I recently started a lecture series at the Vrije University of Amsterdam (VU). As part of this I did a lecture on how to operate unreliable Information Systems in a reliable way – or: Everything breaks, All the time.
Behind the clouds of cloud computing! How can we reliably operate systems that are inherently unreliable?
What if for some hours, we do not have access to the services such as navigators, routers, and other communication technologies? It seems our life will be at stake if major digital services fail! Many promises of digital technologies, from big-data to the Internet of the things and many others are based on reliable infrastructures such as cloud computing. What if these critical infrastructures fail? Do they by the way fail? How the responsible companies and organizations manage these infrastructures in a reliable way? And what are the implications of all this for companies who want to base their business on such services?
As part of the lecture we explored modern complex systems and how we got there, using examples from Google and Amazon’s journey and how it relates to modern enterprise IT. We used the material of Mark Burgess to explore how to prevent systems from spiralling out of control. We closed off looking at knowledge management based on the ‘blameless retrospective’ principles and how feedback cycles from other domains are helping to create more reliable IT.
Relevant links supporting the lecture :
The used presentation can be found here: VU lecture
Recording of the session is available within the VU.
VU Assistant Professor Mohammad Mehrizi posted a nice lecture review on LinkedIn, including a picture with some of the attending students.
Less then 24 hours after I published my ‘thank you team’ post, Datacenter Dynamics announced their nominees for this year’s DCD EMEA 2016 Awards.
I’m very proud one of the mentioned projects in my original post got nominated in the Category Cloud Journey of the Year
The selected project is our move of SDL Machine Translation from our co-lo datacenter to a IAAS cloud solution;
Availability of content in multiple languages is key to driving useful international business. SDL’s Statistical machine translation delivers high quality translation services to more thousands of customers. While SDL’s research organisation already explored a new approach to machine translation, the future development and deployment needed more flexibility in technology choice and dynamic scalability to be commercially successful. Over 10 months, SDL migrated their current workload deployment consisting of hundreds of servers, without customer downtime, to a private Cloud deployment. The migration included a project team of more than 35 staff in 5 time zones. Besides flexibility and scalability gains, the migration saves SDL more than 450k GBP over 4 years.
The teams worked long hours, overcoming many obstacles a long the way. Congrats to all involved!
According to AWS CTO Werner Vogels “Cloud is now the new normal.”
Where the first day keynote at AWS’s ReInvent 2015 conference was all about enabling companies to migrate their current services to the cloud, the second day keynote by Vogels was all about the ‘new normal’ – developer enablement.
With new services like AWS Snowball , AWS Database Migration Service and AWS Schema Conversion Tool , AWS tries to smoothen the migration path from old on-premise infrastructure & application deployments, to using AWS’s Infrastructure As A Service offering (EC2, RDS, VPC, S3, ..).
While these new services help companies to move to a consumption model for compute, storage and networking, it is still very infrastructure focused. Design decisions around (virtual)network layout, load balancers and the build & management of the operating systems (Windows/Linux) are still the customer’s responsibility.
Needing to still deal with all these elements, holds developers back from moving fast as they go from idea to the launch of a new service. It slows the creation of real value to the company down.
In the real ‘new normal’ world, the developer is enabled to deploy a new service by building & releasing something fast, without needing to worry about the infrastructure behind it. By stitching external managed capabilities/services together in a smart way the developer can move even faster.
Where in the past a developer would try to speed releases up by code-reuse with, for example, software libraries, the availability of developer ready services like a fully managed message queuing service (AWS SQS) or a push messaging service (AWS SNS) have enabled developers to move even faster without worrying about the manageability of the solution.
Continue with reading