A presentation at Azure Day Rome 2020 in in Rome, Metropolitan City of Rome, Italy by Baruch Sadogursky
DevOps Patterns & Antipatterns for Continuous Software Updates “What can possibly go wrong?!”
Why software updates?
@jbaruch #LiquidSoftware #AzureDayRome http://jfrog.com/shownotes
@jbaruch #LiquidSoftware #AzureDayRome http://jfrog.com/shownotes
“As every company become a software company, Security vulnerabilities are the new oil spills” @jbaruch #LiquidSoftware #AzureDayRome http://jfrog.com/shownotes
Identify @jbaruch #LiquidSoftware Fix #AzureDayRome Deploy http://jfrog.com/shownotes
Identify Fix Deploy Immediately OS upgrade years
Identify Fix Deploy 2 months Struts upgrade 2 months
@jbaruch #LiquidSoftware #AzureDayRome http://jfrog.com/shownotes
@jbaruch #LiquidSoftware #AzureDayRome http://jfrog.com/shownotes
@jbaruch #LiquidSoftware Identify As fast as possible Fix As fast as possible Deploy As fast as possible #AzureDayRome http://jfrog.com/shownotes
@jbaruch #LiquidSoftware #AzureDayRome http://jfrog.com/shownotes
This is not a new idea! @jbaruch #LiquidSoftware XP: short feedback Scrum: reducing cycle time to absolute minimum TPS: Decide as late as possible and Deliver as fast as possible Kanban: Incremental change #AzureDayRome http://jfrog.com/shownotes
🎩 @jbaruch #dockercon jfrog.com/shownotes @ErinMeyerINSEAD’s “Culture Map”
shownotes http://jfrog.com/shownotes Slides Video Links Comments, Ratings Raffle @jbaruch #LiquidSoftware #AzureDayRome http://jfrog.com/shownotes
@jbaruch #LiquidSoftware #AzureDayRome http://jfrog.com/shownotes
@jbaruch #LiquidSoftware #AzureDayRome http://jfrog.com/shownotes
@jbaruch #LiquidSoftware #AzureDayRome http://jfrog.com/shownotes
Update available Yes No Do we trust the update? Yes How about no Let’s update! Yes Are there any high risks? No Do we want it? No
@jbaruch #LiquidSoftware #AzureDayRome http://jfrog.com/shownotes
number of artifacts as a symptom of complexity Today IoT Serverless Docker Microservices Infrastructure as Code Continuous Delivery Continuous Integration Agile 2000 @jbaruch @jfrog #LiquidSoftware www.liquidsoftware.com
The problem is not the code, it’s the data. Big data. @jbaruch #LiquidSoftware #AzureDayRome http://jfrog.com/shownotes
Update available Yes No Can we verify the update? No Yes Yes How about no Do we trust the update? Time consuming verification Let’s update! Yes Are there any high risks? No Do we want it? No
Features that we want @jbaruch #LiquidSoftware Acceptance tests costs #AzureDayRome http://jfrog.com/shownotes
Your browser Twitter in your browser Twitter on your smartphone Your smartphone OS?! Update available Yes Are there any high risks? No Let’s update! Do we want it? No one asked you (auto update)
What can possibly go wrong?
@jbaruch #LiquidSoftware #AzureDayRome http://jfrog.com/shownotes
Continuous updates pattern: Local rollback @jbaruch #LiquidSoftware Problem: update went catastrophically wrong and an over the-air patch can’t reach the device Solution: Have a previous version saved on the device prior to update. Rollback in case problem occurred #AzureDayRome http://jfrog.com/shownotes
@jbaruch #LiquidSoftware #AzureDayRome http://jfrog.com/shownotes
Continuous updates pattern: OTA software updates @jbaruch #LiquidSoftware Problem: physical recalls are costly. Extremely costly. Also, you can’t force an upgrade. Solution: Implement over the air software updates, preferably, continuous updates. #AzureDayRome http://jfrog.com/shownotes
continuous OTA updates are like normal OTA updates, but better @jbaruch #LiquidSoftware #AzureDayRome http://jfrog.com/shownotes
Continuous updates pattern: continuous updates @jbaruch #LiquidSoftware Problem: In batch updates important features wait for non-important features. Solution: Implement continuous updates. #AzureDayRome http://jfrog.com/shownotes
You thought your problems are hard? Things under your control Server-side Updates IoT (Mobile, Automotive, Edge) Updates ✓ ✓ ✓ ✓ ✕ ✕ ✕ ✕ The availability of the target The state of the target The version on the target The access to the target @jbaruch #LiquidSoftware #AzureDayRome http://jfrog.com/shownotes
KNIGHT-MARE @jbaruch #LiquidSoftware New system reused old APIs 1 out of 8 servers was not updated New clients sent requests to machine contained old code Engineers undeployed working code from updated servers, increasing the load on the not-updated server No monitoring, no alerting, no debugging #AzureDayRome http://jfrog.com/shownotes
Continuous updates pattern: Automated deployment @jbaruch #LiquidSoftware Problem: People suck at repetitive tasks. Solution: Automate everything. #AzureDayRome http://jfrog.com/shownotes
Continuous updates pattern: frequent updates @jbaruch #LiquidSoftware Problem: Seldom deployments generate anxiety and stress, leading to errors. Solution: Update frequently to develop skill and habit. #AzureDayRome http://jfrog.com/shownotes
Continuous updates pattern: state awareness @jbaruch #LiquidSoftware Problem: Target state can affect the update process and the behavior of the system after the update. Solution: Know and consider target state when updating. Reverting might require revering the state. #AzureDayRome http://jfrog.com/shownotes
Cloud-dark @jbaruch #LiquidSoftware New rules are deployed frequently to battle attacks Deployment of a single misconfigured rule Included regex to spike CPU to 100% “Affected region: Earth” #AzureDayRome http://jfrog.com/shownotes
Continuous updates pattern: Progressive Delivery @jbaruch #LiquidSoftware Problem: Releasing a bug affects ALL the users. Solution: Release to a small number of users first effectively reducing the blast radius and observe. If a problem occurs, stop the release, revert or update the affected users. #AzureDayRome http://jfrog.com/shownotes
@jbaruch #LiquidSoftware #AzureDayRome http://jfrog.com/shownotes
Continuous updates pattern: observability @jbaruch #LiquidSoftware Problem: Some problems are hard to trace relying on user feedback only Solution: Implement tracing, monitoring and logging #AzureDayRome http://jfrog.com/shownotes
Continuous updates pattern: Rollbacks @jbaruch #LiquidSoftware Problem: Fixes might take time, users suffer in a meanwhile Solution: Implement rollback, the ability to deploy a previous version without delay #AzureDayRome http://jfrog.com/shownotes
Continuous updates pattern: feature flags @jbaruch #LiquidSoftware Problem: Rollbacks are not always supported by the deployment target platform Solution: Embed 2 versions of the features in the app itself and trigger them with API calls #AzureDayRome http://jfrog.com/shownotes
Continuous updates pattern: zero downtime updates @jbaruch #LiquidSoftware Problem: You will probably loose all your users if you shut down for 5 weeks to perform an update. Solution: Perform zerodowntime OTA small and fequent continuous updates. #AzureDayRome http://jfrog.com/shownotes
Continuous updates @jbaruch #LiquidSoftware Frequent Automatic Tested Progressively delivered State-aware Observability *Local Rollbacks #AzureDayRome http://jfrog.com/shownotes
Update available Yes Do we trust the update? Yes Do we want it? Are there any high risks? Sure, why not? (auto update) Yes Let’s update! No
” Our goal is to transition from bulk and rare software updates to extremely tiny and extremely frequent software updates; so tiny and so frequent that they provide an illusion of software flowing from development to the update target. We call it the Liquid Software vision. @jbaruch #LiquidSoftware #AzureDayRome http://jfrog.com/shownotes
@jbaruch #LiquidSoftware #AzureDayRome http://jfrog.com/shownotes
Corner cases? @jbaruch #LiquidSoftware #AzureDayRome http://jfrog.com/shownotes
@jbaruch #LiquidSoftware #AzureDayRome http://jfrog.com/shownotes
Q&A and twitter ads @jbaruch #LiquidSoftware #AzureDayRome https://liquidsoftware.com https://jfrog.com/shownotes
So, you want to update the software for your user, be it the nodes in your K8s cluster, a browser on user’s desktop, an app in user’s smartphone or even a user’s car. What can possibly go wrong?
In this talk, we’ll analyze real-world software update fails and how multiple DevOps patterns, that fit a variety of scenarios, could have saved the developers. Manually making sure that everything works before sending an update and expecting the user to do acceptance tests before they update is most definitely not on the list of such patterns.
Join us for some awesome and scary continuous update horror stories and some obvious (and some not so obvious) proven ideas for improvement and best practices you can start following tomorrow.