DevOps Patterns & Antipatterns for Continuous Software Updates

A presentation at GIDS.CLOUD Live in August 2020 in by Baruch Sadogursky

Slide 1

Slide 1

DevOps Patterns & Antipatterns for Continuous Software Updates “What can possibly go wrong?!”

Slide 2

Slide 2

Why software updates?

Slide 3

Slide 3

@jbaruch #LiquidSoftware #GIDS http://jfrog.com/shownotes

Slide 4

Slide 4

Slide 5

Slide 5

Slide 6

Slide 6

“As every company become a software company, Security vulnerabilities are the new oil spills” @jbaruch #LiquidSoftware #GIDS http://jfrog.com/shownotes

Slide 7

Slide 7

Slide 8

Slide 8

Slide 9

Slide 9

@jbaruch #LiquidSoftware #GIDS http://jfrog.com/shownotes

Slide 10

Slide 10

This is not a new idea! @jbaruch XP: short feedback Scrum: reducing cycle time to absolute minimum TPS: Decide as late as possible and Deliver as fast as possible Kanban: Incremental change #LiquidSoftware #GIDS http://jfrog.com/shownotes

Slide 11

Slide 11

Slide 12

Slide 12

shownotes http://jfrog.com/shownotes Slides Video Links Comments, Ratings Raffle @jbaruch #LiquidSoftware #GIDS http://jfrog.com/shownotes

Slide 13

Slide 13

Slide 14

Slide 14

@jbaruch #LiquidSoftware #GIDS http://jfrog.com/shownotes

Slide 15

Slide 15

@jbaruch #LiquidSoftware #GIDS http://jfrog.com/shownotes

Slide 16

Slide 16

Slide 17

Slide 17

@jbaruch #LiquidSoftware #GIDS http://jfrog.com/shownotes

Slide 18

Slide 18

Update available Yes No Do we trust the update? Yes How about no Let’s update! Yes Are there any high risks? No Do we want it? No

Slide 19

Slide 19

@jbaruch #LiquidSoftware #GIDS http://jfrog.com/shownotes

Slide 20

Slide 20

The problem is not the code, it’s the data. Big data. @jbaruch #LiquidSoftware #GIDS http://jfrog.com/shownotes

Slide 21

Slide 21

Update available Yes No Can we verify the update? No Yes Yes How about no Do we trust the update? Time consuming verification Let’s update! Yes Are there any high risks? No Do we want it? No

Slide 22

Slide 22

Slide 23

Slide 23

Features that we want @jbaruch Acceptance tests costs #LiquidSoftware #GIDS http://jfrog.com/shownotes

Slide 24

Slide 24

Updates are bulkier and rarer Updates are even less trustworthy Updates are packed with more features to make them worth taking

Slide 25

Slide 25

What can possibly go wrong?

Slide 26

Slide 26

@jbaruch #LiquidSoftware #GIDS http://jfrog.com/shownotes

Slide 27

Slide 27

Slide 28

Slide 28

Continuous updates pattern: Local rollback @jbaruch Problem: update went catastrophically wrong and an over the-air patch can’t reach the device Solution: Have a previous version saved on the device prior to update. Rollback in case problem occurred #LiquidSoftware #GIDS http://jfrog.com/shownotes

Slide 29

Slide 29

@jbaruch #LiquidSoftware #GIDS http://jfrog.com/shownotes

Slide 30

Slide 30

Slide 31

Slide 31

Slide 32

Slide 32

Slide 33

Slide 33

Continuous updates pattern: OTA software updates @jbaruch Problem: physical recalls are costly. Extremely costly. Also, you can’t force an upgrade. Solution: Implement over the air software updates, preferably, continuous updates. #LiquidSoftware #GIDS http://jfrog.com/shownotes

Slide 34

Slide 34

continuous OTA updates are like normal OTA updates, but better @jbaruch #LiquidSoftware #GIDS http://jfrog.com/shownotes

Slide 35

Slide 35

Slide 36

Slide 36

Slide 37

Slide 37

Continuous updates pattern: continuous updates @jbaruch Problem: In batch updates important features wait for non-important features. Solution: Implement continuous updates. #LiquidSoftware #GIDS http://jfrog.com/shownotes

Slide 38

Slide 38

Slide 39

Slide 39

KNIGHT-MARE @jbaruch New system reused old APIs 1 out of 8 servers was not updated New clients sent requests to machine contained old code Engineers undeployed working code from updated servers, increasing the load on the not-updated server No monitoring, no alerting, no debugging #LiquidSoftware #GIDS http://jfrog.com/shownotes

Slide 40

Slide 40

Continuous updates pattern: Automated deployment @jbaruch Problem: People suck at repetitive tasks. Solution: Automate everything. #LiquidSoftware #GIDS http://jfrog.com/shownotes

Slide 41

Slide 41

Continuous updates pattern: frequent updates @jbaruch Problem: Seldom deployments generate anxiety and stress, leading to errors. Solution: Update frequently to develop skill and habit. #LiquidSoftware #GIDS http://jfrog.com/shownotes

Slide 42

Slide 42

Continuous updates pattern: state awareness @jbaruch Problem: Target state can affect the update process and the behavior of the system after the update. Solution: Know and consider target state when updating. Reverting might require revering the state. #LiquidSoftware #GIDS http://jfrog.com/shownotes

Slide 43

Slide 43

Slide 44

Slide 44

Cloud-dark @jbaruch New rules are deployed frequently to battle attacks Deployment of a single misconfigured rule Included regex to spike CPU to 100% “Affected region: Earth” #LiquidSoftware #GIDS http://jfrog.com/shownotes

Slide 45

Slide 45

Continuous updates pattern: Progressive Delivery @jbaruch Problem: Releasing a bug affects ALL the users. Solution: Release to a small number of users first effectively reducing the blast radius and observe. If a problem occurs, stop the release, revert or update the affected users. #LiquidSoftware #GIDS http://jfrog.com/shownotes

Slide 46

Slide 46

@jbaruch #LiquidSoftware #GIDS http://jfrog.com/shownotes

Slide 47

Slide 47

Continuous updates pattern: observability @jbaruch Problem: Some problems are hard to trace relying on user feedback only Solution: Implement tracing, monitoring and logging #LiquidSoftware #GIDS http://jfrog.com/shownotes

Slide 48

Slide 48

Continuous updates pattern: Rollbacks @jbaruch Problem: Fixes might take time, users suffer in a meanwhile Solution: Implement rollback, the ability to deploy a previous version without delay #LiquidSoftware #GIDS http://jfrog.com/shownotes

Slide 49

Slide 49

Continuous updates pattern: feature flags @jbaruch Problem: Rollbacks are not always supported by the deployment target platform Solution: Embed 2 versions of the features in the app itself and trigger them with API calls #LiquidSoftware #GIDS http://jfrog.com/shownotes

Slide 50

Slide 50

Update available Yes Do we trust the update? Yes Do we want it? Are there any high risks? Sure, why not? (auto update) Yes Let’s update! No

Slide 51

Slide 51

” Our goal is to transition from bulk and rare software updates to extremely tiny and extremely frequent software updates; so tiny and so frequent that they provide an illusion of software flowing from development to the update target. We call it the Liquid Software vision. @jbaruch #LiquidSoftware #GIDS http://jfrog.com/shownotes

Slide 52

Slide 52

@jbaruch #LiquidSoftware #GIDS http://jfrog.com/shownotes

Slide 53

Slide 53

Corner cases? @jbaruch #LiquidSoftware #GIDS http://jfrog.com/shownotes

Slide 54

Slide 54

Slide 55

Slide 55

Slide 56

Slide 56

@jbaruch #LiquidSoftware #GIDS http://jfrog.com/shownotes

Slide 57

Slide 57

Q&A and twitter ads • @jbaruch • #LiquidSoftware • #GIDS • https://liquidsoftware.com • https://jfrog.com/shownotes