DevOps Patterns & Antipatterns for Continuous Software Updates

A presentation at DevOpsDays Buffalo 2020 in October 2020 in Buffalo, NY, USA by Baruch Sadogursky

Slide 1

Slide 1

DevOps Patterns and Antipatterns for Continuous Updates DevOps Patterns & Antipatterns for Continuous Software Updates Kat Cosgrove What could possibly go wrong?

Slide 2

Slide 2

Kat Cosgrove IoT Engineer Developer Advocate @Dixie3Flatline katc@jfrog.com jfrog.com/shownotes

Slide 3

Slide 3

Why do we update software?

Slide 4

Slide 4

@dixie3flatline @jbaruch #DODBuf jfrog.com/shownotes

Slide 5

Slide 5

WHO ARE WE? WHAT DO WE WANT? USERS! FEATURES!

Slide 6

Slide 6

@dixie3flatline @jbaruch #DODBuf jfrog.com/shownotes

Slide 7

Slide 7

“As every company become a software company, Security vulnerabilities are the new oil spills” @dixie3flatline @jbaruch #DODBuf jfrog.com/shownotes

Slide 8

Slide 8

Fix Identify @dixie3flatline @jbaruch #DODBuf Deploy jfrog.com/shownotes

Slide 9

Slide 9

Identify Immediate Fix OS Update Deploy Years

Slide 10

Slide 10

Identify 2 Months Fix Struts Upgrade Deploy 2 Months

Slide 11

Slide 11

@dixie3flatline @jbaruch #DODBuf jfrog.com/shownotes

Slide 12

Slide 12

@dixie3flatline @jbaruch #DODBuf jfrog.com/shownotes

Slide 13

Slide 13

@dixie3flatline @jbaruch #DODBuf jfrog.com/shownotes

Slide 14

Slide 14

Identify Fix Deploy @dixie3flatline @jbaruch #DODBuf As Fast as Possible As Fast as Possible As Fast as Possible jfrog.com/shownotes

Slide 15

Slide 15

Slide 16

Slide 16

Slide 17

Slide 17

Slide 18

Slide 18

@dixie3flatline @jbaruch #DODBuf jfrog.com/shownotes

Slide 19

Slide 19

This is not a new idea! @dixie3flatline • XP: short feedback • Scrum: reducing cycle time to absolute minimum • TPS: Decide as late as possible and Deliver as fast as possible • Kanban: Incremental change @jbaruch #DODBuf jfrog.com/shownotes

Slide 20

Slide 20

Slide 21

Slide 21

@dixie3flatline @jbaruch #DODBuf jfrog.com/shownotes

Slide 22

Slide 22

@jbaruch #LiquidSoftware #AzureDayRome http://jfrog.com/shownotes

Slide 23

Slide 23

Slide 24

Slide 24

How do we update? @dixie3flatline #LiquidSoftware #KubeCon http://jfrog.com/shownotes

Slide 25

Slide 25

Update available Yes No Why not? Do we trust the update? Yes Let’s update! How about no Yes Are there any high risks? No Do we want it? No

Slide 26

Slide 26

The problem is not the code, it’s the data. Big data. @dixie3flatline @jbaruch #DODBuf jfrog.com/shownotes

Slide 27

Slide 27

Update available Yes No Can we verify the update? No Yes Yes How about no Do we trust the update? Time consuming verification Let’s update! Yes Are there any high risks? No Do we want it? No

Slide 28

Slide 28

Slide 29

Slide 29

Features that we want @dixie3flatline Acceptance tests costs @jbaruch #DODBuf jfrog.com/shownotes

Slide 30

Slide 30

Users defer the updates Vendors pack more features The vicious cycle or ever-growing updates Updates are more expensive to test

Slide 31

Slide 31

Slide 32

Slide 32

•Your browser •Twitter in your browser •Twitter on your smartphone •Your smartphone OS?! Update available Yes Are there any high risks? No Let’s update! Do we want it? No one asked you (auto update)

Slide 33

Slide 33

What could possibly go wrong?

Slide 34

Slide 34

@dixie3flatline @jbaruch #DODBuf jfrog.com/shownotes

Slide 35

Slide 35

Slide 36

Slide 36

Continuous updates pattern: Local Rollback @dixie3flatline • Problem: update went catastrophically wrong and an over the-air patch can’t reach the device • Solution: Have a previous version saved on the device prior to update. Rollback in case problem occurred @jbaruch #DODBuf jfrog.com/shownotes

Slide 37

Slide 37

@dixie3flatline @jbaruch #DODBuf jfrog.com/shownotes

Slide 38

Slide 38

Slide 39

Slide 39

Slide 40

Slide 40

Continuous updates pattern: OTA Software Updates @dixie3flatline • Problem: physical recalls are costly. Extremely costly. Also, you can’t force an upgrade. • Solution: Implement over the air software updates, preferably, continuous updates. @jbaruch #DODBuf jfrog.com/shownotes

Slide 41

Slide 41

Continuous OTA updates are like normal OTA updates, but better @dixie3flatline @jbaruch #DODBuf jfrog.com/shownotes

Slide 42

Slide 42

Slide 43

Slide 43

Slide 44

Slide 44

Slide 45

Slide 45

Continuous updates pattern: Continuous Updates @dixie3flatline • Problem: In batch updates, important features wait for unimportant features. • Solution: Implement continuous updates. @jbaruch #DODBuf jfrog.com/shownotes

Slide 46

Slide 46

You thought your problems were hard? Things under your control @dixie3flatline @jbaruch Server-side Updates #DODBuf IoT (Mobile, Automotive, Edge) Updates jfrog.com/shownotes

Slide 47

Slide 47

You thought your problems were hard? Things under your control Server-side Updates IoT (Mobile, Automotive, Edge) Updates ✓ ✕ The availability of the target @dixie3flatline @jbaruch #DODBuf jfrog.com/shownotes

Slide 48

Slide 48

You thought your problems were hard? Things under your control Server-side Updates IoT (Mobile, Automotive, Edge) Updates ✓ ✓ ✕ ✕ The availability of the target The state of the target @dixie3flatline @jbaruch #DODBuf jfrog.com/shownotes

Slide 49

Slide 49

You thought your problems were hard? Things under your control Server-side Updates IoT (Mobile, Automotive, Edge) Updates ✓ ✓ ✓ ✕ ✕ ✕ The availability of the target The state of the target The version on the target @dixie3flatline @jbaruch #DODBuf jfrog.com/shownotes

Slide 50

Slide 50

You thought your problems were hard? Things under your control Server-side Updates IoT (Mobile, Automotive, Edge) Updates ✓ ✓ ✓ ✓ ✕ ✕ ✕ ✕ The availability of the target The state of the target The version on the target The access to the target @dixie3flatline @jbaruch #DODBuf jfrog.com/shownotes

Slide 51

Slide 51

Slide 52

Slide 52

KNIGHT-MARE @dixie3flatline • New system reused old APIs • 1 out of 8 servers was not updated • New clients sent requests to machine contained old code •Engineers removed working code from updated servers, increasing the load on the un-updated server •No monitoring, no alerting, no debugging @jbaruch #DODBuf jfrog.com/shownotes

Slide 53

Slide 53

Continuous updates pattern: Automated Deployment @dixie3flatline • Problem: People suck at repetitive tasks. • Solution: Automate everything. @jbaruch #DODBuf jfrog.com/shownotes

Slide 54

Slide 54

Continuous updates pattern: Frequent Updates @dixie3flatline • Problem: Seldom deployments generate anxiety and stress, leading to errors. • Solution: Update frequently to develop skill and habit. @jbaruch #DODBuf jfrog.com/shownotes

Slide 55

Slide 55

Continuous updates pattern: State awareness @dixie3flatline • Problem: Target state can affect the update process and the behavior of the system after the update. • Solution: Know and consider target state when updating. Reverting might require reverting the state. @jbaruch #DODBuf jfrog.com/shownotes

Slide 56

Slide 56

Slide 57

Slide 57

Cloud-dark @dixie3flatline • New rules are deployed frequently to battle attacks • Deployment of a single misconfigured rule • Included regex to spike CPU to 100% • “Affected region: Earth” @jbaruch #DODBuf jfrog.com/shownotes

Slide 58

Slide 58

Continuous updates pattern: Progressive Delivery @dixie3flatline • Problem: Releasing a bug affects ALL the users. • Solution: Release to a small number of users first effectively reducing the blast radius and observe. If a problem occurs, stop the release, revert or update the affected users. @jbaruch #DODBuf jfrog.com/shownotes

Slide 59

Slide 59

Continuous updates pattern: Observability @dixie3flatline • Problem: Some problems are hard to trace relying on user feedback only • Solution: Implement tracing, monitoring and logging @jbaruch #DODBuf jfrog.com/shownotes

Slide 60

Slide 60

Continuous updates pattern: Rollbacks @dixie3flatline • Problem: Fixes might take time, users suffer in the interim • Solution: Implement rollback, the ability to deploy a previous version without delay @jbaruch #DODBuf jfrog.com/shownotes

Slide 61

Slide 61

Continuous updates pattern: Feature Flags @dixie3flatline • Problem: Rollbacks are not always supported by the deployment target platform • Solution: Embed 2 versions of the features in the app itself and trigger them with API calls @jbaruch #DODBuf jfrog.com/shownotes

Slide 62

Slide 62

Continuous updates @dixie3flatline • Frequent • Automatic • Tested • Progressively delivered • State-aware • Observability • *Local Rollbacks @jbaruch #DODBuf jfrog.com/shownotes

Slide 63

Slide 63

@jbaruch #LiquidSoftware #DevOpsTO http://jfrog.com/shownotes

Slide 64

Slide 64

• While doing maintenance on Newark->Chicago segment, engineers wanted to route that traffic through Atlanta Cloud-dark (the sequel) • They updated the config manually • They routed ALL THE TRAFFIC through Atlanta instead • Atlanta crashed, in chain reaction everything else crashed as well • “Affected region: Earth” @dixie3flatline @jbaruch #DODBuf jfrog.com/shownotes

Slide 65

Slide 65

Continuous updates pattern: Automated deployment @dixie3flatline • Problem: People suck at repetitive tasks. • Solution: Automate everything. @jbaruch #DODBuf jfrog.com/shownotes

Slide 66

Slide 66

Update available Yes Do we trust the update? Yes Let’s update! Yes Are there any high risks? No Do we want it? Sure, why not? (auto update)

Slide 67

Slide 67

” Our goal is to transition from bulk and rare software updates to extremely tiny and extremely frequent software updates; so tiny and so frequent that they provide an illusion of software flowing from development to the update target. We call it the Liquid Software vision. @dixie3flatline @jbaruch #DODBuf jfrog.com/shownotes

Slide 68

Slide 68

@dixie3flatline @jbaruch #DODBuf jfrog.com/shownotes

Slide 69

Slide 69

Corner cases? @dixie3flatline @jbaruch #DODBuf jfrog.com/shownotes

Slide 70

Slide 70

Slide 71

Slide 71

Slide 72

Slide 72

@dixie3flatline @jbaruch #DODBuf jfrog.com/shownotes

Slide 73

Slide 73

● Q&A and twitter ads @dixie3flatline ● @jbaruch ● #DODBuf ● https://liquidsoftware.com ● https://jfrog.com/shownotes