2013/10/16 ElectricCloud Spark

by Gene Kim on

#sparkec13

Steve Brodie, CEO, ElectricCloud

  • Brodie: "Immelt, CEO, GE: 'In an industrial company, avoid software at your peril. A software company could disintermediate GE someday and we're better off being paranoid of that."
  • Brodie: "A typical car is now a rolling data center: 40-50% of value of car value is software; 100MM+ lines of code"
  • Brodie: "1979 car: 100 lines of code; 1981: 50K; 2012: 100M LoC; (!!); Joint Strike Fighter: 24MM LoC; 787 8MM LoC"
  • Brodie: "Mercedes announces that they'll deliver self-driving cars by 2020
  • Brodie: "35% of organizations have adopted Agile"
  • Brodie: "Continuous delivery applies even to embedded systems, with huge benefits; you need fast feedback loops, esp testing
  • Brodie: "Kurt Bittner: Forrester: 'If Agile was opening act to great performance, continuous delivery is the headliner
  • Brodie: "Continuous delivery challenges: enterprise silos; embedded systems;
  • Brodie: "Mobile space is dynamic: Samsung is releasing on average one device per week" (!!!)
  • Brodie: "On average, a developer spends 10 hrs/week waiting on builds; testers wait 18 hrs/week" (!! source: ElectricCloud survey
  • Brodie: "Brocade: 76K hours/year; GE: 6 hours -> 45m;
  • Brodie: "Pressure to reduce lead times is so enormous, many people turn off automated tests." (Hahaha. Oh, no...)
  • Brodie: "GE reduced error rates by 90%"
  • Brodie: "Cisco Aurora project creates self service builds"
  • Brodie: "Family Search: original cycles every 3 months: down to 10-30m code commits"
  • Brodie: "

Gene Kim

  • share with you my top learnings, tantalize you with other DevOps patterns you may not have seen to help your organizations with
  • Paul Rogers, GE
  • Justin Arbuckle, GEd
  • goal of science: explain observed phenomena, confirm deeply held intuitions, reveal surprising insights
  • @rj_tech: “@RealGeneKim: # Up: Paul Rogers, Chief Development Officer, GE #devops#SparkEC13
  • @rcbaillargeon: "You don't choose chaos monkey, chaos monkey chooses you" Gene Kim #SparkEC13 #goingtoreusethis
  • @ElectricCloud: RT @rcbaillargeon: Foster cultures of experimentation and practice. #SparkEC13
  • @ElectricCloud: @RealGeneKim: "repetition is the prerequisite to mastery." Learn more at #SparkEC13; watch live stream: http://t.co/78fuyavTZR
  • @rcbaillargeon: Metrics in organizations are indicators of feedback and organizations operating with rigor and balance. #goodpractice #SparkEC13
  • @rcbaillargeon: Two biggest indicator of organization high performers. Do the have version control? Do they have automated deploy? #SparkEC13

Up: Paul Rogers, Chief Development Officer, GE

  • Rogers: "I'll share Agile journey of $1 billion GE dev center in San Ramone" (Awesome. GE is horsiest of the horses)
  • Rogers: "Us: SSG: 5 P&Ls & 100+ products: 70 scrum teams; 13 locations, $450MM revenue, 800 engineers"
  • (energy, M&D, T&D, Engergy Mgmt, Smart Grid)
  • Rogers: "Utility market is dynamic: $50B market: prosperity (urban growth) vs. sustainability (environmental concerns)
  • Rogers: "Utility market: bring energy to areas where they didn't have electricity (more than several hrs/day)
  • Rogers: "Old days: COE that built 80% of sw in Energy;
  • Rogers: "Energy software division built from hw acquisitions; goodness expected to happen from sw jammed together" (haha)
  • Rogers: "Needed sr leadershp buy-in, b/c of all the angry phone calls go up chain, to roll them back down w/support" (haha)
  • Rogers: "Common problem: people don't start Agile, b/c of fear that they're not ready; coach kicks them, saying 'go go go'
  • Rogers: "Rolled out new infrastructure: Xmas in July; team assembled w/o real understanding of real dev infrastrcture
  • Rogers: "Long cycle biz: new turbine takes 5yrs, $1B R&D; shortened our dev cycle to 3-6 months; took us 5 re-orgs over 5 yrs
  • Rogers: "Reduced physical locations from, umm, 23-ish teams to 13 teams
  • Rogers: "
  • @rcbaillargeon: Why choose two weeks for an iteration? Because the organization couldn't do it and needed the objective to make dramatic change. #SparkEC13
  • Rogers: "We adopted Rally as our project management tool, used smart boards to sync Mumbai and San Ramone
  • Rogers: "People at IBM/HP loved me; we had truckloads of equipment delivered;
  • Rogers: "Discovered that dev was uysing equipment that they were issued when they started; 'can I have 256G more RAM?'
  • Rogers: "I was so touched by [impoverished] Dev groups, I offered them the laptop in my bag." (Hahaha)
  • Rogers telling funny/sad story, comparing his Dev groups in London to Oliver Twist, who begged for more memroy
  • Rogers: "Stages of maturity: Shu, Ha, Ri"
  • Rogers: "@rcbaillargeon: "Tools and process are often confused" -- Paul Rogers. Yes! And the shouldn't. #SparkEC13
  • @rcbaillargeon: New environment and new technology needed for transition to agile. #SparkEC13
  • Rogers: "We had massive build cycles, often taking > 1 day; brought down to minutes;
  • Rogers: "unforch, we liked manual testing: reqd 5-10 weeks to complete; automated testing viewed as overhead (vs relief)
  • Rogers: "Long build times -> only monthly builds" (argh.) "Electric Accelerator enabled 5x daily builds, continuous builds"
  • Rogers: "ThoughtWorks consultants: 'can you believe some orgs reqd up to 10 minutes to build?' (laughed, b/c ours were days)
  • Rogers: "When you have 5 week test cycles, it means you test your code only once (if you have 6 month cycles)" (Hopeless!)
  • Rogers: "New policy: all new code required automated testing; brought test cycle time from 5 weeks to 5 minutes"
  • Rogers: "We shifted our Dev culture from '# of bugs' to 'are we shipping we're proud of?'"
  • Rogers: "Biggest game changer: Year 2: creating continuous integration process: automated unit & integration tests"
  • Rogers: "Site A (before): 11 hour builds; broken builds for weeks/months; After: Electric Cloud: found 73 build failures, saving 11 yours * 73; 20 min build times: build is ready when day begins
  • Rogers: "It often took us 3 months to find/fix all the build failures, b/c 11 hour build times & 73 build problems" (!!)
  • Rogers: "As a large biz, we had to conform to NPI tollgates: backlog 0; NPI +50 pts; VTW: -70%" (corp speak for 'late less')
  • Rogers showing NPI tollgate metrics: haha. Each line reqs 2 min of explanations:
    https://pbs.twimg.com/media/BWt3h9YCAAAznl1.jpg
  • Rogers: "Touch time reduction: +3,225%: i.e., 5 wks of wait time reduced to minutes" (Six Sigma scorecard making me laugh/cry)
  • Rogers: "Next step: moving Dev groups back into the business units, vs. COE;
  • Rogers: "Waterfall: planning the most when you know the least; Scrumfall: Process got same outcomes as waterfall
  • Rogers: "
  • @adr0sen: Rogers: Agile saved 700,000h of productivity across GE Energy! #SparkEC13 http://t.co/APdpDp3lmZ
  • @rcbaillargeon: Traditional waterfall -> planning the most when you know the least. #SparkEC13
  • @ElectricCloud: Rogers: GE measures productivity by % of time people didn't do something before. We were then able to deliver high quality code. #SparkEC13
  • Rogers: "Ouch: automated testing was slowing us down during sprint b/c we had to fix all the issues"
  • Rogers: "Prob w/oil rigs around the world: 'didn't know if they were on or not w/o driving car there'; 3 week startup times
  • Rogers: "GE multi-million contract: remote oil rig monitoring/diag: allow remote oil rig startups; prototype in 30 days" (!!)
  • Rogers: "Effort: 3 months; 7 people; daily story development, deliver product daily; customers using product daily"
  • Rogers: "Currently monitoring 70+ oil wells; running on server in basement (it's still prototype); rollout is in Nov"
  • Rogers: "Gave customers value: product sold via subscription basis ($100s/month); Potential market: 1MM+ wells out there"
  • Rogers:
  • @rcbaillargeon: "Automated testing was slowing us down with the issues they were finding". #oddthingsdeveloperssay #SparkEC13
  • Rogers: "Be suspicious of quiet teams" (haha) "This is important; when u ask someone to do things diff, quiet == opting out"

Up: Gary Gruver, VP Quality Engineering And Release Operations, Macy's, "Practical Approach To Large-Scale Agile Development"

  • At HP, he led 400 engineers working on LaserJet
  • .@GruverGary: "I wrote a book... it would have been shorter, but then they wouldn't have published it." (har har)
  • .@GruverGary telling his HP Agile/cont delivery story: "400+ engrs for LaserJets, scanners, copiers, etc. Embedded SW, networks"
  • .@GruverGary: "When LaserJets scan docs & put them on network shares, software becomes as complex as PC OS; 10MM LoC"
  • .@GruverGary: "Early on: 6+ weeks to complete test cycle, lots of manual testing; integration to trunk taking 15-20% Dev time
  • .@GruverGary: "impact: customer satisfaction suffering, and marketing stopped asking us for features -- they had given up on us
  • .@GruverGary: "2008: Dev costs grew 2.5x (uh oh); 10 different branches; branches of branches from last years model
  • .@GruverGary: "80-90% of resources just porting existing firmware to new products & qualifying
  • .@GruverGary: "60% of our journey was organizational change management; must get leadership team engaged
  • .@GruverGary: "1) arch for product viability; lots of #ifdef PRODA or PRODB or PROD_F... moved to HTML file & common code base
  • .@GruverGary: "The goal: move all Dev onto common code base; 1 branch for all products; biggest inefficiency is multiple branches;
  • .@GruverGary: "On single branch: it's real, it works; it will be your biggest efficiency gain
  • .@GruverGary: "You need continuous integration, otherwise it's the fastest way to get a big pile of junk"
  • .@GruverGary: "Big inefficiency: "I can't reproduce ur problem; & you're an idiot because you can't even configure your laptop."
  • .@GruverGary: "If you keep breaking the build; shame: 'really? you didn't even try to compile it before you checked in code?'
  • .@GruverGary: "I have no idea what Agile in the enterprise is if you don't have 'releasable all the time' capability"
  • .@GruverGary: "400+ dev; 10+M LOC; 75K-100K LOC turmoil; 100-150 commits/day"
    https://pbs.twimg.com/media/BWymiFfCYAAQ614.jpg
  • .@GruverGary: "2008->2011 improvements: builds: daily to 10-15/day; amazing..." Look at his chart;
    https://pbs.twimg.com/media/BWymz3qCMAAa1Pt.jpg
  • .@GruverGary: "Dev innovation: 5% -> 40%; first time in decades, firmware wasn't the constraint" Wow.
    https://pbs.twimg.com/media/BWynRRuCMAAB1lp.jpg
  • .@GruverGary: "Making an Enterprise Agile vs Enabling Small Agile Teams In The Enterprise"
  • .@GruverGary: "I joke w/@jezhumble that Agile conferences have become Scrum conferences; lost sight of Agile principles
  • .@GruverGary: "What must be done at enterprise level: objectves, continuous improvement, CI/CD and test automation infrastructre
  • .@GruverGary: "Our goal: automate, eliminate or engineer out the drivers that aren't key to the value prop; cost & cycle time
  • .@GruverGary: "Don't start w/cont delivery: instead: increase quality & feedback freq; reduce time/resources betw trunk/branch; improve deployment repeatability
  • .@GruverGary: "For embedded people, you don't need to do daily deploys; instead our goal is that our code is always releasable
  • .@GruverGary: "Don't manage by metrics; instead use metrics go guide conversations abt what is not getting done; I didn't have status mtgs
  • .@GruverGary: "Find offending code; it can't take triage experts: you broke all the fax tests 6 weeks ago; instead, fix it on the next day
  • .@GruverGary: "It's impossible to overstate
  • .@GruverGary: "What is DevOps for embedded systems: there aren't enuf trees in Idaho to do 15K tests that all print; emulators
  • .@GruverGary: "Cost of testing goes up closer to physical; ergo, must drive all testing upstream (emulators, simulators, etc)
  • .@GruverGary: "Created chat room: it became lab felony to lv before confirming build is still green; no code commits on red build
  • .@GruverGary: "One time, build was red for 5 days: frustrated b/c no features could get in; why? new tests were breaking things
  • .@GruverGary: "I need a process to let good code in, and bad code out; Duh; we used git and auto-revert queue, to allow; we moved train wrecks on main track off the tracks < biggest productivity improvements
  • .@GruverGary: "Moving from HP to current job at Macy's: deployment issues kept showing up
  • .@GruverGary: "Macy's a very old company, started in 1800s; don't think there's any code left from back then, but..." (haha)
  • @dbgordon: "I need a process that keeps the bad code ouy and lets the good code in" @GaryGruver at #sparkec13 on CD for large enterprises.
  • @dbgordon: "I need a process that keeps the bad code ouy and lets the good code in" @GaryGruver at #sparkec13 on CD for large enterprises.
  • @SmallFrenchGuy: @GRUVERGary when the build is "red", don't pile code on top of it until it's "green" #SparkEC13
  • @dbgordon: The value of the frequency and quality of feedback to your developers cannot be underestimated. @GarryGruver
  • .@GruverGary: "It's big investment to mock up apps, so avoid them, but valuable when you need them; merge them
  • .@GruverGary: "Getting mechanical engineers to manage sw is a significant challenge; so much uncertainty & yet so much ability to adapt
  • .@GruverGary: "
  • @jezhumble: RT @ElectricCloud: @jezhumble delivers his #SparkEC13 keynote in ten minutes! Watch live here: http://t.co/78fuyavTZR

Up: Jez Humble

  • .@jezhumble: "Continuous integration is the place to start; you must improve engineering first before tackling ops"
  • .@jezhumble: "@SmallFrenchGuy: RT @rj_tech: @jezhumble on stage at #SparkEC13 continuous delivery is the CAPABILITY to release at any time. Now that applies to IT and em…
  • .@jezhumble: "Big problem: when biz comes up w/idea, the least efficient way to test whether it's valuable is to build it
  • .@jezhumble: "In Gannt chart, we have this big band for testing; oddly, we don't have a band for fixing everything
  • .@jezhumble: "What must be addressed at enterprise level: fix fuzzy end, continuous delivery; not generating scrum teams" (!!)
  • .@jezhumble: "Continuous delivery: make releases boring & low risk, so that you must be able to do them any time of day
  • .@jezhumble: "Second Way: fast, automated feedback loops on production readiness readiness of your app upon every change/commit
  • .@jezhumble: "At Facebook, it used to take 6 wks to incr capacity (bring rack of server online); reduced to 6 hrs to 6 min"
  • .@jezhumble: (oops; brought it down to 6 hours, not 6 min): "Within 6 hrs, servers can start serving traffic"
  • .@jezhumble: "Continuous integration is hard; it's not running CI on feature branches; it's not
  • .@jezhumble: "If I commit bad code, I am selfishly putting every developer in a non-working state; so I must fix or revert
  • .@jezhumble: "Everyone must check into trunk at least once per day"
  • .@jezhumble: "All of Google runs off of one massive Perforce repo; everyone checks into trunk once per day; it works"
  • .@jezhumble: "When a dev says they're 'done,' what does it mean? Usually 'it works on my laptop.' It should mean 'in Prod'
  • .@jezhumble citing @GruverGary story: "don't even show me a demo until it's in trunk"
  • @cyetain: Continuous Integration on a Dollar a Day by @jamesshore, recommended by @jezhumble @ #sparkec13 http://t.co/C8QKTgmFAh
  • .@jezhumble: "We shouldn't be measuring ourselves on how many crappy features we call Dev Complete & checked into vers ctl
  • .@jezhumble: "..instead, it should be about how many useful features we get to customers & reduction of cycle time" (lead time)
  • .@jezhumble: "
  • @SmallFrenchGuy: RT @dbgordon: Dev complete means releasable. No demo and you aren't done until you have automated tests. @jezhumble
  • @SmallFrenchGuy: RT @dbgordon: Dev complete means releasable. No demo and you aren't done until you have automated tests. @jezhumble
  • .@jezhumble: "Deming: 'Cease dependence on mass inspection to achieve quality'; 'Implication: never check bugs into vers ctl
  • .@jezhumble: "Presence of feature branches indicates the components are too big and are untestable" (i.e., it's an archtectural problem that needs to be fixed)
  • .@jezhumble: "@ElectricCloud: RT @adr0sen: .@jezhumble: "If you're doing manual regression test in 2013, all computers get together at night and laugh at you" #SparkEC13
  • @ElectricCloud: RT @adr0sen: .@jezhumble: "If you're doing manual regression test in 2013, all computers get together at night and laugh at you" #SparkEC13
  • .@jezhumble: "Big problem happens when you can't run tests in parallel; it should be design requirement for tests
  • .@jezhumble: "Next big problem: lots of automated acceptance tests, but no unit tests; that's too expensive; must fix earlier
  • .@jezhumble: "
  • @rj_tech: #SparkEC13 @jezhumble write your tests to be parallelizable and run across a lot of servers.
  • @SmallFrenchGuy: @jezhumble Branching should not be used for the lack of component architecture #SparkEC13
  • .@jezhumble: "what succ looks like: sw is always releasable on demand (obviating need for stabilizing phase, even for 10MM LoC)
  • .@jezhumble: "prioritize keeping system releasable" (improvement of daily work is more important than daily work)
  • .@jezhumble: "Here is the ROI of all this: buy @GruverGary book" Slide:
    https://pbs.twimg.com/media/BWy3ZKBCYAA3z3a.jpg
  • .@jezhumble: "You don't reduce costs by reducing costs; you do it by investing in things that reduce waste"
  • .@jezhumble: "Taichi Ohno on theft of loom plans: what's valuable isn't the IP; it's the people & culture that generated IP.'
  • @dbgordon: Intellectual property is not what's important. Important is having the mindset that leads to intellectual property. @jezhumble
  • @cyetain @RealGeneKim @jezhumble High costs are like slow processes: they hide problems. Fixing those problems naturally lowers the costs

Customer Panel: @damonedwards hosts: Jacob Aleksynas, Gap; Jorn-Erik Jensen, Lego; Hugo Gayosso, GM; Manual Garcia-Duque, ETrade

  • @rcbaillargeon: DevOps is everywhere. Check out the diversity of toys, clothes, cars, and stocks. #SparkEC13 http://t.co/pM77dKModr
  • @adr0sen: Customer Panel - value of Accelerating Software Delivery? Speed (GM), Quality (Gap), Flexibility (Etrade), Scale (Lego) #SparkEC13
  • @SmallFrenchGuy: Value of EC for customers is improving quality, reducing build times and allowing CD #SparkEC13
  • control vs. responsibility: centralized vs. decentralized (to get rid of bottleneck);
  • Neat panel from Gap, Lego, GM, ETrace here:

    https://pbs.twimg.com/media/BWzB37GCcAAkHuM.jpg

  • Jacob Aleksynas, Gap

    • jumped right to quality vs. speed delivery
    • .@damonedwards: "Deploys equal broken stuff; you want to deploy more broken stuff, faster? Get out of my office." (haha)
    • Red/green builds is important; measuring apps decoupled, describe how you deliver it;
    • Hard coded tests that want to talk to Dev environment; centralized team helps with that
    • But it does limit the ability to change; goal is encourage the right actions, not impede them
    • larger group may have tests that hit production; "that's just not right" (hahaha. awesome)
  • Jorn-Erik Jensen, Lego

    • One product team tried 10K feature branches; 1 branch exposed dependency nightmare
    • 1 branch; they knew they had a problem; 1 trunk for lego.com; spent a lot of time researching, and our state of automated testing, it would be impossible; it would be weeks of testing
    • we decided to do 1 branch of each product, and build automated testing along the way
    • We did an internal roadtrip;
  • Hugo Gayosso, GMRT @adr0sen: Cust Panel: value of Accelerating Software Delivery? Speed (GM), Quality (Gap), Flexibility (Etrade), Scale (Lego) #SparkEC13

    • @dbgordon: "With @ElectricAccelerator we went from 100 builds a week to 1000's." GM at #sparkec13 CD panel.
    • 6 week cycles: integration leaders would bring in supplier code; problem was how to make it faster: we have to talk to developers; how do you do the tools so that you don't have to change anything?
    • instead of trying to convince them, we found volunteers; someone say that we could do builds in one hour, and everyone begged to join
    • Powertrain group: 15 yeras: created centralized tool group to do build tools; versus silos developing each tools
    • People want to write code for cars, not build tools
  • Manual Garcia-Duque, ETrade

    • @adr0sen: In complex GM supply chain, Dev resistant to change - so we had to build no-change tooling that could manage 10x speed. #SparkEC13
    • Manuel: TODO: talk about compliance at ETrade
    • the only way to scale best practices is through automation, not documentation

Misc

  • @davemcclure: RT @Forbes: Saving Your Infrastructure From #DevOps http://t.co/x0OG1GLF5W cc @ScriptRock @500startups
  • @NoemiFenyvesi: #DevOps – taking the SH out of IT!
  • @kinncj: RT @patrickdebois: TDD gets you thinking about your code-architecture, #devops gets you thinking about your organisational structure #obser
  • @kinncj: RT @patrickdebois: TDD gets you thinking about your code-architecture, #devops gets you thinking about your organisational structure #observation
  • RT @patrickdebois: TDD gets you thinking about your code-architecture, #devops gets you thinking about your organisational structure
  • RT @adr0sen: Rogers: Agile saved 700,000h of productivity across GE Energy! #SparkEC13 http://t.co/APdpDp3lmZ
  • RT @rcbaillargeon: "Automated testing was slowing us down with the issues they were finding". #oddthingsdeveloperssay #SparkEC13 (Hahaha)
  • RT @dbgordon: "I need a process that keeps bad code out and lets the good code in" @GaryGruver at #sparkec13 on CD for large enterprises.
  • RT @ElectricCloud: @jezhumble delivers his #SparkEC13 keynote in ten minutes! Watch live here: http://t.co/78fuyavTZR
  • RT @rj_tech: @jezhumble: 'continuous delivery is the CAPABILITY to release at any time. Now that applies to IT and embedded sw…
  • RT @cyetain: Continuous Integration on a Dollar a Day by @jamesshore, recommended by @jezhumble @ #sparkec13 http://t.co/C8QKTgmFAh
  • RT @dbgordon: Dev complete means releasable. No demo & you aren't done until you have automated tests. @jezhumble
  • RT @adr0sen: .@jezhumble: "If u're doing manual regression test in 2013, all computers are laugh at you at night" #SparkEC13
  • RT @rcbaillargeon: DevOps is everywhere. Check out diversity of toys, clothes, cars, & stocks. #SparkEC13 http://t.co/pM77dKModr