2013/06/19: Velocity Day 2

by Gene Kim on


My favorite conference!

Thank you to @appfirst! They’re giving away “Phoenix Project” books! I’ll be signing them tomorrow at #velocityconf #booth614 http://t.co/qsbnMXVhfT

Souders/Allspaw: Opening

  • Velocity: 2013 vs 2008: shows: 4 vs 1; originally San Francisco Airport Marriott Waterfront; Chair: Jesse Robbins; 350 vs 1800; Exhibitors: 17 vs 74; Sponsors: 9 vs 63
  • # of books on Amazon on "devops": 35!
  • Vote here: http://bit.ly/VelocityFavorites

Jonhan Bergstrom, Lund University, Sweden: "What, where and when is risk in system design?"

  • risk is a product of unreliable system components; risk is a function of non-linear relationships
  • Machine metaphor principle: mimic as closely as possible a machine
  • reliability = safety
  • sticking to the rules; matter of design; something the machine has/is; functioning of the whole is reducable to the functioning of the constituent cmoponents
  • biggest source of error: human error? "the unreliable human actor"
  • risk = sum (i = 1 to k) C sub i * P sub i
  • Redundant barriers
  • Reducing variability: replace unreliable humans w/reliability technology; controlling unreliable humans with reliable rules; asking the unreliable humans to try harder
  • diverse actors; strong interdependencies; shift between loose and tight coupling; functioning of the whole cannot be reduced to the functioning of the constituent components
  • "More barriers [processes/controls] actually increase risk"; Yes. Reminds me of top myths of reliable systems. #velocityconf
  • A dynamic system does not allow a complete description - constantly changing and no one person can grasp the whole system #velocityconf
  • Reminded of two favorite books related to today's Keynote - "Drift into Failure" and "Managing the Unexpected" #velocityconf #devops
  • "Normalization of deviance" == "Technical debt"; tech debt is the drift into failure, impedes understanding of system
  • "Boundaries around a financial system: financially acceptable behavior, functionally acceptable behavior, workload"
  • "In complex systems, the only way to find out you've gone over the line is to go over the line"
  • "How do you get feedback that you get closer?"
  • "Minor changes all the time, or major ones less often?"
  • Fascinating example: "Twitter t.co link shortener built to reduce risk, but introduced fragility"
  • Managing risk: organizations who are good at this discuss risk all the time, even when things look safe
    • Invite dissenting opinions, constantly debate boundaries and their distance, monitor gap between work prescribed vs performed
    • Focus on understanding how people make tradeoffs guaranteeing safety; safety management is not about avoiding, it's about achieving
  • "Risk is a game played between values and frames of reference"; P. Stovic 2001, "The Risk Game"
  • This changed my perspective dramatically RT @noahsussman: "Amazon deploys once every 11.6s" (2011) http://t.co/rvnhODz1TD #velocityconf
  • @wickett: Knowing your technical debt, and where it came from is a way to understand the history of risk #velocityconf
  • I'd always pick minor code changes often versus major changes rarely. #velocityconf
  • If no single actor can understand the value of transactions in your enterprise you might be in a scary place. #velocityconf
  • Safety measures become their own risks - c.f. Twitter's t.co domain meant to prevent phishing links but then went down itself #velocityconf
  • Risk and safety are both products of the same kinds of processes in complex systems. #velocityconf @bergstrom_johan
  • Risk management: "[leaders] Focus on understanding how people make tradeoffs guaranteeing safety." #velocityconf
  • Make your values explicit - what are the values guiding your perception of risk? #velocityconf
  • Great talk on risk in Web Ops from @bergstrom_johan (and he used Prezi) slides up at http://t.co/vQWz1dwaY3 #velocityconf #yam
  • “Risk is a game played between values and frames of reference.” - @bergstrom_johan at #velocityconf
  • @bergstrom_johan LOVED your keynote. Such a great presentation! #velocityconf
  • Relevant to the #velocityconf keynote, Paul Slovic’s “The risk game”: http://t.co/bSek0lgHsz
  • create a culture of high performance inside your team #velocityconf
  • @RealGeneKim The book is much better than this wikipedia entry: http://t.co/awP8NQMpwW . I recommend it.
  • I love Systemantics book! “@cwestin63: Book is better than this wikipedia entry: http://t.co/awP8NQMpwW . I recommend it.” #velocityconf
  • "The cost of Technical Debt is increased Risk of Failure" (paraphrasing @bergstrom_johan's excellent keynote at #velocityconf)

Up: Kyle Rush, Formerly Obama For America: @kylerush

  • .@kylerush: Online fundraising platform: $250 million raised; 4.27M donations; 81.5M views; 17.8M visits; 6 month lifespan
  • Goal: $500M online goal in 2008; expected to raise $1B; needed to raise as much money as what Amazon makes
  • @jdotp: @RealGeneKim @kylerush Wrong JP Gene! :) I'm hoping we get to chat more later, so much to talk about! Go Kyle!”
  • "We used Blue State Digital: fundraising app, CMS app"
  • "Started seeing 5-7+ second load times; no CDN, so moved to CloudFront from Amazon; no caching; 46 requests, 700K loads"
  • "Overhauling Blue State app would have been heavy lift project; so we created Donate API, connected to Blue State"
  • .@kylerush: "Next was overhauling CMS: built one using Jekyll (static site generator using Ruby built by Shopify) & GitHub"
  • .@kylerush: "Then we put the static assets on S3 and Akamai, which was awesome"
  • .@kylerush: "80% faster time to paint (vs page load times) as measured by WebPageTest (everyone should use this tool)"
  • .@kylerush: "We love Optimize.ly: faster page had 14% higher conversion vs. slower page; est $32MM in tight fundraising race
  • .@kylerush: "Redundancy needed: processing $3MM/hour; S3 and HTML fine, but payment processing was weak point
  • .@kylerush: "We duplicated the payment processing system, put it into EC2, split by geo location"
  • .@kylerush: "1,101 frontend deploys; 4K lines of JavaScript; 240 a/b tests" "Results? 49% increase in conversion rate"
  • .@kylerush is now Director of Technology at The New Yorker! Great talk!
  • Results? 49% increase in conversion rate"
  • .@jespi yeah, contribution pages are a very special, highly-cacheable case, and payment processing is nicely stateless #velocityconf
  • Blog post from @kylerush about their fundraising platform: http://t.co/I2XoG37rgN #velocityconf

  • @velocityconf: Feedback is love; I love #velocityconf, but I think “talk to vendor” ratio boundary got broken this yr. @courtneynash

Arvind Jain: Is The Web Getting Faster?

  • Speed of the network, browser speeds, speed of web page
  • Akamai: "Since 2007: peak connection speed in US increased 5x; now 30 Mbps"
  • "More important than connection speed is round trip time; cable: 26ms; DSL: 43 ms; Google Fiber: 4 ms" (!!)
  • "When Chrome launched in 9/2008, Javascript perf was 20x faster than peers; Last year, perf increased 24%"
  • "Each month, size of web pages keep going up; average web page is 1.3 MB" (!!)
  • "In 2012, desktop page load times avg was 3.7 seconds; 5.7% faster in 2013"
  • "At Google, we keep working to figure out how to get to the 1s page load times that we'e all been dreaming of"

Branden Gregg:

  • USE method: check utilization, saturation, errors for every resource @brendangregg #velocityconf
  • Utilization: time resource was busy/degree used, saturation: degree of queued extra work, errors - check output @brendangregg #velocityconf
  • Always a pleasure hearing @brendangregg talk. I remember the "weather symbols" dashboard status method in Sun's Fishworks NAS #velocityconf
  • Moar heatmaps: http://t.co/lERWErXWVh @brendangregg #velocityconf <3 OH: "take a picture.. Take a picture.. No. Ok glass.. Take a picture.. Ok glass. No" #velocityconf

I'm on my 8th CEO [at Yahoo] - Steven Woods at #velocityconf @ysaw
“Caching: the cause of, and solution to, all of life’s problems.” — @ysaw #velocityconf

# Capacity planning at Twitter

  • "Super Bowl blackout created massive unexpected surge in Twitter traffic." (Capacity planning at Twitter is hard)
  • "Core drivers of capacity: rates of tweets, retweets, favorites, photos (drives storage)
  • Twitter's cap. planning is strikingly similar to mainframe cap. planning. Is that stochastic smoothing due to scale? #velocityconf
  • "Twitter uses MACD stats to detect capacity breakouts, just like stock traders use to trigger buy/sell"
  • "Twitter rolls out features all the time, and this allows us to help us do capacity planning
  • "Twitter can simulate/replay production traffic to four canary builds. We can test in production, but that's not advised"

That takes skill.

  • Flickr is doing face detection for their Ken Burns effect @ysaw #velocityconf

  • Facebook use tinydns for their geo dns. #djb win! #velocityconf

  • RT @jameshartig: Facebook: 12.5M HTTP/SPDY requests/s, 260M TCP connections/s at peak, says Adam Lazur #velocityconf

  • @colinhostert: RT @wickett: RT @jameshartig: Facebook: 12.5M HTTP/SPDY requests/s, 260M TCP connections/s at peak, says Adam Lazur

  • @Stephen Nelson-Smith: Lovely to see old friends at #velocityconf - we’re lucky to have such an awesome tribe - I’m proud to be a part of it. But now: sleep.