by Gene Kim on
We tease who we love, right? :) @botchagalupe @hsiboy
@rlengwinat: RT @stack72: RT @stack72: devs need to take ownership of instrumenting their code. enable self service metric creation - key culture shift #VelocityConf
@KojiISHIMOTO: RT @bluesmoon: RT @bluesmoon: if you missed my late night tweet, the summary and slides of our #velocityconf talk on #webperf are online: http://t.co/N ...
Yo, u're at wrong conference, mate. Java One isn't until next month! Haha. Thx for kind words! (From fellow (ex) developer) :) @redturtleltd
@hsiboy: @RealGeneKim It was a great talk, packed out and i spotted @nasrat in the crowd too. lolz taken in good jest
Michael Rembetsy, Director Ops Engineering (@mrembetsy), Patrick McDonnell, Senior Ops Engr (@mcdonnps)
Next up: Michael Rembetsy, Director Ops Engineering (@mrembetsy), Patrick McDonnell, Senior Ops Engr (@mcdonnps)
.@mrembetsy/@mcdonnps: "Here's how we've scaled our culture at Etsy. 125 engrs; 12 ops people"
.@mrembetsy/@mcdonnps: "sprouter was the 'middleware of distrust', initially designed to give Dev access ot the databse
@cmsj: Etsy has 12 ops folks, 125 engineers, 350 employees total.
.@mrembetsy/@mcdonnps: "2008 was the year of pain; deploys took hours, w/ops at helm, code didn't work; no communication
.@mrembetsy/@mcdonnps: "Pushes failed, but couldn't restart easily, causing 500 errors across the site."
.@mrembetsy/@mcdonnps: "After a deployment day, I'd be completely spent. That's when we created internal blog fix.etsy.com"
.@mrembetsy: "We realized we had to fix tech debt; can't keep living in sea of engineering filth" (hahaha)
.@mrembetsy: "The day before Cyber Monday (largest ecommerce day), thought 'WTF did I get myself into?'
.@mrembetsy: "2008 stats: $87.3M; 163M visits; made decision to switch to CDN
@mrmanc_tech: Etsy 2008: 250 servers in two DCs, deploy take hours, and complicated process. Rollback similarly complicated. Sound familiar? #VelocityConf
@stack72: Etsy post outtage updates on their blog. Total transparency
.@mrembetsy: "Sea change was 2009; brought people back inside to Brooklyn [less remote]; 1st step was to manage banner
.@mrembetsy: "Breakthrough: could change banner w/o pushing code; then built Deployinator; infrastructure overhaul
.@mrembetsy: "Moved to hiring people in Brooklyn, moving offices to DUMBO; standup mtgs start to improve communication
@botchagalupe: etsy / deployinator http://t.co/0iI1tW69
.@mrembetsy: "2009: the year stability arrives; mgmt stops saying 'go do this.' Ppl happy to come to work; dev helping rack
.@mrembetsy: "Ended scheduled downtime; site remains up as much as possible; Master db purchased as capacity stopgap
.@mrembetsy: "2009 takeaways: beginning of DevOps culture; sales grow 102% that year
.@mrembetsy: "2009 action items: stabilize most painful part in org; hire staff that make diff; pick projects that matter
.@mrembetsy: "Just ship it
@_neckbeard: RT @mpaluchowski: RT @mpaluchowski: The plece you work in must fit with culture. Can't have lean, creative, agile in a plain, dull office.
.@mrembetsy: "2010: renewed energy; Kellen comes in as VP Engr; @allspaw comes in as VP Ops (was on etsy advisory board)
.@mrembetsy: "Started Code As Craft.etsy.com; created continuous integration team at end 2009; started standardizing on PHP
.@mrembetsy: "Use it or nothing else: benefits: everyone could read/rewrite your code; MySQL migration begins from Postgres
.@mrembetsy: "If it moves, graph it; tools: ganglia, graphite, built 'incomparable line technology' to correlate w/deploys
.@mrembetsy: "Nagios: 7000 checks for 700 hosts: we used to have lots more; pulled chks for unimportant 3am wakeups
.@mrembetsy: "Management ideals: accept failures but don't lower stds; blameless post-mortems; career planning
.@mrembetsy: "Happy company = happy community
@scoobiedoobie: Have a blameless postmortem if you have a failure
@mpaluchowski: "We're being woken up at 3AM for stupid things. Why?" Find out and fix.
@lozzd: I want to expand on my "If it moves, graph it" comment: Even if it doesn't move, you should graph it, because it might move!
RT @cmsj: Etsy has 12 ops folks, 125 engineers, 350 employees total.
RT @lozzd: Expanding on my "If it moves, graph it" comment: if it doesn't move, you should graph it, because it might move!
@itarchitectkev: Etsy chose PHP and MySQL so Dev and Ops could understand the stack and everyone can contribute if they wanted to. #VelocityConf
.@mrembetsy: "Etsy is now B Corp, certified to follow sustainabililty, etc.; Belief: happy = successful"
Kudos to @mrembetsy/@mcdonnps. Fantastic talk!!! cc @stack72: Hearing how etsy built a culture is amazing. Lots to learn here. Kevin Costner was right, build it and they will come #VelocityConf
@allspaw+100 RT @stack72: Hearing how etsy built culture is amazing. Lots to learn here. Kevin Costner right, build it & they came #VelocityConf
Amazed anyone found this room! Excited! RT @botchagalupe: An englishman talking about queuing "iLoviT" .. @ph
Pattern: @unixdaemon: "job queues allow you to do slow work outside of the http request" @ph
RT @unixdaemon: "job queues allow you to do slow work outside of the http request" @ph
.@ph: "Anything not shown on page load: no one will notice: email, tweets, external apis, webhooks: even if it's only couple of secs"
.@ph: "Use job queues for: email, tweets, external apis, webhooks: even if it's only couple of secs"
.@ph: "User exp: 500-200ms threshold for "wtf? click reload"; follower: 1-10s; stranger: 1-2m old data is just fine"@scoobiedoobie: RT @itarchitectkev: RT @itarchitectkev: Nice hearing about developing the right culture anf giving back to the community. @etsy rocks. #VelocityConf
Astonishing. RT @itarchitectkev: Nice hearing about developing right culture & giving back to community. @etsy rocks. #VelocityConf
@stack72: Service = code + infrastructure #VelocityConf
.@ph: "Problem: if you're outside HTTP request [like in job queue], can't take adv of 500 error handling"; drop/requeue?
.@ph: "Resilient system soln: reserve_job(), remove() upon complete (prob: can't guarantee job only run once; bad for email)
.@ph: "Choice: run job 0 or 1; or run job 1 or many"
.@ph: "Choice: slow reliable queue AND Fast unreliable queue"
.@ph: "Idempotent: ok to run twice: resizing photo; Not: sending emails/tweets; Almost external API"
.@ph: "pattern: update canonical source in request: queue idempotent job to update denormalized copies"
.@ph: "#4: jobs don't run in order: no shared state between workers: update user 20; NOT reindex user 20 w/these attributes"
.@ph: "jobs may create jobs when finished"
.@ph: "#5: lock contention hurts; queue churn when users does lots of ops and leaves"
.@ph: "#6: alerting is hard
.@ph: Despite not wanting to talk tools, here's the list of tools @ph mentioned. Helpful!https://pbs.twimg.com/media/A4XdD9RCMAAbrOv.jpg
@phrawzty: @alq breaking down the webops cycle. #velocityconf http://t.co/mlQy8I27
@allspaw: “@AndrewBrockway: SPOF-O-Matic - superb tool for finding 3rd party blocking scripts. #velocityconf http://t.co/LxmiqeQy” /cc @sethwalker
@stack72: RT @mcdonnps: RT @mcdonnps: Slides are up from @mrembetsy and my's presentation on Continuously Deploying Culture at @Etsy #velocityconf #devops http: ...
RT @mcdonnps: Slides from Continuously Deploying Culture at @Etsy #velocityconf
At risk of looking stupid: I've heard mainframes kicked ass at queueing. How do modern queues compare to mainframes? @ph