2014/05/05: Monitorama Portland

by Gene Kim on

#monitorama

Next: Adrian Cockcroft: Please, No More Minutes, Milliseconds, Monoliths... Or Monitoring Tools (@adrianco, Battery Ventures)

  • @adrianco: #Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring Tools! by @adrianco #cloud http://t.co/gLK7Q3LAtXg

  • .@adrianco: "My job these days: 'baffling-late-adopters as a service'" OMG. Amazing graph:

    https://pbs.twimg.com/media/Bm40zNwCIAEZpke.jpg

  • Slide showing @adrianco incredible contribs to monitoring over last 15+ years; "virtual adrian" Yes.

    https://pbs.twimg.com/media/Bm41Tt3CEAAyAFg.jpg

  • .@adrianco: "No more monitoring tools; we need analysis. Let's rename this conf to #analysisrama" (haha)

  • .@adrianco: "I want people to spend more time understanding systems, & dynamically controlling systems, feedback loops"

  • .@adrianco: "What's wrong with mins? Usually 8m delay after something bad, rollback, then 8 min more to see if fixed!"

  • .@adrianco: "CD/DevOps: lots of small chgs, but 1 chg much more likely to break; needs instantaneous detection to recover"

  • .@adrianco: "Netflix Hystrix/Turbine circuit breaker monitoring: 1 data pt per second;

  • .@adrianco: "Rule #2: total feedback loop (detect/fix) needs to be less than human perception (~10s)

  • .@adrianco: "Milliseconds too long; must JVM has ns timers

  • .@adrianco: "Rule #3: Validate your measurement system has enough accuracy and precision"

  • .@adrianco arguing that monolithic monitoring systems don't cut it: 'can't have gaps in your telemetry':

    https://pbs.twimg.com/media/Bm43fxdCcAAg1tq.jpg

  • .@adrianco: "@a32an: RT @hertling: #1: Spend more time working on code that analyzes the meaning of metrics than code that collects/stores/displays metrics. #mo

  • .@adrianco: "Use in-band monitoring [uses same services/infrastructure as your service] & out-of-band [like SaaS]"

  • .@adrianco: "Your monitoring MUST be more available than the service you're actually monitoring." (!!)

  • .@adrianco: "High rate of chgs: ephemeral configs (can't hand-tweak); microservices w/complex calling patterns"

  • .@adrianco showing Gilt's amazing growth in services:

    https://pbs.twimg.com/media/Bm44pU8CYAAKxUR.jpg

  • .@adrianco: Haha. OMG. The Death Star architecture for Netflix, Gilt Group, Twitter:

    https://pbs.twimg.com/media/Bm443CKCMAEkNG8.jpg

  • .@adrianco desc using FFT to forward-predicting to auto-scale: see weekend bulge: biz metrics:

    https://pbs.twimg.com/media/Bm45VCHIEAAhfSr.jpg

  • .@adrianco: "

  • @vingado12345678: RT @newrelic: "Monitoring systems need to be more available and scalable than the systems being monitored" @adrianco

  • .@adrianco: "In DevOps, devs are managing services, now driving APM based: biz transactions, JVM metrics, transaction errors (Netflix Servo, Yammer Metrics)

  • .@adrianco: "Embedding metrics (allowing use in other people's tool) is so useful, allows virality

  • .@adrianco: "Cloud assets bursty: Netflix code push (once every 40s) creates 100s of servers; often re-uses IP/MAC

  • .@adrianco: "NetflixOSS Edda: record a full history of your configuration

  • .@adrianco: "Many of our Cassandra clusters span 4 different regions;

  • .@adrianco's 5 New Rules of Monitoring:

    https://pbs.twimg.com/media/Bm47Rg-CMAAEE2N.jpg

  • .@adrianco: "There's no more architecture diagram anymore; everything always changing; ppl don't even try anymore at Netflix

  • .@adrianco: "Problem with many OSS monitoring tools: great backend, but often front-end not as good commercial tools

  • .@adrianco: "Netflix use JMeter to do canary testing, post functional testing; compares old vs new (CPU, biz metrics, latency

Next: James Mickens, Microsoft Research: "Computers Are A Sadness, I Am The Cure"

  • Mickens: 'Real title of my talk: "I am infallible; you are luck to receive my wisdom"'
  • Mickens: "
  • @sigje: IF YOU EVER GET A CHANCE TO SEE JAMES MICKENS TALK.. GO. AWESOME. FUNNY. #winning
  • @dshack: RT @ashedryden: OH: however, I am not here to discuss racial inequality in Middle Earth here today.
  • @selenamarie: "i am so tired of hearing the word count example"
  • @TerribleDev: #monitorama "Lets stop talking about map reduce, uninteresting"
  • @TerribleDev: #monitorama "Lets stop talking about map reduce, uninteresting"
  • @selenamarie: "you know what you can use that whole warehouse of machines for? you can use them to count words!"
  • Mickens: "Cloud is awful; If you use cloud, you know IOPS queue is 8 gazillion long, none of them for you."
  • Mickens: "Working in cloud is not fun; proof: go into cloud engineering troubleshooting meeting" (OMG. Funny.)
  • Mickens: "You rolled out an OS upgrade? No! Everything will reboot for decades!
  • Mickens: "Cloud will never work; too big, too complicated; Just give up; this is a msg of hope" (haha)
    https://pbs.twimg.com/media/Bm4_dXHCAAAHV34.jpg
  • @hertling: RT @nphase: How does the cloud even work? #monitorama http://t.co/DMq2VZbjsa
  • https://pbs.twimg.com/media/Bm4ODSCQAA5xW.jpg@aneel: "asking fewer things from life is a very powerful strategy for dealing with an adversarial world" -Mickens
  • Mickens: "NoSQL is
  • @aneel: James Mickens msft research page http://t.co/132KzSt9sE
  • Mickens: "NoSQL is not generically evil; here's what's evil: 'letting read & writes fly around ur system in random order"
  • Mickens: "
  • @the_mckern: RT @sigje: "Go learn to play the guitar when your web mail is down, the universe is doing you a favor. It's rude to turn down favors" #moni
  • @nigelkersten: I'd really like to see James Mickens and @grim_radical face off on stage for maximum hilarity some time.
  • @pims: RT @auxesis: This slide of @adrianco's right here is why I love working in the monitoring space #monitorama http://t.co/XNWvAlqJsa
  • @selenamarie: "LET YOUR READS AND WRITES CHOOSE THEIR OWN DESTINY" #monitorama /cc @aphyr
  • Mickens: "If u're a VC that invests in [certain app], I hope u become poor. Like destitute poor. Like leper in Bible poor
  • @williamkratz: Is James Mickens available for parties? Best. Talk. Ever.
  • @aneel: RT @benzobot: These consistency protocols, they want to constrain your freedoms! #monitorama http://t.co/MoS2UpG4I8
  • Mickens: hilarious threat analysis mitigation model: "Is Ur Adversary Mossad or Not-Mossad?"
    https://pbs.twimg.com/media/Bm5BSntCcAA32TQ.jpg
  • @bstrand: James Mickens for president. #monitorama http://t.co/6weqsm5o96
  • @michaeldexter: Mossad vs. Not-Mossad @allanjude @ChrisLAS #Monitorama James Mickens http://t.co/46kBnpFE4h
  • @mary_grace: srsly--James Mickens--best conference talk i've ever heard. stomach hurts from laughing, but the points he's making are solid. #Monitorama.
  • @TerribleDev: What we thought the NSA was like #monitorama http://t.co/75IkURksxI
  • @aneel: RT @danslimmon: Handy security reference table #monitorama http://t.co/FLciJcLB79
  • @drcab1e: RT @ashedryden: “The only thing you need to consider with security is if you’re being attacked by Mossad or not-Mossad.” #monitorama http:…
  • @bridgetkromhout: They have guys fast-roping out of helicopters and you're on a frisbee golf team. @MarkovMickens on threat models and security
  • @mike_julian: James Mickens' comedic style reminds me of Chris Rock, except with a technical background.
  • @leighd2k: 1/3 of my twitter stream is at #monitorama and all just tweeted the security thread model slide @danslimmon @adrianco @nphase etc
  • Mickens: "Do I know Mary? Of course not... But I'm a guy... So I'm dumb. Maybe I do know Mary."
  • Mickens: "#1 Infosec Research Priority: Eliminate Men As A Gender"
    https://pbs.twimg.com/media/Bm5C5zVCcAAtg_m.jpg
  • @hertling: RT @ashedryden: Literally dying rn #monitorama http://t.co/8lygtcURuf
  • @sigje: Dude overflow detected.. this is hilarious because .. #monitorama does have a dude overflow problem. :)
  • @1technodiva: RT @ashedryden: “If you work in the cloud industry, stop talking to reporters because they make it sound cool, when in reality it’s only pa…
  • @standaloneSA: To everyone at #monitorama, if you are a @USENIX member, you know that James Mickens has a most hilarious column in the semi-monthly mag.
  • @bstrand: Hey #Monitorama, you loved James Mickens; check out his body of work: http://t.co/6weqsm5o96
  • @elnoelle: RT @mary_grace: srsly--James Mickens--best conference talk i've ever heard. stomach hurts from laughing, but the points he's making are sol…
  • @mary_grace: RT @RealGeneKim: RT @standaloneSA: To #monitorama, if u are @USENIX member, you know that James Mickens has a most hilarious column in the …
  • @mary_grace: how do we know Mary is Mary, and not creepy hacker dude who wants to be your "friend"? #monitoramahttp://t.co/ON03gL9U9h
  • @adrianco: RT @mary_grace: how do we know Mary is Mary, and not creepy hacker dude who wants to be your "friend"? #monitoramahttp://t.co/ON03gL9U9h
  • @aneel: RT @ashedryden: Dude overflow detected. #monitorama http://t.co/5TNjn0CcKK

Next: Ignite Talks

Next: Toufic Boubez: Simple math to get some signal out of your noisy sea of data, CTO, Co-Founder Metafor Software

  • Boubez: "The fact that all three of my companies are Gartner Cool Vendors must put into question the low bar of being a cool vendor
  • @spazm: Simple math for data Spoiler: no simple tricks. #monitorama
  • "Metrics could be unicorns per second." http://t.co/silafZcUVk
  • Boubez: "@lusis: alert fatigue is the single largest problem we have"
  • Boubez: "
  • @puppetmasterd: RT @nigelkersten: I'd really like to see James Mickens and @grim_radical face off on stage for maximum hilarity some time.
  • @Xorlev: “Watching walls of screens is useless.” Amen. @metaforsoftware
  • @TerribleDev: Warehouse of kids watching screens does not scale #monitorama http://t.co/MfoO4Kp6hF
  • Boubez: "Three stddev rule works right? No! Wakeup calls at 2am, 2:37a, 4:13a, 5:17a" (haha) "Why? Not gaussian distrib"
  • @TerribleDev: 3 sigma == phone calls at 2am...3am...4am
  • @benzobot: RT @brianholcomb: protip from #monitorama: learn stats.
  • @postwait: Huge work and $$ made @circonus meet these => RT @RealGeneKim: .@adrianco's 5 New Rules of Monitoring: #monitorama h@postwait: Huge work and $$ made @circonus meet these => RT @RealGeneKim: .@adrianco's 5 New Rules of Monitoring: #monitorama http://t.co/XntJNo0sJZ
  • ttp://t.co/XntJNo0sJZ
  • @TerribleDev: Most data from Data Centers is not Gaussian and we need to stop looking at it as such
  • @benzobot: No single stats technique fits all your data #monitorama @tboubez
  • @TerribleDev: The Mean as a predictor is too static....Moving Averages == Big Ideas
  • .@benzobot: progression of analysis madness: weighted avg, expon. weigted avg, 2x & 3x expon weighted avgs, Holt Winters...
  • .@benzobot:
  • @aneel: @tboubez doing a good job of pillorying all the statistical methods used to cope with problems that start with the data
  • .@tboubez: "Tip #1: creating histograms is your friend" (the first step of exploring any data)
  • .@tboubez: "Tip #2: Kolmogorov-Smirnov test: non-parametric test"
    https://pbs.twimg.com/media/Bm5KZAuCEAAUdNx.jpg
  • .@tboubez: "Tip #2: KS test: measures max distance betw cumulatv distributions; compare periodic data
    https://pbs.twimg.com/media/Bm5Ky07CUAAr5kE.jpg
  • @n0nsequitarian: Kolmogorov-Smirnoff test useful for anomaly detection in non-Gaussian distribution data.
  • @DamirHot: RT @metaforsoftware: Want to talk math that works (anomaly detection) at #monitorama? Ping @tboubez or @DamirHot = will be happy to geek o…
  • (My #1 stats challenge in last 4 months; trying to understand the SPSS pricing/licensing screen; gave up after 20m)
  • .@tboubez: "Tip #3: Diffing/Derivitives: often data is not stationary, but the derivitives are"
    https://pbs.twimg.com/media/Bm5LdDeCMAAVFy1.jpg
  • .@tboubez: (original data looked totally noisy; derivative series looks gaussian. voila.)
    https://pbs.twimg.com/media/Bm5L3PXCcAAVoXk.jpg
  • @aneel: RT @selenamarie: Diffing/derivatives... most frequently, first difference tends to be stationary
  • @rhm2k: ARE YOU KIDDING ME? #Monitorama live video feed is now 'up to date', and I've missed @tboubez ?? I am calling in an air strike. Feh!

Up: The Care and Feeding of Monitoring, Katherine Daniels

  • @sigje: Wooh!! Sparkly devops princess @beerops is up with the care and feeding of monitoring
  • .@beerops: "This ops war story begins like so many stories begin.. With a @PagerDuty alert..."
  • .@beerops: "What happens when you monitor all the things" (HAHA)
    https://pbs.twimg.com/media/Bm5SKk-CAAAWT.jpg
  • .@beerops: "When searching for needle in haystack, don't add more hay" (i.e., don't monitor all the things. :)
  • @TerribleDev: Xenos dashboard :( #monitorama http://t.co/vbKosr8uUR
  • @TerribleDev: Xenos dashboard :( #monitorama http://t.co/vbKosr8uUR
    https://pbs.twimg.com/media/Bm5SF6wCYAATfHn.jpg
  • @spazm: "Our Mongos were up and mongo-ing" #firstproblem
  • @TerribleDev: "it rains in the cloud and the load balancers get rusty"
  • @bridgetkromhout: Great "blip" story - @beerops "We didn't deploy anything using math.random to take down the site, but..." #monitorama http://t.co/2zIeiUckQT
  • @bridgetkromhout: Great "blip" story - @beerops "We didn't deploy anything using math.random to take down the site, but..." #monitorama http://t.co/2zIeiUckQT
  • .@beerops: "@parkercloud: @RealGeneKim @beerops #monitorama Yes monitor all the things, how to do it well is another question @EnterpriseMonitoringArchitecture
  • .@beerops: "We lowered our timeouts (down to 60s): found db queries going too long; deleted some crappy code. How'd we get here?
  • @benzobot: “Just rub some webscale sauce on it” #monitorama @beerops
  • .@beerops: "Why were 2 APIs on same svr? No Dev sat down & decided, 'I'm going to really screw over our future Ops ppl"
  • .@beerops: "Countermeasure (instead of splitting API across 2 diff servers): put NgineX on it. Worked for 1 year"
  • @sigje: Awesome, when you migrate to external services doesn't eliminate your needs for monitoring. #monitorama @beerops
  • @Xorlev: “Nobody set out to make these bad decisions” @beerops
  • .@beerops: (on high mem usage): "If we weren't using all the memory on all our servers, then we'd be worried"
  • .@beerops: OMG. Is that WordStar?
    https://pbs.twimg.com/media/Bm5VAcuCcAAKjXQ.jpg
  • @TerribleDev: "Oh that check...thats red? its fine" things that should not be said
  • @TerribleDev: "Consider load testing your monitoring" @beerops
  • @TerribleDev: "I can't monitor something if I didn't know it existed in the world" @beerops
  • @randomfrequency: Sensu aggregates are your friend - http://t.co/97JG2VYvIH at the bottom @sensuapp @portertech
  • .@beerops: Interesting. "Lasting counter-measure: We eventually got rid of API #2" (which shared server with API #1)
  • @TerribleDev: Great talk by @beerops on how dev's ops, everyone can work together to monitor properly....
  • @hertling: My notes from Katherine Daniels' talk at #monitorama: http://t.co/gDP2VRQtre @beerops

Next: Car Alarms and Smoke Alarms, Dan Slimmon

  • .@danslimmon: formerly Ops Team Manager at Blue State Digital (woohoo! Say hi to Leigh!), formerly at Exosite
  • .@danslimmon: "90% detect plagarism; 20% detect false positive of 'no plagarism'; 30% plagarim in population
    • 30% plagarism: .9 = 27%
    • 70% plagarism: .2 * .7 = .14
    • Total: 41% (no :)@hertling: My notes from Dan Slimmon on Monitoring at #monitorama: http://t.co/iW186S2wdc
    • Right answer: 59%
  • .@danslimmon: "Why our alerts so sensitive/prone to false positive? B/c outages caught by our boss or customers seems worse"
  • @TerribleDev: "When smoke alarms go off we dont put our ear buds back in and continue working"
  • @HypertextRanch: RT @benzobot: Car alarms: highly sensitive, but effectively useless. Don’t do monitoring like that!
  • @benzobot: Car alarms: highly sensitive, but effectively useless. Don’t do monitoring like that!
  • @aneel: RT @robertolupi: A talk at #monitorama about, basically, the F1-score. With words from med school instead of data mining. We always redisco…
  • @TerribleDev: Car alarms go off and its usually not someone steeling your car #monitorama http://t.co/tYNcb8kpye
  • @aneel: RT @benzobot: Positive Predictive Value: ensuring that the alert that wakes you up at 3am is telling you something is actually wrong #monit
  • .@danslimmon: "As our service gets more reliable, false-positive rate becomes far more important" (did I get that right?)
  • @selenamarie: So much great basic information about how to apply basic stats to monitoring complex systems.
  • @HypertextRanch: As your service gets better your probes get worse if you don't tune specificity.
  • @RealGeneKim @danslimmon it increases so have to refine your monitoring to decrease the false positives
  • @selenamarie: PONY WANTED: something like nagios, but runs diagnostic routines in response to an alert before paging

Misc

  • @rhm2k: RT @johnlkinsella: sigh Does ustream ever work for conference streaming? #monitorama < In a word … No.
  • Haha. RT @rhm2k: RT @johnlkinsella: sigh Does ustream ever work for conference streaming? #monitorama < In a word … No.
    https://pbs.twimg.com/media/Bm5Cs6kCUAAqyUb.jpg
  • OH: "Just hit Command-F1. Then Go into Rehearse Slideshow. Yeah, there you go."
  • "Why hugs & conf buddies? B/c I was at a conf & I felt like total outsider, and was lonely. No one should feel like that here"
  • Trivia fact: "Emotion pain [eg loneliness] is as painful as physical pain. Loneliness biggest cause of early morbidity!"
  • Link to talk an amazing lecture on social pain, loneliness, social media here: http://scribes.tweetscriber.com/realgenekim/260
  • @rberger: Love going to conferences where empathy is the first topic (it’s rare but #monitorama is one)
  • RT @hertling: #1: Spend more time working on code to analyzes meaning of metrics than code that collects/stores/displays metrics
  • RT @hertling: RT @nphase: How does the cloud even work? #monitorama http://t.co/DMq2VZbjsa
  • RT @aneel: "asking fewer things from life is a very powerful strategy for dealing with an adversarial world" -Mickens
  • RT @aneel: James Mickens msft research page http://t.co/132KzSt9sE
  • RT @standaloneSA: To #monitorama, if u are @USENIX member, you know that James Mickens has a most hilarious column in the semi-monthly mag.
  • RT @mary_grace: how do we know Mary is Mary, and not creepy hacker dude who wants to be your "friend"? #monitoramahttp://t.co/ON03gL9U9h
  • And huge trauma & pain, right? :) RT @postwait: Huge work, $$ made @circonus meet these: .@adrianco's 5 New Rules of Monitoring
  • RT @aneel: @tboubez doing a good job of pillorying all statistical methods used to cope with problems that start with the data
  • Hahaha. RT @TerribleDev: Xenos dashboard :( #monitorama http://t.co/vbKosr8uUR
  • RT @bridgetkromhout: Great "blip" story - @beerops "We didn't deploy anything using math.random to take down the site, but..."

  • @metaforsoftware: RT @SeenFeed: Just trended for #monitorama: "data, @tboubez tip" (20 tweets): http://t.co/7lU4wx9cQE

  • RT @TerribleDev: "Consider load testing your monitoring" @beerops

  • @bridgetkromhout: Thx for all the help on #devops survey, btw! The results are astonishing... I'll give you sneak peek?RT @hertling: My notes from Katherine Daniels' talk at #monitorama: http://t.co/gDP2VRQtre @beerops

  • @interrante: RT @hertling: My notes from Adrian Cockcroft's keynote at #monitorama: http://t.co/9alwc8AWj4

  • RT @interrante: RT @hertling: My notes from @adrianco's keynote at #monitorama: http://t.co/9alwc8AWj4

  • Yes! .@adrianco slides! "Please, no more Minutes, Milliseconds, Monoliths or Monitoring Tools! by http://t.co/gLK7Q3LAtX

  • @aneel: there's a guy hacking on @OpenStack during #monitorama in the row in front of me #startuplife

  • (Nice) RT @HypertextRanch: As your service gets better your probes get worse if you don't tune specificity.

  • RT @selenamarie: PONY WANTED: something like nagios, but runs diagnostic routines in resp to an alert before paging #monitorama @danslimmon

  • RT @hertling: My notes from Dan Slimmon on Monitoring at #monitorama: http://t.co/iW186S2wdc @danslimmon