# 2019/12/10: YOW! Brisbane Day 2 1c

by Gene Kim on

#Yow19

2019/12/10: YOW! Brisbane Day 2 1a

  • @lizthegrey: So we used to have monoliths, then we broke things down into microservices... and microservice architectures are a chain of dominoes.
  • @lizthegrey: So we used to have monoliths, then we broke things down into microservices... and microservice architectures are a chain of dominoes.

We need to systemically watch and protect... which means design patterns like circuit breakers, to stop large meltdowns/cascading failure. #YOW19

We need to systemically watch and protect... which means design patterns like circuit breakers, to stop large meltdowns/cascading failure. #YOW19
- @lizthegrey: Things like Hystrix, Istio, Consul, etc. are service meshes that help scale this out. We can then take action on the data that's gathered, scale things out, etc.

and now people have service diagrams to cope with the complexity and relationships... #YOW19
- @unixbigot: Your microservice architecture, says @giltene, should incorporate circuit breakers. These are there not to stop work,but to prevent meltdowns. Separate data and control, so that you know what’s happening even when things break. #yow19 https://t.co/0jG1JJO2q5
- @unixbigot: Your microservice architecture, says @giltene, should incorporate circuit breakers. These are there not to stop work,but to prevent meltdowns. Separate data and control, so that you know what’s happening even when things break. #yow19 https://t.co/0jG1JJO2q5
- @lizthegrey: so here's a percentile view of our service... and the 95th percentile spiked, but how bad did the worst 5% get? Things have melted down and we're missing telemetry.

It's useful for marketing but useless otherwise. [ed: yuuup, this is why heatmaps >>>>> percentile lines] #YOW19
- @lizthegrey: and the 99th percentile line indeed looks... much much worse than the 95th percentile, and most people don't keep things beyond 99th percentile... (he's showing misleading New Relic graphs...) [ed: thank you thank you thank you @giltene <3] #YOW19
- @lizthegrey: What percentile numbers do people use? Well, it gets complicated when we try to compose the 99th percentile of each microservice and try to construct the chance an external interaction hits the 99th percentile of at least one? Way too high because of fan-out. #YOW19
- @lizthegrey: Over a long enough session, with enough parallel resource fetches, the user experience degrades.

We need to actually measure session behavior and not single transaction behavior. #YOW19
- @unixbigot: Let go of averages, says @giltene, and especially abandon averages of percentile values. These, says Gil, are worse than useless. When a webpage involves hundreds of HTTP requests, you are almost certain to see worse-than-99th percentile outcomes. #yow19 https://t.co/Hs9S3P5VsR
- @lizthegrey: What percentile numbers do people use? Well, it gets complicated when we try to compose the 99th percentile of each microservice and try to construct the chance an external interaction hits the 99th percentile of at least one? Way too high because of fan-out. #YOW19
- @lizthegrey: Over a long enough session, with enough parallel resource fetches, the user experience degrades.

We need to actually measure session behavior and not single transaction behavior. #YOW19
- @lizthegrey: Real systems tend to have good, bad, and terrible latency humps. depending upon whether they hit the happy path. But we've tuned the "happy paths" to align at whole numbers of nines that we're paying attention to. #YOW19
- @lizthegrey: Why don't we look at the tail latency? Because we can't meaningfully measure high numbers of 9s in short windows, and we tend to summarize the data in short windows with lower percentiles. [ed: hashtag-kill-percentile-metrics-and-go-to-events]

Use HDRHistogram instead. #YOW19
- @lizthegrey: Why don't we look at the tail latency? Because we can't meaningfully measure high numbers of 9s in short windows, and we tend to summarize the data in short windows with lower percentiles. [ed: hashtag-kill-percentile-metrics-and-go-to-events]

Use HDRHistogram instead. #YOW19
- @lizthegrey: Our monitoring metrics code tends to show delayed effects, because the time at which the high latency measurement arrives and is bucketed is... after the hiccup.

[ed: yuuup, this is why wide events with start timestamps matter!!] #YOW19
- @lizthegrey: And even worse, if you stop processing requests, then you have one bad outlier request taking 100 sec... rather than tens of thousands of slow requests!

We need to see the actual 10,000 requests impacted that queued outside the system... it's coordinated omission #YOW19
- @lizthegrey: We care about the response time inclusive of the queuing time, rather than the service time spent inside our service.

Averaged percentiles based off your own service's internal data are nonsense and have nothing to do with real user experiences. #YOW19
- @unixbigot: Two key points from @giltene: most monitoring systems give over optimistic metrics because they can’t measure the response of normal load during an outage. Second most systems don’t measure true end to end respnsectome, which is what customers see. #yow19 https://t.co/A2LagT7Xwz
- @RealGeneKim: @lizthegrey @giltene @stoker_lindsay https://t.co/PPxB31t3hy

yow19

Up: Casey Rosenthal, CEO and co-founder Verica, @caseyrosenthal

  • @lizthegrey: Giving my fingers a break during @caseyrosenthal's talk by linking to my previous livetweet from #YOW19 Sydney instead :) https://t.co/Sq1oD4jc8d

  • @unixbigot: It all began, says @Gaohmee, with this picture. They had made a tech demo for an astronaut game. Some leaked screenshots made it to Reddit, where of course (!) NASA was lurking. NASA has been working with VR for decades, but commercial headsets’re literal game changers. #yow19 https://t.co/en6jMvFxBB

  • @hillelogram: Since commercial VR runs on game engines, NASA needed game designers. Found the tech demo on Reddit.

"Lesson, hang out more on Reddit."

First collab project was Earthlight, now public: https://t.co/6BIxvUkwtq

[showing trailer]

yow19

`
- @MichelePlayfair: #yow19 the most essential molecule for software development: Caffeine.
- @keesan starting off his talk on quantum computing with some essential science! https://t.co/eIIZZTwqhv
- @unixbigot: Cool, you can play @Gaohmee’s game in VR in NASAs ARGOS wire-harness microgravity simulation. Well, you can’t. You could if you were an astronaut trainee. #yow19 https://t.co/fXLGXpPnhK
- @hillelogram: Example: lots of astro training takes place underwater. Lots of work, super expensive. VR can be cheaper

What's it like to work with NASA? Lots of special parameters. Everybody is way smarter than you and has crazy good work ethic. "You have to let go of your ego" #yow19
- @sarahtarap: Excited to be session host 🎤 at #YOW19 to introduce the very funny @caseyrosenthal tell us about Chaos Engineering https://t.co/VHMZLMlQQL
- @unixbigot: Space and game dev have things on common, says @Gaohmee. Neither can happen without collaboration, iteration and subordinating individual egos. #yow19

https://t.co/NrQi5hwXh0
- @unixbigot: One of Jennifer’s takeaways from her time with NASA was their idiom of problem briefs. Explain the problem and the necessary resources but does not presuppose a particular solution. #yow19 https://t.co/9zZmmSgQhZ
- @RealGeneKim: #YOW19 @caseyrosenthal https://t.co/5BnyPlTFXD
- @hillelogram: Problem brief should be:

  • Clear, factual, inclusive explanation of problem
  • Keeps content focused on goal
  • Does not suggest preemptive solutions, only shows problem
  • Psychological drivers behind problem
  • Simple

Gets potential solutions from diverse set of people

yow19

  • @LareneLg: Inclusive and accessible language are key to #NASA problem briefs. It keeps the content focused on the goal and does not prematurely suggest solutions. — @Gaohmee

Also, leave ego at the door.

YOW19 https://t.co/hBipNq14hm

  • @LareneLg: “You can’t work 5-8 years on something for someone you don’t care about”

@Gaohmee talks about a genuine deep love for players as an essential part of #GameDesign #YOW19
- @hillelogram: Drivers: gives players a familiar touchstone, helps people personalize the experience, engages people to explore to get interesting pictures, gives people a souvenir.

"What I want to do now is a little bit strange:" Hidden Game Design.

yow19

  • code freeze caused error; code deploys ensured the restarting of JVMs.
  • no other professional divides what is to be done, how it's done,and who does the work; pinnacle of bureaucracy
  • @janellekz: This is so cool! 😎

Learning about how trapped ion quantum computers work, and new quantum computing gates with @keesan at #yow19

These are super cooled ions lining up like a string of pearls, that get shot with lasers to do logic operations: https://t.co/I8sLcnIQoF
- @DamianM: Myths about making systems safer from @caseyrosenthal #YOW19 https://t.co/uPKXp1PsYP
- @sabinehauert: Cool talk by @caseyrosenthal on Chaos Engineering. Interesting parallels with swarm robotics, haven’t quite figured out the mapping yet. #yow19 https://t.co/Q6qu6IkHfd
- @AnthonyBorton: Next up at #yow19 in the red room is @lizthegrey on detangling complex systems. https://t.co/KaEsi9cHyA

Beautiful slide art by @emilywithcurls #yow19 https://t.co/OpCWb3qbc2
- @ryangribble_: @mcallana sharing insights on "ML-Ops" at Expedia
Closing the operations loop by automatically detecting anomalies, diagnosing issues and suggesting remediation actions with machine learning approaches
... at #YOW19 https://t.co/Dkx3JCwked
- @unixbigot: Pinball has come a long way says Dwight. LCDs and LEDs and multiple CPUs. TIL there’s a Game of Thrones pinball game. #yow19
- @janellekz: Listening to @lizthegrey give us a reality check of all the hard in DevOps, and you can’t just “buy” the alphabet soup of technologies and expect things to get better...

“We as humans are part of the Sociotechnical system... tools are not magic”

YOW19 https://t.co/5P9H0sPJlu

  • expedia/adaptive-learning (aa)
  • @unixbigot: Why makes a game fun? asks Dwight. Learning. anxiety. Choices. A clear core game mechanic but with a unique twist. Risk vs Reward is key. #yow19
  • constant threshold (3 sigma)
  • holt winters (triple exponential, seasonality)
  • stl regression
  • custom regression
  • holt winters: cut into periods: t-1, t, predict
  • alert when outside 2 sigma V
  • kubeflow: https://www.kubeflow.org/
  • @janellekz: We can’t predict failure cases, we need to be able to debug novel cases... brilliant insight.

Design for Observability!!

@lizthegrey #YOW19 https://t.co/bWmay3RCKv
- AWS: closing loops and opening minds video
- @ByteSizedSeamus: Very true, for me at least. Working at @seekjobs I feel a strong sense of fulfillment knowing that we make serious efforts to improve the way candidates find and apply for jobs.

Up: Sarah Wells,

In the Plaza Auditorium: @sarahjwells presents "Mature microservices and how to operate them". #YOW19 http://ow.ly/VtEx30pYVPC https://twitter.com/yow_conf/status/1204236815768666112/photo/1

  • @sarahjwells: "5 years ago, in the bad old days at FT, we couldn't publish news during a deployment. We had to pick slow news times to do releases."
  • @sarahjwells: "It is important for FT to experiment to figure out sustainable revenue model"
  • @hillelogram: Journalists reached out to https://t.co/ZWM0b2rwuq, group of vetted investigative journalists who are trusted not to leak or dump public data. 370+ people worked on Panama Papers for a year without anybody knowing, not even the tech companies they used technology from #yow19
  • @ryangribble_: Feeling very privileged to spend some time with one of the biggest names in #devops @RealGeneKim
    Thanks #YOW19 https://t.co/4OoybAG2Jq

  • @sarahjwells: "the team concluded that zero-downtime deploys were the top factor that improved our quality; it enabled us to deploy more frequently, during times when everyone was in office"

1-5K employees

  • @sarahjwells: "reduce need for coordination; we used to have to fill out change approval forms in... Salesforce..." (wow! I've never heard that one before!)
  • @sarahjwells: "We no longer have change approval board; with our release rate of 2000 per year, that would have been 47 person-days of just filling in forms"
  • @sarahjwells: "Now we're doing 180 releases per day"; 250x more frequently
  • @unixbigot: Fabulous micronap during lunch, has perked me up wonderfully for @LeeRyanCampbell’s talk. Lee calls back to the opening keynote’s five principles. Is your codebase like the office kitchen, with stacks of dirty dishes that no one admits to leaving? #yow19
  • @hillelogram: First found out from anon John Doe. One of the biggest leaks to date, over 2.6 TB total, mostly emails, db formats, pdfs. Some images & text documents. Data was sorted by company, shell company, etc. Super well organized data -- but too much information to find things

yow19

  • @sarahjwells: "...deploying Microservices is easier. but RUNNING icroservices is much more difficult.."
  • @RealGeneKim: #Yow19 @sarahjwells: "...deploying Microservices is easier. but RUNNING icroservices is much more difficult.." https://t.co/ongp1Jtv2J
  • @sarahjwells: "Lots of FT runs on Heroku — it's more expensive, but it's so simple; more complex apps are in AWS, using many of their services"
  • @sarahjwells: "
  • @janellekz: Using a combination of neo4j and graph visualization to support sense-making in a collaborative investigative journalism effort.

Even a fairly basic data model allows for some incredible capabilities.

@mesirii #YOW19 https://t.co/4Px5G6O12X
- @jchyip: Neo4j tutorial using the Panama Papers leak! #yow19

  • @hillelogram: Based on putting labelled edges between nodes in the db. A owns B, C details B, so A and C are connected in some fashion.

Specific tool was Neo4J. Stores data as graph, schemaless, designed to find relationships. A better whiteboard; usable by business owners. #yow19
- @hillelogram: Based on putting labelled edges between nodes in the db. A owns B, C details B, so A and C are connected in some fashion.
- @sarahjwells: "ft.com: 4 or 5 teams -> 150 microservices; content management: same..."
- @hillelogram: Concluding with graph visualization as a thing you can do to get insights and share information

Recommending book https://t.co/PGhv6BL0ys if you want to get into this, as well as a number of journalism datasets I couldn't copy down quickly #yow19

Specific tool was Neo4J. Stores data as graph, schemaless, designed to find relationships. A better whiteboard; usable by business owners. #yow19

  • @sarahjwells: "A big surprise: how much time we've spent migrating, upgrading.. doing anything 150 times is painful"
  • @RealGeneKim: #yow19 - @sarahjwells on modeling services, teams, attributes, etc… https://t.co/rS7GrifCKM

Up: 193 Easy Steps to DevOpsing Your Monolith: Cat Swetel (@catswetel)

  • @yow_conf: Green Room: @CatSwetel tells of the true (and at times ugly) story of one company’s journey towards a more flexible, adaptable, and easily maintainable architecture supported by a culture that prizes learning and respect above all else. #YOW19 https://t.co/FS8jWFZdYq

  • @CatSwetel: PDP-11 created before professional software industry

  • @RealGeneKim: #Yow19 @CatSwetel: PDP-11 created before professional software industry https://t.co/8pSliMXf5C

  • @CatSwetel: "

  • @unixbigot: They had Cat’s attention when they told her Ticketmaster’s platform was an Emulated Vax. A custom operating system running on a custom emulator. That’s right, young person, it’s zany all the way down. #yow19 https://t.co/h7pTd5lB3r

  • @unixbigot: Ticketmaster was born in 1976. They bet on the cutting edge tech of the time, the PDP/11 and bespoke software. Half the company weren’t sure if these newfangled computers would stick around. #yow19

  • @CatSwetel: "characteristics of tech being in Commodity era: like electricity; no one came in this room and said, 'wow, this room is having a great electricity day!'"

  • @CatSwetel: "Ticketmaster: moving from transactions to relationships; I now have a relationship with my local grocery store, who tell me when my son's favorite apples are in stock; enables higher order innovation

  • @CatSwetel: "Ticketmaster: we want to tell you when your fave artist is in town, offer discounts and merchandise... has security implications: who is entering venue w/paper ticket vs. phone (better able chain of custody); some artists don't want brokers/resold tickets

  • @CatSwetel: "information value x2; now we measure how good we're doing at creating effective relationship with the customer"

  • @CatSwetel: "Great video: person who built toaster from scratch: https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&ved=2ahUKEwjYrK6RoqrmAhXk6nMBHVsPDkoQtwIwAnoECAkQAQ&url=https%3A%2F%2Fwww.ted.com%2Ftalks%2Fthomasthwaiteshowibuiltatoasterfromscratch%3Flanguage%3Den&usg=AOvVaw0sOcMizalkyYri7r9IiJWI"

  • @CatSwetel: "Things in VAX system we're taking out: games, custom printer, custom messaging (replacing with email)

  • I think @CatSwetel is trying to say that we excrete code

  • @CatSwetel: "

  • @unixbigot: Ticketmaster was born in 1976. They bet on the cutting edge tech of the time, the PDP/11 and bespoke software. Half the company weren’t sure if these newfangled computers would stick around. #yow19

  • @CatSwetel: "Maintenance vs. metabolism

  • @CatSwetel: "Anyone have executives read The Phoenix Project, who then tell you, 'all you have to do is change your mind and be DevOps. That can't be that hard, right?'" 😂😂😱😱🤭☠️

  • @CatSwetel: "first and foremost when rewriting systems built in 1976: respect for history"

  • @CatSwetel: "Back then, compute was so expensive, efficiency was important" (reminds me of a talk about someone trying to rewrite LaTeX; code so difficult to understand because it was written to be fast on hardware of it's time)

TODO

  • Have field to say who speaker is:
  • why did my service stream fail at 3pm