- Read the online version below
- Grab the digital zine PDF version, or
- Print the formatted PDF to assemble into a physical copy [printing instructions PDF]
- Editor's Note
- Featured Post-Incident Report: Discord
- Imagining Your Post-Incident Report As A Documentary
- Featured Illustration: “Humans of On-Call”
What a difference two seasons can make.
When we launched Post-Incident Review, we excitedly brought stacks of hand-crafted printed copies to SREcon EMEA, looking forward to sharing how different post-incident reports felt when laid-out and printed like a journal. Mid-conference, people were coming up to us asking for a copy. We happily obliged.
Now, it looks like it’ll be at least a few months before we’ll get to see friends from far and wide. We were ready to print copies of Issue 2, but we realized things have changed—so we too had to think about how to adapt PIR. With everyone — us, included - at home, we thought, how can we keep the spirit of what we want to achieve going? It hit us: “What about leaning into our zine vibe?
What you’ll find is a zine that now can be printed on just a handful of letter-size papers. There’s an illustration by Denise Yu that’s in black-and-white, just ready to be coloured in to give you (or maybe a little one) a break in the day. And, even better, you’ll get more PIR, as we move to a monthly cadence.
It’s difficult to imagine things returning to the way as they were, but part of the silver lining that comes from being students of the incident response process is that we can learn, and we can adapt, and we can try to find ways to make things better.
- Emil Stolarsky and Jaime Woo
Featured Post-Incident Report: Discord"Server Outages and Increased API Errors"
Published March 20, 2020
Discord was unavailable for most users for a period of an hour. The root cause is well understood and fixed. The bug was in our service discovery system, which is used by services within our infrastructure to discover one another. In this instance, service discovery is used by our real time chat services services in order to discover the RPC endpoint that they use to load data from our databases when you connect to Discord, or when a Discord server (or "guild") is created for the first time, or needs to be re-loaded from the database.
View the full post-incident report on the Discord status page.
Imagining Your Post-Incident Report As A Documentary
By Jaime Woo
How you tell a story matters. We know this as the best stories feel like they arrive fully formed, spontaneous and unmanipulated. The worst ones have us regularly checking our phones for the time, and then questioning physics because how has only four minutes passed?
We may not think of post-incident reports as a form of storytelling, but they are an interpretation of the truth. They can’t be the definitive truth, since, frankly, that’s an impossibility. No matter how rigorous the investigation, there is simply no way to capture all the details to encapsulate the capital-T truth.
Every story has a point of view—even technical reports inject some subjectivity by which facts are included, the descriptive language used, and the order information is listed in. That these reports are subjective doesn’t diminish their power; in fact, by accepting that we are telling a story, we can do our jobs of relating an incident better.
What happens when we, instead, embrace that the best thing we can do might be telling a story? What benefits come from imagining our reports as documentary? We think of documentary as the creative retelling of truth, a riff on the definition by John Grierson, who coined the term “documentary film.” First, it underlines the distinction between actuality and any works we create about it; and, second, that in the act of building a report we use human judgment: because we can’t know everything that occurred, we capture what we can in an attempt to create and reconstruct what happened.
“To truly understand how an incident unfolded, you need to experience the incident from the perspectives of the people who were directly involved in it,” notes Netflix’s Lorin Hochstein on his blog. “Only then can you understand how they came to their conclusions and made their decisions.” A helpful tip Lorin suggests is writing the narrative description in the first-person rather than from the third-person.
You might also consider asking world-building questions familiar to any documentarian:
- Who are your characters?
- What are their goals?
- What are their worries?
- What do they see, hear, and feel?
The right details build context, enrich understanding, and allow the reader to form connections between ideas. We always acknowledge the caveat that all narratives are speculative; still, building a stronger narrative in post-incident reports can do the seemingly magical task of transporting readers to the past to answer why the actions taken by those involved made sense at the time.Check out Lorin’s blog post at https://surfingcomplexity.blog/2020/01/16/getting-into-peoples-heads-how-and-why-to-fake-it
Thanks for checking out the Post-Incident Review.
You can find more of our writing at Morning Mind-Meld, a biweekly newsletter with thoughts on industry news.
Recently, we discussed what it meant to lose the hallway track in light of 2020 conferences being cancelled. Here’s an excerpt:
“The hallway track is a place where you can learn from others like you, and hear conversations that are too private to share on social media. (You know, the ones that don’t have to be polished and performed.) The hallway track is where you get to enjoy not just the ideas of the industry you’re in, but the people. We’ve met our heroes through the hallway track, and strangers who instantly felt like friends we’ve known forever. These moments are necessarily ephemeral: that is the magic of live events.”
Read the full article at https://morningmindmeld.com
Featured IllustrationIllustration by @DeniseYu21, story from @rothgar.
Enjoy this issue? Check out the entire catalogue.
Interested in receiving the next issue direct to your inbox? Enter your email here.
Enjoyed the Post-Incident Review? Checkout our other projects!
- The Morning Mind-Meld: Summary and context from interesting conference talks
- Ovvy: a Slack bot that simplifies scheduling overrides in PagerDuty
- SRE for Mere Mortals: An upcoming book on practical SRE
Printable version and bulk orders
Print the PDF double-sided yourself! Or, request bulk orders if you’d like to pick up enough copies for your team or org by shooting us an email at firstname.lastname@example.org!