Read the online version below or grab the Post-Incident Review PDF.Table of Contents
- Editor's Note
- Featured postmortems
- Are Your Post-Incident Reports Gathering Digital Dust?
- Ain't No Party Like A Journal Club Party
Welcome to the first issue of the Post-Incident Review.
The Review starts with a topo, or illustration of a climbing route, because of our shared love of rock climbing, and also as a nod to a relic of a solved problem not unlike post-incident reviews or postmortems. The idea for the Review came from one of us (Emil) reading an annual collection of climbing accidents and wondering what that might look like for tech. Given that the other of us (Jaime) had editorial and publishing experience, after months of thinking, designing, and creating, here we are.
We wanted to create a space for focused reading, although every post-incident review included is publicly available. We have not touched the content, and added design only for clarity. There were plenty more to include were it not for space considerations.
You’ll find wide margins to jot down notes. Let your mind run free. Maybe you’re one of those people who highlights and then draws a line to the side, before writing your thoughts. However you see fit, that’s the right way to do it.
We hope you enjoy this, and happy reading!
- Emil Stolarsky and Jaime Woo
- Honeycomb: “You Can’t Deploy Binaries That Don’t Exist”
- Monzo: “We Had Issues With Monzo On 29th July. Here's What Happened, And What We Did To Fix It.”
- Stripe: “Significantly Elevated Error Rates on 2019‑07‑10”
- Google Cloud: “Networking Incident #19009”
Are Your Post-Incident Reports Gathering Digital Dust?
By Jaime Woo
A four in the morning page—followed by a few (groggy) hours trying to figure out what happened. Thankfully a fix is holding, and the system’s returning to normal. Your bed beckons, but so does the start of the day. Your workload didn’t need an a post-incident report on top: you can’t recall the last time anyone read one of these, but it still needs to get done. So you grind one out quickly, and hope to put the incident far, far behind you.
Sound familiar? Not surprising. Most companies understand the need for a post-incident report, especially for disruptive or novel disruptions. A meeting happens to discuss and beef up the report, and then it lands in the metaphorical filing cabinet in the sky, er, cloud. Companies invest thousands of dollars’ worth of people-hours on the post-incident process only to have those learnings end up gathering digital dust in some Google Drive folder. Like the proverbial tree in the forest, if a postmortem gets written but no one reads it, does it really exist?
Yet it’s understandable why people don't regularly read post-incident reports. They are overworked as it is. They may not see how it directly ties into their job. They might want to read them, but never know when the reports are released—and who has time to keep track?
Underlining these reasons is the fact that learning has a non-trivial level of difficulty, and requires time and attention to do. Unless its future benefit is evidently worth the effort, there will always be other competing demands on people.
What can organizations do then? Luckily, this isn’t unique to tech, and we can find inspiration in the lessons learned model, which has been adopted by numerous commercial, government, and military organizations. An adapted version of its cycle for knowledge includes the steps of: collection, prioritization, storage, dissemination, and reuse.
This framework helps illustrate a potential hiccup: how many companies reach “storage” and then think, “mission accomplished”? Unfortunately, organizations can assume that dissemination and reuse happen organically, a kind of magical thinking where if something is helpful enough then it will miraculously reach those who need it; however, given the constant and conflicting demands placed on people, that idealized case rarely is reality.
What does intentionally planning the dissemination and reuse steps look like? It means asking the questions: “what channels communicate the report to its different audiences?” and “how will those audiences take that knowledge and apply it?” Let's break down these two steps further.
A successful dissemination strategy for getting knowledge to people employs the channels they currently use or the places they already frequent. Some teams prefer sharing through email; some congregate around platforms like Slack or Teams; and, others may find an announcement during all-hand meetings or AMAs best. In addition, frequency matters: how often can notifications occur before people begin to tune it out? Better to start sparingly, like once a month. There’s no blanket solution but setting goals and tracking uptake will point to the right direction.
Once the knowledge is in people’s hands, how should they apply it? Reuse is the notion that not only should knowledge get shared, but also consumed and made use of. What does that mean? American educational theorist David Allen Kolb suggested an experiential learning cycle where, to truly gain knowledge, people have to absorb the lesson, relate it to their own experiences and behaviours, conjure new or modified ideas, and then test them out to form new lessons.
One tactic is the journal club, which is similar to a book club except that the group discusses journal papers rather than Oprah’s latest pick. (The Water Dancer sounds excellent, by the way.) Having regular sessions of journal club can facilitate teams digesting lessons from the report.
All of this might be appealing, and yet the question of whether it’s worth the effort lingers. Luckily research confirms that a learning organization has many benefits, including more innovation, greater sense of community, improved decision making, and higher quality output.
The authors of Accelerate: Building and Scaling High Performing Technology Organizations write: “In today’s fast-moving and competitive world, the best thing you can do for your products, your company, and your people is institute a culture of experimentation and learning.” Encouraging and supporting learning was a cultural capability to drive improvement, and a healthy organizational culture led to improved organizational performance.
While learning isn't automatic, it also isn't a mystery. Thoughtfully seeding lessons learned across teams will hopefully make those four in the morning calls a bit easier.
Ain't No Party Like A Journal Club Party?
By Jaime Woo
Journal clubs are highly effective for a group to work together to consume and digest information and to convert it into knowledge. Here’s how they work: groups choose a paper, read it in advance, gather to critically appraise the information, and then apply the knowledge to their own work; through discussion, participants can share, develop, and evolve their understanding.
Why does journal club exist? Not only to increase knowledge within the group but to improve comprehension skills: journal papers are a vital stream of information for researchers; however, they shouldn't be taken at face value.
You've seen the perils of this before: sometimes the media will report findings of a study without noting the qualifications, constraints, or gaps in the methodology. This is an issue because how authors reached their conclusions matters: a small sample size suggests, for example, that we shouldn't extrapolate the findings to beyond that group.
Although post-incident reports aren't constructed the same as journal articles, we think there's a lot to gain from borrowing from journal clubs, both in terms of how companies resolve incidents as well as how they then communicate what happened. A 2008 paper suggested these characteristics for successful journal clubs:
- Regular and anticipated meetings
- Mandatory attendance
- Clear long‐ and short‐term purpose
- Appropriate meeting timing and incentives
- A trained journal club leader to choose papers and lead discussion
- Circulating papers prior to the meeting
- Using the internet for wider dissemination and data storage
- Using established critical appraisal processes
- Summarizing journal club findings
We can boil that down into five main questions to answer before your first meeting:
1. What do you want out of journal club?
Journal clubs shouldn't be overly rigid, and in fact being too prescriptive dampens the creative back-and-forth that characterize the best discussions. However, you’ll want to set objectives to provide some structure to the conversation. Here are some examples of objectives you may want to target:
- Increasing exposure to new ideas
- Gathering of people with different perspectives to exchange knowledge
- Improving critical assessments around processes and outcomes
- Applying the lessons toward your own work
- Familiarizing of the post-incident review process
What would best strengthen your team? Pick a handful of objectives, and let participants know so that they understand the context for attending journal club.
2. What does the timing look like?
A common cadence for journal clubs is monthly—often enough to maintain momentum, but not so often to exhaust attendees. To further encourage participation, you should choose a repeated day (such as the third Tuesday of every month) so people can include them in their calendars and plan their schedule around them. You'll want sessions of sixty to ninety minutes, which should be enough time for discussion.
It's tempting to slot journal club during lunch or after the standard work day, but it can also signal that journal club is superfluous. Given the positive evidence-based results for learning from journal club, treat it like any other meeting within typical work hours.
3. What does the format look like?
There are some procedural aspects of a journal club you’ll want to consider, such as:
- Is your journal club in-person only, or is it remote-friendly?
- How many people do you expect in each session? (More than 8 is probably too many.)
- Is attendance mandatory?
- Are notes being taken? And if they are, who takes them, and how will they be stored and distributed?
As for the itinerary for journal club: generally, the journal club leader begins with a brief synopsis of the paper and the context for why it was chosen. Then, they will moderate discussion—first with overall thoughts, and then a more specific breakdown of the ideas in the paper.
Some helpful questions to ask include:
- What was the most interesting part of the post-mortem?
- Do you agree with their methodology? And how would you do things differently?
- What next steps would we take if this happened to us?
4. Who will be facilitate journal club?
The journal club leader is responsible for choosing the reading material. This is because they are usually a subject matter expert and can spot a report that will have enough relevant material to allow for discussion. Once chosen, it should be distributed to the group early enough for everyone to get a chance to have a proper read—at least a week, ideally.
At journal club, the leader is responsible for stoking a healthy conversation, enacting the agenda, keeping the session on time, and ensuring participants feel they can equally contribute. They should watch out for groupthink: sometimes, participants will defer to more senior people in the room, reducing the amount of discourse. A good journal club leader ensures psychological safety so that the group can open share ideas.
Whether the role of journal club leader rotates or not between participants is up to you.
5. How will you measure success?
Before you start a journal club, you should think through your evaluation for how well it's working. You don't need to go overboard: a few key metrics will do. Then, every three to six months, assess your journal club to inform what's working and where you can improve.
This is an important step because companies each have different goals, and furthermore not every group within the same company will share exactly the same goals. So, figuring out what meets the needs of your group results in better outcomes than following a template. It will also help people understand why they should attend, which leads to better participation rates.
A successful journal club may be that everyone enjoys their time together, or it may be that it led to a creative insight on a current project: it's up to you, and just make sure the metrics are clearly stated. Don't, however, feel bound by your initial rubric—play around with the format, the timing, the size of the group, and so on—and adapt it as you gain experience running the discussions.
Enjoy this issue? Check out the entire catalogue.
Interested in receiving the next issue direct to your inbox? Enter your email here.
Enjoyed the Post-Incident Review? Checkout our other projects!
- The Morning Mind-Meld: Summary and context from interesting conference talks
- Ovvy: a Slack bot that simplifies scheduling overrides in PagerDuty
- SRE for Mere Mortals: An upcoming book on practical SRE
Printable version and bulk orders
Print the PDF double-sided yourself! Or, request bulk orders if you’d like to pick up enough copies for your team or org by shooting us an email at firstname.lastname@example.org!