Post-Mortems10 min read

Incident Post-Mortems: A Step-by-Step Checklist

A complete checklist for running effective blameless post-mortems. From preparation to follow-up actions, ensure your team learns from every incident.

By OutageReview Team|December 10, 2025

A post-mortem done well transforms an incident from a painful experience into a learning opportunity. Done poorly, it becomes a blame session that teaches nothing and damages team trust. This checklist covers everything you need to run effective, blameless post-mortems that actually prevent future incidents.

What Makes a Good Post-Mortem?

An effective post-mortem answers three questions: What happened? Why did it happen? And what will we do to prevent it from happening again? But the format matters less than the culture. A post-mortem in a blame-focused organization will always be shallow because people won't share honestly.

The goal isn't to find who was at fault. It's to understand what systemic factors allowed the incident to happen and what changes will make the system more resilient.

Blameless Doesn't Mean Accountable-less

Blameless culture means we don't punish individuals for honest mistakes. It doesn't mean nobody is responsible for fixing things. Every action item should have an owner. Every improvement should have accountability. We just don't blame people for the incidents that reveal the need for those improvements.

Complete Post-Mortem Checklist

Phase 1: Immediate (Within 24 Hours)

  • Confirm the incident is fully resolved

    Don't start the post-mortem while still firefighting

  • Assign a post-mortem lead

    Someone to gather data, schedule the meeting, and drive the process

  • Preserve evidence

    Save logs, metrics, chat transcripts, and any temporary debugging artifacts before they expire or get cleaned up

  • Document immediate facts while fresh

    Ask responders to write brief notes about what they did and observed

  • Schedule the post-mortem meeting

    Aim for 1-3 days after resolution, while details are fresh but emotions have cooled

Phase 2: Preparation (Before the Meeting)

  • Build the incident timeline

    Gather logs, metrics, and communications into a chronological sequence

  • Identify key timestamps

    When did the incident start? When was it detected? When was it resolved?

  • Calculate impact metrics

    Duration, affected users/requests, error rates, revenue impact if applicable

  • Identify attendees

    Include responders, relevant engineers, and affected stakeholders

  • Share pre-read materials

    Send the timeline and basic facts so people come prepared

  • Review previous similar incidents

    Check if this is a pattern or first occurrence

Timeline Building Tip

Building timelines from raw logs can take hours of tedious work. Consider using tools that can parse logs and extract events automatically. The key is to verify the extracted timeline and add context that only your team knows, like who made decisions and why.

Phase 3: The Post-Mortem Meeting

  • Set the tone

    Remind everyone this is blameless; we're here to learn, not to punish

  • Review the timeline together

    Walk through what happened, filling in gaps and correcting errors

  • Conduct root cause analysis

    Use 5 Whys, Fishbone, or other techniques to dig into causes

  • Identify multiple contributing factors

    Don't stop at one cause; complex incidents have multiple factors

  • Discuss what went well

    Acknowledge effective response actions; reinforce good practices

  • Brainstorm preventive actions

    What changes would prevent this incident or detect it faster?

  • Prioritize action items

    Not everything can be fixed immediately; focus on highest impact

  • Assign owners and deadlines

    Every action item needs a specific owner and due date

Phase 4: Documentation

  • Write a clear incident summary

    One paragraph that anyone can understand

  • Document the complete timeline

    Include timestamps, events, and decision points

  • Record root causes

    Both immediate and underlying systemic causes

  • Document impact

    Duration, affected services, user impact, business impact

  • List all action items

    With owners, due dates, and priority

  • Include lessons learned

    What should others know about this type of incident?

  • Link to relevant resources

    Dashboards, runbooks, code changes, related incidents

Phase 5: Follow-Through

  • Share the post-mortem widely

    Other teams can learn from your experience

  • Enter action items in your tracking system

    Treat them like any other work item

  • Schedule follow-up reviews

    Check that actions are completed on schedule

  • Verify action effectiveness

    Did the changes actually prevent recurrence?

  • Update runbooks and documentation

    Incorporate lessons learned into operational docs

  • Add to lessons learned library

    Make post-mortems searchable for future reference

Post-Mortem Document Template

Post-Mortem: [Incident Title]

Date

[Date of incident]

Author

[Post-mortem lead]

Status

[Draft / Final / Actions Complete]


Summary

[1-2 paragraph summary: what happened, impact, and key takeaway]

Impact

  • Duration: [X hours/minutes]
  • Users affected: [number or percentage]
  • Services affected: [list]
  • Business impact: [revenue, SLA, etc.]

Timeline

[Detailed chronological timeline with timestamps]

Root Causes

[List of contributing factors at technical and systemic levels]

What Went Well

[Effective response actions]

What Could Be Improved

[Areas for improvement in detection, response, or prevention]

Action Items

ActionOwnerDue DateStatus
[Action description][Name][Date][Status]

Lessons Learned

[Key insights for the organization]

Common Post-Mortem Anti-Patterns

The Blame Game

If your post-mortem includes phrases like "Bob should have..." or "The team failed to...", you're doing it wrong. Reframe: What systemic factors allowed this to happen? What guardrails were missing?

The Checkbox Exercise

Post-mortems done because policy requires them, not because the team wants to learn, produce shallow analysis and ignored action items. If you're just going through the motions, you're wasting everyone's time.

The Novel

A 20-page post-mortem that nobody reads helps nobody. Be thorough but concise. Focus on what matters for preventing recurrence.

The Action Item Graveyard

Post-mortems that produce action items that never get done are theater. Track actions to completion or don't bother writing them down.

Measuring Post-Mortem Quality

How do you know if your post-mortems are effective? Consider tracking:

  • Action completion rate: What percentage of actions get done?
  • Time to completion: How long do actions take to implement?
  • Recurrence rate: Do similar incidents happen again?
  • Investigation depth: Are you finding systemic causes or just immediate triggers?

Some teams implement quality scoring for post-mortems, measuring completeness of timelines, depth of root cause analysis, specificity of actions, and follow-through on implementation.

Key Takeaways

  • Start within 24 hours while details are fresh
  • Build a detailed timeline before analyzing causes
  • Focus on systems and processes, not individual blame
  • Track action items to completion, then verify they worked

Ready to improve your incident investigations?

OutageReview gives you 5 Whys, Fishbone diagrams, timeline tracking, and rigor scoring in one platform.

Start Your 14-Day Free Trial

No credit card required