Browse
Stories
3 stories from the trenches
🦸 Heroic Save
Recovering from Terraform State Corruption 30 Minutes Before a Board Demo
👤 @sre_hero_mayainfrastructure2025
“We provided a cloud infrastructure management platform. Our own infrastructure was managed by Terraform with state stored in an S3 backend with DynamoDB locking. We had a board dem...”
TerraformAWSIncident ResponseCI/CD+1
🔄 Culture ChangeBuilding an On-Call Culture from Scratch at a "Move Fast, Break Things" Startup
👤 @startup_samSaaS2023
“We were a 7-person engineering team at a seed-stage B2B SaaS startup. There was no on-call rotation — when things broke, the CTO would get a text from a customer and scramble to fi...”
PagerDutyDatadogOn-CallIncident Response+1
⚡ Incident ReportThe Black Friday Meltdown: How a Missing Index Took Down Our Checkout
👤 @sre_sarahe-commerce2024
“We were a mid-size e-commerce platform processing about 50k orders per day on normal days. Our stack was a Node.js monolith backed by PostgreSQL, deployed on AWS ECS. We had monito...”
PostgreSQLAWSDatadogIncident Response+1