It’s an embarrasing day for Judoscale. Last night through this morning we had our longest and most severe production incident in history, and we didn’t know anything was wrong for almost 12 hours. It was caused by some unexpected data and a line of code that never should have been written.
In this post I’ll air our dirty laundry and tell you exactly what happened, where we screwed up, and how we’re fixing it.
The timeline
- 00:25 UTC: Upscaling stopped working for most Judoscale customers. We were not aware of this at the time.
- 12:00 UTC: Carlos begins his day and opens our support queue to find 30 new messages (1-2 is typical). He updates our status page and begins investigating. …