6 Postmortem of an outage in a complex distributed app at Monzo

community.monzo.com posted by kenny 2361 days ago  

Hi everyone  I’m Monzo’s Head of Engineering, and as I promised on Friday I’d like to share some more information about what happened during this outage. Because the nature of the issue was technical, this post is also quite technical.

A large scale failure in a distributed system can be very difficult to understand, and well-intentioned human action can sometimes compound issues, as happened here. When things like this do happen, we want to learn as much as possible from the event to ensure it can’t resurface.
Register to comment or vote on this story