Intermittent database access

Incident Report for Routific

Postmortem

Yesterday, Routific was intermittently unavailable for 32 minutes between 10:08am – 10:40am Pacific Time. The incident has been identified and resolved.

What happened?

We deployed a feature that included an inefficient DB query for large datasets. This caused our DB servers to be overloaded, which resulted in very slow performance and intermittently dropped connections. We immediately reverted the deployment as soon as we confirmed that this was indeed the culprit.

What are we doing about it?

Our engineering team is investigating our processes to understand how something like this was able to get into production, despite our current testing and QA practices.

Reliability and stability of the Routific service is of utmost importance to us. We are making sure that we learn from our mistake and work hard towards a more stable platform going forward.

Sorry for the disruption to your business; thanks again for your patience and ongoing support!

Posted Sep 14, 2018 - 11:02 PDT

Resolved

Things have been stable for the last 4 hours and no other incidents have been recorded.

Post-mortem to follow.

Posted Sep 13, 2018 - 15:29 PDT

Monitoring

We've finished the revert, and the DB load is back to normal. We will closely monitor the situation while we write a post-mortem.

Posted Sep 13, 2018 - 10:45 PDT

Identified

We've identified a performance bottleneck in a recent deployment, which caused intermittent DB access. Our users may experience some dropped connections and long loading times. We are in the process of reverting the deployment...

Posted Sep 13, 2018 - 10:32 PDT

This incident affected: Routific API and Routific SaaS product.