Routific inaccessible

Incident Report for Routific

Postmortem

On September 19th, Routific was unavailable for 84 minutes between 12:39 p.m. - 2:05 p.m. Pacific Time. The cause has been identified and our systems have been stable since.

What happened?

The messaging queue system that our geocoding service is dependent on went down, which subsequently made our platform unavailable. We did not correctly configure our messaging queue to be highly available which resulted in one failed instance bringing down the entire system.

Resolution

We immediately spun up a new messaging queue to replace the old service. We are still working closely with AWS support to investigate why the messaging queue failed.

What are we doing about it?

We are now working on deploying our messaging queue system to be managed by a third-party queueing service to ensure future scalability and high availability. In addition, our engineering team will work on decoupling our system so that failure on part of our system will not affect the rest of our platform. We will also improve our failover processes to ensure uptime.

Posted Sep 21, 2018 - 15:42 PDT

Resolved

Things have been stable for the last 3 hours and no other incidents have been recorded. We are looking into alternate service to host RabbitMQ in order to improve stability.

Posted Sep 19, 2018 - 17:01 PDT

Monitoring

Systems are back online. There was an issue with the connection to our RabbitMQ cluster; we've connected to a new cluster and things are running again. We will deep-dive to understand what happened, while closely monitoring the stability of our systems right now...

Posted Sep 19, 2018 - 14:08 PDT

Investigating

Sorry, looks like the fix didn't hold up for long... we continue the investigation...

Posted Sep 19, 2018 - 13:03 PDT

Monitoring

Systems are back online after a server reboot. We are continuing our investigation.

Posted Sep 19, 2018 - 12:57 PDT

Investigating

We are currently investigating the issue...

Posted Sep 19, 2018 - 12:53 PDT

This incident affected: Routific API and Routific SaaS product.