Routific inaccessible
Incident Report for Routific
Postmortem

On September 19th, Routific was unavailable for 84 minutes between 12:39 p.m. - 2:05 p.m. Pacific Time. The cause has been identified and our systems have been stable since.

What happened?

The messaging queue system that our geocoding service is dependent on went down, which subsequently made our platform unavailable. We did not correctly configure our messaging queue to be highly available which resulted in one failed instance bringing down the entire system.

Resolution

We immediately spun up a new messaging queue to replace the old service. We are still working closely with AWS support to investigate why the messaging queue failed.

What are we doing about it?

We are now working on deploying our messaging queue system to be managed by a third-party queueing service to ensure future scalability and high availability. In addition, our engineering team will work on decoupling our system so that failure on part of our system will not affect the rest of our platform. We will also improve our failover processes to ensure uptime.

Posted 5 months ago. Sep 21, 2018 - 15:42 PDT

Resolved
Things have been stable for the last 3 hours and no other incidents have been recorded. We are looking into alternate service to host RabbitMQ in order to improve stability.
Posted 5 months ago. Sep 19, 2018 - 17:01 PDT
Monitoring
Systems are back online. There was an issue with the connection to our RabbitMQ cluster; we've connected to a new cluster and things are running again. We will deep-dive to understand what happened, while closely monitoring the stability of our systems right now...
Posted 5 months ago. Sep 19, 2018 - 14:08 PDT
Investigating
Sorry, looks like the fix didn't hold up for long... we continue the investigation...
Posted 5 months ago. Sep 19, 2018 - 13:03 PDT
Monitoring
Systems are back online after a server reboot. We are continuing our investigation.
Posted 5 months ago. Sep 19, 2018 - 12:57 PDT
Investigating
We are currently investigating the issue...
Posted 5 months ago. Sep 19, 2018 - 12:53 PDT
This incident affected: Routific API and Routific SaaS product.