Routific outage

Incident Report for Routific

Postmortem

On September 17 12:57a.m. - 1:04 p.m. PST, Routific’s route optimization and SaaS service became partially degraded. The cause has been identified and fixed. Our system is now stable.

We're so sorry for all the stress that we've caused you, as we fully appreciate how important Routific's availability is to our customers.

What happened?

On September 16, there was a bug that caused an optimization job to be stuck in an infinite processing loop. The bug was fixed promptly but we failed to identify a hidden issue, where our system disk space was filled with error logs from the bug. This subsequently resulted in the load balancer having issues communicating with two of our API instances.

What are we doing about it?

Our team has been committed to improving the reliability and stability of our systems this quarter, and will continue to do so for the next quarter. We will continue to enhance our monitoring systems. To prevent this incident from occurring in the future, we will setup new monitors for load balancer connectivity issues and storage concerns.

Posted Sep 17, 2019 - 19:26 PDT

Resolved

Issue has been resolved. We will follow up with a post-mortem about the incident soon. So sorry about the trouble!

Posted Sep 17, 2019 - 13:34 PDT

Investigating

We are experiencing degraded performance on our API and a partial outage for our SaaS product. Our engineers are currently investigating the issue.

Posted Sep 17, 2019 - 11:16 PDT

This incident affected: Routific API and Routific SaaS product.