Increased errors and timeouts on API calls

Incident Report for Routific

Postmortem

On March 11th, one of our internal services became unavailable between 12:43 PM to 13:00 PM (PDT). It resulted in increased errors and/or timeouts on those API calls that depend on that service.

What happened?

A recent change caused inconsistent higher latency of our internal services that handles map geometry. This resulted in some of our customer API requests not being served or experiencing a longer than usual wait for a response.

What are we doing about it?

We shipped a fix to lower the latency to the internal service. Also, we have started the work to set up more monitoring and alerts to our internal services so that we will know the issue before it causes a negative impact.

Posted May 13, 2021 - 14:28 PDT

Resolved

This incident has been resolved. From 12:50pm PDT until 13:04pm PDT we had an increased number of failed requests to our API that possibly have caused issues to users of the Routific App and Routific Engine API.
We are investigating the causes of the issue and how we can prevent it from happening again in the future and will share a post mortem when we have more details.

Posted May 11, 2021 - 16:39 PDT

Monitoring

A fix has been implemented and we are monitoring the results.

Posted May 11, 2021 - 13:04 PDT

Identified

The issue has been identified and a fix is being implemented.

Posted May 11, 2021 - 13:03 PDT

Investigating

We are currently investigating this issue.

Posted May 11, 2021 - 12:52 PDT

This incident affected: Routific API.