Back to overview

Mercury and other application down for 1h37m due to cloud provider incident

Mar 19 at 04:16pm CET
Affected services
Mercury
MMM
Ad Schedule
Data Connection Service

Resolved
Mar 19 at 04:16pm CET

The application became unavailable at 1:59 PM, after LoadBalancing failed due to an incidient that occurred at our cloud provider, Hetzner, which led to config changes negatively affecting LoadBalancer performance. From our investigation, we deduced that an autoscaling event in the cluster that happened at 1:59 PM led to a config change propagated to the LoadBalancer which then led to the failure.

https://status.hetzner.com/incident/b8246859-8fdd-4c44-b35c-d3c9a6cad3a7

We then redeployed the LoadBalancer and deactivated auto-scaling in the cluster to prevent further problems associated with the Hetzner incident.

We will keep auto-scaling deactivated until we have ensured in a test setup that problem mitigation at Hetzner is reliable.