Dashboard & API Unavailable

Incident Report for 5c SMS Site Status

Resolved

Root Cause Analysis

A Message from Our Leadership Team
We want to take a moment to sincerely thank our customers for their patience and understanding during the outage on 6 March 2026. We recognise that your ability to send messages and access your account is critical to your operations, and we do not take lightly the disruption this incident caused.

This outage falls below the standard of availability we hold ourselves to, and that our customers rightly expect from us. We are committed to understanding exactly what went wrong, being transparent about it, and putting the right safeguards in place to ensure it is not repeated.

We also want to acknowledge the volume of enquiries our support team received during the incident. Our phone lines and email were overwhelmed, and while our team worked as hard as they could to respond in a timely manner, we understand that wait times were not acceptable. We sincerely apologise for any frustration this caused.

To help keep you informed during any future incidents, we strongly encourage all customers to subscribe to real-time updates at status.5centsms.com.au. Our engineering team posts updates there as incidents develop, so you can stay informed without needing to contact support.

If you would like to discuss this incident further please don't hesitate to reach out to us at hello@5centsms.com.au or your account manager. We are happy to talk through what happened and how we can best support you.

Executive Summary
On 6 March 2026, 5centsms.com.au experienced a service outage affecting the customer dashboard and API. The outage was caused by an automated certificate renewal process that deployed invalid certificate files to backend database nodes, triggering replication failures and a cascading database outage. The team was alerted immediately and began investigation. The root cause was identified and corrected certificates were manually provisioned, restoring core services.

The automated certificate deployment service has been disabled pending a full review and will not be re-enabled until remediation controls are in place.

Incident Timeline (All times AEST on 06/03/2026)
14:36 Automated certificate renewal process deploys invalid certificate files to backend database nodes
14:36 Automated alerts triggered; on-call team notified
14:36 Dashboard and API begin returning errors; investigation commenced
15:40 Root cause identified: invalid certificates causing database replication failure
15:40 Manual certificate provisioning initiated
16:21 Core services (dashboard and API) restored
~16:51 Auxiliary services fully restored

Root Cause
The automated certificate renewal service deployed invalid certificate files to backend database nodes during an automated renewal cycle. The invalid certificates caused database replication to fail across affected nodes. As replication failed, the databases became unavailable, which in turn caused the customer dashboard and API to return errors.

This was not a deliberate or malicious event. The certificate deployment service had not previously issued incorrect certificates, and no prior issues with this process had been observed.

Detection & Response
Automated monitoring detected the failure at the time of the incident and immediately alerted the on-call team. The team commenced investigation without delay.

Troubleshooting was initially impeded by the invalid certificates themselves, engineers were unable to connect directly to the affected database nodes to investigate, as the same certificate failure that caused the outage also blocked administrative access. A workaround was implemented to restore connectivity before root cause investigation could proceed in earnest. This contributed to the time between alert and root cause identification.

The workaround approach and steps taken have been documented to improve response speed and reduce friction in any similar future incident. Once identified, manual certificate provisioning was completed and core services were restored.

Remediation Actions
Immediate
Invalid certificates were removed from all affected database nodes
Correct certificates were manually provisioned across the database cluster
Core services were restored and verified
Auxiliary services were confirmed fully operational

Short-Term
The automated certificate deployment service has been disabled pending a full review
A review of the certificate deployment pipeline is underway to identify how invalid certificate files were generated

Long-Term
The certificate deployment service will not be re-enabled until the following controls are implemented:
1. Staggered deployments: certificate updates will be applied to a single replica at a time, with a minimum 24-hour interval between each node update. This allows the team to detect and halt a bad deployment before it affects the full cluster.
2. Pre-deployment certificate validation: automated checks will verify certificate validity, chain integrity, and expiry before any deployment proceeds.
3. Automated rollback: if a replica fails to successfully apply a new certificate, the process will automatically roll back and halt further deployments, triggering an alert for manual review.

Contact
If you have questions regarding this incident, please contact the 5centsms.com.au support team at hello@5centsms.com.au.
Posted Mar 08, 2026 - 06:22 AEST

Update

Queues are returning to normal operation and backlogs are in the process of being cleared. We are monitoring all systems.
Posted Mar 06, 2026 - 16:35 AEST

Monitoring

The Dashboard and API have been restored. Backend services and message queues are expected to return to normal in 10-15 minutes
Posted Mar 06, 2026 - 16:21 AEST

Identified

A fix has been deployed. We are currently deploying the required changes to the affected system.
Posted Mar 06, 2026 - 16:13 AEST

Update

We have identified the cause of the issue and are looking into a fix. We will provide another update within the next 30 minutes.
Posted Mar 06, 2026 - 15:56 AEST

Update

Our team is investigating the cause of the issue which appears to be a failed backend service
Posted Mar 06, 2026 - 15:37 AEST

Investigating

We are aware of an issue affecting the dashboard and API - it is currently being investigated
Posted Mar 06, 2026 - 15:11 AEST
This incident affected: Dashboard & API.