Multiple issues on Production

Incident Report for ServiceChannel

Postmortem

Incident Report: Cloud Provider issue cause Service Channel Platform Interruption  

 

Date of Incident:                   02/26/2026 

Time/Date Incident Started: 02/26, 9:42 AM EST  

Time/Date Stability Restored: 02/26, 10:06 AM EST 

Time/Date Incident Resolved: 02/26, 10:15 AM EST 

  

Users Impacted: All U.S. 

 Frequency: Continuous 

Impact: Major 

  

Incident description: 

On February 26, 2026, the Service Channel platform experienced multiple service interruptions due to an outage at a cloud hosting provider. The outage impacted several platform services that rely on the provider’s managed infrastructure. 

Our monitoring systems alerted the SRE team at 9:42 AM EST. The cloud hosting provider identified and resolved the underlying issue, after which service stability was restored by 10:06 AM EST. All servers and services were fully verified and confirmed operational by 10:15 AM EST. 

Root Cause Analysis: 

The cloud hosting provider experienced an issue with their managed SQL instances, which disrupted communication between several web applications and the database layer. This resulted in partial service interruptions across the platform. 

The cloud provider implemented a corrective action to resolve the SQL service issue, which restored connectivity and platform functionality. 

 Actions Taken: 

  • Monitoring alerts were received by the SRE team indicating service degradation. 

  • Investigation identified connectivity issues between certain application modules and the database servers. 

  • During incident triage, multiple recovery actions were attempted across impacted components; the issue persisted throughout these efforts 

  • Services returned to normal operation once the cloud provider resolved the SQL issue

Posted Mar 11, 2026 - 20:30 EDT

Resolved

Issues on multiple services
Posted Feb 26, 2026 - 09:45 EST