Code Release Causes US Environment Outage

Incident Report for ServiceChannel

Postmortem

Incident Report: Code Release Causes US Environment Outage 

 

Date of Incident:                   05/08/2025 

Time/Date Incident Started: 05/08/2025, 2:29 am EDT  

Time/Date Stability Restored:   05/08/2025, 4:07 am EDT 

Time/Date Incident Resolved: 05/08/2025, 4:12 am EDT 

  

Users Impacted: Many 

Frequency: Continuous 

Impact: Major 

  

Incident description: 

During the scheduled US production code release on May 8, 2025, ServiceChannel encountered technical issues that impacted service availability on our platform. Users experienced login difficulties from 2:29 AM to 3:07 AM EDT, while critical dashboard functionality was unavailable from 2:29 AM to 4:12 AM EDT. 

   

Root Cause Analysis: 

Login Module Issue: As part of ongoing deployment process enhancements, a configuration adjustment was made that worked correctly in our testing environments but behaved differently in production. The issue was identified and resolved through our standard troubleshooting procedures. 

Dashboard Issue: A configuration setting that was properly configured in our development environments had not been fully synchronized to the production environment. This discrepancy wasn't detected until the new code attempted to access the setting during the deployment. 

Full platform functionality was confirmed restored by 4:12 AM EDT 

 

Actions Taken: 

  • SRE team immediately investigated upon receiving alerts starting at 2:29am EDT indicating issues with two critical systems: dashboard and login 

  • CICD team successfully rolled back the login module to the prior version, restoring user access by 3:07 AM EDT 

  • Dashboard continued to experience issues, so investigation continued while login was restored 

  • Dashboard functionality was restored by ensuring all required configuration settings were properly applied to production 

 

Mitigation Measures:    

  • Reviewed existing deployment procedures to include improved configuration validation and improved rollback protocols to prevent similar configuration-related issues in the future 

  • Implemented process improvements for immediate communication with support teams following any service disruptions to ensure proper customer follow-up and transparency

Posted May 15, 2025 - 18:02 EDT

Resolved

During the scheduled US production code release on May 8, 2025, ServiceChannel encountered technical issues that impacted service availability on our platform. Users experienced login difficulties from 2:29 AM to 3:07 AM EDT, while critical dashboard functionality was unavailable from 2:29 AM to 4:12 AM EDT.
Posted May 08, 2025 - 02:30 EDT