System Performance Degradation Across Modules - Incident Report
Date of Incident: 05/19/2022
Time/Date Incident Started: 05/19/2022, 08:50 am EDT
Time/Date Stability Restored: 05/19/2022, 09:46 am EDT
Time/Date Incident Resolved: 05/19/2022, 09:58 am EDT
Users Impacted: All users
Frequency: Intermittent
Impact: Major
Incident description:
The ServiceChannel monitoring system detected system performance degradation and increased application latencies. The ServiceChannel Site Reliability Engineering (SRE) team established an investigation. Impacted customers reported general slowness while using the ServiceChannel platform.
After confirming that core system components that would have a cascading adverse effect on many modules if they were in a degraded state were in fact healthy, the SRE team determined that an issue at our cloud provider was likely experiencing unreported performance degradation on their backend.
The SRE team engaged our cloud provider’s support engineers to establish a root cause. During a lengthy exploratory conference bridge, the cloud service provider’s support engineers were able to find evidence of a networking failure within a primary datacenter, consistent with the timeline of the incident.
Root Cause Analysis:
A transient networking failure at our cloud provider’s datacenter caused slowness and degraded performance for end users of the ServiceChannel platform.
Actions Taken:
Mitigation Measures: