Increased platform latency and workorder reports unresponsive Incident Report
Date of Incident: 04/01/2024
Time/Date Incident Started: 04/01/2024, 10:34 am EST
Time/Date Stability Restored: 04/01/2024, 12:45 pm EST
Time/Date Incident Resolved: 04/01/2024, 1:05 pm EST
Users Impacted: All Users
Frequency: Intermitted
Impact: Major
Incident description:
Users experienced sporadic latency and timeout issues while engaging with the ServiceChannel Platform, particularly for workorder report services.
Root Cause Analysis:
The automated monitoring systems of the ServiceChannel SRE and DBA teams detected elevated CPU utilization on database read replicas. A subsequent investigation into the logs identified that the incident coincided with a spike in user traffic. This surge in activity caused extended wait times for certain Servicechannel Services, notably the excel report services, leading to slower page loads and timeouts.
The SRE team swiftly acted by scaling up our infrastructure resources to accommodate the increased traffic. Following the expansion of capacity, normal system operations resumed.
Actions Taken:
Mitigation Measures: