ServiceChannel System Performance Degradation

Incident Report for ServiceChannel

Postmortem

Dashboard Latency - Incident Report

Date of Incident:                               5/16/2024

Time/Date Incident Started:             5/16/2024, 10:18 am EDT

Time/Date Stability Restored:   5/16/2024, 11:37 am EDT

Time/Date Incident Resolved: 5/16/2024, 12:00 pm EDT

 

Users Impacted: Some Users

Frequency: Intermittent

Impact: Minor

 

Incident description:

Some users of ServiceChannel that were utilizing the dashboard experienced slow loading times.

 

Root Cause Analysis:

Around 10:18 AM EDT, the ServiceChannel Site Reliability Engineering (SRE) team was alerted to slow response times on the Dashboard, affecting customer experience. The team quickly looked into the matter and identified that one of the ServiceClick application pools was exhibiting unusually high response times. Initial attempts to rectify the issue by restarting individual nodes did not resolve the problem. Further investigation led to the decision to reboot the entire application pool for ServiceClick. This measure effectively reduced response times and returned our services to their standard operational state.

Actions Taken:

  1. Manually tested and reproduced the issue.
  2. Researched, then restarted affected nodes that were reporting slow responses.
  3. Retested to confirm the problem had been resolved and continued to monitor.

Mitigation Measures:       

  1. Increased monitoring of our Dashboard and the ServiceClick application response times.
Posted May 28, 2024 - 16:46 EDT

Resolved

This incident has been resolved. All services are working as expected.
Posted May 16, 2024 - 12:11 EDT

Monitoring

System stability has been restored and services are functioning normally. We will continue to monitor closely for any further issues.
Posted May 16, 2024 - 11:55 EDT

Investigating

We are actively investigating degraded system performance. An update will be provided shortly. Thank you for your patience.
Posted May 16, 2024 - 11:11 EDT
This incident affected: Service Automation (Dashboard, Work Order Manager) and Provider Automation (Work Order Manager).