System Performance Degradation
Incident Report for ServiceChannel
Postmortem

Date of Incident: 06/17/2019

Time/Date Incident Started: 06/17/2019, 11:11 am EST

Time/Date Stability Restored: 06/17/2019, 11:30 pm EST

Time/Date Incident Resolved: 06/17/2019, 11:40 pm EST

Users Impacted: Some users

Frequency: Intermittent

Impact: Minor

Incident description:

System Performance Degradation where users were unable to login to the system, or experienced errors during the login process.

Root Cause Analysis:

We have identified an issue related to login session management in classic ASP code. This issue resulted in a number of cascading failures, which in turn created timeouts throughout the ServiceChannel platform. This issue is related to the issue identified on 06/13/2019.

Actions Taken:

Implemented the manual fix identified as a stopgap on 06/13.

Mitigation Measures:

Added additional monitoring to notify SRE team when Redis Cache hits are over defined thresholds.

Implemented manual temporary stopgap measures and currently working on a permanent solution.

Posted Oct 09, 2019 - 14:32 EDT

Resolved
We are currently investigating degraded system performance. We will provide an update shortly. Thank you for your patience.

...

Our engineering team has identified the issue and services are returning to normal. We are continuing to monitor. Thank you for your patience.

...

All services are confirmed running as expected. We consider this incident to be resolved.
Posted Jun 17, 2019 - 11:11 EDT