System Performance Degradation
Incident Report for ServiceChannel
Postmortem

Date of Incident: 06/13/2019

Time/Date Incident Started: 06/13/2019, 10:35 am EST

Time/Date Stability Restored: 06/13/2019, 1:12 pm EST

Time/Date Incident Resolved: 06/13/2019, 1:18 pm EST

Users Impacted: Active users

Frequency: Intermittent

Impact: Major

Incident description:

System Performance Degradation where users were unable to login to the system, or experienced errors during the login process.

Root Cause Analysis:

We have identified an issue related to login session management in classic ASP code. This issue resulted in a number of cascading failures, which in turn created timeouts throughout the ServiceChannel platform.

Actions Taken:

Reverted code from previous release

Restarted Redis Cluster

Mitigation Measures:

Added additional monitoring to notify SRE team when Redis Cache hits are over defined thresholds.

Implemented manual temporary stopgap measures and currently working on a permanent solution.

Posted Oct 09, 2019 - 14:31 EDT

Resolved
We are currently investigating degraded system performance. We will provide an update shortly. Thank you for your patience.

...

We are continuing to investigate this issue, we thank you for your patience.

...

All services are confirmed running as expected. We consider this incident to be resolved.
Posted Jun 13, 2019 - 10:35 EDT