Date of Incident: 06/17/2019
Time/Date Incident Started: 06/17/2019, 11:11 am EST
Time/Date Stability Restored: 06/17/2019, 11:30 pm EST
Time/Date Incident Resolved: 06/17/2019, 11:40 pm EST
Users Impacted: Some users
Frequency: Intermittent
Impact: Minor
Incident description:
System Performance Degradation where users were unable to login to the system, or experienced errors during the login process.
Root Cause Analysis:
We have identified an issue related to login session management in classic ASP code. This issue resulted in a number of cascading failures, which in turn created timeouts throughout the ServiceChannel platform. This issue is related to the issue identified on 06/13/2019.
Actions Taken:
Implemented the manual fix identified as a stopgap on 06/13.
Mitigation Measures:
Added additional monitoring to notify SRE team when Redis Cache hits are over defined thresholds.
Implemented manual temporary stopgap measures and currently working on a permanent solution.