Date of Incident: 07/26/2019
Time/Date Incident Started: 07/26/2019, 09:53 am EDT
Time/Date Stability Restored: 07/26/2019, 10:35 am EDT
Time/Date Incident Resolved: 07/26/2019, 11:29 am EDT
Users Impacted: Some active users
System Performance Degradation resulting in slow login, dashboard, and work order list performance.
Root Cause Analysis:
An intermittent failure affecting one read replica in our production database cluster caused certain application requests to be delayed, thereby intermittently impacting performance in login, dashboards, and work order lists. Stability was restored by 10:35am EDT, and the incident was declared resolved at 11:29am EDT.
After replacing the affected node, some users briefly experienced a performance impact while database indexes were rebuilt on the new node. The index rebuild completed at 12:30pm EDT.