Infrastructure/Hardware Instability
Incident Report
Date of Incident:09/05/2023
Time/Date Incident Started: 09/05/2023, 09:15 am EDT
Time/Date Stability Restored:09/05/2023, 10:19 am EDT
Time/Date Incident Resolved:09/05/2023, 10:25 am EDT
Users Impacted: All
Frequency: Intermittent
Impact: Major
Incident description:
Third party vendor infrastructure/hardware instability
Root Cause Analysis:
A third-party vendor infrastructure issue affected performance and system availability for the underlying data storage layer that services platform resources.
Actions Taken:
Investigated system-generated alerts and identified affected platform functionality.
SRE and DBA teams initiated a platform infrastructure redeployment, forcing the new infrastructure to be spun up on unaffected infrastructure/hardware.
Mitigation Measures:
Continue the ongoing investigation into root causes of infrastructure issues within our cloud hosting provider.
Continue to implement high availability improvements to prepare the platform to respond better to unexpected hardware issues that are beyond our control.