Date of Incident: 02/16/2021
Time/Date Incident Started: 02/16/2021, 05:09pm EDT
Time/Date Stability Restored: 02/16/2021, 06:06pm EDT
Time/Date Incident Resolved: 02/16/2021, 08:08pm EDT
Users Impacted: All API Users
Frequency: Sustained
Impact: Outage
Incident description:
Servicechannel API failure impacting a broad range of API endpoints.
Root Cause Analysis:
Due to an operational oversight, a system service account used by many API endpoints was accidentally deleted.
In the course of testing permissions for a new production service account, a member of the ServiceChannel SRE team referenced an existing production API service account to review roles and permissions for the new test production service account.
Once the testing had concluded, the SRE engineer deleted the test production service account without realizing that the existing production API service account was also selected for deletion. The test and production account were deleted together.
Without the production API services account, many API calls were unable to authenticate against ServiceChannel resources, causing a broad range of API requests to fail. Since API calls are used by both customers and the ServiceChannel platform alike, this issue manifested itself as a wide range of system issues, but they all had the same root cause.
Only a small number of people within the ServiceChannel SRE team are granted permissions to create or delete system accounts like the one identified in this incident.
Actions Taken:
Mitigation Measures: