Date/Time Incident Started
Feb 24, 2021, 4:30 PM
Date/Time Stability Restored
Feb 24, 2021, 9:30 PM
Date/Time Incident Resolved
Feb 24, 2021, 10:00 PM
Users Impacted
Few
Frequency
Continuous
Impact
Minor
Description
Degraded performance while processing Notification messages resulting in delayed processing of automatic assignments, universal connector integrations, and webhooks.
Root Cause Analysis
Around 4:40 PM EST, a template was submitted on behalf of a ServiceChannel client by our support team. This caused a large number of messages queued for processing. The resulting long queue caused issues submitted to the auto assignment process, universal connector, and webhooks to be delayed until the queues could be processed.
Actions Taken
- Identified a large template that was submitted on the behalf of a client
- Observed a large number of messages queued for processing (>300K)
- Restarted application services that appeared to have been stalled to increase queue processing throughput
- Adjusted instance counts for worker processes to improve queue processing performance
Mitigation Measures
- Implement functionality to better handle large templates with asynchronous, stepwise processing
- Investigate functionality to ignore messages without subscriptions
- Review system monitors to improve detection of stalled queue worker processes
- Improve queue processing performance through worker processor optimizations