Servicechannel performance degradation
Incident Report for ServiceChannel
Postmortem

Date/Time Incident Started

Feb 24, 2021, 4:30 PM

Date/Time Stability Restored

Feb 24, 2021, 9:30 PM

Date/Time Incident Resolved

Feb 24, 2021, 10:00 PM

Users Impacted

Few

Frequency

Continuous

Impact

Minor

Description

Degraded performance while processing Notification messages resulting in delayed processing of automatic assignments, universal connector integrations, and webhooks.

Root Cause Analysis

Around 4:40 PM EST, a template was submitted on behalf of a ServiceChannel client by our support team. This caused a large number of messages queued for processing. The resulting long queue caused issues submitted to the auto assignment process, universal connector, and webhooks to be delayed until the queues could be processed.

Actions Taken

  1. Identified a large template that was submitted on the behalf of a client
  2. Observed a large number of messages queued for processing (>300K)
  3. Restarted application services that appeared to have been stalled to increase queue processing throughput
  4. Adjusted instance counts for worker processes to improve queue processing performance

Mitigation Measures

  1. Implement functionality to better handle large templates with asynchronous, stepwise processing
  2. Investigate functionality to ignore messages without subscriptions
  3. Review system monitors to improve detection of stalled queue worker processes
  4. Improve queue processing performance through worker processor optimizations
Posted Mar 24, 2022 - 09:02 EDT

Resolved
The ServiceChannel platform is no longer experiencing delays in any of our processes.
All systems have returned to normal. We consider this issue resolved.
Posted Feb 24, 2021 - 22:02 EST
Monitoring
All delayed queues have caught up and are processing as expected. We are now monitoring the system to ensure stabilization.
Posted Feb 24, 2021 - 21:51 EST
Identified
The ServiceChannel platform is currently experiencing some delays, affecting the following processes:

* auto assignment processes
* universal connector
* webhooks

Queues are running and events are being processed. We will provide an update shortly about when we expect the processing backlog to be completed.
Posted Feb 24, 2021 - 20:05 EST
This incident affected: API (API Response).