Campaign Automation Participants Delayed in US Data Center
Incident Report for ClickDimensions
Resolved
This incident has been resolved.

At the end of March, ClickDimensions released a code change for an A/B testing “use case” that saw some participants remaining in “running” status even though they had completed an automation. The code release had no issues when first deployed into production. However, after about a month in production the ClickDimensions campaign automation backend Azure services began having “scaling” issues - we began seeing Azure nodes “drop out” of service. This resulted in Campaign Automations delays. We immediately escalated the issue with Microsoft who was unable to determine a root cause. At the time it was assumed an Azure upgrade may be the root cause. As the issue persisted and out of an abundance of caution, ClickDimensions decided to “back out” the 2023.03 code changes. After the code was “backed-out” and the services restarted, we no longer experienced the “scaling” issue. Microsoft is still unable to provide an explanation of why only one ClickDimensions Azure region was affected and why it took almost a month for the “scaling” issues to manifest. However, all campaign automation delays have now been remediated. We will continue to work internally (and with Microsoft) to review the offending code to determine why it caused Azure “scaling” issues.

Regarding IP changes mentioned in earlier status updates, customers do not need to make these changes at this time as the issue was resolved without implementing a new IP.
Posted Jun 13, 2023 - 12:07 UTC
Update
Our team has been able to implement a fix and we are seeing improved performance.

At this time we are not seeing delays in progression and will continue to monitor for a few days. We will provide a root cause in the coming days.
Posted Jun 08, 2023 - 12:26 UTC
Update
For customers on our US Data center: a new Campaign Automation IP Address has been created, thus needs to be whitelisted on any relevant firewalls. The new IP address is 20.98.25.64/28 (20.98.25.64 / 255.255.255.240).

EU Data Center customers: no IP changes at this time.

We continue to work with Microsoft and will share updates here.
Posted Jun 05, 2023 - 12:00 UTC
Monitoring
Our team is continuing to work with our Microsoft cloud services provider for a long term fix for this issue. We have a critical severity escalation with their App Service team at this time.

You should not experience any data loss or participant failures due to this issue. If you are seeing participant delays of more than a few hours at each node, please alert the support team of this for additional investigation.
Posted Jun 01, 2023 - 19:31 UTC
Investigating
Currently our technology team has narrowed down the issue to our Microsoft Services.  We have a critical severity escalation with their App Service team at this time and continue and hard at work to expedite the resolution of this issue.
Posted May 25, 2023 - 12:31 UTC
Update
Our team continues to investigate this issue. We are seeing intermittent delays in the processing of Campaign Automations, primarily in our US data center.
Posted May 23, 2023 - 14:41 UTC
Update
We have implemented a few measures to mitigate the issue while working on the long-term fix and we are monitoring the results. If you are observing any delays with Campaign Automations, please contact our Support Team for further troubleshooting and validation.
Posted May 11, 2023 - 12:41 UTC
Monitoring
A fix has been implemented to mitigate the issue and we are monitoring the results.
Posted May 04, 2023 - 18:48 UTC
Investigating
We are currently investigating a recurring issue causing delays for participants in Campaign Automations in the US and EU data center.
Posted May 04, 2023 - 15:27 UTC
This incident affected: US Data Center (Campaign Automations) and EU Data Center (Campaign Automations).