US - Authentication Errors due to Networking Issue
Incident Report for ClickDimensions
Postmortem

Issue Description: Users unable to sign into Click services due to an Authentication outage.

Timeline: An approved maintenance window scheduled to run an upgrade of a critical piece of infrastructure completed successfully on Wednesday at 8am Irish. During post migration testing, an issue was identified on the migrated infrastructure relating to the authentication service and a High Severity ticket was raised with our vendor to address the issue. Our vendor engaged with us within 20 mins and there was an interactive period of investigation. At 12pm Irish, the issue was identified as a configuration fault with our Vendor and the issue was escalated to the vendor product team for resolution. At 1:05pm, the vendor product team implemented a fix by correcting an infrastructure DNS misconfiguration and all Click services were restored by 1:30pm.

RCA: The authentication service was successfully migrated to new infrastructure but automated migration processes (managed by our vendor) did not complete the DNS updates correctly. Our vendor corrected this misconfiguration within an hour of identifying the fault.

Mitigation: The migration activity was a one-time operation needed to move Click services onto the latest supported platforms. It was fully vetted by our vendor prior to the maintenance window so no issue was expected. No further migration activity is required so there is no risk of a repeat of this incident.

Posted Aug 14, 2024 - 14:05 UTC

Resolved
This incident has been resolved.
Posted Aug 07, 2024 - 13:40 UTC
Monitoring
A fix has been implemented and we are monitoring results. Services have been restarted and may take up to 30 minutes to stabilize. We recommend customers clear cookies/cache if Authentication Errors persist.
Posted Aug 07, 2024 - 12:24 UTC
Identified
The issue has been identified and a fix is being implemented.
Posted Aug 07, 2024 - 12:16 UTC
Update
We are continuing to investigate this issue.
Posted Aug 07, 2024 - 12:10 UTC
Investigating
We are currently investigating a networking issue, leading to Authentication Errors throughout our product. We are working with Microsoft on resolution and will share updates here as they become available.
Posted Aug 07, 2024 - 11:41 UTC
This incident affected: US Data Center (Campaign Automations, CRM SDK / Connectivity, Email, Email Statistics, Eventbrite Connector, Forms, GoToWebinar Connector, Image Manager, Import Tool, Landing Pages, Lead Scoring, Marketing Calendar, Service Bus Relay Connector, SMS, Social Posting, Solution, Subscription Management, Surveys, Web Tracking).