Increased errors in login, regionalization and session manager
Affected components
Updates

Write-up published

Read it here

Resolved

This incident has been resolved. We will provide further information in a public incident report.

Summary

On Mar 17, 2024, from 10:30 to 15:52 UTC, stores experienced an increase in errors in Session Manager and VTEX ID. This was caused by a failure in the SSL certificate renewal of an internal dependency of these systems.

Our global sales flow was unaffected during this incident, however significant impact may have been experienced by stores dependent on login, regionalization or session manager to complete purchases

We apologize for any inconvenience this may have caused.

Timeline

At 10:30 UTC, the error rate for an internal dependency of Session Manager started increasing. Posterior analysis indicated that this was triggered by a failure in an SSL certificate renewal.

At 10:50 UTC, an on-call engineer was notified of an increase in errors in Session Manager. Our monitoring systems indicated that global sessions and orders were within forecasted levels. The impact in a subset of our stores was unclear at this point.

At 13:27 UTC, we received customer reports from stores that they were experiencing errors in login, regionalization or session manager dependent customizations.

At 13:44 UTC, our incident response team was notified of the issue and started investigating.

At 14:40 UTC, new logs were added to Session Manager to aid in the investigation, which had until then been inconclusive.

At 14:53 UTC, our team identified the issue was triggered by errors thrown by an internal dependency.

At 15:36 UTC, our team started deploying a fix to the internal dependency and errors started to decrease.

At 15:52 UTC, our team completed deploying the fix and the incident was fully mitigated.

Sun, Mar 17, 2024, 04:35 PM
37m earlier...

Monitoring

The fix for the issue in Session Manager has been implemented.

Stores should no longer be experiencing increased errors in login, regionalization, B2B and assisted sales flows.

Our incident response team is monitoring to guarantee that normal platform behavior is fully reestablished.

We will send an additional update in the next 30 minutes, or as soon as we have more information to share.

Sun, Mar 17, 2024, 03:57 PM
20m earlier...

Identified

We have identified the contributing factors of the issue in Session Manager.

Our monitoring systems indicate that orders and sessions are within forecasted levels for most stores, but significant impact may be experienced by stores dependent on login, regionalization or session manager to complete purchases. This is the case for some stores using B2B and telesales features.

Our incident response team estimates that the fix will be implemented in the next 30 minutes.

We will send an additional update in the next 30 minutes, or as soon as we have more information to share.

Sun, Mar 17, 2024, 03:36 PM
55m earlier...

Investigating

We are currently investigating an issue in Session Manager.

Shoppers at affected accounts may be experiencing shopping cart instabilities and regionalization errors.

Our incident response team is working to identify the root cause and implement a solution.

We will send an additional update in the next 30 minutes, or as soon as we have more information to share.

Sun, Mar 17, 2024, 02:41 PM