Elevated Errors in the Platform
Resolved
Lasted for 1h

Between 16:56 and 17:12 UTC, our shoppers and merchants experienced a major outage in the platform caused by CPU saturation in the License Manager module.

At 16:56 UTC, the License Manager module observed an abnormal increase in CPU usage. This exceeded the normal levels, indicating a potential issue. Subsequently, at 16:58 UTC, our alarms were triggered, and the platform experienced a major outage, affecting both the Sales flow and the Administrative environment.

To address the situation, we promptly took action. At 17:03 UTC, we increased the minimum nodes of the License Manager module. This adjustment aimed to alleviate the strain on the system and restore stability. By 17:07 UTC, the CPU usage of the License Manager had returned to the normal threshold.

The recovery process of the platform's nominal metrics started at 17:10 UTC. Finally, at 17:12 UTC, the platform had fully recovered, and we ensured ongoing monitoring of the License Manager and its dependencies to prevent further incidents.

Tue, Jul 11, 2023, 06:18 PM
9 months ago
Affected components

No components marked as affected

Updates

Resolved

Between 16:56 and 17:12 UTC, our shoppers and merchants experienced a major outage in the platform caused by CPU saturation in the License Manager module.

At 16:56 UTC, the License Manager module observed an abnormal increase in CPU usage. This exceeded the normal levels, indicating a potential issue. Subsequently, at 16:58 UTC, our alarms were triggered, and the platform experienced a major outage, affecting both the Sales flow and the Administrative environment.

To address the situation, we promptly took action. At 17:03 UTC, we increased the minimum nodes of the License Manager module. This adjustment aimed to alleviate the strain on the system and restore stability. By 17:07 UTC, the CPU usage of the License Manager had returned to the normal threshold.

The recovery process of the platform's nominal metrics started at 17:10 UTC. Finally, at 17:12 UTC, the platform had fully recovered, and we ensured ongoing monitoring of the License Manager and its dependencies to prevent further incidents.

Tue, Jul 11, 2023, 06:18 PM
41m earlier...

Monitoring

We are continuing to monitor our platform for any further issues. The user experience in the platform is back to nominal.

Tue, Jul 11, 2023, 05:37 PM

Monitoring

VTEX Sales flow and Administrative environment are back to normal.

We have experienced major degradation in our License Manager module that impacted the sales flow and administrative environment. Our team quickly identified and fixed the issue. We are now monitoring the environment.

Tue, Jul 11, 2023, 05:28 PM
10m earlier...

Identified

We are now recovering from the increased error rates. We continue to work towards full recovery.

Tue, Jul 11, 2023, 05:18 PM

Investigating

We are investigating increased error rates in our platform.

Tue, Jul 11, 2023, 05:09 PM