Elevated Errors with administrative environment

Resolved

Lasted for 4d

We’ve published a write-up of this incidentRead the write-up

Read it here

Affected components

No components marked as affected

Updates

Write-up published

Read it here

Resolved

We deployed a new version of a proxy system in the VTEX IO platform that caused the admin environment to go down. We rolled this deploy back in the first 10 minutes, but it did not fix the problem. We tried switching the traffic to our staging environment, which is used by all VTEX employees as a way of testing new versions, while we fixed the production one. This did not produce the outcome we expected because the staging environment is less resilient and also went down due to the sudden load increase. The change, however, alleviated the production environment load, allowing it to recover and the admin to start working again when we switched back.

Tue, Aug 7, 2018, 07:16 PM

4d earlier...

Resolved

Between 16:41 PM and 18:02 PM UTC-3 (Brasilia), we experienced elevated error rates in our administrative environment. We will work to avoid this issues in the future. The service is now operating normally.

Thu, Aug 2, 2018, 09:41 PM

35m earlier...

Monitoring

We can confirm an improvement in the elevated error rates in our platform. We are monitoring the result of our actions.

Thu, Aug 2, 2018, 09:05 PM

Identified

We are continuing to work towards full resolution of this issue. We continue to work on recovery.

Thu, Aug 2, 2018, 08:58 PM

24m earlier...

Identified

We are now recovering from the increased error rates. We continue to work towards full recovery.

Thu, Aug 2, 2018, 08:33 PM

21m earlier...

Investigating

We are continuing to investigate increased error rates in some parts of our platform.

Thu, Aug 2, 2018, 08:12 PM

30m earlier...

Investigating

We are investigating increased error rates with our administrative environment ("myvtex.com").

Thu, Aug 2, 2018, 07:41 PM