No components marked as affected
Write-up published
Resolved
We deployed a new version of a proxy system in the VTEX IO platform that caused the admin environment to go down. We rolled this deploy back in the first 10 minutes, but it did not fix the problem. We tried switching the traffic to our staging environment, which is used by all VTEX employees as a way of testing new versions, while we fixed the production one. This did not produce the outcome we expected because the staging environment is less resilient and also went down due to the sudden load increase. The change, however, alleviated the production environment load, allowing it to recover and the admin to start working again when we switched back.
Resolved
Between 16:41 PM and 18:02 PM UTC-3 (Brasilia), we experienced elevated error rates in our administrative environment. We will work to avoid this issues in the future. The service is now operating normally.
Monitoring
We can confirm an improvement in the elevated error rates in our platform. We are monitoring the result of our actions.
Identified
We are continuing to work towards full resolution of this issue. We continue to work on recovery.
Identified
We are now recovering from the increased error rates. We continue to work towards full recovery.
Investigating
We are continuing to investigate increased error rates in some parts of our platform.
Investigating
We are investigating increased error rates with our administrative environment ("myvtex.com").