At 03:14 PM we lost one of our Search engine nodes used by our Logistics service and immediately started the recovery process which was finished at 03:33 PM with full recovery of our service.
But at 05:27 PM UTC-3 we lost another node of the same cluster. We started the recovery process again until 05:51 PM UTC-3 when the service was fully recovered.
During the recovery period we experienced elevated error rates in our platform. This incident had a partial impact on sales, affecting mainly our Logistics service which is part of the critical path of Checkout.
The service is now operating normally, we have identified the cause of the issue, and our engineering team is testing measures to prevent it.
Apr 2, 19:55 GMT-03:00
We can confirm an improvement in the elevated error rates in our platform. We are monitoring the result of our actions.
Apr 2, 18:36 GMT-03:00
We are now recovering from the increased error rates. We continue to work towards full recovery.
Apr 2, 18:34 GMT-03:00
Since 05:27 PM(UTC-3) we are experienced high error rates in our Logistics service, responsible for calculate freights
We already have identified the root cause and We are working to resolve this issue.
Apr 2, 18:22 GMT-03:00