Degraded Performance on MasterData Search
Incident Report for VTEX
Postmortem
Posted Aug 17, 2023 - 13:32 UTC

Resolved
On August 3rd, from 11:50 UTC to 00:27 UTC the platform presented a partial degradation on an internal component of our Master Data.

We want to assure you that our global sales flow remained unaffected during this incident. However, customers who rely on Master Data to customize their sales flow were affected.

We apologize for any inconvenience this may have caused.


Timeline:

At 11:50 UTC, we identified that one cluster stopped responding. Our engineering and support teams promptly engaged and began the restoration work for the affected cluster.

At 12:58 UTC, we paused the indexing of MasterData.

At 15:09 UTC, our engineering team initiated the Disaster Recovery process for a new cluster and started the reindexing of the entire cluster backup.

At 22:31 UTC, we successfully restored the original cluster, reinstating the service for all our customers.

At 22:40 UTC, we began the indexing process for the paused data and closely monitored the reestablishment of all affected clients.

At 00:27 UTC, the entire queue processing was normalized.
Posted Aug 04, 2023 - 00:50 UTC
Monitoring
We can confirm we have mitigated this incident for the majority of accounts affected. We are monitoring the result of our actions.
Posted Aug 03, 2023 - 22:10 UTC
Update
We are now recovering from the increased error rates. Most accounts should see significant improvements in indexing and search in Master Data and applications that depend on that functionality.
Posted Aug 03, 2023 - 21:40 UTC
Update
We are continuing to work towards full resolution of this issue, prioritizing the most affected accounts in our mitigation actions. We appreciate your patience as we continue committed to fully restoring our platform.
Posted Aug 03, 2023 - 20:35 UTC
Update
We are continuing to work towards full resolution of this issue, prioritizing the most affected accounts in our mitigation actions.
Posted Aug 03, 2023 - 18:58 UTC
Update
We are continuing to work towards full resolution of this issue. Some accounts should already experience normalized behavior of indexing and search in Master Data at this point. We are prioritizing the most affected accounts in our mitigation actions.
Posted Aug 03, 2023 - 18:23 UTC
Update
We are continuing to work towards full resolution of this issue. We are observing reduced delays in the processing of indexing queues for Master Data. We appreciate your patience as we continue committed to fully restoring our platform.
Posted Aug 03, 2023 - 17:09 UTC
Update
We are continuing to work towards full resolution of this issue. We have restarted indexing processes to gradually restore indexing and search functionality.
Posted Aug 03, 2023 - 16:07 UTC
Update
We are continuing to work towards full resolution of this issue. We started to observer healthier metrics for the degraded internal component in Master Data.
Posted Aug 03, 2023 - 15:47 UTC
Update
We are continuing to work towards full resolution of this issue. Our internal recovery actions are taking longer than expected, so we have initiated our disaster recovery plan.

We are fully committed to reestablishing our services as soon as possible, and appreciate your patience during this incident.
Posted Aug 03, 2023 - 15:14 UTC
Update
We are continuing to work towards full resolution of this issue. We were able to isolate the degraded component in Master Data, and will proceed to gradually restore indexing and search functionality.
Posted Aug 03, 2023 - 14:38 UTC
Update
We continue to work on recovery. Some customers have reported impact in applications that leverage Master Data, such as B2B Suite.
Posted Aug 03, 2023 - 14:03 UTC
Update
We are performing actions to recover the degraded component. Our global sales flow is not impacted during this incident. However, customers that use MasterData to customize their sales flow can be affected.
Posted Aug 03, 2023 - 13:28 UTC
Identified
An internal component of our MasterData is partially degraded, and some customers are experiencing an impact in the MasterData module. We are acting to mitigate the issue, and we'll update you soon.
Posted Aug 03, 2023 - 13:08 UTC
This incident affected: Administrative Environment and Internal Modules.