On December 3rd, Calibreapp.com suffered approximately 1 hour 30 minutes of downtime following difficulties during a routine data migration followed by a period of degraded performance.
During the data migration, tests recorded prior to December 3rd were temporarily unavailable to view. New tests were being conducted, but delayed in aggregation due to the ongoing data migration and also temporarily unavailable.
No data was lost.
45 minutes into the data migration we noticed drastically degraded Postgres database performance, which brought Calibreapp.com down for almost an hour.
Calibreapp.com was brought back up while still experiencing degraded performance due to the migration load.
A routine vacuum and automatic daily database backups started running and operating on the same table that was being migrated, which caused further issues.
By Tuesday the migration had progressed to process data back to September 2018, which meant that timeline metrics were 100% available, but detailed reports of those tests were still unavailable for view.
We continued to monitor the database.
Following numerous process efficiency fixes and replacing a database replica the remaining queue backlog was processed smoothly and the service came back to full availability.