Opalstack - Emergency MariaDB maintenance for opal4, opal7, and opal9 – Incident details

Emergency MariaDB maintenance for opal4, opal7, and opal9

Resolved
Operational
Started about 1 year agoLasted about 15 hours

Affected

Americas Hosting

Operational from 7:23 AM to 10:21 PM

Web Hosting - Shared

Operational from 7:23 AM to 10:21 PM

opal4.opalstack.com

Operational from 7:23 AM to 10:21 PM

opal7.opalstack.com

Operational from 7:23 AM to 10:21 PM

Europe Hosting

Operational from 7:23 AM to 10:21 PM

Web Hosting - Shared

Operational from 7:23 AM to 10:21 PM

Updates
  • Resolved
    Resolved

    The MariaDB recovery on opal9 is complete. Databases with names starting with tro through zzz are being restored from our 2 December 2023 backup.

    Our plan now is to test the recovery process on a test server to determine why we were not able to import the dump files and then refine our maintenance procedures based on those findings.

  • Update
    Update

    The MariaDB recovery on opal9 is still in progress. We expect the recovery to be complete in around 90 minutes.

  • Update
    Update

    The MariaDB recovery on opal7 is complete.

    The MariaDB recovery on opal9 is still in progress. We'll update with an ETA as soon as possible.

  • Update
    Update

    The MariaDB recovery on opal7 and opal9 is still in progress. We expect that it will take at least a couple of hours more to complete.

  • Update
    Update

    The MariaDB recovery on opal7 and opal9 is still in progress. We expect that it will take at least a few more hours to complete.

    The MariaDB recovery on opal4 is now complete, however we were not able to use the most recent backup for about 50% of the recovered databases. The remaining databases were recovered as follows:

    • Databases with names starting with lsa through shn were restored from our 2 December 2023 backup.
    • Databases with names starting with sho through zzz were restored from our 1 December 2023 backup.
  • Update
    Update

    It seems some opal4 databases were not restored. We're restoring those databases now.

    The recovery on opal7 and opal9 is still in progress.

  • Update
    Update

    The MariaDB recovery on opal4 is complete. Recovery on opal7 and opal9 is still in progress.

  • Update
    Update

    The database restore is still in progress. We expect that it will take at least a few more hours.

  • Update
    Update

    What happened?

    We took the MariaDB database down to stop the ibdatafile from growing exponentially. The only way to do this is to dump all databases, delete the log files, update the server configuration, and restore the data.

    The dump process worked without any errors. The clean up and server configuration completed without any errors. The restore process is where things failed.

    Because of errors in 1 or more database(s) transactions failed to complete causing the restore to never get past a certain point. The amount of data in the mysql data directory would increase up to a certain point and then the data size would drop by half before repeating. After running the restore process twice and having errors in different spots we decided to extract each database from the monolithic backup we had taken previously. Why a monolithic backup? It's usually faster to dump and restore, except in this case.

    The extraction process is painfully slow compared to just running a working dump restore. That's why this process is taking so long.

    What didn't we do?

    We could have restored from the latest backup which would have been 24 hours or less old, however, that option comes with the significant risk of data loss. Rather than risk losing 24 hours of data we went with the slower, safer process.

    The current restore process for each server is now past the previous failure point.

    We'd sincerely apologize for the downtime this has caused for your apps that use MariaDB. In the tests we performed before the actual event we did not run into any of these errors.

  • Update
    Update

    The data restoration process is still ongoing. There is currently no ETA we can provide.

  • Monitoring
    Monitoring

    Opal4, Opal7, Opal9: We are still restoring data to the MariaDB databases. The exact amount of time left in the restoration process is unknown but we will update every hour until all of the restores have finished.

  • Investigating
    Investigating

    On Sunday, 03 December 2023 at 0500 UTC we'll be taking the managed MariaDB database service offline for maintenance on the following servers:

    • opal4.opalstack.com (Dallas)
    • opal7.opalstack.com (Phoenix)
    • opal9.opalstack.com (Frankfurt)

    The maintenance window is 2 hours. During the maintenance sites and applications which use MariaDB (including WordPress sites) will not function.

    We apologize for the short notice and any inconvenience. If you have any questions or concerns regarding the maintenance then please contact our support team.