Troubleshooting for Aurora MySQL in-place upgrade - Amazon Aurora

Troubleshooting for Aurora MySQL in-place upgrade

Use the following tips to help troubleshoot problems with Aurora MySQL in-place upgrades. These tips don't apply to Aurora Serverless DB clusters.

Reason for in-place upgrade being canceled or slow Effect Solution to allow in-place upgrade to complete within maintenance window
Associated Aurora cross-Region replica isn't patched yet Aurora cancels the upgrade. Upgrade the Aurora cross-Region replica and try again.
Cluster has XA transactions in the prepared state Aurora cancels the upgrade. Commit or roll back all prepared XA transactions.
Cluster is processing any data definition language (DDL) statements Aurora cancels the upgrade. Consider waiting and performing the upgrade after all DDL statements are finished.
Cluster has uncommitted changes for many rows Upgrade might take a long time.

The upgrade process rolls back the uncommitted changes. The indicator for this condition is the value of TRX_ROWS_MODIFIED in the INFORMATION_SCHEMA.INNODB_TRX table.

Consider performing the upgrade only after all large transactions are committed or rolled back.

Cluster has high number of undo records Upgrade might take a long time.

Even if the uncommitted transactions don't affect a large number of rows, they might involve a large volume of data. For example, you might be inserting large BLOBs. Aurora doesn't automatically detect or generate an event for this kind of transaction activity. The indicator for this condition is the history list length (HLL). The upgrade process rolls back the uncommitted changes.

You can check the HLL in the output from the SHOW ENGINE INNODB STATUS SQL command, or directly by using the following SQL query:

SELECT count FROM information_schema.innodb_metrics WHERE name = 'trx_rseg_history_len';

You can also monitor the RollbackSegmentHistoryListLength metric in Amazon CloudWatch.

Consider performing the upgrade only after the HLL is smaller.

Cluster is in the process of committing a large binary log transaction Upgrade might take a long time.

The upgrade process waits until the binary log changes are applied. More transactions or DDL statements could start during this period, further slowing down the upgrade process.

Schedule the upgrade process when the cluster isn't busy with generating binary log replication changes. Aurora doesn't automatically detect or generate an event for this condition.

Schema inconsistencies resulting from file removal or corruption Aurora cancels the upgrade.

Change the default storage engine for temporary tables from MyISAM to InnoDB. Perform the following steps:

  1. Modify the default_tmp_storage_engine DB parameter to InnoDB.

  2. Reboot the DB cluster.

  3. After rebooting, confirm that the default_tmp_storage_engine DB parameter is set to InnoDB. Use the following command:

    show global variables like 'default_tmp_storage_engine';
  4. Make sure not to create any temporary tables that use the MyISAM storage engine. We recommend that you pause any database workload and not create any new database connections, because you're upgrading soon.

  5. Try the in-place upgrade again.

Master user was deleted Aurora cancels the upgrade.
Important

Don't delete the master user.

However, if for some reason you should happen to delete the master user, restore it using the following SQL commands:

CREATE USER 'master_username'@'%' IDENTIFIED BY 'master_user_password' REQUIRE NONE PASSWORD EXPIRE DEFAULT ACCOUNT UNLOCK; GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, RELOAD, PROCESS, REFERENCES, INDEX, ALTER, SHOW DATABASES, CREATE TEMPORARY TABLES, LOCK TABLES, EXECUTE, REPLICATION SLAVE, REPLICATION CLIENT, CREATE VIEW, SHOW VIEW, CREATE ROUTINE, ALTER ROUTINE, CREATE USER, EVENT, TRIGGER, LOAD FROM S3, SELECT INTO S3, INVOKE LAMBDA, INVOKE SAGEMAKER, INVOKE COMPREHEND ON *.* TO 'master_username'@'%' WITH GRANT OPTION;

For more details on troubleshooting issues that cause upgrade prechecks to fail, see the following blogs:

You can use the following steps to perform your own checks for some of the conditions in the preceding table. That way, you can schedule the upgrade at a time when you know the database is in a state where the upgrade can complete successfully and quickly.

  • You can check for open XA transactions by executing the XA RECOVER statement. You can then commit or roll back the XA transactions before starting the upgrade.

  • You can check for DDL statements by executing a SHOW PROCESSLIST statement and looking for CREATE, DROP, ALTER, RENAME, and TRUNCATE statements in the output. Allow all DDL statements to finish before starting the upgrade.

  • You can check the total number of uncommitted rows by querying the INFORMATION_SCHEMA.INNODB_TRX table. The table contains one row for each transaction. The TRX_ROWS_MODIFIED column contains the number of rows modified or inserted by the transaction.

  • You can check the length of the InnoDB history list by executing the SHOW ENGINE INNODB STATUS SQL statement and looking for the History list length in the output. You can also check the value directly by running the following query:

    SELECT count FROM information_schema.innodb_metrics WHERE name = 'trx_rseg_history_len';

    The length of the history list corresponds to the amount of undo information stored by the database to implement multi-version concurrency control (MVCC).