Migrate an on-premises Oracle database to Amazon RDS for Oracle by using direct Oracle Data Pump Import over a database link
Created by Rizwan Wangde (AWS)
Environment: Production | Source: On-premises Oracle database | Target: Amazon RDS for Oracle |
R Type: Replatform | Workload: Oracle | Technologies: Migration; Databases |
AWS services: AWS DMS; AWS Direct Connect; Amazon RDS |
Summary
Numerous patterns cover migrating on-premises Oracle databases to Amazon RDS for Oracle by using Oracle Data Pump, a native Oracle utility that is the preferred way to migrate large Oracle workloads. These patterns typically involve exporting application schemas or tables into dump files, transferring the dump files to a database directory on Amazon RDS for Oracle, and then importing the application schemas and data from the dump files.
Using that approach, a migration can take longer depending on the size of the data and the time that it takes to transfer the dump files to the Amazon RDS instance. In addition, the dump files reside on the Amazon RDS instance's Amazon Elastic Block Store (Amazon EBS) volume, which must be large enough for the database and the dump files. When the dump files are deleted after import, the empty space cannot be retrieved, so you continue to pay for unused space.
This pattern mitigates those issues by performing a direct import on the Amazon RDS instance by using the Oracle Data Pump API (DBMS_DATAPUMP
) over a database link. The pattern initiates a simultaneous export and import pipeline between the source and target databases. This pattern doesn't require sizing an EBS volume for the dump files because no dump files are created or stored on the volume. This approach saves the monthly cost of unused disk space.
Prerequisites and limitations
Prerequisites
An active Amazon Web Services (AWS) account.
A virtual private cloud (VPC) configured with private subnets across at least two Availability Zones, to provide the network infrastructure for the Amazon RDS instance.
An Oracle database in an on-premises data center.
An existing Amazon RDS Oracle
instance in a single Availability Zone. Using a single Availability Zone improves write performance during migration. A Multi-AZ deployment can be enabled 24–48 hours before cutover. AWS Direct Connect
(recommended for large sized databases). Network connectivity and firewall rules on premises configured to allow an inbound connection from the Amazon RDS instance to the on-premises Oracle database.
Limitations
The database size limit on Amazon RDS for Oracle is 64 TiB (as of December 2022).
Product versions
Source database: Oracle Database version 10g Release 1 and later.
Target database: For the latest list of supported versions and editions on Amazon RDS, see Amazon RDS for Oracle in the AWS documentation.
Architecture
Source technology stack
Self-managed Oracle database on premises or in the cloud
Target technology stack
Amazon RDS for Oracle
Target architecture
The following diagram shows the architecture for migrating from an on-premises Oracle database to Amazon RDS for Oracle in a Single-AZ environment. The arrow directions depict the data flow in the architecture. The diagram doesn't show what component is initiating the connection.
The Amazon RDS for Oracle instance connects to the on-premises source Oracle database to perform a full-load migration over the database link.
AWS DMS connects to the on-premises source Oracle database to perform ongoing replication by using change data capture (CDC).
CDC changes are applied to the Amazon RDS for Oracle database.
Tools
AWS services
AWS Database Migration Service (AWS DMS) helps you migrate data stores into the AWS Cloud or between combinations of cloud and on-premises setups. This pattern uses CDC and the Replicate data changes only setting.
AWS Direct Connect links your internal network to a Direct Connect location over a standard Ethernet fiber-optic cable. With this connection, you can create virtual interfaces directly to public AWS services while bypassing internet service providers in your network path.
Amazon Relational Database Service (Amazon RDS) for Oracle helps you set up, operate, and scale an Oracle relational database in the AWS Cloud.
Other tools
Oracle Data Pump
helps you move data and metadata from one database to another at high speeds. Client tools such as Oracle Instant Client
or SQL Developer are used to connect and run SQL queries on the database.
Best practices
Although AWS Direct Connect
A virtual private network (VPN) using Amazon Site-to-Site VPN or an IPsec VPN connection from the on-premises network to the AWS network
Oracle Database Native Network Encryption
configured on the on-premises Oracle database Encryption using TLS
Epics
Task | Description | Skills required |
---|---|---|
Set up network connectivity from the target database to the source database. | Configure the on-premises network and firewall to allow incoming connection from the target Amazon RDS instance to the on-premises source Oracle database. | Network administrator, Security engineer |
Create a database user with the appropriate privileges. | Create a database user in the on-premises source Oracle database with privileges to migrate data between the source and target using Oracle Data Pump.
| DBA |
Prepare the on-premises source database for AWS DMS CDC migration. | (Optional) Prepare the on-premises source Oracle database for AWS DMS CDC migration after completion of Oracle Data Pump Full Load:
| DBA |
Install and configure SQL Developer. | Install and configure SQL Developer | DBA, Migration engineer |
Generate a script to create the tablespaces. | Use the following example SQL query to generate the script on the source database.
The script will be applied on the target database. | DBA |
Generate a script to create users, profiles, roles, and privileges. | To generate a script to create the database users, profiles, roles, and privileges, use the scripts from the Oracle Support document How to Extract DDL for User including Privileges and Roles Using dbms_metadata.get_ddl (Doc ID 2739952.1) The script will be applied on the target database. | DBA |
Task | Description | Skills required |
---|---|---|
Create a database link to the source database and verify connectivity. | To create a database link to the on-premises source database, you can use the following example command.
To verify connectivity, run the following SQL command.
Connectivity is successful if the response is | DBA |
Run the scripts to prepare the target instance. | Run the previously generated scripts to prepare the target Amazon RDS for Oracle instance:
This helps ensure that the Oracle Data Pump migration can create the schemas and their objects. | DBA, Migration engineer |
Task | Description | Skills required |
---|---|---|
Migrate the required schemas. | To migrate the required schemas from the source on-premises database to the target Amazon RDS instance, use the code in the Additional information section:
To tune the performance of the migration, you can adjust the number of parallel processes by running the following command.
| DBA |
Gather schema statistics to improve performance. | The Gather Schema Statistics command returns the Oracle query optimizer statistics gathered for database objects. By using this information, the optimizer can select the best execution plan for any query against these objects.
| DBA |
Task | Description | Skills required |
---|---|---|
Capture the SCN on the source on-premises Oracle database. | Capture the system change number (SCN) To generate the current SCN on the source database, run the following SQL statement.
| DBA |
Perform the full-load migration of the schemas. | To migrate the required schemas (
In the code, replace
To tune the performance of the migration, you can adjust the number of parallel processes.
| DBA |
Disable the triggers under the migrated schemas. | Before you begin the AWS DMS CDC-only task, disable the | DBA |
Gather schema statistics to improve performance. | The Gather Schema Statistics command returns the Oracle query optimizer statistics gathered for database objects. By using this information, the optimizer can select the best execution plan for any query against these objects.
| DBA |
Use AWS DMS to perform an ongoing replication from the source to target. | Use AWS DMS to perform an ongoing replication from the source Oracle database to the target Amazon RDS for Oracle instance. For more information, see Creating tasks for ongoing replication using AWS DMS and the blog post How to work with native CDC support in AWS DMS | DBA, Migration engineer |
Task | Description | Skills required |
---|---|---|
Enable Multi-AZ on the instance 48 hours before cutover. | If this is a production instance, we recommend enabling Multi-AZ deployment on the Amazon RDS instance to provide the benefits of high availability (HA) and disaster recovery (DR). | DBA, Migration engineer |
Stop the AWS DMS CDC-only task (if CDC was turned on). |
| DBA |
Enable the triggers. | Enable the TRIGGERS that you disabled before the CDC task was created. | DBA |
Related resources
AWS
Oracle documentation
Additional information
Code 1: Full-load migration only, single application schema
DECLARE v_hdnl NUMBER; BEGIN v_hdnl := DBMS_DATAPUMP.OPEN(operation => 'IMPORT', job_mode => 'SCHEMA', remote_link => '<DB LINK Name to Source Database>', job_name => null); DBMS_DATAPUMP.ADD_FILE( handle => v_hdnl, filename => 'import_01.log', directory => 'DATA_PUMP_DIR', filetype => dbms_datapump.ku$_file_type_log_file); DBMS_DATAPUMP.METADATA_FILTER(v_hdnl,'SCHEMA_EXPR','IN (''<schema_name>'')'); -- To migrate one selected schema DBMS_DATAPUMP.METADATA_FILTER (hdnl, 'EXCLUDE_PATH_EXPR','IN (''STATISTICS'')'); -- To prevent gathering Statistics during the import DBMS_DATAPUMP.SET_PARALLEL (handle => v_hdnl, degree => 4); -- Number of parallel processes performing export and import DBMS_DATAPUMP.START_JOB(v_hdnl); END; /
Code 2: Full-load migration only, multiple application schemas
DECLARE v_hdnl NUMBER; BEGIN v_hdnl := DBMS_DATAPUMP.OPEN(operation => 'IMPORT', job_mode => 'SCHEMA', remote_link => '<DB LINK Name to Source Database>', job_name => null); DBMS_DATAPUMP.ADD_FILE( handle => v_hdnl, filename => 'import_01.log', directory => 'DATA_PUMP_DIR', filetype => dbms_datapump.ku$_file_type_log_file); DBMS_DATAPUMP.METADATA_FILTER (v_hdnl, 'SCHEMA_LIST', '''<SCHEMA_1>'',''<SCHEMA_2>'', ''<SCHEMA_3>'''); -- To migrate multiple schemas DBMS_DATAPUMP.METADATA_FILTER (v_hdnl, 'EXCLUDE_PATH_EXPR','IN (''STATISTICS'')'); -- To prevent gathering Statistics during the import DBMS_DATAPUMP.SET_PARALLEL (handle => v_hdnl, degree => 4); -- Number of parallel processes performing export and import DBMS_DATAPUMP.START_JOB(v_hdnl); END; /
Code 3: Full-load migration before CDC-only task, single application schema
DECLARE v_hdnl NUMBER; BEGIN v_hdnl := DBMS_DATAPUMP.OPEN(operation => 'IMPORT', job_mode => 'SCHEMA', remote_link => '<DB LINK Name to Source Database>', job_name => null); DBMS_DATAPUMP.ADD_FILE( handle => v_hdnl, filename => 'import_01.log', directory => 'DATA_PUMP_DIR', filetype => dbms_datapump.ku$_file_type_log_file); DBMS_DATAPUMP.METADATA_FILTER(v_hdnl,'SCHEMA_EXPR','IN (''<schema_name>'')'); -- To migrate one selected schema DBMS_DATAPUMP.METADATA_FILTER (v_hdnl, 'EXCLUDE_PATH_EXPR','IN (''STATISTICS'')'); -- To prevent gathering Statistics during the import DBMS_DATAPUMP.SET_PARAMETER (handle => v_hdnl, name => 'FLASHBACK_SCN', value => <CURRENT_SCN_VALUE_IN_SOURCE_DATABASE>); -- SCN required for AWS DMS CDC only task. DBMS_DATAPUMP.SET_PARALLEL (handle => v_hdnl, degree => 4); -- Number of parallel processes performing export and import DBMS_DATAPUMP.START_JOB(v_hdnl); END; /
Code 4: Full-load migration before CDC-only task, multiple application schemas
DECLARE v_hdnl NUMBER; BEGIN v_hdnl := DBMS_DATAPUMP.OPEN (operation => 'IMPORT', job_mode => 'SCHEMA', remote_link => '<DB LINK Name to Source Database>', job_name => null); DBMS_DATAPUMP.ADD_FILE (handle => v_hdnl, filename => 'import_01.log', directory => 'DATA_PUMP_DIR', filetype => dbms_datapump.ku$_file_type_log_file); DBMS_DATAPUMP.METADATA_FILTER (v_hdnl, 'SCHEMA_LIST', '''<SCHEMA_1>'',''<SCHEMA_2>'', ''<SCHEMA_3>'''); -- To migrate multiple schemas DBMS_DATAPUMP.METADATA_FILTER (v_hdnl, 'EXCLUDE_PATH_EXPR','IN (''STATISTICS'')'); -- To prevent gathering Statistics during the import DBMS_DATAPUMP.SET_PARAMETER (handle => v_hdnl, name => 'FLASHBACK_SCN', value => <CURRENT_SCN_VALUE_IN_SOURCE_DATABASE>); -- SCN required for AWS DMS CDC only task. DBMS_DATAPUMP.SET_PARALLEL (handle => v_hdnl, degree => 4); -- Number of parallel processes performing export and import DBMS_DATAPUMP.START_JOB(v_hdnl); END; /
Scenario where a mixed migration approach can work better
In rare scenarios where the source database contains tables with millions of rows and very large-sized LOBSEGMENT columns, this pattern will slow down the migration. Oracle migrates LOBSEGMENTs over the network link one at a time. It extracts a single row (along with the LOB column data) from the source table, and inserts the row into the target table, repeating the process until all rows are migrated. Oracle Data Pump over the database link doesn’t support bulk load or direct path load mechanisms for LOBSEGMENTs.
In this situation, we recommend the following:
Skip the identified tables during the Oracle Data Pump migration by adding the following metadata filter.
dbms_datapump.metadata_filter(handle =>h1, name=>'NAME_EXPR', value => 'NOT IN (''TABLE_1'',''TABLE_2'')');
Use an AWS DMS task (full-load migration, with CDC replication if required) to migrate the identified tables. AWS DMS will extract multiple rows from the source Oracle database, and insert them in a batch to the target Amazon RDS instance, which improves performance.