Oracle Data Pump and PostgreSQL pg_dump and pg_restore - Oracle to Aurora PostgreSQL Migration Playbook

Oracle Data Pump and PostgreSQL pg_dump and pg_restore

With AWS DMS, you can migrate data from source databases to target databases using Oracle Data Pump and PostgreSQL pg_dump and pg_restore. Oracle Data Pump is a utility for transferring data between Oracle databases, while PostgreSQL pg_dump and pg_restore create a backup of a PostgreSQL database.

Feature compatibility AWS SCT / AWS DMS automation level AWS SCT action code index Key differences

No compatibility

N/A

N/A

Non-compatible tool.

Oracle usage

Oracle Data Pump is a utility for exporting and importing data from/to an Oracle database. It can be used to copy an entire database, entire schemas, or specific objects in a schema. Oracle Data Pump is commonly used as a part of a backup strategy for restoring individual database objects (specific records, tables, views, stored procedures, and so on) as opposed to snapshots or Oracle RMAN, which provides backup and recovery capabilities at the database level. By default (without using the sqlfile parameter during export), the dump file generated by Oracle Data Pump is binary (it can’t be opened using a text editor).

Oracle Data Pump supports:

  • Export Data from an Oracle database — The Data Pump EXPDP command creates a binary dump file containing the exported database objects. Objects can be exported with data or with metadata only. Exports can be performed for specific timestamps or Oracle SCNs to ensure cross-object consistency.

  • Import Data to an Oracle database — The Data Pump IMPDP command imports objects and data from a specific dump file created with the EXPDP command. The IMPDP command can filter on import (for example, only import certain objects) and remap object and schema names during import.

The term “Logical backup” refers to a dump file created by Oracle Data Pump.

Both EXPDP and IMPDP can only read/write dump files from file system paths that were pre-configured in the Oracle database as directories. During export/import, users must specify the logical directory name where the dump file should be created; not the actual file system path.

Examples

Use EXPDP to export the HR schema.

$ expdp system/**** directory=expdp_dir schemas=hr dumpfile=hr.dmp logfile=hr.log

The command contains the credentials to run Data Pump, the logical Oracle directory name for the dump file location (which maps in the database to a physical file system location), the schema name to export, the dump file name, and log file name.

Use IMPDP to import the HR a schema and rename to HR_COPY.

$ impdp system/**** directory=expdp_dir schemas=hr dumpfile=hr.dmp logfile=hr.log REMAP_SCHEMA=hr:hr_copy

The command contains the database credentials to run Data Pump, the logical Oracle directory for where the export dump file is located, the dump file name, the schema to export, the name for the dump file, the log file name, and the REMAP_SCHEMA parameter

For more information, see Oracle Data Pump in the Oracle documentation.

PostgreSQL usage

PostgreSQL provides native utilities — pg_dump and pg_restore can be used to perform logical database exports and imports with a degree of comparable functionality to the Oracle Data Pump utility. Such as for moving data between two databases and creating logical database backups.

  • pg_dump is an equivalent to Oracle expdp

  • pg_restore is an equivalent to Oracle impdp

Amazon Aurora PostgreSQL supports data export and import using both pg_dump and pg_restore, but the binaries for both utilities will need to be placed on your local workstation or on an Amazon EC2 server as part of the PostgreSQL client binaries.

PostgreSQL dump files created using pg_dump can be copied, after export, to an Amazon S3 bucket as cloud backup storage or for maintaining the desired backup retention policy. Later, when dump files are needed for database restore, the dump files should be copied back to the desktop / server that has a PostgreSQL client (such as your workstation or an Amazon EC2 server) to issue the pg_restore command.

Starting with PostgreSQL 10, the following capabilities were added:

  • A schema can be excluded in pg_dump or pg_restore commands.

  • Can create dumps with no blobs.

  • Allow to run pg_dumpall by non-superusers, using the --no-role-passwords option.

  • Create additional integrity option to ensure that the data is stored in disk using the fsync() method.

Starting with PostgreSQL 11, pg_dump and pg_restore can export and import relationships between extensions and database objects established with ALTER …​ DEPENDS ON EXTENSION, which allows these objects to be dropped when extension is dropped with CASCADE option.

Note

pg_dump will create consistent backups even if the database is being used concurrently. pg_dump doesn’t block other users accessing the database (readers or writers).pg_dump only exports a single database, in order to backup global objects that are common to all databases in a cluster, such as roles and tablespaces, use pg_dumpall. PostgreSQL dump files can be both plain-text and custom format files.

Another option to export and import data from PostgreSQL database is to use COPY TO and COPY FROM commands. Starting with PostgreSQL 12 the COPY FROM command, that can be used to load data into DB, has support for filtering incoming rows with WHERE condition.

CREATE TABLE tst_copy(v TEXT);
COPY tst_copy FROM '/home/postgres/file.csv' WITH (FORMAT CSV) WHERE v LIKE '%apple%';

Examples

Export data using pg_dump: Use a workstation or server with the PostgreSQL client installed in order to connect to the Aurora PostgreSQL instance in AWS; providing the hostname (-h), database user name (-U) and database name (-d) while issuing the pg_dump command.

$ pg_dump -h hostname.rds.amazonaws.com -U username -d db_name -f dump_file_name.sql

The output file, dump_file_name.sql, will be stored on the server where the pg_dump command runs. You can later copy the output file to an Amazon S3 Bucket, if needed.

Run pg_dump and copy the backup file to an Amazon S3 bucket using pipe and the AWS CLI.

$ pg_dump -h hostname.rds.amazonaws.com -U username -d db_name -f dump_file_name.sql | aws s3 cp - s3://pg-backup/pg_bck-$(date"+%Y-%m-%d-%H-%M-%S")

Restore data with pg_restore. Use a workstation or server with the PostgreSQL client installed to connect to the Aurora PostgreSQL instance providing the hostname (-h), database user name (-U), database name (-d) and the dump file to restore from while issuing the pg_restore command.

$ pg_restore -h hostname.rds.amazonaws.com -U username -d dbname_restore dump_file_name.sql

Copy the output file from the local server to an Amazon S3 Bucket using the AWS CLI. Upload the dump file to Amazon S3 bucket.

$ aws s3 cp /usr/Exports/hr.dmp s3://my-bucket/backup-$(date "+%Y-%m-%d-%H-%M-%S")

The {-$(date "+%Y-%m-%d-%H-%M-%S")} format is valid on Linux servers only.

Download the output file from the Amazon S3 bucket.

$ aws s3 cp s3://my-bucket/backup-2017-09-10-01-10-10 /usr/Exports/hr.dmp

You can create a copy of an existing database without having to use pg_dump or pg_restore. Instead, use the template keyword to signify the database used as the source.

CREATE DATABASE mydb_copy TEPLATE mydb;

Summary

Description Oracle Data Pump PostgreSQL Dump

Export data to a local file

expdp system/***
schemas=hr
dumpfile=hr.dmp
logfile=hr.log
pg_dump -F c -h
hostname.rds.amazonaws.com
-U username -d hr
-p 5432 > c:\Export\hr.dmp

Export data to a remote file

Create Oracle directory on remote storage mount or NFS directory called EXP_DIR. Use the export command:

expdp system/***
schemas=hr directory=EXP_DIR
dumpfile=hr.dmp logfile=hr.log

Export:

pg_dump -F c
-h hostname.rds.amazonaws.com
-U username -d hr
-p 5432 > c:\Export\hr.dmp

Upload to Amazon S3

aws s3 cp
c:\Export\hr.dmp
s3://my-bucket/backup-$(date"+%Y-%m-%d-%H-%M-%S")

Import data to a new database with a new name

impdp system/***
schemas=hr dumpfile=hr.dmp
logfile=hr.log
REMAP_SCHEMA=hr:hr_copy
TRANSFORMM=OID:N
pg_restore
-h hostname.rds.amazonaws.com
-U hr -d hr_restore
-p 5432 c:\Expor\hr.dmp

Exclude schemas

expdp system/*** FULL=Y
directory=EXP_DIR
dumpfile=hr.dmp
logfile=hr.log
exclude=SCHEMA:"HR"
pg_dump -F c
-h hostname.rds.amazonaws.com
-U username -d hr -p 5432
-N 'log_schema'
c:\Export\hr_nolog.dmp

For more information, see SQL Dump and pg_restore in the PostgreSQL documentation.