Migrate Oracle CLOB values to individual rows in PostgreSQL on AWS - AWS Prescriptive Guidance

Migrate Oracle CLOB values to individual rows in PostgreSQL on AWS

Created by Sai Krishna Namburu (AWS) and Sindhusha Paturu (AWS)

Environment: PoC or pilot

Source: Oracle Database

Target: Aurora PostgreSQL-Compatible or Amazon RDS for PostgreSQL

R Type: Replatform

Workload: Oracle; Open-source

Technologies: Migration; Storage & backup; Databases

AWS services: Amazon Aurora; AWS DMS; Amazon S3; Amazon RDS

Summary

This pattern describes how to split Oracle character large object (CLOB) values into individual rows in Amazon Aurora PostgreSQL-Compatible Edition and Amazon Relational Database Service (Amazon RDS) for PostgreSQL. PostgreSQL doesn’t support the CLOB data type.

Tables with interval partitions are identified in the source Oracle database, and the table name, the type of partition, the interval of the partition, and other metadata are captured and loaded into the target database. You can load CLOB data that is less than 1 GB in size into target tables as text by using AWS Database Migration Service (AWS DMS), or you can export the data in CSV format, load it into an Amazon Simple Storage Service (Amazon S3) bucket, and migrate it to your target PostgreSQL database.

After migration, you can use the custom PostgreSQL code that is provided with this pattern to split the CLOB data into individual rows based on the new line character identifier (CHR(10)) and populate the target table. 

Prerequisites and limitations

Prerequisites 

  • An Oracle database table that has interval partitions and records with a CLOB data type.

  • An Aurora PostgreSQL-Compatible or Amazon RDS for PostgreSQL database that has a table structure that’s similar to the source table (same columns and data types).

Limitations 

  • The CLOB value cannot exceed 1 GB.

  • Each row in the target table must have a new line character identifier.

Product versions

  • Oracle 12c

  • Aurora Postgres 11.6

Architecture

The following diagram shows a source Oracle table with CLOB data, and the equivalent PostgreSQL table in Aurora PostgreSQL-Compatible version 11.6.

Source CLOB table and equivalent target PostgreSQL table.

Tools

AWS services

Other tools

You can use the following client tools to connect to, access, and manage your Aurora PostgreSQL-Compatible and Amazon RDS for PostgreSQL databases. (These tools aren’t used within this pattern.)

  • pgAdmin is an open-source management tool for PostgreSQL. It provides a graphical interface that helps you create, maintain, and use database objects.

  • DBeaver is an open-source database tool for developers and database administrators. You can use the tool to manipulate, monitor, analyze, administer, and migrate your data.

Best practices

For best practices for migrating your database from Oracle to PostgreSQL, see the AWS blog post Best practices for migrating an Oracle database to Amazon RDS PostgreSQL or Amazon Aurora PostgreSQL: Migration process and infrastructure considerations.

For best practices for configuring the AWS DMS task for migrating large binary objects, see Migrating large binary objects (LOBs) in the AWS DMS documentation.

Epics

TaskDescriptionSkills required

Analyze the CLOB data.

In the source Oracle database, analyze the CLOB data to see whether it contains column headers, so you can determine the method for loading the data into the target table. 

To analyze the input data, use the following query.

SELECT * FROM clobdata_or;  

Developer

Load the CLOB data to the target database.

Migrate the table that has CLOB data to an interim (staging) table in the Aurora or Amazon RDS target database. You can use AWS DMS or upload the data as a CSV file to an Amazon S3 bucket.

For information about using AWS DMS for this task, see Using an Oracle database as a source and Using a PostgreSQL database as a target in the AWS DMS documentation.

For information about using Amazon S3 for this task, see Using Amazon S3 as a target in the AWS DMS documentation.

Migration engineer, DBA

Validate the target PostgreSQL table.

Validate the target data, including headers, against the source data by using the following queries in the target database.

SELECT * FROM clobdata_pg; SELECT * FROM clobdatatarget;

Compare the results against the query results from the source database (from the first step).

Developer

Split the CLOB data into separate rows.

Run the custom PostgreSQL code provided in the Additional information section to split the CLOB data and insert it into separate rows in the target PostgreSQL table.

Developer
TaskDescriptionSkills required

Validate the data in the target table.

Validate the data inserted into the target table by using the following queries.

SELECT * FROM clobdata_pg; SELECT * FROM clobdatatarget;
Developer

Related resources

Additional information

PostgreSQL function for splitting CLOB data

do $$ declare totalstr varchar; str1 varchar; str2 varchar; pos1 integer := 1; pos2 integer ; len integer; begin select rawdata||chr(10) into totalstr from clobdata_pg; len := length(totalstr) ; raise notice 'Total length : %',len; raise notice 'totalstr : %',totalstr; raise notice 'Before while loop'; while pos1 < len loop select position (chr(10) in totalstr) into pos2; raise notice '1st position of new line : %',pos2; str1 := substring (totalstr,pos1,pos2-1); raise notice 'str1 : %',str1; insert into clobdatatarget(data) values (str1); totalstr := substring(totalstr,pos2+1,len); raise notice 'new totalstr :%',totalstr; len := length(totalstr) ; end loop; end $$ LANGUAGE 'plpgsql' ;

Input and output examples

You can use the following examples to try out the PostgreSQL code before you migrate your data.

Create an Oracle database with three input lines.

CREATE TABLE clobdata_or ( id INTEGER GENERATED ALWAYS AS IDENTITY, rawdata clob ); insert into clobdata_or(rawdata) values (to_clob('test line 1') || chr(10) || to_clob('test line 2') || chr(10) || to_clob('test line 3') || chr(10)); COMMIT; SELECT * FROM clobdata_or;

This displays the following output.

id

rawdata

1

test line 1 test line 2 test line 3

Load the source data into a PostgreSQL staging table (clobdata_pg) for processing.

SELECT * FROM clobdata_pg; CREATE TEMP TABLE clobdatatarget (id1 SERIAL,data VARCHAR ); <Run the code in the additional information section.> SELECT * FROM clobdatatarget;

This displays the following output.

id1

data

1

test line 1

2

test line 2

3

test line 3