AWS Database Migration Service
User Guide (Version API Version 2016-01-01)

Using Amazon DocumentDB as a Target for AWS Database Migration Service

You can use AWS DMS to migrate data to Amazon DocumentDB (with MongoDB compatibility) from any of the source data engines that AWS DMS supports. The source engine can be on an Amazon-managed service such as Amazon RDS, Aurora, or Amazon S3. Alternatively, the engine can be on a self-managed database, such as MongoDB running on Amazon EC2 or on-premises.

Note

Support for Amazon DocumentDB (with MongoDB compatibility) as a target is available in AWS DMS versions 3.1.3 and later.

You can use AWS DMS to replicate source data to Amazon DocumentDB databases, collections. or documents.

If the source endpoint is MongoDB, make sure to enable the following extra connection attributes:

  • nestingLevel=NONE

  • extractDocID=false

For more information, see Extra Connection Attributes When Using MongoDB as a Source for AWS DMS.

MongoDB stores data in a binary JSON format (BSON). AWS DMS supports all of the BSON data types that are supported by Amazon DocumentDB. For a list of these data types, see Supported MongoDB APIs, Operations, and Data Types in the Amazon DocumentDB Developer Guide.

If the source endpoint is a relational database, AWS DMS maps database objects to Amazon DocumentDB as follows:

  • A relational database, or database schema, maps to an Amazon DocumentDB database.

  • Tables within a relational database map to collections in Amazon DocumentDB.

  • Records in a relational table map to documents in Amazon DocumentDB. Each document is constructed from data in the source record.

If the source endpoint is Amazon S3, then the resulting Amazon DocumentDB objects correspond to AWS DMS mapping rules for Amazon S3. For example, consider the following URI.

s3://mybucket/hr/employee

In this case, AWS DMS maps the objects in mybucket to Amazon DocumentDB as follows:

  • The top-level URI part (hr) maps to an Amazon DocumentDB database.

  • The next URI part (employee) maps to an Amazon DocumentDB collection.

  • Each object in employee maps to a document in Amazon DocumentDB.

For more information on mapping rules for Amazon S3, see Using Amazon S3 as a Source for AWS DMS.

To help increase the speed of the transfer, AWS DMS supports a multithreaded full load to an Amazon DocumentDB target instance. DMS supports this multithreading with task settings that include the following:

  • MaxFullLoadSubTasks – Use this option to indicate the maximum number of source tables to load in parallel. DMS loads each table into its corresponding Amazon DocumentDB target table using a dedicated subtask. The default is 8; the maximum value is 49.

  • ParallelLoadThreads – Use this option to specify the number of threads that AWS DMS uses to load each table into its Amazon DocumentDB target table. The maximum value for an Amazon DocumentDB target is 32. You can ask to have this maximum limit increased.

  • ParallelLoadBufferSize – Use this option to specify the maximum number of records to store in the buffer that the parallel load threads use to load data to the Amazon DocumentDB target. The default value is 50. The maximum value is 1,000. Use this setting with ParallelLoadThreads. ParallelLoadBufferSize is valid only when there is more than one thread.

  • Table-mapping settings for individual tables – Use table-settings rules to identify individual tables from the source that you want to load in parallel. Also use these rules to specify how to segment the rows of each table for multithreaded loading. For more information, see Table-Settings Rules and Operations.

    Note

    DMS assigns each segment of a table to its own thread for loading. Therefore, set ParallelLoadThreads to the maximum number of segments that you specify for a table in the source.

For additional details on working with Amazon DocumentDB as a target for AWS DMS, including a walkthrough of the migration process, see the following sections.

Mapping Data from a Source to an Amazon DocumentDB Target

AWS DMS reads records from the source endpoint, and constructs JSON documents based on the data it reads. For each JSON document, AWS DMS must determine an _id field to act as a unique identifier. It then writes the JSON document to an Amazon DocumentDB collection, using the _id field as a primary key.

Source Data That Is a Single Column

If the source data consists of a single column, the data must be of a string type. (Depending on the source engine, the actual data type might be VARCHAR, NVARCHAR, TEXT, LOB, CLOB, or similar.) AWS DMS assumes that the data is a valid JSON document, and replicates the data to Amazon DocumentDB as is.

If the resulting JSON document contains a field named _id, then that field is used as the unique _id in Amazon DocumentDB.

If the JSON doesn't contain an _id field, then Amazon DocumentDB generates an _id value automatically.

Source Data That Is Multiple Columns

If the source data consists of multiple columns, then AWS DMS constructs a JSON document from all of these columns. To determine the _id field for the document, AWS DMS proceeds as follows:

  • If one of the columns is named _id, then the data in that column is used as the target_id.

  • If there is no _id column, but the source data has a primary key or a unique index, then AWS DMS uses that key or index value as the _id value. The data from the primary key or unique index also appears as explicit fields in the JSON document.

  • If there is no _id column, and no primary key or a unique index, then Amazon DocumentDB generates an _id value automatically.

Coercing a Data Type at the Target Endpoint

AWS DMS can modify data structures when it writes to an Amazon DocumentDB target endpoint. You can request these changes by renaming columns and tables at the source endpoint, or by providing transformation rules that are applied when a task is running.

Using a Nested JSON Document (json_ Prefix)

To coerce a data type, you can prefix the source column name with json_ (that is, json_columnName) either manually or using a transformation. In this case, the column is created as a nested JSON document within the target document, rather than as a string field.

For example, suppose that you want to migrate the following document from a MongoDB source endpoint.

{ "_id": "1", "FirstName": "John", "LastName": "Doe", "ContactDetails": "{"Home": {"Address": "Boston","Phone": "1111111"},"Work": { "Address": "Boston", "Phone": "2222222222"}}" }

If you don't coerce any of the source data types, the embedded ContactDetails document is migrated as a string.

{ "_id": "1", "FirstName": "John", "LastName": "Doe", "ContactDetails": "{\"Home\": {\"Address\": \"Boston\",\"Phone\": \"1111111\"},\"Work\": { \"Address\": \"Boston\", \"Phone\": \"2222222222\"}}" }

However, you can add a transformation rule to coerce ContactDetails to a JSON object. For example, suppose that the original source column name is ContactDetails. Suppose also that the renamed source column is to be json_ContactDetails. AWS DMS replicates the ContactDetails field as nested JSON, as follows.

{ "_id": "1", "FirstName": "John", "LastName": "Doe", "ContactDetails": { "Home": { "Address": "Boston", "Phone": "1111111111" }, "Work": { "Address": "Boston", "Phone": "2222222222" } } }

Using a JSON Array (array_ Prefix)

To coerce a data type, you can prefix a column name with array_ (that is, array_columnName), either manually or using a transformation. In this case, AWS DMS considers the column as a JSON array, and creates it as such in the target document.

Suppose that you want to migrate the following document from a MongoDB source endpoint.

{ "_id" : "1", "FirstName": "John", "LastName": "Doe",
 "ContactAddresses": ["Boston", "New York"],
 "ContactPhoneNumbers": ["1111111111", "2222222222"] }

If you don't coerce any of the source data types, the embedded ContactDetails document is migrated as a string.

{ "_id": "1", "FirstName": "John", "LastName": "Doe",
 "ContactAddresses": "[\"Boston\", \"New York\"]",
 "ContactPhoneNumbers": "[\"1111111111\", \"2222222222\"]"
 }

However, you can add transformation rules to coerce ContactAddress and ContactPhoneNumbers to JSON arrays, as shown in the following table.

Original Source Column Name Renamed Source Column
ContactAddress array_ContactAddress
ContactPhoneNumbers array_ContactPhoneNumbers

AWS DMS replicates ContactAddress and ContactPhoneNumbers as follows.

{ "_id": "1", "FirstName": "John", "LastName": "Doe", "ContactAddresses": [ "Boston", "New York" ], "ContactPhoneNumbers": [ "1111111111", "2222222222" ] }

Connecting to Amazon DocumentDB Using TLS

By default, a newly created Amazon DocumentDB cluster accepts secure connections only using Transport Layer Security (TLS). When TLS is enabled, every connection to Amazon DocumentDB requires a public key.

You can download the public key for Amazon DocumentDB as the rds-combined-ca-bundle.pem file from an AWS-hosted Amazon S3 bucket.

After you download this .pem file, you can import the file into AWS DMS as described following.

AWS Management Console

To import the public key (.pem) file

  1. Open the AWS DMS console at https://console.aws.amazon.com/dms.

  2. In the navigation pane, choose Certificates.

  3. Choose Import certificate and do the following:

    • For Certificate identifier, enter a unique name for the certificate, for example docdb-cert.

    • For Import file, navigate to the location where you saved the .pem file.

    When the settings are as you want them, choose Add new CA certificate.

AWS CLI

Use the aws dms import-certificate command, as shown in the following example.

aws dms import-certiciate \ --certificate-identifier docdb-cert \ --certificate-pem file://./rds-combined-ca-bundle.pem

When you create an AWS DMS target endpoint, provide the certificate identifier (for example, docdb-cert). Also, set the SSL mode parameter to verify-full.

Ongoing Replication with Amazon DocumentDB as a Target

If ongoing replication is enabled, AWS DMS ensures that documents in Amazon DocumentDB stay in sync with the source. When a source record is created or updated, AWS DMS must first determine which Amazon DocumentDB record is affected by doing the following:

  • If the source record has a column named _id, the value of that column determines the corresponding _id in the Amazon DocumentDB collection.

  • If there is no _id column, but the source data has a primary key or unique index, then AWS DMS uses that key or index value as the _id for the Amazon DocumentDB collection.

  • If the source record doesn't have an _id column, a primary key, or a unique index, then AWS DMS matches all of the source columns to the corresponding fields in the Amazon DocumentDB collection.

When a new source record is created, AWS DMS writes a corresponding document to Amazon DocumentDB. If an existing source record is updated, AWS DMS updates the corresponding fields in the target document in Amazon DocumentDB. Any fields that exist in the target document but not in the source record remain untouched.

When a source record is deleted, AWS DMS deletes the corresponding document from Amazon DocumentDB.

Structural Changes (DDL) at the Source

With ongoing replication, any changes to source data structures (such as tables, columns, and so on) are propagated to their counterparts in Amazon DocumentDB. In relational databases, these changes are initiated using data definition language (DDL) statements. You can see how AWS DMS propagates these changes to Amazon DocumentDB in the following table.

DDL at Source Effect at Amazon DocumentDB Target
CREATE TABLE Creates an empty collection.
Statement that renames a table (RENAME TABLE, ALTER TABLE...RENAME, and similar) Renames the collection.
TRUNCATE TABLE Removes all the documents from the collection, but only if HandleSourceTableTruncated is true. For more information, see Task Settings for Change Processing DDL Handling.
DROP TABLE Deletes the collection, but only if HandleSourceTableDropped is true. For more information, see Task Settings for Change Processing DDL Handling.
Statement that adds a column to a table (ALTER TABLE...ADD and similar) The DDL statement is ignored, and a warning is issued. When the first INSERT is performed at the source, the new field is added to the target document.
ALTER TABLE...RENAME COLUMN The DDL statement is ignored, and a warning is issued. When the first INSERT is performed at the source, the newly named field is added to the target document.
ALTER TABLE...DROP COLUMN The DDL statement is ignored, and a warning is issued.
Statement that changes the column data type (ALTER COLUMN...MODIFY and similar) The DDL statement is ignored, and a warning is issued. When the first INSERT is performed at the source with the new data type, the target document is created with a field of that new data type.

Limitations to Using Amazon DocumentDB as a Target

The following limitations apply when using Amazon DocumentDB as a target for AWS DMS:

  • In Amazon DocumentDB, collection names can't contain the dollar symbol ($). In addition, database names can't contain any Unicode characters.

  • AWS DMS doesn't support merging of multiple source tables into a single Amazon DocumentDB collection.

  • When AWS DMS processes changes from a source table that doesn't have a primary key, any LOB columns in that table are ignored.

  • If the Change table option is enabled and AWS DMS encounters a source column named "_id", then that column appears as "__id" (two underscores) in the change table.

  • If you choose Oracle as a source endpoint, then the Oracle source must have full supplemental logging enabled. Otherwise, if there are columns at the source that weren't changed, then the data is loaded into Amazon DocumentDB as null values.

Target Data Types for Amazon DocumentDB

In the following table, you can find the Amazon DocumentDB target data types that are supported when using AWS DMS, and the default mapping from AWS DMS data types. For more information about AWS DMS data types, see Data Types for AWS Database Migration Service.

AWS DMS Data Type

Amazon DocumentDB Data Type

BOOLEAN

Boolean

BYTES

Binary data

DATE

Date

TIME

String (UTF8)

DATETIME

Date

INT1

32-bit integer

INT2

32-bit integer

INT4

32-bit integer

INT8

64-bit integer

NUMERIC

String (UTF8)

REAL4

Double

REAL8

Double

STRING

If the data is recognized as JSON, then AWS DMS migrates it to Amazon DocumentDB as a document. Otherwise, the data is mapped to String (UTF8).

UINT1

32-bit integer

UINT2

32-bit integer

UINT4

64-bit integer

UINT8

String (UTF8)

WSTRING

If the data is recognized as JSON, then AWS DMS migrates it to Amazon DocumentDB as a document. Otherwise, the data is mapped to String (UTF8).

BLOB

Binary

CLOB

If the data is recognized as JSON, then AWS DMS migrates it to Amazon DocumentDB as a document. Otherwise, the data is mapped to String (UTF8).

NCLOB

If the data is recognized as JSON, then AWS DMS migrates it to Amazon DocumentDB as a document. Otherwise, the data is mapped to String (UTF8).