AWS Database Migration Service
User Guide (Version API Version 2016-01-01)

Using an Amazon Elasticsearch Service Cluster as a Target for AWS Database Migration Service

You can use AWS DMS to migrate data to Amazon Elasticsearch Service (Amazon ES). Amazon ES is a managed service that makes it easy to deploy, operate, and scale an Elasticsearch cluster.

In Elasticsearch, you work with indexes and documents. An index is a collection of documents, and a document is a JSON object containing scalar values, arrays, and other objects. Elasticsearch provides a JSON-based query language, so that you can query data in an index and retrieve the corresponding documents.

When AWS DMS creates indexes for a target endpoint for Amazon Elasticsearch Service, it creates one index for each table from the source endpoint. The cost for creating an Elasticsearch index depends on several factors. These are the number of indexes created, the total amount of data in these indexes, and the small amount of metadata that Elasticsearch stores for each document.

You must use AWS Database Migration Service engine version 3.1.2 or higher to migrate data to Amazon Elasticsearch Service.

Configure your Elasticsearch cluster with compute and storage resources that are appropriate for the scope of your migration. We recommend that you consider the following factors, depending on the replication task you want to use:

  • For a full data load, consider the total amount of data that you want to migrate, and also the speed of the transfer.

  • For replicating ongoing changes, consider the frequency of updates, and your end-to-end latency requirements.

Also, configure the index settings on your Elasticsearch cluster, paying close attention to the shard and replica count.

Migrating from a Relational Database Table to an Amazon ES Index

AWS DMS supports migrating data to Elasticsearch's scalar data types. When migrating from a relational database like Oracle or MySQL to Elasticsearch, you might want to restructure how you store this data.

AWS DMS supports the following Elasticsearch scalar data types:

  • Boolean

  • Date

  • Float

  • Int

  • String

AWS DMS converts data of type Date into type String. You can specify custom mapping to interpret these dates.

AWS DMS doesn't support migration of LOB data types.

Prerequisites for Using Amazon Elasticsearch Service as a Target for AWS Database Migration Service

Before you begin work with an Elasticsearch database as a target for AWS DMS, make sure that you create an AWS Identity and Access Management (IAM) role. This role should let AWS DMS access the Elasticsearch indexes at the target endpoint. The minimum set of access permissions is shown in the following sample role policy.

{ "Version": "2012-10-17", "Statement": [ { "Sid": "1", "Effect": "Allow", "Principal": { "Service": "dms.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }

The role that you use for the migration to Elasticsearch must have the following permissions.

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "es:ESHttpDelete", "es:ESHttpGet", "es:ESHttpHead", "es:ESHttpPost", "es:ESHttpPut" ], "Resource": "arn:aws:es:region:account-id:domain/domain-name/*" } ] }

In the preceding example, replace region with the AWS Region identifier, account-id with your AWS account ID, and domain-name with the name of your Amazon Elasticsearch Service domain. An example is arn:aws:es:us-west-2:123456789012:domain/my-es-domain

Extra Connection Attributes When Using Elasticsearch as a Target for AWS DMS

When you set up your Elasticsearch target endpoint, you can specify extra connection attributes. Extra connection attributes are specified by key-value pairs and separated by semicolons.

The following table describes the extra connection attributes available when using an Elasticsearch instance as an AWS DMS source.

Attribute Name Valid Values Default Value and Description

fullLoadErrorPercentage

A positive integer greater than 0 but no larger than 100.

10 – For a full load task, this attribute determines the threshold of errors allowed before the task fails. For example, suppose that there are 1,500 rows at the source endpoint and this parameter is set to 10. Then the task fails if AWS DMS encounters more than 150 errors (10 percent of the row count) when writing to the target endpoint.

errorRetryDuration

A positive integer greater than 0.

300 – If an error occurs at the target endpoint, AWS DMS retries for this many seconds. Otherwise, the task fails.

Limitations When Using Amazon Elasticsearch Service as a Target for AWS Database Migration Service

The following limitations apply when using Amazon Elasticsearch Service as a target:

  • AWS DMS only supports replication of tables with noncomposite primary keys. The primary key of the source table must consist of a single column.

  • Elasticsearch uses dynamic mapping (auto guess) to determine the data types to use for migrated data.

  • Elasticsearch stores each document with a unique ID. The following is an example ID.

    "_id": "D359F8B537F1888BC71FE20B3D79EAE6674BE7ACA9B645B0279C7015F6FF19FD"

    Each document ID is 64 bytes long, so anticipate this as a storage requirement. For example, if you migrate 100,000 rows from an AWS DMS source, the resulting Elasticsearch index requires storage for an additional 6,400,000 bytes.

  • With Amazon ES, you can't make updates to the primary key attributes. This restriction is important when using ongoing replication with change data capture (CDC) because it can result in unwanted data in the target. In CDC mode, primary keys are mapped to SHA256 values, which are 32 bytes long. These are converted to human-readable 64-byte strings, and are used as Elasticsearch document IDs.

  • If AWS DMS encounters any items that can't be migrated, it writes error messages to Amazon CloudWatch Logs. This behavior differs from that of other AWS DMS target endpoints, which write errors to an exceptions table.

Target Data Types for Amazon Elasticsearch Service

When AWS DMS migrates data from heterogeneous databases, the service maps data types from the source database to intermediate data types called AWS DMS data types. The service then maps the intermediate data types to the target data types. The following table shows each AWS DMS data type and the data type it maps to in Elasticsearch.

AWS DMS Data Type Elasticsearch Data Type

Boolean

boolean

Date

string

Time

date

Timestamp

date

INT4

integer

Real4

float

UINT4

integer

For additional information about AWS DMS data types, see Data Types for AWS Database Migration Service.