Menu
AWS Schema Conversion Tool
User Guide (Version 1.0)

Using Data Extraction Agents

You can use data extraction agents to extract data from your on-premises data warehouse and migrate it to Amazon Redshift. To manage the data extraction agents, you can use AWS SCT. Data extraction agents can work in the background while AWS SCT is closed. After your agents extract your data, they upload the data to Amazon S3 and then copy the data into Amazon Redshift.

The following diagram shows the supported scenario.


            Extraction agent architecture

Data extraction agents are currently supported for the following source data warehouses:

  • Greenplum Database (version 4.3 and later)

  • Microsoft SQL Server (version 2008 and later)

  • Netezza (version 7.0.3 and later)

  • Oracle (version 10 and later)

  • Teradata (version 13 and later)

  • Vertica (version 7.2.2 and later)

Amazon S3 Settings

After your agents extract your data, they upload it to your Amazon S3 bucket. Before you continue, you must provide the credentials to connect to your AWS account and your Amazon S3 bucket. You store your credentials and bucket information in a profile in the global application settings, and then associate the profile with your AWS SCT project. If necessary, choose Global Settings to create a new profile. For more information, see Storing AWS Profiles in the AWS Schema Conversion Tool.

Security Settings

The AWS Schema Conversion Tool and the extraction agents can communicate through Secure Sockets Layer (SSL). To enable SSL, set up a trust store and key store.

To set up secure communication with your extraction agent

  1. Start the AWS Schema Conversion Tool.

  2. Open the Settings menu, and then choose Global Settings. The Global settings dialog box appears.

    Choose the Security tab as shown following.

    
                        The Security tab on the Global Settings dialog box
  3. Choose Generate Trust and Key Store, or choose Select existing Trust and Key Store.

    If you choose Generate Trust and Key Store, you then specify the name and password for the trust and key stores, and the path to the location for the generated files. You use these files in later steps.

    If you choose Select existing Trust and Key Store, you then specify the password and file name for the trust and key stores. You use these files in later steps.

  4. After you have specified the trust store and key store, choose OK to close the Global Settings dialog box.

Installing Extraction Agents

We recommend that you install multiple extraction agents on individual computers, separate from the computer that is running the AWS Schema Conversion Tool.

Extraction agents are currently supported on the following operating systems:

  • macOS

  • Microsoft Windows

  • Red Hat Enterprise Linux (RHEL) 6.0

  • Ubuntu Linux (version 14.04 and later)

Use the following procedure to install extraction agents. Repeat this procedure for each computer that you want to install an extraction agent on.

To install an extraction agent

  1. If you have not already downloaded the AWS SCT installer file, follow the instructions at Installing and Updating the AWS Schema Conversion Tool to download it. The .zip file that contains the AWS SCT installer file also contains the extraction agent installer file.

  2. Locate the installer file for your extraction agent in a subfolder named agents. The correct file for the operating system of the computer that you want to install the extraction agent on is shown following.

    Operating System File Name

    macOS

    aws-schema-conversion-tool-extractor-1.0.build-number.dmg

    Microsoft Windows

    aws-schema-conversion-tool-extractor-1.0.build-number.msi

    RHEL

    aws-schema-conversion-tool-extractor-1.0.build-number.x86_64.rpm

    Ubuntu Linux

    aws-schema-conversion-tool-extractor-1.0.build-number.deb

  3. To install the extraction agent on a separate computer, copy the installer file to the new computer.

  4. Run the installer file. Use the instructions for your operating system, shown following.

    Operating System Install Instructions

    macOS

    In Finder, open aws-schema-conversion-tool-extractor-1.0.build-number.dmg.

    Drag aws-schema-conversion-tool-extractor-1.0.build-number.dmg to the Applications folder.

    Microsoft Windows

    Double-click the file to run the installer.

    RHEL

    Run the following command in the folder that you downloaded or moved the file to:

    sudo rpm -ivh aws-schema-conversion-tool-extractor-1.0.build-number.x86_64.rpm

    Ubuntu Linux

    Run the following command in the folder that you downloaded or moved the file to:

    sudo dpkg -i aws-schema-conversion-tool-extractor-1.0.build-number.deb

  5. Install the Java Database Connectivity (JDBC) drivers for your source database engine. For instructions and download links, see Installing the Required Database Drivers. Follow the instructions for your source database engine only, not your target database engine.

  6. Copy the SSL trust and key stores (.zip or individual files) that you generated in an earlier procedure. If you copy the .zip file to a new computer, extract the individual files from the .zip file on the new computer.

    You can put the files anywhere you want, but note the locations because in a future procedure you tell the agent where to find the files.

Continue installing your extraction agent by completing the procedure in the following section.

Configuring Extraction Agents

Use the following procedure to configure extraction agents. Repeat this procedure on each computer that has an extraction agent installed.

To configure your extraction agent

  • From the location where you installed the agent, run the setup program. For RHEL and Ubuntu, the file is named sct-extractor-setup.sh. For macOS and Microsoft Windows, the file is named AWS SCT Data Extractor Agent, and you can double-click the file to run it.

    The setup program prompts you for information. For each prompt, a default value appears. You can accept the default value, or type a new value. You specify the following information:

    • The data warehouse engine.

    • The port number the agent listens on.

    • The location where you installed the JDBC drivers.

    • The working folder. Your extracted data goes into a subfolder of this location. The working folder can be on a different computer from the agent, and a single working folder can be shared by multiple agents on different computers.

    • The location of the key store file.

    • The password for the key store.

    • The location of the trust store file.

    • The password for the trust store.

The setup program updates the settings file for the extraction agent. The settings file is named Settings.properties, and is located where you installed the extraction agent. The following is a sample settings file.

port=8888
vendor=ORACLE
driver.jars=<driver path>/Install/Drivers/ojdbc7.jar
location=<output path>/dmt/8888/out
extractor.log.folder=<log path>/dmt/8888/log
extractor.storage.folder=<storage path>/dmt/8888/storage
extractor.start.fetch.size=20000
extractor.out.file.size=10485760
ssl.option=OFF
#ssl.option=ON
#ssl.keystore.path=<key store path>/dmt/8888/vault/keystore
#ssl.truststore.path=<trust store path>/dmt/8888/vault/truststore

Starting Extraction Agents

Use the following procedure to start extraction agents. Repeat this procedure on each computer that has an extraction agent installed.

Extraction agents act as listeners. When you start an agent with this procedure, the agent starts listening for instructions. You send the agents instructions to extract data from your data warehouse in a later section.

To start your extraction agent

  • On the computer that has the extraction agent installed, run the command listed following for your operating system.

    Operating System Start Command

    macOS

    Run the StartAgent.command file.

    Microsoft Windows

    Double-click the StartAgent.bat batch file.

    RHEL

    Run the following command in the path to the folder that you installed the agent:

    sudo initctl start sct-extractor

    Ubuntu Linux

    Run the following command in the path to the folder that you installed the agent. Use the command appropriate for your version of Ubuntu.

    Ubuntu 14.04: sudo initctl start sct-extractor

    Ubuntu 15.04 and later: sudo systemctl start sct-extractor

To check the status of the agent, run the same command but replace start with status.

To stop an agent, run the same command but replace start with stop.

Registering Extraction Agents

You manage your extraction agents by using AWS SCT. The extraction agents act as listeners. When they receive instructions from AWS SCT, they extract data from your data warehouse.

Use the following procedure to register extraction agents with your AWS SCT project.

To register an extraction agent

  1. Start the AWS Schema Conversion Tool, and open a project.

    1. Open an existing or create a new project. For more information, see Creating an AWS Schema Conversion Tool Project.

    2. Connect to your source database. For more information, see Connecting to Your Source Database.

    3. Connect to your target database. For more information, see Connecting to Your Target Database.

    4. Convert your schema. For more information, see Converting Data Warehouse Schema to Amazon Redshift by Using the AWS Schema Conversion Tool.

    5. Apply your schema. For more information, see Saving and Applying Your Converted Schema in the AWS Schema Conversion Tool.

  2. Open the View menu, and then choose Data Migration View. The Agents tab appears. If you have previously registered agents, they appear in a grid at the top of the tab as shown following.

    
                        Agents grid
  3. Choose Register. The New Agent Registration dialog box appears.

    Note

    After you register an agent with an AWS SCT project, you can't register the same agent with a different project. If you're no longer using an agent in an AWS SCT project, you can unregister it, and then register it with a different project.

  4. Enter your information in the New Agent Registration dialog box:

    1. For Description, type a description of the agent.

    2. For Host Name, type the host name or IP address of the computer of the agent.

    3. For Port, type the port number that the agent is listening on.

    4. Choose Register to register the agent with your AWS SCT project.

  5. Repeat the previous steps to register multiple agents with your AWS SCT project.

Filtering Data Before Extraction

Before you extract your data, you can set up filters that reduce the amount of data that you extract. You can create data extraction filters by using where clauses to reduce the data that you extract. For more information, see Creating Data Extraction Filters in the AWS Schema Conversion Tool.

Managing Extraction Tasks

Use the following procedures to create, run, and monitor data extraction tasks.

To assign tasks to agents and migrate data

  1. In the AWS Schema Conversion Tool, after you have converted your schema, choose one or more tables from the left panel of your project. You can choose all tables, but we recommend against that for performance reasons. We recommend that you create multiple tasks for multiple tables based on the size of the tables in your data warehouse.

  2. Open the context (right-click) menu for the tables, and then choose Create Task. The Create Local Task dialog box opens.

    1. For Task Name, type a name for the task.

    2. For Migration Mode, choose one of the following:

      • Extract Only – Extract your data, and save the data to your local working folders.

      • Extract and Upload – Extract your data, and upload your data to Amazon S3.

      • Extract, Upload and Copy – Extract your data, upload your data to Amazon S3, and copy it into your Amazon Redshift data warehouse.

    3. If you want to see detailed information about a task, select Enable Task Logging. You can use the task log to debug problems.

      If you enable task logging, choose the level of detail that you want to see. The levels are the following, with each level including all messages from the previous level:

      • ERROR — The smallest amount of detail.

      • WARNING

      • INFO

      • DEBUG

      • TRACE — The largest amount of detail.

    4. Choose Test Task to verify that you can connect to your working folder, Amazon S3 bucket, and Amazon Redshift data warehouse. The verification depends on the migration mode you chose.

    5. Choose Create to create the task.

  3. Repeat the previous steps to create tasks for all the data that you want to migrate.

To run and monitor tasks

  1. Open the View menu, and then choose Data Migration View. The Agents tab appears.

  2. Choose the Tasks tab. Your tasks appear in the grid at the top as shown following.

    
                        Tasks grid
  3. Select a task in the top grid and expand it. Depending on the migration mode you chose, you see the task divided into Extract, Upload, and Copy.

  4. Select a task in the top grid. You can see the status of the task in the top grid, and the status of its subtasks in the bottom grid.

  5. To run a task, choose Start. You can monitor the status of your tasks while they work. The subtasks run in parallel. The extract, upload, and copy also run in parallel.

  6. If you enabled logging when you set up the task, you can view the log.

    1. Choose Download Log. A message appears with the name of the folder that contains the log file. Dismiss the message.

    2. A link appears in the Task details tab. Choose the link to open the folder that contains the log file.

You can close AWS SCT, and your agents and tasks continue to run. You can reopen AWS SCT later to check the status of your tasks and view the task logs.

Extraction Task Output

After your migration tasks complete, your data is ready. Use the following information to determine how to proceed based on the migration mode you chose and the location of your data.

Migration Mode Data Location

Extract, Upload and Copy

The data is already in your Amazon Redshift data warehouse. You can verify that the data is there, and start using it. For more information, see Connecting to Clusters From Client Tools and Code.

Extract and Upload

The extraction agents saved your data as files in your Amazon S3 bucket. You can use the Amazon Redshift COPY command to load your data to Amazon Redshift. For more information, see Loading Data from Amazon S3 in the Amazon Redshift documentation.

There are multiple folders in your Amazon S3 bucket, corresponding to the extraction tasks that you set up. When you load your data to Amazon Redshift, specify the name of the manifest file created by each task. The manifest file appears in the task folder in your S3 bucket as shown following.


                                    File list in S3 bucket

Extract Only

The extraction agents saved your data as files in your working folder. Manually copy your data to your Amazon S3 bucket, and then proceed with the instructions for Extract and Upload.

Best Practices and Troubleshooting

The following are some best practices and troubleshooting suggestions for using extraction agents.

Issue Troubleshooting Suggestions

Performance is slow

To improve performance, we recommend the following:

  • Install multiple agents.

  • Install agents on computers close to your data warehouse.

  • Don't run all tables on a single agent task.

Contention delays

Avoid having too many agents accessing your data warehouse at the same time.

An agent goes down temporarily

If an agent is down, the status of each of its tasks appears as failed in AWS SCT. If you wait, in some cases the agent can recover. In this case, the status of its tasks updates in AWS SCT.

An agent goes down permanently

If the computer running an agent goes down permanently, and that agent is running a task, you can substitute a new agent to continue the task. You can substitute a new agent only if the working folder of the original agent was not on the same computer as the original agent. To substitute a new agent, do the following:

  • Install an agent on a new computer.

  • Configure the new agent with the same settings, including port number and working folder, as the original agent.

  • Start the agent. After the agent starts, the task discovers the new available agent and continues running on the new agent.