AWS Data Pipeline
Developer Guide (API Version 2012-10-29)
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.Did this page help you?  Yes | No |  Tell us about it...

Copy Data from Amazon S3 to MySQL

This example pipeline definition automatically creates an Amazon EC2 instance that will copy the specified data from a CSV file in Amazon S3 into a MySQL database table. For simplicity, the structure of the example MySQL insert statement assumes that you have a CSV input file with two columns of data that you are writing into a MySQL database table that has two matching columns of the appropriate data type. If you have data of a different scope, you would modify the MySQL statement to include additional data columns or data types.

Example Pipeline Definition

{
  "objects": [
    {
      "id": "Default",
      "logUri": "s3://testbucket/error_log",
      "schedule": {
        "ref": "MySchedule"
      }
    },
    {
      "id": "MySchedule",
      "type": "Schedule",
      "startDateTime": "2012-11-26T00:00:00",
      "endDateTime": "2012-11-27T00:00:00",
      "period": "1 day"
    },
    {
      "id": "MyS3Input",
      "filePath": "s3://testbucket/input_data_file.csv",
      "type": "S3DataNode"
    },
    {
      "id": "MyCopyActivity",
      "input": {
        "ref": "MyS3Input"
      },
      "output": {
        "ref": "MyDatabaseNode"
      },
      "type": "CopyActivity",
      "runsOn": {
        "ref": "MyEC2Resource"
      }
    },
    {
      "id": "MyEC2Resource",
      "type": "Ec2Resource",
      "actionOnTaskFailure": "terminate",
      "actionOnResourceFailure": "retryAll",
      "maximumRetries": "1",
      "role": "test-role",
      "resourceRole": "test-role",
      "instanceType": "m1.medium",
      "securityGroups": [
        "test-group",
        "default"
      ],
      "keyPair": "test-pair"
    },
    {
      "id": "MyDatabaseNode",
      "type": "MySqlDataNode",
      "table": "table_name",
      "username": "user_name",
      "*password": "my_password",
      "connectionString": "jdbc:mysql://mysqlinstance-rds.example.us-east-1.rds.amazonaws.com:3306/database_name",
      "insertQuery": "insert into #{table} (column1_ name, column2_name) values (?, ?);"
      
    }
  ]
}

This example has the following fields defined in the MySqlDataNode:

id

User-defined identifier for the MySQL database, which is a label for your reference only.

type

MySqlDataNode type that matches the kind of location for our data, which is an Amazon RDS instance using MySQL in this example.

table

Name of the database table that contains the data to copy. Replace table_name with the name of your database table.

username

User name of the database account that has sufficient permission to retrieve data from the database table. Replace user_name with the name of your user account.

*password

Password for the database account with the asterisk prefix to indicate that AWS Data Pipeline must encrypt the password value. Replace my_password with the correct password for your user account.

connectionString

JDBC connection string for CopyActivity to connect to the database.

insertQuery

A valid SQL SELECT query that specifies which data to copy from the database table. Note that #{table} is a variable that re-uses the table name provided by the "table" variable in the preceding lines of the JSON file.