Menu
Amazon EMR
Amazon EMR Release Guide

Apache Sqoop

Sqoop is a tool for transferring data between Amazon S3, Hadoop, HDFS, and RDBMS databases.

Release Information

Application Amazon EMR Release Label Components installed with this application

Sqoop 1.4.6

emr-5.8.0

emrfs, emr-ddb, emr-goodies, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, mysql-server, sqoop-client

By default, Sqoop has a MariaDB and PostgresSQL driver installed. The PostgresSQL driver installed for Sqoop will only work for PostgreSQL 8.4. To install an alternate set of JDBC connectors for Sqoop, you need to install them in /usr/lib/sqoop/lib. The following are links for various JDBC connectors:

Sqoop's supported databases are shown here: http://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_supported_databases. If the JDBC connect string does not match those in this list, you will need to specify a driver.

For example, you can export to an Amazon Redshift database table with the following command (for JDBC 4.1):

Copy
sqoop export --connect jdbc:redshift://$MYREDSHIFTHOST:5439/mydb --table mysqoopexport --export-dir s3://mybucket/myinputfiles/ --driver com.amazon.redshift.jdbc41.Driver --username master --password Mymasterpass1

You can use both the MariaDB and MySQL connection strings but if you specify the MariaDB connection string, you need to specify the driver:

Copy
sqoop export --connect jdbc:mariadb://$HOSTNAME:3306/mydb --table mysqoopexport --export-dir s3://mybucket/myinputfiles/ --driver org.mariadb.jdbc.Driver --username master --password Mymasterpass1

If you are using Secure Socket Layer encryption to access your database, you need to use a JDBC URI like in the following Sqoop export example:

Copy
sqoop export --connect jdbc:mariadb://$HOSTNAME:3306/mydb?verifyServerCertificate=false&useSSL=true&requireSSL=true --table mysqoopexport --export-dir s3://mybucket/myinputfiles/ --driver org.mariadb.jdbc.Driver --username master --password Mymasterpass1

For more information about SSL encryption in RDS, see Using SSL to Encrypt a Connection to a DB Instance in the Amazon Relational Database Service User Guide.

For more information, see the Apache Sqoop documentation.