Working with MongoDB connections in ETL jobs

You can create a connection for MongoDB and then use that connection in your AWS Glue job. For more information, see MongoDB connections in the AWS Glue programming guide. The connection url, username and password are stored in the MongoDB connection. Other options can be specified in your ETL job script using the additionalOptions parameter of glueContext.getCatalogSource. The other options can include:

database: (Required) The MongoDB database to read from.
collection: (Required) The MongoDB collection to read from.

By placing the database and collection information inside the ETL job script, you can use the same connection for in multiple jobs.

Create an AWS Glue Data Catalog connection for the MongoDB data source. See "connectionType": "mongodb" for a description of the connection parameters. You can create the connection using the console, APIs or CLI.
Create a database in the AWS Glue Data Catalog to store the table definitions for your MongoDB data. See Creating databases for more information.
Create a crawler that crawls the data in the MongoDB using the information in the connection to connect to the MongoDB. The crawler creates the tables in the AWS Glue Data Catalog that describe the tables in the MongoDB database that you use in your job. See Using crawlers to populate the Data Catalog for more information.
Create a job with a custom script. You can create the job using the console, APIs or CLI. For more information, see Adding Jobs in AWS Glue.
Choose the data targets for your job. The tables that represent the data target can be defined in your Data Catalog, or your job can create the target tables when it runs. You choose a target location when you author the job. If the target requires a connection, the connection is also referenced in your job. If your job requires multiple data targets, you can add them later by editing the script.

Customize the job-processing environment by providing arguments for your job and generated script.

Here is an example of creating a DynamicFrame from the MongoDB database based on the table structure defined in the Data Catalog. The code uses additionalOptions to provide the additional data source information:

Run the job, either on-demand or through a trigger.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Updating the schema and adding new partitions

Optimizing query performance using column statistics