Appendix F: Run the clinvar-to-parquet AWS Glue job - Genomics Tertiary Analysis and Data Lakes Using AWS Glue

Appendix F: Run the clinvar-to-parquet AWS Glue job

This solution includes an example Clinvar to Parquet Extract, Transform, and Load (ETL) AWS Glue job. You can run the job using either the AWS Command Line Interface (AWS CLI) or the AWS Glue console.

To start the job using the AWS CLI, run the following command:

aws glue start-job-run \ --name clinvar-to-parquet –arguments \ BucketName=<data-lake-bucket>,KeyPrefix=annotation/clinvar.tsv’

The data lake bucket name can be found as the DataLakeBucket output value in the GenomicsAnalysisPipe CloudFormation stack.

Use the following process to run the job in the AWS Glue console:

  1. Sign in to the AWS Glue console.

  2. Choose Jobs from the left navigation menu. On the Jobs page, select the name of the example jobs —clinvar-to-parquet.

  3. Choose Actions and select Run Job.

  4. Expand Security configuration, script libraries, and job parameters.

  5. Under Job Parameters observe the values for the –input-path and –output-path keys.


    Do not change these values.

  6. Select Run Job.