Creating HealthOmics variant store import jobs
The following example shows how to use the AWS CLI to create an import job for a variant store.
aws omics start-variant-import-job \ --destination-name myvariantstore \ --runLeftNormalization false \ --role-arn arn:aws:iam::55555555555:role/roleName \ --items source=s3://my-omics-bucket/sample.vcf.gz source=s3://my-omics-bucket/sample2.vcf.gz
{ "destinationName": "store_a", "roleArn": "....", "runLeftNormalization": false, "items": [ {"source": "s3://my-omics-bucket/sample.vcf.gz"}, {"source": "s3://my-omics-bucket/sample2.vcf.gz"} ] }
For stores created after May 15, 2023, the following example shows how to add the
--annotation-fields
parameter. The annotation fields are defined with the
import.
aws omics start-variant-import-job \ --destination-name annotationparsingvariantstore \ --role-arn arn:aws:iam::123456789012:role/<role_name> \ --items source=s3://pathToS3/sample.vcf --annotation-fields '{"VEP": "CSQ"}'
{ "jobId": "981e2286-e954-4391-8a97-09aefc343861" }
Use get-variant-import-job to check the status.
aws omics get-variant-import-job --job-id 08279950-a9e3-4cc3-9a3c-a574f9c9e229
You'll receive a JSON response that shows the status of your import job. VEP annotations
in the VCF are parsed for information stored in the INFO column as an ID/Value pair. The
default ID for Ensembl Variant Effect Predictor--annotation-fields
parameter to indicate a custom value used in the INFO
column. Parsing is currently supported for VEP annotations.
For a store created before May 15, 2023 or for VCF files that don't include VEP annotation, the response doesn't include any annotation fields.
{ "creationTime": "2023-04-11T17:52:37.241958+00:00", "destinationName": "annotationparsingvariantstore", "id": "7a1c67e3-b7f9-434d-817b-9c571fd63bea", "items": [ { "jobStatus": "COMPLETED", "source": "s3://amzn-s3-demo-bucket/NA12878.2k.garvan.vcf" } ], "roleArn": "arn:aws:iam::555555555555:role/<role_name>", "runLeftNormalization": false, "status": "COMPLETED", "updateTime": "2023-04-11T17:58:22.676043+00:00", }
The VEP annotations that are a part of VCF files are stored as predefined schema with the following structure. The extras field can be used to store any additional VEP fields that aren't included in the default schema.
annotations struct< vep: array<struct< allele:string, consequence: array<string>, impact:string, symbol:string, gene:string, `feature_type`: string, feature: string, biotype: string, exon: struct<rank:string, total:string>, intron: struct<rank:string, total:string>, hgvsc: string, hgvsp: string, `cdna_position`: string, `cds_position`: string, `protein_position`: string, `amino_acids`: struct<reference:string, variant: string>, codons: struct<reference:string, variant: string>, `existing_variation`: array<string>, distance: string, strand: string, flags: array<string>, symbol_source: string, hgnc_id: string, `extras`: map<string, string> >> >
The parsing is performed with a best effort approach. If the VEP entry doesn't follow the
VEP
standard specifications
For a new variant store, the response for get-variant-import-job would include the annotation fields, as shown.
aws omics get-variant-import-job --job-id 08279950-a9e3-4cc3-9a3c-a574f9c9e229
You receive a JSON response that shows the status of your import job.
{ "creationTime": "2023-04-11T17:52:37.241958+00:00", "destinationName": "annotationparsingvariantstore", "id": "7a1c67e3-b7f9-434d-817b-9c571fd63bea", "items": [ { "jobStatus": "COMPLETED", "source": "s3://amzn-s3-demo-bucket/NA12878.2k.garvan.vcf" } ], "roleArn": "arn:aws:iam::123456789012:role/<role_name>", "runLeftNormalization": false, "status": "COMPLETED", "updateTime": "2023-04-11T17:58:22.676043+00:00", "annotationFields" : {"VEP": "CSQ"} } }
You can use list-variant-import-jobs to see all import jobs and their statuses.
aws omics list-variant-import-jobs --ids 7a1c67e3-b7f9-434d-817b-9c571fd63bea
The response contains information as follows.
{ "variantImportJobs": [ { "creationTime": "2023-04-11T17:52:37.241958+00:00", "destinationName": "annotationparsingvariantstore", "id": "7a1c67e3-b7f9-434d-817b-9c571fd63bea", "roleArn": "arn:aws:iam::55555555555:role/roleName", "runLeftNormalization": false, "status": "COMPLETED", "updateTime": "2023-04-11T17:58:22.676043+00:00", "annotationFields" : {"VEP": "CSQ"} } ] } }
If necessary, you can cancel an import job with the following command.
aws omics cancel-variant-import-job --job-id edd7b8ce-xmpl-47e2-bc99-258cac95a508