A reference store in HealthOmics is a data store for the storage of reference genomes. You can have a single reference store in each AWS account and Region. You can create a reference store using the console or CLI.
Creating a reference
store using the console
To create a reference store
-
Open the HealthOmics console https://console.aws.amazon.com/omics/
. -
In the left navigation pane, choose Get started with HealthOmics.
-
Choose Reference genomes from the Genomics data storage options.
-
You can either choose a previously imported reference genome or import a new one. If you haven't imported a reference genome,choose Import reference genome in the top right.
-
On the Create reference genome import job page, choose either the Quick create or Manual create option to create a reference store, and then provide the following information.
-
Reference genome name - A unique name for this store.
-
Description (optional) - A description of this reference store.
-
IAM Role - Select a role with access to your reference genome.
-
Reference from Amazon S3 - Select your reference sequence file in an Amazon S3 bucket.
-
Tags (optional) - Provide up to 50 tags for this reference store.
-
Creating a reference
store using the CLI
The following example shows you how to create a reference store by using the AWS CLI. You can have one reference store per AWS Region.
Reference stores support storage of FASTA files with the extensions .fasta
, .fa
,
.fas
, .fsa
, .faa
, .fna
, .ffn
,
.frn
, .mpfa
, .seq
, .txt
. The bgzip
version
of these extensions is also supported.
In the following example, replace
with the
name you've chosen for your reference store.reference store name
aws omics create-reference-store --name
"reference store name"
You receive a JSON response with the reference store ID and name, the ARN, and the timestamp of when your reference store was created.
{
"id": "3242349265",
"arn": "arn:aws:omics:us-west-2:555555555555:referenceStore/3242349265",
"name": "MyReferenceStore",
"creationTime": "2022-07-01T20:58:42.878Z"
}
You can use the reference store ID in additional AWS CLI commands. You can retrieve the list of reference store IDs linked to your account by using the list-reference-stores command, as shown in the following example.
aws omics list-reference-stores
In response, you receive the name of your newly created reference store.
{
"referenceStores": [
{
"id": "3242349265",
"arn": "arn:aws:omics:us-west-2:555555555555:referenceStore/3242349265",
"name": "MyReferenceStore",
"creationTime": "2022-07-01T20:58:42.878Z"
}
]
}
After you create a reference store, you can create import jobs to load genomic reference files into it. To do so, you must use or create an IAM role to access the data. The following is an example policy.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:GetBucketLocation"
],
"Resource": [
"arn:aws:s3:::amzn-s3-demo-bucket1",
"arn:aws:s3:::amzn-s3-demo-bucket1/*"
]
}
]
}
}
You must also have a trust policy similar to the following example.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": [
"omics.amazonaws.com"
]
},
"Action": "sts:AssumeRole"
}
]
}
You can now import a reference genome. This example uses Genome Reference Consortium Human Build 38
(hg38), which is open access and available from the Registry of Open
Data on AWS
aws s3 cp s3://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta s3://amzn-s3-demo-bucket
You can then begin your import job. Replace
, reference store
ID
, and role ARN
with your own input.source file
path
aws omics start-reference-import-job --reference-store-id
--role-arn
reference store ID
--sources
role ARN
source file path
After the data is imported, you receive the following response in JSON.
{
"id": "7252016478",
"referenceStoreId": "3242349265",
"roleArn": "arn:aws:iam::111122223333:role/OmicsReferenceImport",
"status": "CREATED",
"creationTime": "2022-07-01T21:15:13.727Z"
}
You can monitor the status of a job by using the following command. In the following example, replace
and reference store ID
with your reference store ID and the job ID that you want to learn more about.job
ID
aws omics get-reference-import-job --reference-store-id
--id
reference store ID
job ID
In response, you receive a response with the details for that reference store and its status.
{
"id": "7252016478",
"referenceStoreId": "3242349265",
"roleArn": "arn:aws:iam::555555555555:role/OmicsReferenceImport",
"status": "RUNNING",
"creationTime": "2022-07-01T21:15:13.727Z",
"sources": [
{
"sourceFile": "s3://amzn-s3-demo-bucket/Homo_sapiens_assembly38.fasta",
"status": "IN_PROGRESS",
"name": "MyReference"
}
]
}
You can also find the reference that was imported by listing your references and filtering them based
on the reference name. Replace
with your reference
store ID, and add an optional filter to narrow the list.reference store ID
aws omics list-references --reference-store-id
--filter name=
reference store ID
MyReference
In response, you receive the following information.
{
"references": [
{
"id": "1234567890",
"arn": "arn:aws:omics:us-west-2:555555555555:referenceStore/1234567890/reference/1234567890",
"referenceStoreId": "12345678",
"md5": "7ff134953dcca8c8997453bbb80b6b5e",
"status": "ACTIVE",
"name": "MyReference",
"creationTime": "2022-07-02T00:15:19.787Z",
"updateTime": "2022-07-02T00:15:19.787Z"
}
]
}
To learn more about the reference metadata, use the get-reference-metadata API
operation. In the following example, replace
with
your reference store ID and reference store ID
with the reference ID that you
want to learn more about.reference ID
aws omics get-reference-metadata --reference-store-id
--id
reference store ID
reference ID
You receive the following information in response.
{ "id": "1234567890", "arn": "arn:aws:omics:us-west-2:555555555555:referenceStore/referencestoreID/reference/referenceID", "referenceStoreId": "1234567890", "md5": "7ff134953dcca8c8997453bbb80b6b5e", "status": "ACTIVE", "name": "MyReference", "creationTime": "2022-07-02T00:15:19.787Z", "updateTime": "2022-07-02T00:15:19.787Z", "files": { "source": { "totalParts": 31, "partSize": 104857600, "contentLength": 3249912778 }, "index": { "totalParts": 1, "partSize": 104857600, "contentLength": 160928 } } }
You can also download parts of the reference file by using get-reference. In the
following example, replace
with your reference store
ID and reference store ID
with the reference ID that you want to download
from.reference ID
aws omics get-reference --reference-store-id
--id
reference store ID
--part-number 1 outfile.fa
reference ID