Creating and managing sequence stores - AWS HealthOmics

Creating and managing sequence stores

HealthOmics sequence stores support storage of genomic files in the unaligned formats of FASTQ (gzip-only) and uBAM. It also supports the aligned formats of BAM and CRAM. Imported files are stored as read sets, which are an AWS resource. This means that you can add tags and control access through IAM. Aligned read sets require a reference genome to align genomic sequences, but it's optional for unaligned read sets.

To store read sets, you first create a sequence store. When you create a sequence store, you can specify an optional Amazon S3 bucket as a fallback location. The fallback location is used for storing any files that fail to create a read set during a direct upload. Fallback locations are available for sequence stores created after May 15, 2023. You specify the fallback location when you create the sequence store. You can't add a fallback location after the sequence store is created.

In the following example, replace sequence store name with the name you chose for your sequence store.

aws omics create-sequence-store --name sequence store name --fallback-location "s3://DOC-EXAMPLE-BUCKET"

You receive the following response in JSON, which includes the ID number for your newly created sequence store.

{ "id": "3936421177", "arn": "arn:aws:omics:us-west-2:111122223333:sequenceStore/3936421177", "name": "sequence_store_example_name", "creationTime": "2022-07-13T20:09:26.038Z" "fallbackLocation" : "s3://DOC-EXAMPLE-BUCKET" }

You can also view all sequence stores associated with your account by using the list-sequence-stores command, as shown in the following.

aws omics list-sequence-stores

You receive the following response.

{ "sequenceStores": [ { "arn": "arn:aws:omics:us-west-2:111122223333:sequenceStore/3936421177", "id": "3936421177", "name": "MySequenceStore", "creationTime": "2022-07-13T20:09:26.038Z" "fallbackLocation" : "s3://DOC-EXAMPLE-BUCKET" } ] }

Additionally, you can use get-sequence-store to learn more about a sequence store by using its ID, as shown in the following.

aws omics get-sequence-store --id sequence store ID
{ "arn": "arn:aws:omics:us-west-2:123456789012:sequenceStore/sequencestoreID", "creationTime": "2024-01-12T04:45:29.857Z", "description": null, "fallbackLocation": null, "id": "2015356892", "name": "MySequenceStore", "s3Access": { "s3AccessPointArn": "arn:aws:s3:us-west-2:123456789012:accesspoint/592761533288-2015356892", "s3Uri": "s3://592761533288-2015356892-ajdpi90jdas90a79fh9a8ja98jdfa9jf98-s3alias/592761533288/sequenceStore/2015356892/" }, "sseConfig": { "keyArn": "arn:aws:kms:us-west-2:123456789012:key/eb2b30f5-635d-4b6d-b0f9-d3889fe0e648", "type": "KMS" } }