HealthOmics storage
Use HealthOmics storage to store, retrieve, organize, and share genomics data efficiently and at low cost. HealthOmics storage understands the relationships between different data objects, so that you can define which read sets originated from the same source data. This provides you with data provenance.
Data that's stored in ACTIVE
state is retrievable immediately. Data that hasn't been accessed for
30 days or more is stored in ARCHIVE
state. To access archived data, you can reactivate it through the
API operations or console.
HealthOmics sequence stores are designed to preserve the content integrity of files. However, bitwise equivalence of imported data files and exported files isn't preserved because of the compression during active and archive tiering.
During ingestion, HealthOmics generates an entity tag, or HealthOmics ETag, to make it possible to validate the content integrity of your data files. Sequencing portions are identified and captured as an ETag at the source level of a read set. The ETag calculation doesn't alter the actual file or genomic data. After a read set is created, the ETag shouldn't change throughout the lifecycle of the read set source. This means that reimporting the same file results in the same ETag value being calculated.
Topics
- HealthOmics ETags and data provenance
- Creating a HealthOmics reference store
- Creating a HealthOmics sequence store
- Deleting HealthOmics reference and sequence stores
- Importing read sets into a HealthOmics sequence store
- Direct upload to a HealthOmics sequence store
- Exporting HealthOmics read sets to an Amazon S3 bucket
- Accessing HealthOmics read sets with Amazon S3 URIs
- Activating read sets in HealthOmics