Importing files into HealthLake data stores - AWS HealthLake

Importing files into HealthLake data stores

After you create your HealthLake data store, you can import files from an Amazon Simple Storage Service (Amazon S3) bucket. You can use the HealthLake console or the StartFHIRImportJobto start an import job. HealthLake accepts input files in newline delimited JSON (.ndjson) format, where each line consists of a valid FHIR resource. You can use the API operations DescribeFHIRImportJob and ListFHIRImportJobs to describe and list ongoing import jobs. A customer-owned or AWS-owned KMS key is required for encryption of the Amazon S3 bucket for all import jobs. To learn more about creating and using a KMS Keys, see Creating keys in the AWS Key Management Service Developer Guide.

Only one import or export job can run concurrently per HealthLake data store. However, users can create, read, update, or delete FHIR resources while an import job is in progress.

For each import job, a manifest.json file is generated. This file describes both the successes and failures of an import job. Users can programmatically navigate to these files. They are organized into two folders, named SUCCESS and FAILURE. An output file may contain sensitive information, therefore, users must provide both an output Amazon S3 bucket and an AWS KMS key for encryption.

The following is an example of the output manifest.json file. It is recommended users use this file as the first step of troubleshooting a failed import job because it provides details on each file and what caused the import job to fail.

{ "inputDataConfig": { "s3Uri": "s3://inputS3Bucket/healthlake-input/invalidInput/" }, "outputDataConfig": { "s3Uri": "s3://outputS3Bucket/32839038a2f47f17c2fe0f53f0c3a0ba-FHIR_IMPORT-19dd7bb7bcc8ee12a09bf6d322744a3d/", "encryptionKeyID": "arn:aws:kms:us-west-2:123456789012:key/fbbbfee3-20b3-42a5-a99d-c48c655ed545" }, "successOutput": { "successOutputS3Uri": "s3://outputS3Bucket/32839038a2f47f17c2fe0f53f0c3a0ba-FHIR_IMPORT-19dd7bb7bcc8ee12a09bf6d322744a3d/SUCCESS/" }, "failureOutput": { "failureOutputS3Uri": "s3://outputS3Bucket/32839038a2f47f17c2fe0f53f0c3a0ba-FHIR_IMPORT-19dd7bb7bcc8ee12a09bf6d322744a3d/FAILURE/" }, "numberOfScannedFiles": 1, "numberOfFilesImported": 1, "sizeOfScannedFilesInMB": 0.023627, "sizeOfDataImportedSuccessfullyInMB": 0.011232, "numberOfResourcesScanned": 9, "numberOfResourcesImportedSuccessfully": 4, "numberOfResourcesWithCustomerError": 5, "numberOfResourcesWithServerError": 0 }

Performing an import

You can start an import job by using either the AWS HealthLake console or the AWS HealthLake import API, start-fhir-import-job API.

Importing files by using the API operations

Prerequisites

When you use the AWS HealthLake API operations, you must first create an AWS Identity and Access Management (IAM) policy and attach it to an IAM role. To learn more about IAM roles and trust policies, see IAM Policies and Permissions. Customers must also use a KMS key for encryption. To learn more about using KMS Keys, see Amazon Key Management Service.

To import files (API), use the following steps.
  1. Upload your data into an Amazon S3 bucket.

  2. To start a new import job, use the start-FHIR-import-job operation. When you start the job, indicate to HealthLake the name of the Amazon S3 bucket that contains the input files, the KMS key you want to use for encryption, and the output data configuration.

  3. To learn more about a FHIR import job, use the describe-fhir-import-job operation to get the job's ID, ARN, name, start time, end time, and current status. Use list-fhir-import-job to show all import jobs and their statuses.

Importing files by using the console

To import files (console), use the following steps.
  1. Upload your data into an Amazon S3 bucket.

  2. To start a new import job, identify the Amazon S3 bucket, and either create or identify the IAM role and the KMS key you want to use. To learn more about IAM roles and trust policies, see IAM Roles. To learn more about using KMS keys, see Amazon Key Management Service.

  3. To see the status of your import job, use ListFHIRImportJobs. For more details on the ListFHIRImportJobs API command, see ListFHIRImportJobs in the AWS HealthLake API Reference.

IAM policies for import jobs

The IAM role that calls the AWS HealthLake API operations must have a policy that grants access to the Amazon S3 buckets containing the input files. It must also be assigned a trust relationship that enables HealthLake to assume the role. To learn more about IAM roles and trust policies, see IAM Roles.

The role must have the following policy:

{ "Version": "2012-10-17", "Statement": [ { "Action": [ "s3:ListBucket", "s3:GetBucketPublicAccessBlock", "s3:GetEncryptionConfiguration" ], "Resource": [ "arn:aws:s3:::inputS3Bucket", "arn:aws:s3:::outputS3Bucket" ], "Effect": "Allow" }, { "Action": [ "s3:GetObject" ], "Resource": [ "arn:aws:s3:::inputS3Bucket/*" ], "Effect": "Allow" }, { "Action": [ "s3:PutObject" ], "Resource": [ "arn:aws:s3:::outputS3Bucket/*" ], "Effect": "Allow" }, { "Action": [ "kms:DescribeKey", "kms:GenerateDataKey*" ], "Resource": [ "arn:aws:kms:us-east-1:012345678910:key/d330e7fc-b56c-4216-a250-f4c43ef46e83" ], "Effect": "Allow" } ] }

The role must have the following trust relationship.

{ "Version": "2012-10-17", "Statement": [ {"Effect": "Allow", "Principal": {"Service": [ "healthlake.amazonaws.com" ] }, "Action": "sts:AssumeRole" "Condition": { "StringEquals": { "aws:SourceAccount": "(accountId)" }, "ArnEquals": { "aws:SourceArn": "arn:aws:healthlake:(region):(accountId):datastore/fhir/(datastoreId)" } } } ] }