Creating Amazon OpenSearch Service data source integrations with Amazon S3 - Amazon OpenSearch Service

Creating Amazon OpenSearch Service data source integrations with Amazon S3

You can create a new Amazon S3 direct-query data source for OpenSearch Service through the AWS Management Console or the API. Each new data source uses the AWS Glue Data Catalog to manage tables that represent Amazon S3 buckets.

Prerequisites

Before you can create a data source, you must have an OpenSearch domain with version 2.13 or later. For instructions on setting this up, see Creating OpenSearch Service domains.

Set up a new direct-query data source

You can set up a direct-query data source on a domain with the AWS Management Console or the OpenSearch Service API.

  1. Navigate to the Amazon OpenSearch Service console at https://console.aws.amazon.com/aos/.

  2. In the left navigation pane, choose Domains.

  3. Select the domain that you want to set up a new data source for. This opens the domain details page. Choose the Connections tab below the general domain details and find the Direct query section.

  4. Choose Create.

  5. On the data source creation page, enter a name for your new data source. Under Data source type, choose Amazon S3. Choose an existing IAM role that has limitations for what can be accessed in the AWS Glue Data Catalog and Amazon S3.

  6. Choose Create. This opens the data source details screen with an OpenSearch Dashboards URL. You can navigate to this URL to complete the next steps.

Use the AddDataSource API operation to create a new data source in your domain.

POST https://es.region.amazonaws.com/2021-01-01/opensearch/domain/domain-name/dataSource { "DataSourceType": { "s3GlueDataCatalog": { "RoleArn": "arn:aws:iam::account-id:role/Admin" } } "Description": "data-source-description", "Name": "my-data-source" }

The following sample policy demonstrates the least-privilege permissions required to create and manage a data source. If you have broader permissions, such as s3:* or the AdminstratorAccess policy, these permissions encompasses the least-privilege permissions in the sample policy.

The integration needs access to write to Amazon S3 and AWS Glue Data Catalog. For Amazon S3, we need write access to maintain a checkpoint location when building accelerations. For AWS Glue Data Catalog, we need write access to manage databases, tables, and partitions needed for the integration from within OpenSearch Service.

{ "Version":"2012-10-17", "Statement":[ { "Sid":"HttpActionsForOpenSearchDomain", "Effect":"Allow", "Action":"es:ESHttp*", "Resource":"arn:aws:es:<region>:<account>:domain/<domain_name>/*" }, { "Sid":"AmazonOpenSearchS3GlueDirectQueryReadAllS3Buckets", "Effect":"Allow", "Action":[ "s3:GetObject", "s3:GetObjectVersion", "s3:ListBucket" ], "Condition":{ "StringEquals":{ "aws:ResourceAccount":"<account>" } }, "Resource":"*" }, { "Sid":"AmazonOpenSearchDirectQueryGlueCreateAccess", "Effect":"Allow", "Action":[ "glue:CreateDatabase", "glue:CreatePartition", "glue:CreateTable", "glue:BatchCreatePartition" ], "Resource":"*" }, { "Sid":"AmazonOpenSearchS3GlueDirectQueryModifyAllGlueResources", "Effect":"Allow", "Action":[ "glue:DeleteDatabase", "glue:DeletePartition", "glue:DeleteTable", "glue:GetDatabase", "glue:GetDatabases", "glue:GetPartition", "glue:GetPartitions", "glue:GetTable", "glue:GetTableVersions", "glue:GetTables", "glue:UpdateDatabase", "glue:UpdatePartition", "glue:UpdateTable", "glue:BatchGetPartition", "glue:BatchDeletePartition", "glue:BatchDeleteTable" ], "Resource":[ "arn:aws:glue:us-east-1:<account>:table/*", "arn:aws:glue:us-east-1:<account>:database/*", "arn:aws:glue:us-east-1:<account>catalog" ], "Condition":{ "StringEquals":{ "aws:ResourceAccount":"<account>" } } }, { "Sid":"ReadAndWriteActionsForS3CheckpointBucket", "Effect":"Allow", "Action":[ "s3:ListMultipartUploadParts", "s3:DeleteObject", "s3:GetObject", "s3:PutObject", "s3:GetBucketLocation", "s3:ListBucket" ], "Condition":{ "StringEquals":{ "aws:ResourceAccount":"<account>" } }, "Resource":[ "arn:aws:s3:::<checkpoint_bucket_name>", "arn:aws:s3:::<checkpoint_bucket_name>/*" ] } ] }

To support Amazon S3 buckets in different accounts, you will need to include a condition to the Amazon S3 policy and add the appropriate account.

"Condition": { "StringEquals": { "aws:ResourceAccount": "{{accountId}}" }

The role must also have the following trust policy, which specifies the target ID.

{ "Version":"2012-10-17", "Statement":[ { "Effect":"Allow", "Principal":{ "Service": "directquery.opensearchservice.amazonaws.com" }, "Action":"sts:AssumeRole" } ] }

For instructions to create the role, see Creating a role using custom trust policies.

If you have fine-grained access control enabled in OpenSearch Service, a new OpenSearch fine-grained access control role will automatically be created for your data source. The name of the new fine-grained access control role will be AWSOpenSearchDirectQuery <name of data source>.

By default, the role has access to direct query data source indexes only. Although you can configure the role to limit or grant access to your data source, it is recommended you not adjust the access of this role. If you delete the data source, this role will be deleted. This will remove access for any other users if they are mapped to the role.

Map the AWS Glue Data Catalog role (if fine-grained access control is enabled after creating data source)

If you have enabled fine-grained access control after creating a data source, you must map non-admin users to an IAM role with AWS Glue Data Catalog access in order to run direct queries. To manually create a backend glue_access role that you can map to the IAM role, perform the following steps:

Note

Indexes are used for any queries against the data source. A user with read access to the request index for a given data source can read all queries against that data source. A user with read access to the result index can read results for all queries against that data source.

  1. From the main menu in OpenSearch Dashboards, choose Security, Roles, and Create roles.

  2. Name the role glue_access.

  3. For Cluster permissions, select indices:data/write/bulk*, indices:data/read/scroll, indices:data/read/scroll/clear.

  4. For Index, enter the following indexes you want to grant the user with the role access to:

    • .query_execution_request_<name of data source>

    • query_execution_result_<name of data source>

    • flint_*

  5. For Index permissions, select indices_all.

  6. Choose Create.

  7. Choose Mapped users, Manage mapping.

  8. Under Backend roles, add the ARN of the AWS Glue role that needs permission to call your domain.

    arn:aws:iam::account-id:role/role-name
  9. Select Map and confirm the role shows up under Mapped users.

For more information on mapping roles, see Mapping roles to users.

Next steps

After you create a data source, OpenSearch Service provides you with an OpenSearch Dashboards URL. You use this to configure access control, define tables, set up log-type based dashboards for popular log types, and query your data.