Creating Amazon OpenSearch Service data source integrations with Amazon S3 - Amazon OpenSearch Service

Creating Amazon OpenSearch Service data source integrations with Amazon S3

This is prerelease documentation for Amazon OpenSearch Service direct queries with Amazon S3, which is in preview release. The documentation and the feature are both subject to change. We recommend that you use this feature only in test environments, and not in production environments. For preview terms and conditions, see Betas and Previews in AWS Service Terms.

You can create a new Amazon S3 direct-query data source for OpenSearch Service through the AWS Management Console or the API. Each new data source uses the AWS Glue Data Catalog to manage tables that represent Amazon S3 buckets.

Prerequisites

Before you create a data source, you must have the following:

  • An OpenSearch domain with version 2.11 or later

For instructions for setting these up, see Creating OpenSearch Service domains and Getting started with the AWS Glue Data Catalog.

Required permissions

To create a data source, your user or role must have an attached identity-based policy with the appropriate IAM permissions. The following sample policy demonstrates the least-privilege permissions required to create and manage a data source. Note that if you have broader permissions, such as s3:* or the AdministratorAccess policy, these permissions encompass the least-privilege permissions in the sample policy.

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "es:ESHttp*", "es:AddDataSource", "es:DeleteDataSource", "es:GetDataSource", "es:ListDataSource", "es:UpdateDataSource", "s3:Get*", "s3:List*", "s3:Put*", "s3:Describe*", "glue:*" ], "Resource": [ "arn:aws:s3:::bucket-name", "arn:aws:s3:::bucket-name/*", "arn:aws:glue:us-east-1:{aws-account-id}:database/*" ] }, { "Sid": "GlueCreateAndReadDataCatalog", "Effect": "Allow", "Action": [ "glue:GetDatabase", "glue:CreateDatabase", "glue:GetDatabases", "glue:CreateTable", "glue:GetTable", "glue:UpdateTable", "glue:DeleteTable", "glue:GetTables", "glue:GetPartition", "glue:GetPartitions", "glue:CreatePartition", "glue:BatchCreatePartition", "glue:GetUserDefinedFunctions" ], "Resource": "*" } ] }

The role must also have the following trust policy, which specifies the target ID.

{ "Version":"2012-10-17", "Statement":[ { "Effect":"Allow", "Principal":{ "Service": "directquery.opensearchservice.amazonaws.com" }, "Action":"sts:AssumeRole" } ] }

For instructions to create the role, see Creating a role using custom trust policies.

If you have fine-grained access control enabled, a new OpenSearch fine-grained access control role will automatically be created for your data source. The name of the new fine-grained access control role will be AWSOpenSearchDirectQuery_<name of data source>.

By default, the role has access to direct query data source indexes only. Although you can configure the role to limit or grant access to your data source, it is recommended you not adjust the access of this role. If you delete the data source, this role will be deleted. This will remove access for any other users if they are mapped to the role.

Map the AWS Glue Data Catalog role (if fine-grained access control is enabled after creating data source)

If you have enabled fine-grained access control after creating a data source, you must map non-admin users to an IAM role with AWS Glue Data Catalog access in order to run direct queries. To manually create a backend glue_access role that you can map to the IAM role, perform the following steps:

Note

Indexes are used for any queries against the data source. A user with read access to the request index for a given data source can read all queries against that data source. A user with read access to the result index can read results for all queries against that data source.

  1. From the main menu in OpenSearch Dashboards, choose Security, Roles, and Create roles.

  2. Name the role glue_access.

  3. For Cluster permissions, select indices:data/write/bulk*, indices:data/read/scroll, indices:data/read/scroll/clear.

  4. For Index, enter the following indexes you want to grant the user with the role access to:

    • .query_execution_request_<name of data source>

    • query_execution_result_<name of data source>

    • flint_*

  5. For Index permissions, select indices_all.

  6. Choose Create.

  7. Choose Mapped users, Manage mapping.

  8. Under Backend roles, add the ARN of the AWS Glue role that needs permission to call your domain.

    arn:aws:iam::account-id:role/role-name
  9. Select Map and confirm the role shows up under Mapped users.

For more information on mapping roles, see Mapping roles to users.

Set up a new direct-query data source

You can set up a direct-query data source on a domain with the AWS Management Console or the OpenSearch Service API.

  1. Navigate to the Amazon OpenSearch Service console at https://console.aws.amazon.com/aos/.

  2. In the left navigation pane, choose Domains.

  3. Select the domain that you want to set up a new data source for. This opens the domain details page. Choose the Connections tab below the general domain details and find the Direct query section.

  4. Choose Create.

  5. On the data source creation page, enter a name for your new data source. Under Data source type, choose Amazon S3. Choose an existing IAM role that has limitations for what can be accessed in the AWS Glue Data Catalog and Amazon S3.

  6. Choose Create. This opens the data source details screen with an OpenSearch Dashboards URL. You can navigate to this URL to complete the next steps.

Use the AddDataSource API operation to create a new data source in your domain.

POST https://es.region.amazonaws.com/2021-01-01/opensearch/domain/domain-name/dataSource { "DataSourceType": { "s3GlueDataCatalog": { "RoleArn": "arn:aws:iam::account-id:role/Admin" } } "Description": "data-source-description", "Name": "my-data-source" }

Next steps

After you create a data source, OpenSearch Service provides you with an OpenSearch Dashboards URL. You use this to configure access control, define tables, set up log-type based dashboards for popular log types, and query your data.