Step 5: Create a job that uses the OpenSearch connection - AWS Glue Studio

Step 5: Create a job that uses the OpenSearch connection

After creating a role for your ETL job, you can create a job in AWS Glue Studio that uses the connection and connector for Open Spark ElasticSearch.

If your job runs within a Amazon Virtual Private Cloud (Amazon VPC), make sure the VPC is configured correctly. For more information, see Configure a VPC for your ETL job.

To create a job that uses the Elasticsearch Spark Connector

  1. In AWS Glue Studio, choose Connectors.

  2. In the Your connections list, select the connection you just created and choose Create job.

  3. In the visual job editor, choose the Data source node. On the right, on the Data source properties - Connector tab, configure additional information for the connector.

    1. Choose Add schema and enter the schema of the data set in the data source. Connections do not use tables stored in the Data Catalog, which means that AWS Glue Studio doesn't know the schema of the data. You must manually provide this schema information. For instructions on how to use the schema editor, see Editing the schema in a custom transform node.

    2. Expand Connection options.

    3. Choose Add new option and enter the information needed for the connector that was not entered in the AWS secret:

      • es.nodes : https://<ElasticSearch endpoint>

      • es.port : 443

      • path : test

      • es.nodes.wan.only. : true

      
        The screenshot shows a data source node of a job graph (selected). The Data source
         properties tab in the panel on the right is selected. The Connection field has the value
         MyEsConn. Under the heading Connection options, additional options have been added. The
         key-value pairs are (es.nodes, https://my-elasticsearch-endpo...), (es.port, 443), (path,
         test), (es.nodes.wan.only, true).

      For an explanation of these connection options, refer to: https://www.elastic.co/guide/en/elasticsearch/hadoop/current/configuration.html.

  4. Add a target node to the graph as described in Adding nodes to the job diagram and Editing the data target node.

    Your data target can be Amazon S3, or it can use information from an AWS Glue Data Catalog or a connector to write data in a different location. For example, you can use a Data Catalog table to write to a database in Amazon RDS, or you can use a connector as your data target to write to data stores that are not natively supported in AWS Glue.

    
      The screenshot shows two nodes of a job graph, a Join transform node and a Data target
       node for an ElasticSearch Connector (selected). The Node properties tab in the panel on the
       right is selected. The values displayed are: Name - "ElasticSearch Spark Connector", Node
       type - ElasticSearch Spark Connector. A drop-down list is displayed for the Node type
       selection, and the list of available Data targets is shown, which includes S3, Data Catalog, AWS Glue
       Connector for Google BigQuery, Apache Hudi Connector, and ElasticSearch Spark Connector 
       (selected).

    If you choose a connector for your data target, you must choose a connection created for that connector. Also, if required by the connector provider, you must add options to provide additional information to the connector. If you use a connection that contains information for an AWS secret, then you don’t need to provide the user name and password authentication in the connection options.

    
      The screenshot shows a four nodes of a job graph, an ElasticSearch source node, an Data
       Catalog source node, a Join transform node, and an ElasticSearch Data target node (selected).
       The Data target properties tab in the panel on the right is selected. The Connection field
       has the value MyEsConn. Under the heading Connection options, additional options have beed
       added. The key-value pairs are (es.net.http.auth.user, MyUser), (path, es_write_loc),
       (es.nodes.wan.only, true), (es.nodes, https://search-glue-etl-job-vtr...),
       (es.net.http.auth.pass, HiddenPassword), and (es.port, 443).
  5. Optionally, add additional data sources and one or more transform nodes as described in Editing the data transform node.

  6. Configure the job properties as described in Modify the job properties, starting with step 3, and save the job.

Next step

Step 6: Run the job