Providing your own custom scripts
Scripts perform the extract, transform, and load (ETL) work in AWS Glue. A script is created when you automatically generate the source code logic for a job. You can either edit this generated script, or you can provide your own custom script.
Different versions of AWS Glue support different versions of Apache Spark. Your custom script must be compatible with the supported Apache Spark version. For information about AWS Glue versions, see the Glue version job property.
To provide your own custom script in AWS Glue, follow these general steps:
-
Sign in to the AWS Management Console and open the AWS Glue console at https://console.aws.amazon.com/glue/
. -
Choose the Jobs tab, and then choose Add job to start the Add job wizard.
-
In the Job properties screen, choose the IAM role that is required for your custom script to run. For more information, see Identity and access management for AWS Glue.
-
Under This job runs, choose one of the following:
An existing script that you provide
-
A new script to be authored by you
Choose any connections that your script references. These objects are needed to connect to the necessary JDBC data stores.
An elastic network interface is a virtual network interface that you can attach to an instance in a virtual private cloud (VPC). Choose the elastic network interface that is required to connect to the data store that's used in the script.
-
If your script requires additional libraries or files, you can specify them as follows:
- Python library path
-
Comma-separated Amazon Simple Storage Service (Amazon S3) paths to Python libraries that are required by the script.
Note Only pure Python libraries can be used. Libraries that rely on C extensions, such as the pandas Python Data Analysis Library, are not yet supported.
- Dependent jars path
-
Comma-separated Amazon S3 paths to JAR files that are required by the script.
Note Currently, only pure Java or Scala (2.11) libraries can be used.
- Referenced files path
-
Comma-separated Amazon S3 paths to additional files (for example, configuration files) that are required by the script.
-
If you want, you can add a schedule to your job. To change a schedule, you must delete the existing schedule and add a new one.
For more information about adding jobs in AWS Glue, see Adding jobs in AWS Glue.
For step-by-step guidance, see the Add job tutorial in the AWS Glue console.