There are more AWS SDK examples available in the AWS Doc SDK Examples
AWS Glue examples using AWS CLI with Bash script
The following code examples show you how to perform actions and implement common scenarios by using the AWS Command Line Interface with Bash script with AWS Glue.
Scenarios are code examples that show you how to accomplish specific tasks by calling multiple functions within a service or combined with other AWS services.
Each example includes a link to the complete source code, where you can find instructions on how to set up and run the code in context.
Topics
Scenarios
The following code example shows how to:
Create a database
Create a table
Clean up resources
- AWS CLI with Bash script
-
Note
There's more on GitHub. Find the complete example and learn how to set up and run in the Sample developer tutorials
repository. #!/bin/bash # AWS Glue Data Catalog Tutorial Script # This script demonstrates how to create and manage AWS Glue Data Catalog resources using the AWS CLI # Cost improvements: Reduced API calls, optimized queries, eliminated redundant operations # Reliability improvements: Enhanced error handling, input validation, resource tracking set -euo pipefail # Setup logging LOG_FILE="glue-tutorial-$(date +%Y%m%d-%H%M%S).log" exec > >(tee -a "$LOG_FILE") 2>&1 echo "Starting AWS Glue Data Catalog tutorial script at $(date)" echo "All operations will be logged to $LOG_FILE" # Generate a unique identifier for resource names UNIQUE_ID=$(openssl rand -hex 4) DB_NAME="tutorial-db-${UNIQUE_ID}" TABLE_NAME="flights-data-${UNIQUE_ID}" TABLE_INPUT_FILE="table-input-${UNIQUE_ID}.json" # Track created resources declare -a CREATED_RESOURCES=() # Set default region if not provided AWS_REGION="${AWS_REGION:-us-east-1}" # Flag to track if database was successfully created DATABASE_CREATED=false # Trap to ensure cleanup on exit trap cleanup_resources EXIT # Function to check command status check_status() { if [ $? -ne 0 ]; then echo "ERROR: $1 failed." >&2 exit 1 fi } # Function to cleanup resources cleanup_resources() { local exit_code=$? echo "Attempting to clean up resources..." # Delete resources in reverse order for ((i=${#CREATED_RESOURCES[@]}-1; i>=0; i--)); do resource=${CREATED_RESOURCES[$i]} resource_type=$(echo "$resource" | cut -d':' -f1) resource_name=$(echo "$resource" | cut -d':' -f2) echo "Deleting $resource_type: $resource_name" case $resource_type in "table") if [ "$DATABASE_CREATED" = true ]; then aws glue delete-table \ --database-name "$DB_NAME" \ --name "$resource_name" \ --region "$AWS_REGION" \ 2>/dev/null || echo "Warning: Failed to delete table $resource_name" fi ;; "database") aws glue delete-database \ --name "$resource_name" \ --region "$AWS_REGION" \ 2>/dev/null || echo "Warning: Failed to delete database $resource_name" ;; *) echo "Unknown resource type: $resource_type" >&2 ;; esac done # Clean up temporary files securely if [ -f "$TABLE_INPUT_FILE" ]; then if command -v shred &> /dev/null; then shred -vfz -n 3 "$TABLE_INPUT_FILE" 2>/dev/null || rm -f "$TABLE_INPUT_FILE" else rm -f "$TABLE_INPUT_FILE" fi fi echo "Cleanup completed." exit $exit_code } # Function to validate prerequisites validate_prerequisites() { # Validate AWS CLI is available if ! command -v aws &> /dev/null; then echo "ERROR: AWS CLI is not installed or not in PATH" >&2 exit 1 fi # Validate AWS CLI version local AWS_CLI_VERSION AWS_CLI_VERSION=$(aws --version 2>&1 | cut -d' ' -f1 | cut -d'/' -f2 | cut -d'.' -f1) if [ "$AWS_CLI_VERSION" -lt 1 ]; then echo "ERROR: AWS CLI is required" >&2 exit 1 fi # Validate jq is available for JSON validation if ! command -v jq &> /dev/null; then echo "ERROR: jq is not installed or not in PATH" >&2 exit 1 fi # Validate AWS credentials and get account identity in single call (cost optimization) local CALLER_IDENTITY CALLER_IDENTITY=$(aws sts get-caller-identity --region "$AWS_REGION" --query 'Account' --output text 2>/dev/null) || { echo "ERROR: Failed to get AWS caller identity. Check credentials and permissions." >&2 exit 1 } if [ -z "$CALLER_IDENTITY" ] || [ "$CALLER_IDENTITY" == "None" ]; then echo "ERROR: Unable to determine AWS account identity" >&2 exit 1 fi echo "Using AWS Account: $CALLER_IDENTITY" echo "Using Region: $AWS_REGION" } # Function to create database with verification create_database() { echo "Step 1: Creating a database named $DB_NAME" if ! aws glue create-database \ --database-input "Name=$DB_NAME,Description=Database for AWS Glue tutorial" \ --region "$AWS_REGION" \ --output json > /dev/null 2>&1; then echo "ERROR: Failed to create database $DB_NAME" >&2 exit 1 fi DATABASE_CREATED=true CREATED_RESOURCES+=("database:$DB_NAME") echo "Database $DB_NAME created successfully." } # Function to prepare table input JSON prepare_table_input() { # Create a temporary JSON file for table input with restricted permissions if ! touch "$TABLE_INPUT_FILE" 2>/dev/null; then echo "ERROR: Failed to create temporary file $TABLE_INPUT_FILE" >&2 exit 1 fi if ! chmod 600 "$TABLE_INPUT_FILE" 2>/dev/null; then echo "ERROR: Failed to set permissions on $TABLE_INPUT_FILE" >&2 rm -f "$TABLE_INPUT_FILE" exit 1 fi cat > "$TABLE_INPUT_FILE" << 'EOF' { "Name": "TABLE_NAME_PLACEHOLDER", "StorageDescriptor": { "Columns": [ { "Name": "year", "Type": "bigint" }, { "Name": "quarter", "Type": "bigint" } ], "Location": "s3://crawler-public-us-west-2/flight/2016/csv", "InputFormat": "org.apache.hadoop.mapred.TextInputFormat", "OutputFormat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat", "Compressed": false, "NumberOfBuckets": -1, "SerdeInfo": { "SerializationLibrary": "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe", "Parameters": { "field.delim": ",", "serialization.format": "," } } }, "PartitionKeys": [ { "Name": "mon", "Type": "string" } ], "TableType": "EXTERNAL_TABLE", "Parameters": { "EXTERNAL": "TRUE", "classification": "csv", "columnsOrdered": "true", "compressionType": "none", "delimiter": ",", "skip.header.line.count": "1", "typeOfData": "file" } } EOF # Replace placeholder with actual table name if ! sed -i "s/TABLE_NAME_PLACEHOLDER/$TABLE_NAME/g" "$TABLE_INPUT_FILE" 2>/dev/null; then echo "ERROR: Failed to substitute table name in JSON file" >&2 rm -f "$TABLE_INPUT_FILE" exit 1 fi # Validate JSON syntax before using it if ! jq empty "$TABLE_INPUT_FILE" 2>/dev/null; then echo "ERROR: Invalid JSON in table input file" >&2 rm -f "$TABLE_INPUT_FILE" exit 1 fi } # Function to create table create_table() { echo "Step 2: Creating a table named $TABLE_NAME in database $DB_NAME" prepare_table_input if ! aws glue create-table \ --database-name "$DB_NAME" \ --table-input "file://${TABLE_INPUT_FILE}" \ --region "$AWS_REGION" \ --output json > /dev/null 2>&1; then echo "ERROR: Failed to create table $TABLE_NAME" >&2 rm -f "$TABLE_INPUT_FILE" exit 1 fi CREATED_RESOURCES+=("table:$TABLE_NAME") echo "Table $TABLE_NAME created successfully." } # Function to get and display table details display_table_details() { echo "Step 3: Getting details of table $TABLE_NAME" if ! aws glue get-table \ --database-name "$DB_NAME" \ --name "$TABLE_NAME" \ --region "$AWS_REGION" \ --output json; then echo "ERROR: Failed to retrieve table details" >&2 exit 1 fi } # Function to display summary display_summary() { echo "" echo "===========================================" echo "RESOURCES CREATED" echo "===========================================" echo "Database: $DB_NAME" echo "Table: $TABLE_NAME" echo "===========================================" } # Main execution flow validate_prerequisites create_database create_table display_table_details display_summary echo "" echo "===========================================" echo "CLEANUP CONFIRMATION" echo "===========================================" echo "Starting cleanup process..." echo "Script completed at $(date)"-
For API details, see the following topics in AWS CLI Command Reference.
-