This documentation is for Version 1 of the AWS CLI only.
We announced the upcoming end-of-support for the AWS CLI version 1. We recommend that you migrate to the AWS CLI version 2. For dates, additional details, and information on how to migrate, see the announcement
Athena examples using AWS CLI with Bash script
The following code examples show you how to perform actions and implement common scenarios by using the AWS Command Line Interface with Bash script with Athena.
Scenarios are code examples that show you how to accomplish specific tasks by calling multiple functions within a service or combined with other AWS services.
Each example includes a link to the complete source code, where you can find instructions on how to set up and run the code in context.
Topics
Scenarios
The following code example shows how to:
Create an S3 bucket for query results
Create a database
Create a table
Run a query
Create and use named queries
Clean up resources
- AWS CLI with Bash script
-
Note
There's more on GitHub. Find the complete example and learn how to set up and run in the Sample developer tutorials
repository. #!/bin/bash # Amazon Athena Getting Started Script # This script demonstrates how to use Amazon Athena with AWS CLI # It creates a database, table, runs queries, and manages named queries set -euo pipefail # Security: Validate AWS credentials are configured if ! aws sts get-caller-identity &>/dev/null; then echo "ERROR: AWS credentials not configured or invalid" exit 1 fi # Security: Restrict umask to prevent world-readable files umask 0077 # Set up logging with restricted permissions LOG_FILE="athena-tutorial.log" touch "$LOG_FILE" chmod 600 "$LOG_FILE" exec > >(tee -a "$LOG_FILE") 2>&1 echo "Starting Amazon Athena Getting Started Tutorial..." echo "Logging to $LOG_FILE" # Function to handle errors handle_error() { echo "ERROR: $1" echo "Resources created:" if [ -n "${NAMED_QUERY_ID:-}" ]; then echo "- Named Query: $NAMED_QUERY_ID" fi if [ -n "${DATABASE_NAME:-}" ]; then echo "- Database: $DATABASE_NAME" if [ -n "${TABLE_NAME:-}" ]; then echo "- Table: $TABLE_NAME in $DATABASE_NAME" fi fi if [ -n "${S3_BUCKET:-}" ]; then echo "- S3 Bucket: $S3_BUCKET" fi echo "Exiting..." exit 1 } # Security: Validate bucket name format validate_bucket_name() { local bucket_name="$1" if [[ ! "$bucket_name" =~ ^[a-z0-9][a-z0-9.-]*[a-z0-9]$ ]] || [ ${#bucket_name} -lt 3 ] || [ ${#bucket_name} -gt 63 ]; then return 1 fi return 0 } # Security: Validate database and table names validate_identifier() { local identifier="$1" if [[ ! "$identifier" =~ ^[a-zA-Z_][a-zA-Z0-9_]*$ ]]; then return 1 fi return 0 } # Security: Safely generate random identifier if ! command -v openssl &>/dev/null; then RANDOM_ID=$(head -c 6 /dev/urandom | od -An -tx1 | tr -d ' ') else RANDOM_ID=$(openssl rand -hex 6) fi # Security: Validate random ID format if [[ ! "$RANDOM_ID" =~ ^[a-f0-9]{12}$ ]]; then handle_error "Failed to generate valid random ID" fi # Check for shared prereq bucket with proper error handling PREREQ_BUCKET="" if aws cloudformation describe-stacks --stack-name tutorial-prereqs-bucket \ --query 'Stacks[0].Outputs[?OutputKey==`BucketName`].OutputValue' --output text 2>/dev/null | grep -qv "^$"; then PREREQ_BUCKET=$(aws cloudformation describe-stacks --stack-name tutorial-prereqs-bucket \ --query 'Stacks[0].Outputs[?OutputKey==`BucketName`].OutputValue' --output text 2>/dev/null) fi if [ -n "$PREREQ_BUCKET" ] && [ "$PREREQ_BUCKET" != "None" ]; then S3_BUCKET="$PREREQ_BUCKET" BUCKET_IS_SHARED=true echo "Using shared bucket: $S3_BUCKET" else BUCKET_IS_SHARED=false S3_BUCKET="athena-${RANDOM_ID}" fi if ! validate_bucket_name "$S3_BUCKET"; then handle_error "Invalid S3 bucket name: $S3_BUCKET" fi DATABASE_NAME="mydatabase" TABLE_NAME="cloudfront_logs" if ! validate_identifier "$DATABASE_NAME"; then handle_error "Invalid database name: $DATABASE_NAME" fi if ! validate_identifier "$TABLE_NAME"; then handle_error "Invalid table name: $TABLE_NAME" fi # Get the current AWS region with validation AWS_REGION=$(aws configure get region 2>/dev/null || echo "") if [ -z "$AWS_REGION" ]; then AWS_REGION="us-east-1" echo "No AWS region found in configuration, defaulting to $AWS_REGION" fi # Security: Validate region format - expanded regex for newer regions if [[ ! "$AWS_REGION" =~ ^[a-z]{2}-[a-z]+-[0-9]{1}$ ]] && [[ ! "$AWS_REGION" =~ ^[a-z]+-[a-z]+-[0-9]{1}$ ]]; then echo "WARNING: Region format may be invalid: $AWS_REGION" fi echo "Using AWS Region: $AWS_REGION" # Create S3 bucket for Athena query results echo "Creating S3 bucket for Athena query results: $S3_BUCKET" if [ "$BUCKET_IS_SHARED" = false ]; then CREATE_BUCKET_RESULT=$(aws s3 mb "s3://$S3_BUCKET" --region "$AWS_REGION" 2>&1) if echo "$CREATE_BUCKET_RESULT" | grep -qi "error\|failed"; then handle_error "Failed to create S3 bucket: $CREATE_BUCKET_RESULT" fi # Security: Enable S3 bucket encryption with KMS validation echo "Enabling default encryption on S3 bucket..." if ! aws s3api put-bucket-encryption \ --bucket "$S3_BUCKET" \ --server-side-encryption-configuration '{ "Rules": [{ "ApplyServerSideEncryptionByDefault": { "SSEAlgorithm": "AES256" } }] }' 2>&1; then echo "Warning: Could not enable encryption on bucket" fi # Security: Block public access echo "Blocking public access to S3 bucket..." if ! aws s3api put-public-access-block \ --bucket "$S3_BUCKET" \ --public-access-block-configuration \ "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true" 2>&1; then echo "Warning: Could not block public access on bucket" fi # Security: Enable versioning for data protection echo "Enabling versioning on S3 bucket..." if ! aws s3api put-bucket-versioning \ --bucket "$S3_BUCKET" \ --versioning-configuration Status=Enabled 2>&1; then echo "Warning: Could not enable versioning on bucket" fi echo "S3 bucket created successfully: $S3_BUCKET" fi # Step 1: Create a database echo "Step 1: Creating Athena database: $DATABASE_NAME" CREATE_DB_RESULT=$(aws athena start-query-execution \ --query-string "CREATE DATABASE IF NOT EXISTS $DATABASE_NAME" \ --result-configuration "OutputLocation=s3://$S3_BUCKET/output/" \ --region "$AWS_REGION" 2>&1) if echo "$CREATE_DB_RESULT" | grep -qi "error\|failed"; then handle_error "Failed to create database: $CREATE_DB_RESULT" fi QUERY_ID=$(echo "$CREATE_DB_RESULT" | jq -r '.QueryExecutionId // empty' 2>/dev/null || echo "$CREATE_DB_RESULT" | grep -o '"QueryExecutionId": "[^"]*' | cut -d'"' -f4) if [ -z "$QUERY_ID" ]; then handle_error "Failed to extract Query ID from database creation response" fi echo "Database creation query ID: $QUERY_ID" # Wait for database creation to complete echo "Waiting for database creation to complete..." WAIT_TIMEOUT=60 ELAPSED=0 while [ $ELAPSED -lt $WAIT_TIMEOUT ]; do QUERY_STATUS=$(aws athena get-query-execution --query-execution-id "$QUERY_ID" \ --query "QueryExecution.Status.State" --output text --region "$AWS_REGION" 2>&1) if [ "$QUERY_STATUS" = "SUCCEEDED" ]; then echo "Database creation completed successfully." break elif [ "$QUERY_STATUS" = "FAILED" ] || [ "$QUERY_STATUS" = "CANCELLED" ]; then handle_error "Database creation failed with status: $QUERY_STATUS" fi echo "Database creation in progress, status: $QUERY_STATUS" sleep 2 ((ELAPSED+=2)) done if [ $ELAPSED -ge $WAIT_TIMEOUT ]; then handle_error "Database creation timed out" fi # Verify the database was created echo "Verifying database creation..." LIST_DB_RESULT=$(aws athena list-databases --catalog-name AwsDataCatalog --region "$AWS_REGION" 2>&1) if echo "$LIST_DB_RESULT" | grep -qi "error\|failed"; then handle_error "Failed to list databases: $LIST_DB_RESULT" fi echo "$LIST_DB_RESULT" # Step 2: Create a table echo "Step 2: Creating Athena table: $TABLE_NAME" # Replace the region placeholder in the S3 location CREATE_TABLE_QUERY="CREATE EXTERNAL TABLE IF NOT EXISTS $DATABASE_NAME.$TABLE_NAME ( \`Date\` DATE, Time STRING, Location STRING, Bytes INT, RequestIP STRING, Method STRING, Host STRING, Uri STRING, Status INT, Referrer STRING, os STRING, Browser STRING, BrowserVersion STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( \"input.regex\" = \"^(?!#)([^ ]+)\\\\s+([^ ]+)\\\\s+([^ ]+)\\\\s+([^ ]+)\\\\s+([^ ]+)\\\\s+([^ ]+)\\\\s+([^ ]+)\\\\s+([^ ]+)\\\\s+([^ ]+)\\\\s+([^ ]+)\\\\s+[^\\\\(]+[\\\\(]([^\\\\;]+).*\\\\%20([^\\\\/]+)[\\\\/](.*)$\" ) LOCATION 's3://athena-examples-us-east-1/cloudfront/plaintext/';" CREATE_TABLE_RESULT=$(aws athena start-query-execution \ --query-string "$CREATE_TABLE_QUERY" \ --result-configuration "OutputLocation=s3://$S3_BUCKET/output/" \ --region "$AWS_REGION" 2>&1) if echo "$CREATE_TABLE_RESULT" | grep -qi "error\|failed"; then handle_error "Failed to create table: $CREATE_TABLE_RESULT" fi QUERY_ID=$(echo "$CREATE_TABLE_RESULT" | jq -r '.QueryExecutionId // empty' 2>/dev/null || echo "$CREATE_TABLE_RESULT" | grep -o '"QueryExecutionId": "[^"]*' | cut -d'"' -f4) if [ -z "$QUERY_ID" ]; then handle_error "Failed to extract Query ID from table creation response" fi echo "Table creation query ID: $QUERY_ID" # Wait for table creation to complete echo "Waiting for table creation to complete..." ELAPSED=0 while [ $ELAPSED -lt $WAIT_TIMEOUT ]; do QUERY_STATUS=$(aws athena get-query-execution --query-execution-id "$QUERY_ID" \ --query "QueryExecution.Status.State" --output text --region "$AWS_REGION" 2>&1) if [ "$QUERY_STATUS" = "SUCCEEDED" ]; then echo "Table creation completed successfully." break elif [ "$QUERY_STATUS" = "FAILED" ] || [ "$QUERY_STATUS" = "CANCELLED" ]; then handle_error "Table creation failed with status: $QUERY_STATUS" fi echo "Table creation in progress, status: $QUERY_STATUS" sleep 2 ((ELAPSED+=2)) done if [ $ELAPSED -ge $WAIT_TIMEOUT ]; then handle_error "Table creation timed out" fi # Verify the table was created echo "Verifying table creation..." LIST_TABLE_RESULT=$(aws athena list-table-metadata \ --catalog-name AwsDataCatalog \ --database-name "$DATABASE_NAME" \ --region "$AWS_REGION" 2>&1) if echo "$LIST_TABLE_RESULT" | grep -qi "error\|failed"; then handle_error "Failed to list tables: $LIST_TABLE_RESULT" fi echo "$LIST_TABLE_RESULT" # Step 3: Query data echo "Step 3: Running a query on the table..." QUERY="SELECT os, COUNT(*) count FROM $DATABASE_NAME.$TABLE_NAME WHERE date BETWEEN date '2014-07-05' AND date '2014-08-05' GROUP BY os" QUERY_RESULT=$(aws athena start-query-execution \ --query-string "$QUERY" \ --result-configuration "OutputLocation=s3://$S3_BUCKET/output/" \ --region "$AWS_REGION" 2>&1) if echo "$QUERY_RESULT" | grep -qi "error\|failed"; then handle_error "Failed to run query: $QUERY_RESULT" fi QUERY_ID=$(echo "$QUERY_RESULT" | jq -r '.QueryExecutionId // empty' 2>/dev/null || echo "$QUERY_RESULT" | grep -o '"QueryExecutionId": "[^"]*' | cut -d'"' -f4) if [ -z "$QUERY_ID" ]; then handle_error "Failed to extract Query ID from query execution response" fi echo "Query execution ID: $QUERY_ID" # Wait for query to complete echo "Waiting for query to complete..." ELAPSED=0 while [ $ELAPSED -lt $WAIT_TIMEOUT ]; do QUERY_STATUS=$(aws athena get-query-execution --query-execution-id "$QUERY_ID" \ --query "QueryExecution.Status.State" --output text --region "$AWS_REGION" 2>&1) if [ "$QUERY_STATUS" = "SUCCEEDED" ]; then echo "Query completed successfully." break elif [ "$QUERY_STATUS" = "FAILED" ] || [ "$QUERY_STATUS" = "CANCELLED" ]; then handle_error "Query failed with status: $QUERY_STATUS" fi echo "Query in progress, status: $QUERY_STATUS" sleep 2 ((ELAPSED+=2)) done if [ $ELAPSED -ge $WAIT_TIMEOUT ]; then handle_error "Query execution timed out" fi # Get query results echo "Getting query results..." RESULTS=$(aws athena get-query-results --query-execution-id "$QUERY_ID" --region "$AWS_REGION" 2>&1) if echo "$RESULTS" | grep -qi "error\|failed"; then handle_error "Failed to get query results: $RESULTS" fi echo "$RESULTS" # Download results from S3 echo "Downloading query results from S3..." S3_PATH=$(aws athena get-query-execution --query-execution-id "$QUERY_ID" \ --query "QueryExecution.ResultConfiguration.OutputLocation" --output text \ --region "$AWS_REGION" 2>&1) if echo "$S3_PATH" | grep -qi "error\|failed"; then handle_error "Failed to get S3 path for results: $S3_PATH" fi if [ -z "$S3_PATH" ] || [ "$S3_PATH" = "None" ]; then handle_error "S3 path for query results is empty" fi DOWNLOAD_RESULT=$(aws s3 cp "$S3_PATH" "./query-results.csv" 2>&1) if echo "$DOWNLOAD_RESULT" | grep -qi "error\|failed"; then handle_error "Failed to download query results: $DOWNLOAD_RESULT" fi # Security: Secure the downloaded file chmod 600 "./query-results.csv" echo "Query results downloaded to query-results.csv (permissions: 600)" # Step 4: Create a named query echo "Step 4: Creating a named query..." NAMED_QUERY_RESULT=$(aws athena create-named-query \ --name "OS Count Query" \ --description "Count of operating systems in CloudFront logs" \ --database "$DATABASE_NAME" \ --query-string "$QUERY" \ --region "$AWS_REGION" 2>&1) if echo "$NAMED_QUERY_RESULT" | grep -qi "error\|failed"; then handle_error "Failed to create named query: $NAMED_QUERY_RESULT" fi NAMED_QUERY_ID=$(echo "$NAMED_QUERY_RESULT" | jq -r '.NamedQueryId // empty' 2>/dev/null || echo "$NAMED_QUERY_RESULT" | grep -o '"NamedQueryId": "[^"]*' | cut -d'"' -f4) if [ -z "$NAMED_QUERY_ID" ]; then handle_error "Failed to extract Named Query ID from response" fi echo "Named query created with ID: $NAMED_QUERY_ID" # List named queries echo "Listing named queries..." LIST_QUERIES_RESULT=$(aws athena list-named-queries --region "$AWS_REGION" 2>&1) if echo "$LIST_QUERIES_RESULT" | grep -qi "error\|failed"; then handle_error "Failed to list named queries: $LIST_QUERIES_RESULT" fi echo "$LIST_QUERIES_RESULT" # Get the named query details echo "Getting named query details..." GET_QUERY_RESULT=$(aws athena get-named-query --named-query-id "$NAMED_QUERY_ID" \ --region "$AWS_REGION" 2>&1) if echo "$GET_QUERY_RESULT" | grep -qi "error\|failed"; then handle_error "Failed to get named query: $GET_QUERY_RESULT" fi echo "$GET_QUERY_RESULT" # Execute the named query echo "Executing the named query..." QUERY_STRING=$(aws athena get-named-query --named-query-id "$NAMED_QUERY_ID" \ --query "NamedQuery.QueryString" --output text --region "$AWS_REGION" 2>&1) if echo "$QUERY_STRING" | grep -qi "error\|failed"; then handle_error "Failed to get query string: $QUERY_STRING" fi if [ -z "$QUERY_STRING" ] || [ "$QUERY_STRING" = "None" ]; then handle_error "Query string is empty" fi EXEC_RESULT=$(aws athena start-query-execution \ --query-string "$QUERY_STRING" \ --result-configuration "OutputLocation=s3://$S3_BUCKET/output/" \ --region "$AWS_REGION" 2>&1) if echo "$EXEC_RESULT" | grep -qi "error\|failed"; then handle_error "Failed to execute named query: $EXEC_RESULT" fi QUERY_ID=$(echo "$EXEC_RESULT" | jq -r '.QueryExecutionId // empty' 2>/dev/null || echo "$EXEC_RESULT" | grep -o '"QueryExecutionId": "[^"]*' | cut -d'"' -f4) if [ -z "$QUERY_ID" ]; then handle_error "Failed to extract Query ID from named query execution response" fi echo "Named query execution ID: $QUERY_ID" # Wait for named query to complete echo "Waiting for named query execution to complete..." ELAPSED=0 while [ $ELAPSED -lt $WAIT_TIMEOUT ]; do QUERY_STATUS=$(aws athena get-query-execution --query-execution-id "$QUERY_ID" \ --query "QueryExecution.Status.State" --output text --region "$AWS_REGION" 2>&1) if [ "$QUERY_STATUS" = "SUCCEEDED" ]; then echo "Named query execution completed successfully." break elif [ "$QUERY_STATUS" = "FAILED" ] || [ "$QUERY_STATUS" = "CANCELLED" ]; then handle_error "Named query execution failed with status: $QUERY_STATUS" fi echo "Named query execution in progress, status: $QUERY_STATUS" sleep 2 ((ELAPSED+=2)) done if [ $ELAPSED -ge $WAIT_TIMEOUT ]; then handle_error "Named query execution timed out" fi # Summary of resources created echo "" echo "===========================================" echo "RESOURCES CREATED" echo "===========================================" echo "- S3 Bucket: $S3_BUCKET" echo "- Database: $DATABASE_NAME" echo "- Table: $TABLE_NAME" echo "- Named Query: $NAMED_QUERY_ID" echo "- Query results saved to: query-results.csv" echo "===========================================" # Auto-confirm cleanup echo "" echo "===========================================" echo "CLEANUP CONFIRMATION" echo "===========================================" echo "Starting cleanup..." CLEANUP_CHOICE="y" if [[ "$CLEANUP_CHOICE" =~ ^[Yy]$ ]]; then echo "Starting cleanup..." # Delete named query echo "Deleting named query: $NAMED_QUERY_ID" DELETE_QUERY_RESULT=$(aws athena delete-named-query --named-query-id "$NAMED_QUERY_ID" \ --region "$AWS_REGION" 2>&1) if echo "$DELETE_QUERY_RESULT" | grep -qi "error\|failed"; then echo "Warning: Failed to delete named query: $DELETE_QUERY_RESULT" else echo "Named query deleted successfully." fi # Drop table echo "Dropping table: $TABLE_NAME" DROP_TABLE_RESULT=$(aws athena start-query-execution \ --query-string "DROP TABLE IF EXISTS $DATABASE_NAME.$TABLE_NAME" \ --result-configuration "OutputLocation=s3://$S3_BUCKET/output/" \ --region "$AWS_REGION" 2>&1) if echo "$DROP_TABLE_RESULT" | grep -qi "error\|failed"; then echo "Warning: Failed to drop table: $DROP_TABLE_RESULT" else QUERY_ID=$(echo "$DROP_TABLE_RESULT" | jq -r '.QueryExecutionId // empty' 2>/dev/null || echo "$DROP_TABLE_RESULT" | grep -o '"QueryExecutionId": "[^"]*' | cut -d'"' -f4) if [ -n "$QUERY_ID" ]; then echo "Waiting for table deletion to complete..." ELAPSED=0 while [ $ELAPSED -lt $WAIT_TIMEOUT ]; do QUERY_STATUS=$(aws athena get-query-execution --query-execution-id "$QUERY_ID" \ --query "QueryExecution.Status.State" --output text --region "$AWS_REGION" 2>&1) if [ "$QUERY_STATUS" = "SUCCEEDED" ]; then echo "Table dropped successfully." break elif [ "$QUERY_STATUS" = "FAILED" ] || [ "$QUERY_STATUS" = "CANCELLED" ]; then echo "Warning: Table deletion failed with status: $QUERY_STATUS" break fi echo "Table deletion in progress, status: $QUERY_STATUS" sleep 2 ((ELAPSED+=2)) done fi fi # Drop database echo "Dropping database: $DATABASE_NAME" DROP_DB_RESULT=$(aws athena start-query-execution \ --query-string "DROP DATABASE IF EXISTS $DATABASE_NAME" \ --result-configuration "OutputLocation=s3://$S3_BUCKET/output/" \ --region "$AWS_REGION" 2>&1) if echo "$DROP_DB_RESULT" | grep -qi "error\|failed"; then echo "Warning: Failed to drop database: $DROP_DB_RESULT" else QUERY_ID=$(echo "$DROP_DB_RESULT" | jq -r '.QueryExecutionId // empty' 2>/dev/null || echo "$DROP_DB_RESULT" | grep -o '"QueryExecutionId": "[^"]*' | cut -d'"' -f4) if [ -n "$QUERY_ID" ]; then echo "Waiting for database deletion to complete..." ELAPSED=0 while [ $ELAPSED -lt $WAIT_TIMEOUT ]; do QUERY_STATUS=$(aws athena get-query-execution --query-execution-id "$QUERY_ID" \ --query "QueryExecution.Status.State" --output text --region "$AWS_REGION" 2>&1) if [ "$QUERY_STATUS" = "SUCCEEDED" ]; then echo "Database dropped successfully." break elif [ "$QUERY_STATUS" = "FAILED" ] || [ "$QUERY_STATUS" = "CANCELLED" ]; then echo "Warning: Database deletion failed with status: $QUERY_STATUS" break fi echo "Database deletion in progress, status: $QUERY_STATUS" sleep 2 ((ELAPSED+=2)) done fi fi # Empty and delete S3 bucket (only if not shared) if [ "$BUCKET_IS_SHARED" = false ]; then echo "Emptying S3 bucket: $S3_BUCKET" EMPTY_BUCKET_RESULT=$(aws s3 rm "s3://$S3_BUCKET" --recursive 2>&1) if echo "$EMPTY_BUCKET_RESULT" | grep -qi "error\|failed"; then echo "Warning: Failed to empty S3 bucket: $EMPTY_BUCKET_RESULT" else echo "S3 bucket emptied successfully." fi echo "Deleting S3 bucket: $S3_BUCKET" DELETE_BUCKET_RESULT=$(aws s3 rb "s3://$S3_BUCKET" 2>&1) if echo "$DELETE_BUCKET_RESULT" | grep -qi "error\|failed"; then echo "Warning: Failed to delete S3 bucket: $DELETE_BUCKET_RESULT" else echo "S3 bucket deleted successfully." fi else echo "Skipping S3 bucket deletion (shared resource)" fi # Security: Remove downloaded query results if [ -f "./query-results.csv" ]; then if command -v shred &>/dev/null; then shred -vfz -n 3 "./query-results.csv" 2>/dev/null || rm -f "./query-results.csv" else rm -f "./query-results.csv" fi echo "Query results file securely removed." fi echo "Cleanup completed." fi echo "Tutorial completed successfully!"-
For API details, see the following topics in AWS CLI Command Reference.
-