Amazon Athena SAP HANA connector - Amazon Athena

Amazon Athena SAP HANA connector

Prerequisites

Limitations

  • Write DDL operations are not supported.

  • In a multiplexer setup, the spill bucket and prefix are shared across all database instances.

  • Any relevant Lambda limits. For more information, see Lambda quotas in the AWS Lambda Developer Guide.

  • In SAP HANA, object names are converted to uppercase when they are stored in the SAP HANA database. However, because names in quotation marks are case sensitive, it is possible for two tables to have the same name in lower and upper case (for example, EMPLOYEE and employee).

    In Athena Federated Query, schema table names are provided to the Lambda function in lower case. To work around this issue, you can provide @schemaCase query hints to retrieve the data from the tables that have case sensitive names. Following are two sample queries with query hints.

    SELECT * FROM "lambda:saphanaconnector".SYSTEM."MY_TABLE@schemaCase=upper&tableCase=upper"
    SELECT * FROM "lambda:saphanaconnector".SYSTEM."MY_TABLE@schemaCase=upper&tableCase=lower"

Terms

The following terms relate to the SAP HANA connector.

  • Database instance – Any instance of a database deployed on premises, on Amazon EC2, or on Amazon RDS.

  • Handler – A Lambda handler that accesses your database instance. A handler can be for metadata or for data records.

  • Metadata handler – A Lambda handler that retrieves metadata from your database instance.

  • Record handler – A Lambda handler that retrieves data records from your database instance.

  • Composite handler – A Lambda handler that retrieves both metadata and data records from your database instance.

  • Property or parameter – A database property used by handlers to extract database information. You configure these properties as Lambda environment variables.

  • Connection String – A string of text used to establish a connection to a database instance.

  • Catalog – A non-AWS Glue catalog registered with Athena that is a required prefix for the connection_string property.

  • Multiplexing handler – A Lambda handler that can accept and use multiple database connections.

Parameters

Use the Lambda environment variables in this section to configure the SAP HANA connector.

Connection string

Use a JDBC connection string in the following format to connect to a database instance.

saphana://${jdbc_connection_string}

Using a multiplexing handler

You can use a multiplexer to connect to multiple database instances with a single Lambda function. Requests are routed by catalog name. Use the following classes in Lambda.

Handler Class
Composite handler SaphanaMuxCompositeHandler
Metadata handler SaphanaMuxMetadataHandler
Record handler SaphanaMuxRecordHandler

Multiplexing handler parameters

Parameter Description
$catalog_connection_string Required. A database instance connection string. Prefix the environment variable with the name of the catalog used in Athena. For example, if the catalog registered with Athena is mysaphanacatalog, then the environment variable name is mysaphanacatalog_connection_string.
default Required. The default connection string. This string is used when the catalog is lambda:${AWS_LAMBDA_FUNCTION_NAME}.

The following example properties are for a Saphana MUX Lambda function that supports two database instances: saphana1 (the default), and saphana2.

Property Value
default saphana://jdbc:sap://saphana1.host:port/?${Test/RDS/ Saphana1}
saphana_catalog1_connection_string saphana://jdbc:sap://saphana1.host:port/?${Test/RDS/ Saphana1}
saphana_catalog2_connection_string saphana://jdbc:sap://saphana2.host:port/?user=sample2&password=sample2

Providing credentials

To provide a user name and password for your database in your JDBC connection string, you can use connection string properties or AWS Secrets Manager.

  • Connection String – A user name and password can be specified as properties in the JDBC connection string.

  • AWS Secrets Manager – To use the Athena Federated Query feature with AWS Secrets Manager, the VPC connected to your Lambda function should have internet access or a VPC endpoint to connect to Secrets Manager.

    You can put the name of a secret in AWS Secrets Manager in your JDBC connection string. The connector replaces the secret name with the username and password values from Secrets Manager.

    For Amazon RDS database instances, this support is tightly integrated. If you use Amazon RDS, we highly recommend using AWS Secrets Manager and credential rotation. If your database does not use Amazon RDS, store the credentials as JSON in the following format:

    {"username": "${username}", "password": "${password}"}
Example connection string with secret name

The following string has the secret name ${Test/RDS/Saphana1}.

saphana://jdbc:sap://saphana1.host:port/?${Test/RDS/Saphana1}&...

The connector uses the secret name to retrieve secrets and provide the user name and password, as in the following example.

saphana://jdbc:sap://saphana1.host:port/?user=sample2&password=sample2&...

Currently, the SAP HANA connector recognizes the user and password JDBC properties.

Using a single connection handler

You can use the following single connection metadata and record handlers to connect to a single SAP HANA instance.

Handler type Class
Composite handler SaphanaCompositeHandler
Metadata handler SaphanaMetadataHandler
Record handler SaphanaRecordHandler

Single connection handler parameters

Parameter Description
default Required. The default connection string.

The single connection handlers support one database instance and must provide a default connection string parameter. All other connection strings are ignored.

The following example property is for a single SAP HANA instance supported by a Lambda function.

Property Value
default saphana://jdbc:sap://saphana1.host:port/?secret=Test/RDS/Saphana1

Spill parameters

The Lambda SDK can spill data to Amazon S3. All database instances accessed by the same Lambda function spill to the same location.

Parameter Description
spill_bucket Required. Spill bucket name.
spill_prefix Required. Spill bucket key prefix.
spill_put_request_headers (Optional) A JSON encoded map of request headers and values for the Amazon S3 putObject request that is used for spilling (for example, {"x-amz-server-side-encryption" : "AES256"}). For other possible headers, see PutObject in the Amazon Simple Storage Service API Reference.

Data type support

The following table shows the corresponding data types for JDBC and Apache Arrow.

JDBC Arrow
Boolean Bit
Integer Tiny
Short Smallint
Integer Int
Long Bigint
float Float4
Double Float8
Date DateDay
Timestamp DateMilli
String Varchar
Bytes Varbinary
BigDecimal Decimal
ARRAY List

Data type conversions

In addition to the JDBC to Arrow conversions, the connector performs certain other conversions to make the SAP HANA source and Athena data types compatible. These conversions help ensure that queries get executed successfully. The following table shows these conversions.

Source data type (SAP HANA) Converted data type (Athena)
DECIMAL BIGINT
INTEGER INT
DATE DATEDAY
TIMESTAMP DATEMILLI

All other unsupported data types are converted to VARCHAR.

Partitions and splits

A partition is represented by a single partition column of type Integer. The column contains partition names of the partitions defined on an SAP HANA table. For a table that does not have partition names, * is returned, which is equivalent to a single partition. A partition is equivalent to a split.

Name Type Description
PART_ID Integer Named partition in SAP HANA.

Performance tuning

SAP HANA supports native partitions. The Athena Lambda connector can retrieve data from these partitions in parallel. If you want to query very large datasets with uniform partition distribution, native partitioning is highly recommended.

The Lambda function performs predicate pushdown to decrease the data scanned by the query. LIMIT clauses reduce the amount of data scanned, but if you do not provide a predicate, you should expect SELECT queries with a LIMIT clause to scan at least 16 MB of data. Selecting a subset of columns significantly speeds up query runtime and reduces data scanned.

The connector shows significant throttling, and sometimes query failures, due to concurrency.

License information

By using this connector, you acknowledge the inclusion of third party components, a list of which can be found in the pom.xml file for this connector, and agree to the terms in the respective third party licenses provided in the LICENSE.txt file on GitHub.com.

See also

For the latest JDBC driver version information, see the pom.xml file for the SAP HANA connector on GitHub.com.

For additional information about this connector, visit the corresponding site on GitHub.com.