Using Machine Learning (ML) with Amazon Athena (Preview)
Machine Learning (ML) with Amazon Athena (Preview) lets you use Athena to write SQL statements that run Machine Learning (ML) inference using Amazon SageMaker. This feature simplifies access to ML models for data analysis, eliminating the need to use complex programming methods to run inference.
To use ML with Athena (Preview), you define an ML with Athena (Preview) function with
the USING FUNCTION
clause. The function points to the SageMaker model endpoint that you want to use and
specifies
the variable names and data types to pass to the model. Subsequent clauses in the
query
reference the function to pass values to the model. The model runs inference based
on the
values that the query passes and then returns inference results. For more information
about
SageMaker and how SageMaker endpoints work, see the Amazon SageMaker Developer Guide.
For an example that uses ML with Athena (Preview) and SageMaker inference to detect
an anomalous value in a
result set, see the AWS Big Data Blog article Detecting anomalous values by invoking the Amazon Athena machine learning inference
function
Considerations and Limitations
-
Available Regions – The Athena ML feature is available in preview in the US East (N. Virginia), Asia Pacific (Mumbai), Europe (Ireland), and US West (Oregon) Regions.
-
AmazonAthenaPreviewFunctionality workgroup – To use this feature in preview, you must create an Athena workgroup named
AmazonAthenaPreviewFunctionality
and join that workgroup. For more information, see Managing Workgroups. -
SageMaker model endpoint must accept and return
text/csv
– For more information about data formats, see Common Data Formats for Inference in the Amazon SageMaker Developer Guide. -
SageMaker endpoint scaling – Make sure that the referenced SageMaker model endpoint is sufficiently scaled up for Athena calls to the endpoint. For more information, see Automaticaly Scale SageMaker Models in the Amazon SageMaker Developer Guide and CreateEndpointConfig in the Amazon SageMaker API Reference.
-
IAM permissions – To run a query that specifies an ML with Athena (Preview) function, the IAM principal running the query must be allowed to perform the
sagemaker:InvokeEndpoint
action for the referenced SageMaker model endpoint. For more information, see Allowing Access for ML with Athena (Preview). -
ML with Athena (Preview) functions cannot be used in
GROUP BY
clauses directly
ML with Athena (Preview) Syntax
The USING FUNCTION
clause specifies an ML with Athena (Preview) function or multiple
functions that can be referenced by a subsequent SELECT
statement in the
query. You define the function name, variable names, and data types for the variables
and return values.
Synopsis
The following example illustrates a USING FUNCTION
clause that
specifies ML with Athena (Preview) function.
USING FUNCTION ML_function_name
(variable1 data_type
[, variable2 data_type
][,...]) RETURNS data_type
TYPE SAGEMAKER_INVOKE_ENDPOINT WITH (sagemaker_endpoint= 'my_sagemaker_endpoint
')[, FUNCTION...][, ...] SELECT [...] ML_function_name(expression) [...]
Parameters
- USING FUNCTION
ML_function_name
(variable1 data_type
[,variable2 data_type
][,...]) -
ML_function_name
defines the function name, which can be used in subsequent query clauses. Eachvariable data_type
specifies a named variable with its corresponding data type, which the SageMaker model can accept as input. Specifydata_type
as one of the supported Athena data types that the SageMaker model can accept as input. - RETURNS data_type TYPE
-
data_type
specifies the SQL data type thatML_function_name
returns to the query as output from the SageMaker model. - SAGEMAKER_INVOKE_ENDPOINT WITH (sagemaker_endpoint=
'
my_sagemaker_endpoint
') -
my_sagemaker_endpoint
specifies the endpoint of the SageMaker model. - SELECT [...] ML_function_name(expression) [...]
-
The SELECT query that passes values to function variables and the SageMaker model to return a result.
ML_function_name
specifies the function defined earlier in the query, followed by anexpression
that is evaluated to pass values. Values that are passed and returned must match the corresponding data types specified for the function in theUSING FUNCTION
clause.
Example
The following example demonstrates a query using ML with Athena (Preview).
USING FUNCTION predict_customer_registration(age INTEGER) RETURNS DOUBLE TYPE SAGEMAKER_INVOKE_ENDPOINT WITH (sagemaker_endpoint = 'xgboost-2019-09-20-04-49-29-303') SELECT predict_customer_registration(age) AS probability_of_enrolling, customer_id FROM "sampledb"."ml_test_dataset" WHERE predict_customer_registration(age) < 0.5;
Customer Use Examples
The following videos showcase ways in which you can use SageMaker with Athena.
Predicting Customer Churn
The following video shows how to combine Athena with the machine learning capabilities of Amazon SageMaker to predict customer churn.
Detecting Botnets
The following video shows how one company uses Amazon Athena and Amazon SageMaker to detect botnets.