Menu
AWS Glue
Developer Guide

Classifier API

Data Types

Classifier Structure

Classifiers are written in Python and triggered during a Crawl Task. You can write your own Classifiers to best categorize your data sources and specify the appropriate schemas to use for them. A Classifier first checks whether a given file is in a format it can handle, and then, if so, creates a schema in the form of a StructType object that matches that data format.

Fields

  • GrokClassifier – A GrokClassifier object.

    A GrokClassifier object.

GrokClassifier Structure

A classifier that uses grok.

Fields

  • Name – String, matching the Single-line string pattern. Required.

    The name of the classifier.

  • Classification – String. Required.

    The data form that the classifier matches, such as Twitter, JSON, Omniture Logs, and so forth.

  • CreationTime – Timestamp.

    The time this classifier was registered.

  • LastUpdated – Timestamp.

    The time this classifier was last updated.

  • Version – Number (long).

    The version of this classifier.

  • GrokPattern – String, matching the A Logstash Grok string pattern. Required.

    The grok pattern used by this classifier.

  • CustomPatterns – String, matching the URI address multi-line string pattern.

    Custom grok patterns used by this classifier.

CreateGrokClassifierRequest Structure

Specifies a Grok classifier for CreateClassifier to create.

Fields

  • Classification – String. Required.

    The type of result that the classifier matches, such as Twitter Json, Omniture logs, Cloudwatch logs, and so forth.

  • Name – String, matching the Single-line string pattern. Required.

    The name of the new Classifier.

  • GrokPattern – String, matching the A Logstash Grok string pattern. Required.

    The grok pattern used by this classifier.

  • CustomPatterns – String, matching the URI address multi-line string pattern.

    Custom grok patterns used by this classifier.

UpdateGrokClassifierRequest Structure

Specifies a Grok classifier to update when passed to UpdateClassifier.

Fields

  • Name – String, matching the Single-line string pattern. Required.

    The name of the GrokClassifier.

  • Classification – String.

    The type of result that the classifier matches, such as Twitter Json, Omniture logs, Cloudwatch logs, and so forth.

  • GrokPattern – String, matching the A Logstash Grok string pattern.

    The grok pattern used by this classifier.

  • CustomPatterns – String, matching the URI address multi-line string pattern.

    Custom grok patterns used by this classifier.

Operations

CreateClassifier Action (Python: create_classifier)

Creates a Classifier in the user's account.

Request

  • GrokClassifier – A CreateGrokClassifierRequest object.

    A grok classifier to create.

Response

  • No Response parameters.

Errors

  • AlreadyExistsException

  • InvalidInputException

  • OperationTimeoutException

DeleteClassifier Action (Python: delete_classifier)

Removes a Classifier from the metadata store.

Request

Response

  • No Response parameters.

Errors

  • EntityNotFoundException

  • OperationTimeoutException

GetClassifier Action (Python: get_classifier)

Retrieve a Classifier by name.

Request

Response

  • Classifier – A Classifier object.

    The requested Classifier.

Errors

  • EntityNotFoundException

  • OperationTimeoutException

GetClassifiers Action (Python: get_classifiers)

Lists all Classifier objects in the metadata store.

Request

  • MaxResults – Number (integer).

    Size of the list to return (optional).

  • NextToken – String.

    An optional continuation token.

Response

  • Classifiers – An array of Classifiers.

    The requested list of Classifier objects.

  • NextToken – String.

    A continuation token.

Errors

  • OperationTimeoutException

UpdateClassifier Action (Python: update_classifier)

Modifies an existing Classifier.

Request

  • GrokClassifier – An UpdateGrokClassifierRequest object.

    A GrokClassifier object with updated fields.

Response

  • No Response parameters.

Errors

  • InvalidInputException

  • VersionMismatchException

  • EntityNotFoundException

  • OperationTimeoutException