Menu
AWS Glue
Developer Guide

DynamicFrameReader Class

 — Methods —

__init__

__init__(glue_context)

from_rdd

from_rdd(data, name, schema=None, sampleRatio=None)

Reads a DynamicFrame from a Resilient Distributed Dataset (RDD).

  • data – The dataset to read from.

  • name – The name to read from.

  • schema – The schema to read (optional).

  • sampleRatio – The sample ratio (optional).

from_options

from_options(connection_type, connection_options={}, format=None, format_options={}, transformation_ctx="")

Reads a DynamicFrame using the specified connection and format.

  • connection_type – The connection type. Valid values include s3, mysql, postgresql, redshift, sqlserver, oracle, and dynamodb.

  • connection_options – Connection options, such as path and database table (optional). For a connection_type of s3, Amazon S3 paths are defined in an array.

    connection_options = {"paths": [ "s3://mybucket/object_a", "s3://mybucket/object_b"]}

    For JDBC connections, several properties must be defined. Note that the database name must be part of the URL. It can optionally be included in the connection options.

    connection_options = {"url": "jdbc-url/database", "user": "username", "password": "password","dbtable": "table-name", "redshiftTmpDir": "s3-tempdir-path"}
  • format – A format specification (optional). This is used for an Amazon Simple Storage Service (Amazon S3) or an AWS Glue connection that supports multiple formats. See Format Options for ETL Output in AWS Glue for the formats that are supported.

  • format_options – Format options for the specified format. See Format Options for ETL Output in AWS Glue for the formats that are supported.

  • transformation_ctx – The transformation context to use (optional).

from_catalog

from_catalog(name_space, table_name, redshift_tmp_dir = "", transformation_ctx="")

Reads a DynamicFrame using the specified catalog namespace and table name.

  • name_space – The database to read from.

  • table_name – The name of the table to read from.

  • redshift_tmp_dir – An Amazon Redshift temporary directory to use (optional).

  • transformation_ctx – The transformation context to use (optional).