Menu
AWS Glue
Developer Guide

DynamicFrameReader Class

 — Methods —

__init__

__init__(glue_context)

from_rdd

from_rdd(data, name, schema=None, sampleRatio=None)

Reads a DynamicFrame from a Resilient Distributed Dataset (RDD).

  • data – The dataset to read from.

  • name – The name to read from.

  • schema – The schema to read (optional).

  • sampleRatio – The sample ratio (optional).

from_options

from_options(connection_type, connection_options={}, format=None, format_options={}, transformation_ctx="")

Reads a DynamicFrame using the specified connection and format.

  • connection_type – The connection type. Valid values include s3, mysql, postgresql, redshift, sqlserver, and oracle.

  • connection_options – Connection options, such as path and database table (optional). For a connection_type of s3, Amazon S3 paths are defined in an array.

    connection_options = {"paths": [ "s3://mybucket/object_a", "s3://mybucket/object_b"]}

    For JDBC connections, several properties must be defined. Note that the database name must be part of the URL. It can optionally be included in the connection options.

    connection_options = {"url": "jdbc-url/database", "user": "username", "password": "password","dbtable": "table-name", "redshiftTmpDir": "s3-tempdir-path"}
  • format – A format specification such as JSON, CSV, or other format (optional). This is used for an Amazon S3 or tape connection that supports multiple formats.

  • format_options – Format options, such as delimiter (optional).

  • transformation_ctx – The transformation context to use (optional).

from_catalog

from_catalog(name_space, table_name, redshift_tmp_dir = "", transformation_ctx="")

Reads a DynamicFrame using the specified catalog namespace and table name.

  • name_space – The database to read from.

  • table_name – The name of the table to read from.

  • redshift_tmp_dir – An Amazon Redshift temporary directory to use (optional).

  • transformation_ctx – The transformation context to use (optional).