Reading from Salesforce entities - AWS Glue

Reading from Salesforce entities

Prerequisite

A Salesforce sObject you would like to read from. You will need the object name such as Account or Case or Opportunity.

Example:

salesforce_read = glueContext.create_dynamic_frame.from_options( connection_type="salesforce", connection_options={ "connectionName": "connectionName", "ENTITY_NAME": "Account", "API_VERSION": "v60.0" }

Partitioning queries

You can provide the additional Spark options PARTITION_FIELD, LOWER_BOUND, UPPER_BOUND, and NUM_PARTITIONS if you want to utilize concurrency in Spark. With these parameters, the original query would be split into NUM_PARTITIONS number of sub-queries that can be executed by Spark tasks concurrently.

  • PARTITION_FIELD: the name of the field to be used to partition the query.

  • LOWER_BOUND: an inclusive lower bound value of the chosen partition field.

    For timestamp field, we accept the Spark timestamp format used in Spark SQL queries.

    Examples of valid values:

    "TIMESTAMP \"1707256978123\"" "TIMESTAMP ’2024-02-06 22:02:58.123 UTC'" "TIMESTAMP \"2018-08-08 00:00:00 Pacific/Tahiti\" "TIMESTAMP \"2018-08-08 00:00:00\"" "TIMESTAMP \"-123456789\" Pacific/Tahiti" "TIMESTAMP \"1702600882\""
  • UPPER_BOUND: an exclusive upper bound value of the chosen partition field.

  • NUM_PARTITIONS: the number of partitions.

Example:

salesforce_read = glueContext.create_dynamic_frame.from_options( connection_type="salesforce", connection_options={ "connectionName": "connectionName", "ENTITY_NAME": "Account", "API_VERSION": "v60.0", "PARTITION_FIELD": "SystemModstamp" "LOWER_BOUND": "TIMESTAMP '2021-01-01 00:00:00 Pacific/Tahiti'" "UPPER_BOUND": "TIMESTAMP '2023-01-10 00:00:00 Pacific/Tahiti'" "NUM_PARTITIONS": "10" }