Specifies a new DataBrew dataset.
Syntax
To declare this entity in your AWS CloudFormation template, use the following syntax:
JSON
{
"Type" : "AWS::DataBrew::Dataset",
"Properties" : {
"Format" : String
,
"FormatOptions" : FormatOptions
,
"Input" : Input
,
"Name" : String
,
"PathOptions" : PathOptions
,
"Source" : String
,
"Tags" : [ Tag, ... ]
}
}
YAML
Type: AWS::DataBrew::Dataset
Properties:
Format: String
FormatOptions:
FormatOptions
Input:
Input
Name: String
PathOptions:
PathOptions
Source: String
Tags:
- Tag
Properties
Format
-
The file format of a dataset that is created from an Amazon S3 file or folder.
Required: No
Type: String
Allowed values:
CSV | JSON | PARQUET | EXCEL | ORC
Update requires: No interruption
FormatOptions
-
A set of options that define how DataBrew interprets the data in the dataset.
Required: No
Type: FormatOptions
Update requires: No interruption
Input
-
Information on how DataBrew can find the dataset, in either the AWS Glue Data Catalog or Amazon S3.
Required: Yes
Type: Input
Update requires: No interruption
Name
-
The unique name of the dataset.
Required: Yes
Type: String
Minimum:
1
Maximum:
255
Update requires: Replacement
PathOptions
-
A set of options that defines how DataBrew interprets an Amazon S3 path of the dataset.
Required: No
Type: PathOptions
Update requires: No interruption
Source
-
The location of the data for the dataset, either Amazon S3 or the AWS Glue Data Catalog.
Required: No
Type: String
Allowed values:
S3 | DATA-CATALOG | DATABASE
Update requires: No interruption
-
Metadata tags that have been applied to the dataset.
Required: No
Type: Array of Tag
Update requires: No interruption
Return values
Ref
When you pass the logical ID of this resource to the intrinsic Ref
function, Ref
returns the resource name. For example:
{ "Ref": "myDataset" }
For an AWS Glue DataBrew dataset named myDataset
,
Ref
returns the name of the dataset.
Examples
Creating datasets
The following examples create new DataBrew datasets.
YAML
Resources:
TestDataBrewDataset:
Type: AWS::DataBrew::Dataset
Properties:
Name: dataset-name
Input:
S3InputDefinition:
Bucket: !Join [ '', ['databrew-cfn-integration-tests-', !Ref 'AWS::Region', '-', !Ref 'AWS::AccountId' ] ]
Key: cocktails.json
FormatOptions:
Json:
MultiLine: True
JSON
{
"AWSTemplateFormatVersion": "2010-09-09",
"Description": "This CloudFormation template specifies a DataBrew Dataset",
"Resources": {
"TestDataBrewDataset": {
"Type": "AWS::DataBrew::Dataset",
"Properties": {
"Name": "cf-test-dataset1",
"Input": {
"S3InputDefinition": {
"Bucket": "test-location",
"Key": "test.xlsx"
}
},
"FormatOptions": {
"Excel": {
"SheetNames": ["test"]
}
},
"Tags": [
{
"Key": "key00AtCreate",
"Value": "value001AtCreate"
}
]
}
}
}
}