Filtering the Data Transferred by AWS DataSync
When you transfer data from your source to your destination location, you can apply
filters to transfer only a subset of the files in your source location. For example,
if your
source location includes temporary files that end with .tmp
, you can
create an exclude filter that ensures that these files are not transferred to the
destination.
Filtering is an optional configuration to specify if you want to transfer a subset of your source files. You can leave the filter configuration empty if you want all files from the source to be transferred to the destination location.
Topics
Filtering Terms, Definitions, and Syntax
Following are some terms and definitions for use with filtering:
- Filter
-
The whole string that makes up a particular filter, for example:
*.tmp
|
*.temp
Filters are made up of patterns that are delimited with a | (pipe). A delimiter is not needed when you add patterns on the console because each pattern is added separately.
- Pattern
-
A pattern within a filter. For example,
*.tmp
is a pattern that is part of the*.tmp
|
*.temp
filter. - Folders
-
-
All filters are relative to the source location path. For example, suppose that you specify
/my_source/
as the source path when you create the task and create the include filter/transfer_this/
. In this case, only the directory/my_source/transfer_this/
and its content are transferred. -
To specify a folder directly under the source location, include a backslash (/) in front of the folder name. In the example preceding, the pattern uses
/transfer_this
, nottransfer_this
. -
The following patterns are interpreted the same way and match both the folder and its content.
/dir
/dir/
-
When you are transferring data from or to an Amazon S3 bucket, DataSync treats the / character in the object key as the equivalent of a folder on a file system.
-
- Special Characters
-
Following are special characters for use with filtering.
Special Character Description *
(wildcard)A character used to match zero or more characters. For example,
/movies_folder*
matches both/movies_folder
and/movies_folder1
.|
(pipe delimiter)A character used as a delimiter between patterns. It enables specifying multiple patterns, any of which can match the filter. For example,
*.tmp
|
*.temp
matches files ending with eithertmp
ortemp
.Note This delimiter is not needed when you add patterns on the console because each pattern is added on a separate line.
\
(backslash)This character is used for escaping when a file or object names contains special characters (*, |, \).
A double backslash (\\) is required when a backslash is part of a file name. Similarly, \\\\ represents two consecutive backslashes in a file name.
A backslash followed by a pipe (\|) is required when a pipe is part of a file name.
\ followed by any other character, or at the end of a pattern, is ignored.
Excluding Data from a Transfer
Exclude filters define files, folders, and objects that are excluded when you transfer files from a source to a destination location. You can configure these filters when you create or edit a task.
To create a task with an exclude filter in the DataSync console, specify a list of
patterns in the Filtering configuration – (optional) section
in the Exclude patterns box. For example, to exclude temporary
folders, you can specify */temp
in the exclude patterns text box,
choose Add patterns and then specify */tmp
in
the second text box. To add more patterns to the filter, choose Add
pattern. When you are using the
CLI,
note that quotes are required around the filter and a | (pipe) is used as a delimiter.
For this example, you would specify
'*/temp
|*/tmp
'.
The following screenshot shows the Edit Task wizard with patterns that exclude temporary folders.

After you have created a task, you can edit the task configuration to add or remove patterns from the filter.
You can also use the AWS Command Line Interface (AWS CLI) to create an exclude filter. The following example shows such a CLI command.
aws datasync create-task --source-location-arn 'arn:aws:datasync:region:account-id:location/location-id' --destination-location-arn 'arn:aws:datasync:region:account-id:location/location-id' --cloud-watch-log-group-arn 'arn:aws:logs:region:account-id:log-group:your-log-group' --name your-task-name --excludes FilterType=SIMPLE_PATTERN,Value='*/temp|*/tmp'
If you are migrating files from NetApp, we recommend that you exclude NetApp backup
folders by specifying */.snapshot
as a pattern in your
filter.
Including Data in a Transfer
Include filters define files, folders, and objects that are transferred when you run a task. You configure these filters as part of the configuration when you start a task.
To start a task with an include filter, specify a list of patterns to be included in the optional configuration when you start a task. To do this, use the Start with Overrides option in the DataSync console.
Files and folders matching the include filters are the only ones that are transferred.
For example, to include only a subset of your source folders, you might specify
/important_folder_1
|/important_folder_2
.
You can also use the AWS CLI to create an include filter. The following example shows the CLI command. Take note of the quotes around the filter and the | (pipe) that is used as a delimiter.
aws datasync start-task-execution --task-arn 'arn:aws:datasync:region:account-id:task/task-id' --includes FilterType=SIMPLE_PATTERN,Value='/important_folder1|/important_folder2'
Currently, include filters support the * character only as the rightmost character
in a pattern. For example,
/documents*
|/code*
is supported but
*.txt
is not supported.
Sample Filters for Common Uses
In this section, you can find common uses for filtering and sample filters for them.
Exclude some folders from your source location
In some cases, you might need to exclude folders in your source location to keep them from being copied to your destination. For example, you might have temporary work-in-progress folders. Or you might use NetApp and want to exclude NetApp backup folders. In these cases, you use the following filter.
*/.snapshot
To exclude folders at any level in the file hierarchy, you can create a task to configure an exclude filter like the following.
*/folder-to-exclude-1
|*/folder-to-exclude-2
To exclude folders at the top level of the source location, you can create a task to configure an exclude filter like the following.
/top-level-folder-to-exclude-1
|/top-level-folder-to-exclude-2
Include only a subset of the folders on your source location
In some cases, your source location might be a large share, and you need to transfer only a subset of the folders under the root. To include specific folders, start a task execution with an include filter like the following.
/folder-to-transfer
Exclude specific file types
To exclude certain file types from the transfer, you can create a task execution
with an exclude filter such as *.temp
.
Transfer only individual files you specify
To transfer a list of individual files, start a task execution with an include
filter like the following:
“/folder/subfolder/file1.txt
|/folder/subfolder/file2.txt
|/folder/subfolder/file2.txt
”
The string length is limited to 100,000 characters.