Menu
Amazon CloudSearch
Developer Guide (API Version 2013-01-01)

cs-import-documents

Copy
NAME cs-import-documents DESCRIPTION Uploads documents to a search domain for indexing. If necessary, the source data is analyzed and converted to JSON or XML so it can be indexed by Amazon CloudSearch. The data source can be a local directory or file, an S3 bucket, or a DynamoDB table. Specify the --domain option to upload the documents to your search domain. To save the generated JSON or XML files to your local file system or an S3 bucket, specify the --output option. The cs-import-documents command can process the following content types: text/csv text/html text/plain application/msword application/pdf application/vnd.ms-excel application/vnd.ms-powerpoint application/vnd.openxmlformats-officedocument. presentationml.presentation application/vnd.openxmlformats-officedocument. spreadsheetml.sheet application/vnd.openxmlformats-officedocument. wordprocessingml.document For more information, see the developer guide topic "Generating JSON or XML from Your Source Data for Amazon CloudSearch". SYNOPSIS cs-import-documents --source PATH|S3_URI|DDB_TABLE [--output PATH|S3_URI] [--modified-after yyyy-MM-ddTHH:mm:ssZ] [--exclude-metadata] [--exclude-content] [--single-doc-per-csv] [--multivalued FIELDS] [--sdf-format json|xml] [--docid-prefix STRING] [--batch-size MB] [--batch-docs NUM] [--num-rows NUM] [--dynamodb-rcu-percent NUM] [--start-hash-key STRING] [--start-range-key STRING] [--delimiter CHAR] [--encapsulator CHAR] [--comment-character CHAR] COMMON_OPTIONS COMMON OPTIONS -a, --access-key STRING Your AWS access key ID. Used in conjunction with --secret-key. Required if you do not use an AWS credential file. -c, --aws-credential-file FILE The path to the file that contains your AWS access key ID and secret access key. Required if you have not set the AWS_CREDENTIAL_FILE environment variable or explicitly set your credentials with --access-key and --secret-key. -d, --domain-name STRING The name of the domain that you are querying or configuring. Required. -e, --endpoint URL The endpoint for the Amazon CloudSearch Configuration Service. Defaults to the CS_ENDPOINT environment variable or cloudsearch.us-east-1.amazonaws.com if the environment variable is not set. Optional. -h, --help Display this help message. Optional. -k, --secret-key STRING Your AWS secret access key. Used in conjunction with --access-key. Required if you do not use an AWS credential file. -ve, --verbose Display verbose log messages. Optional. -v, --version Display the version number of the command line tools. Optional. BASIC OPTIONS -o, --output PATH|S3_URI The local directory or S3 bucket where you want to save the generated JSON or XML files. Required if you do not specify the --domain option to upload the documents to a search domain. -s, --source PATH|S3_URI|DDB_TABLE The local directory or file, S3 bucket, or DynamoDB table that contains your source data. You can process data from multiple locations by specifying multiple sources. For example: --source c:\DataSet1 c:\DataSet2. Supports wildcards for filenames, directories, and S3 prefixes: ? matches any single character, * matches zero or more characters, ** matches zero or more directories or prefixes. Required. ADVANCED OPTIONS -bd, --batch-docs NUM The maximum number of documents in a batch. Optional. -bs, --batch-size MB The maximum batch size in MB. Defaults to 5MB. Optional. -char, --comment-character CHAR The character used to identify comments in CSV source files. If not specified, the default comment character is a hash character (#). Optional. -del, --delimiter CHAR The character used to delimit fields in CSV source files. If not specified, the default delimiter is a comma (,). Optional. -dp, --docid-prefix STRING The prefix to prepend to the document ID while processing CSV data. If not specified, the filename is used as the --docid-prefix. The docid column is used as the document ID if it is included in the CSV data; otherwise, the row number is used as the document ID. Optional. -enc, --encapsulator CHAR The character used to encapsulate individual values of a multi-valued field in CSV source files. If not specified, the default encapsulator is a double quote (\"). Optional. -ec, --exclude-content Do not include the content of the source files in the generated JSON or XML, only process the metadata. Optional. -em, --exclude-metadata Do not include the metadata of the source files in the generated JSON or XML, only process the content. Optional. -m, --modified-after TIMESTAMP Only process files or S3 objects modified after the specified date and time. Specified in RFC 822 time zone format (yyyy-MM-dd'T'HH:mm:ssZ). For example, 2012-12-12T01:00:00GMT. Optional. -mv, --multivalued FIELDS Treat the specified fields as multi-valued fields when processing CSV files. Specify multiple fields as a comma-separated list. If no fields are specified, all fields other than docid are processed as multi-valued fields. This option is not valid if the -sdpc option is specified and it has no effect on non-CSV files. Optional. -format, --sdf-format json|xml The format of the generated documents: json or xml. Defaults to json. Optional. -sdpc, --single-doc-per-csv Treat each CSV file as a single document. If this option is specified, the contents of a CSV file are treated as a single text field. This option is not valid if the -mv option is specified and it has no effect on non-CSV files. Optional. DynamoDB SOURCE OPTIONS -drp, --dynamodb-rcu-percent The maximum percentage of configured read capacity units to use while reading from the DynamoDB table. By default, the maximum number of read capacity units is set to 20% the table's configured read capacity units. Optional. -n, --num-rows The maximum number of rows to read from the DynamoDB table. Optional. By default, the entire table is read. -shk, --start-hash-key The hash attribute of the item in the DynamoDB table where you want to begin reading. If the table has a hash and range type primary key, the --start-range-key option must also be specified. By default, the table is read starting with the first item. Optional. -srk, --start-range-key The range attribute of the item in the DynamoDB table where you want to begin reading. Required if --start-hash-key is specified and the DynamoDB table has a hash and range type primary key. Not used if the table has a hash type primary key. Optional. EXAMPLES Process all of the source documents in a directory and upload the data for indexing: cs-import-documents -d mydomain --source c:\myAmazingDataSet\* COMMON_OPTIONS Process a DynamoDB table and save the generated XML files to a local directory: cs-import-documents --source ddb://myDDBTable --output c:\myAmazingDataSet\SDF\batch -format xml COMMON_OPTIONS