Amazon CloudSearch
Developer Guide (API Version 2011-02-01)
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.Did this page help you?  Yes | No |  Tell us about it...

Uploading Data to an Amazon CloudSearch Domain

To make your data searchable, you must describe it according to the Search Data Format (SDF) and upload the resulting SDF batches to a search domain. Amazon CloudSearch can then generate a search index from your SDF data according to the index fields and text options that you have configured for the domain. As your data changes, you submit SDF updates to add, change, or delete documents from your index. Amazon CloudSearch applies data updates continuously, so your changes become searchable in near real-time.

Amazon CloudSearch ensures that the most recent changes are applied to your domain using the document version numbers specified in the SDF add and delete operations. The operation with the greatest version number always takes precedence. To be applied, the version number in the add or delete operation must be greater than the document's current version number in the index. If the version number in an add or delete operation is less than the document's current version number, the operation is ignored. If an operation specifies the same document version that already exists in the index, the result is undefined—there's no guarantee which one will take precedence.

Important

To successfully upload SDF data to your domain, it has to be valid JSON or XML and conform to the SDF data conventions. For information about creating SDF batches, see Preparing Your Data for Amazon CloudSearch.

For information about configuring index fields for a domain, see Configuring Index Fields for an Amazon CloudSearch Domain.

You can submit SDF data to a domain using the cs-post-sdf command, from the Amazon CloudSearch console, or by posting it directly to the domain's Document endpoint.

Note

You are billed for the total number of document batches uploaded to your search domain, including batches that contain delete operations. For more information about Amazon CloudSearch pricing, see aws.amazon.com/cloudsearch/pricing/.

Command Line Tools

You use the cs-post-sdf command to send SDF data to your search domain. The SDF batches can be local or stored in Amazon S3. For information about installing and setting up the Amazon CloudSearch command line tools, see Amazon CloudSearch Command Line Tool Reference.

To send data to a domain for indexing

  1. If you haven't already, prepare your data according to the SDF schema. For more information about generating SDF, see Preparing Your Data for Amazon CloudSearch.

  2. Run the cs-post-sdf command to upload your SDF data to your domain. You must specify at least one --source option to specify the location of the SDF data you want to upload.

    cs-post-sdf -d mydomain --source data1.sdf
    Processing: data1.sdf
    Detected source format for data1.sdf as json
    Status: success
    Added: 5208
    Deleted: 0

AWS Management Console

In the Amazon CloudSearch console, you can upload data to your domain from the domain dashboard. The console can automatically convert the following types of files to SDF during the upload process:

  • Comma Separated Value (.csv)

  • Adobe Portable Document Format (.pdf)

  • HTML (.htm, .html)

  • Microsoft Excel (.xls, .xlsx)

  • Microsoft PowerPoint (.ppt, .pptx)

  • Microsoft Word (.doc, .docx)

  • Text Documents (.txt)

  • JSON Documents (.json)

  • XML Documents (.xml)

CSV files are parsed row-by-row and a separate document is generated for each row. All other types of files are treated as a single document. For more information about automatically generating SDF, see Preparing Your Data for Amazon CloudSearch.

You can also upload SDF batches through the Amazon CloudSearch console.

To send data to a domain for indexing

  1. Go to the Amazon CloudSearch console at https://console.aws.amazon.com/cloudsearch/home.

  2. In the Navigation panel, click the name of the domain.

  3. At the top of the domain dashboard, click Upload Documents.

    Upload Documents
  4. Select the location of the data you want to upload to your domain:

    • File(s) on my local disk

    • Object(s) from Amazon S3

    • Predefined data

    Note

    If you upload data in a format other than SDF, it will automatically be converted to SDF during the upload process.

    Upload Data
  5. If you are uploading local files, click Browse to choose the file(s) to upload:

    Upload Local Files
  6. If you are uploading objects from Amazon S3, select the bucket you want to upload from. To upload the entire contents of the bucket, leave the Prefix field empty and click Add. To upload selected objects, enter a filter in the Prefix field and click Add. (You can add multiple prefixes.)

    Upload from Amazon S3
  7. If are uploading predefined sample data, choose the data set that you want to use:

    Upload Sample Data
  8. Once you've selected the data you want to upload, click Continue.

    Upload Files
  9. On the Review Documents step, review the documents to be uploaded and click Upload Documents to continue.

    Review Upload
  10. On the Document Summary step, if SDF batches have been automatically generated from your data, you can click Download the generated SDF files to get them. Click Finish to return to the domain dashboard.

    Upload Summary

API

You use the documents/batch API to post SDF data to your domain to add, update, or remove documents. For example:

curl -X POST --upload-file data1.sdf doc.movies-123456789012.us-east-1.cloudsearch.amazonaws.com/2011-02-01/documents/batch --header "Content-Type:application/json"