| « PreviousNext » | |
![]() ![]() ![]() | Did this page help you? Yes | No | Tell us about it... |
You create SDF batches to describe the data that you want to make searchable. When you send SDF batches to a domain, the data is indexed automatically according to the domain's indexing and text options.
An SDF batch is a collection of add and delete operations that represent the documents you want to add, update, or delete from your domain. SDF batches can be described in either JSON or XML. The maximum batch size is 5 MB. The maximum size of an individual document is 1 MB. If you have a large number of documents, you must send updates in 5 MB batches.
Important
Whenever possible, you should group add and delete operations in batches that are close to the maximum batch size. Submitting a large volume of single-document batches to the document service can increase the time it takes for your changes to become visible in search results.
For each document in a batch, you must specify:
The operation you want to perform: add or delete.
A unique ID for the document (docid). A document ID can contain the following characters: a-z (lower-case letters), 0-9, and _ (underscore). Document IDs cannot begin with an underscore.
A document version number for the add or delete operation. The version is used to guarantee that older updates aren't accidentally applied, and to provide control over the ordering of concurrent updates to the service. The document service guarantees that the update with the highest version will be applied and remain there until an add or delete operation with a higher version number and the same document ID is received. If you submit multiple add or delete operations with the same version number, which one takes precedence is undefined. You must increase the version number every time you submit a new add or delete operation for a document. For more information, see Document Versions in Amazon CloudSearch.
The document language as a two-letter language code, such as en for English. (Add operations only.)
A name-value pair for each document field. When specifying SDF in JSON, the value for a field cannot be null. (Add operations only.)
For example, the following JSON SDF batch adds one document and deletes one document:
[
{ "type": "add",
"id": "tt0484562",
"version": 1,
"lang": "en",
"fields": {
"title": "The Seeker: The Dark Is Rising",
"director": "Cunningham, David L.",
"genre": ["Adventure","Drama","Fantasy","Thriller"],
"actor": ["McShane, Ian","Eccleston, Christopher","Conroy, Frances",
"Crewson, Wendy","Ludwig, Alexander","Cosmo, James",
"Warner, Amelia","Hickey, John Benjamin","Piddock, Jim",
"Lockhart, Emma"]
}
},
{ "type": "delete",
"id": "tt0484575",
"version": 2
}
]The same batch formatted in XML looks like this:
<batch> <add id="tt0484562" version="1" lang="en"> <field name="title">The Seeker: The Dark Is Rising</field> <field name="director">Cunningham, David L.</field> <field name="genre">Adventure</field> <field name="genre">Drama</field> <field name="genre">Fantasy</field> <field name="genre">Thriller</field> <field name="actor">McShane, Ian</field> <field name="actor">Eccleston, Christopher</field> <field name="actor">Conroy, Frances</field> <field name="actor">Ludwig, Alexander</field> <field name="actor">Crewson, Wendy</field> <field name="actor">Warner, Amelia</field> <field name="actor">Cosmo, James</field> <field name="actor">Hickey, John Benjamin</field> <field name="actor">Piddock, Jim</field> <field name="actor">Lockhart, Emma</field> </add> <delete id="tt0484575" version="2" /> </batch>
Uploading SDF batches that contain invalid JSON or XML will produce unpredictable results. Processing stops when an error is encountered, but the preceding add and delete operations are applied to the domain. You can verify the validity of your JSON or XML data using tools such as xmllint and jsonlint.
Both JSON and XML batches can only contain UTF-8 characters that are valid in XML. Valid characters are the control characters tab (0009), carriage return (000D), and line feed (000A), and the legal characters of Unicode and ISO/IEC 10646. FFFE, FFFF, and the surrogate blocks D800–DBFF and DC00–DFFF are invalid and will cause errors. (For more information, see Extensible Markup Language (XML) 1.0 (Fifth Edition).) You can use the following regular expression to match invalid characters so you can remove them: /[^\u0009\u000a\u000d\u0020-\uD7FF\uE000-\uFFFD]/ .
When formatting SDF in JSON, quotes (") and backslashes (\) within field values must be escaped with a backslash. For example:
"title":"Where the Wild Things Are" "isbn":"0-06-025492-0" "image":"images\\covers\\Where_The_Wild_Things_Are_(book)_cover.jpg" comment":"Sendak's \"Where the Wild Things Are\" is a children's classic."
When formatting SDF in XML, ampersands (&) and less-than symbols (<) within field values need to be represented with the corresponding entity references (& and <).
For example:
<field name="title">Little Cow & the Turtle</field>
<field name="isbn">0-84466-4774</field>
<field name="image">images\covers\Little_Cow_&_the_Turtle_(book)_cover.jpg</field>
<field name="comment"><insert comment></field>
If you have large blocks of user-generated content, you might want to wrap the entire field in a CDATA section, rather than replacing every occurrence with the entity reference. For example:
<field name="comment"><!CDATA[Monsters & mayhem--what's not to like! ]]>
The command line tools and Amazon CloudSearch console include an experimental mechanism for automatically generating SDF from a variety of source documents.