|« PreviousNext »|
|Did this page help you? Yes | No | Tell us about it...|
To build an index from your SDF data, Amazon CloudSearch needs to know what data you want to search, what data you want to be able to include in the search results, what data you want to use as facets, and if any custom stopwords, synonyms, and stems need to be defined for your data set. You define this metadata in your domain configuration by configuring indexing and text options. In your domain configuration, you also specify access policies to control who can send data updates and search your domain, and rank expressions to customize how search results are ranked.
A domain's indexing options configure the index fields that will be included in the search index. An index field represents a named field and value pair that you want to store in your index. You configure an index field for each SDF document field that will be searched, used as a facet, or returned in search results.
Every index field has a unique name and a source that specifies one or more SDF document fields. The sources are used to populate the index field. If no source is specified, the source defaults to the SDF document field that has the same name as the index field. An index field definition also includes meta-information such as:
The index field type.
Whether a literal field is searchable (Text and uint fields are always searchable.)
Whether the value of a text or literal field can be returned in results/ (Uint fields are aways returnable.)
Whether facet counts can be calculated for a text or literal field. (Facet counts can always be calculated for uint fields.)
Amazon CloudSearch supports three types of index fields:
text—contains arbitrary alphanumeric data. For example, a text field might contain a name, description, or the entire body of a document. Text fields are always searchable and Amazon CloudSearch performs text processing on them according to the stopwords, synonyms, and stems you configure in your domain's text options.
literal—contains an identifier or other data that you want to be able to match exactly. Unlike text fields, Amazon CloudSearch does not perform any text processing on literal fields. Literal fields can be used for fields that have a small set of possible values, as well as for more arbitrary values like email addresses or titles where an exact match is important. Literal fields are frequently used to enable faceted searches where you want to count the number of exact matches for a particular value.
uint—contains an unsigned integer value. For example, you might use a uint field for a field that contains a quantity or numerical rating, or for a date field that contains a time_t value.
For information about how to configure index fields for Amazon CloudSearch, see Configuring Index Fields for an Amazon CloudSearch Domain.
A facet is an index field that represents a category that you want to use to refine and filter search results. When you submit search requests to Amazon CloudSearch, you can request facet information to find out how many hits share the same value in a facet. You can display this information along with the search results and use it to enable users to interactively refine their searches. (This is often referred to as faceted navigation or faceted search.)
A facet can be any numeric field or a text or literal field that has faceting enabled in your domain configuration. To request facet information in your search request, you specify:
One or more facets
Facet constraints that specify the particular values you want to count (optional)
How you want the facet values to be sorted in the results (optional)
For each facet, Amazon CloudSearch calculates the number of hits that share the same value. If you specify constraints, the facet counts are calculated only for values that match the constraints. Only constraints that have matches are included in the facet results.
Values from a facet-enabled text or literal field cannot be returned in the search results. Text and literal fields can be facet-enabled or result-enabled, but not both. If you want to return the value from an SDF document field as well as use the field as a facet, create two index fields that use the same SDF document field as a source and make one result-enabled, and the other facet-enabled.
For information about configuring facets, see Configuring Index Fields for an Amazon CloudSearch Domain. For information about using facet information to support faceted navigation, see Getting and Using Facet Information in Amazon CloudSearch.
During indexing, Amazon CloudSearch performs a number of text-processing steps on text fields. First, Amazon CloudSearch strips punctuation and splits the text into individual terms that are indexed separately. For example, the string
spider-man would be split into two terms, spider and
Text fields are then processed using the domain-specific stopword, stemming, and synonym dictionaries:
Stopwords configured for the domain are excluded from the index. For example, the stopwords dictionary generally contains insignificant, frequently occurring terms such as "a", "and", and "the" that would result in a massive number of matches if they were included in the index.
Related words are mapped to a common stem according to the stemming dictionary configured for the domain. For example, the stemming dictionary might map "running" and "ran" to the stem "run".
Synonyms are mapped according to the synonym dictionary configured for the domain. For example, the synonym dictionary might define "colt" and "filly" as synonyms for "horse".
Amazon CloudSearch defines a default stopword dictionary that you can fine-tune for your application. Stemming and synonym dictionaries are application-specific and are empty by default. For information about how to configure stopwords, stems, and synonyms for your domain, see Configuring Text Options for an Amazon CloudSearch Domain.
For more information about how Amazon CloudSearch normalizes and tokenizes text and applies configured text options when indexing text fields and processing search requests, see Text Processing in Amazon CloudSearch.
Access to your search domain's endpoints is restricted by IP address so that only authorized hosts can submit documents and send search requests. IP address authorization is used only to control access to the document and search endpoints. All Amazon CloudSearch configuration requests must be authenticated using standard AWS authentication.
Amazon CloudSearch access policies are specified using the AWS Identity and Access Management (IAM) Access Policy Language.
For information about how to configure access policies for your domain, see Configuring Access for an Amazon CloudSearch Domain.
You can customize how search results are ranked by defining your own rank expressions. Rank expressions are numeric expressions that can be used at search time to calculate a score for every document that matches the search. A rank expression uses standard numeric operators and functions and can reference uint fields, other rank expressions, a document's text_relevance score. When you submit search requests, you specify the rank expression(s) you want to use to rank or constrain the search results.
A document's text_relevance score indicates how relevant a particular search hit is to the search request. To calculate the relevance score, Amazon CloudSearch takes into account how many times the search terms appear (term frequency) and how close the search terms are to each other (proximity).
For information about how to configure rank expressions for your domain, see Customizing Result Ranking with Amazon CloudSearch.