Amazon CloudSearch
Developer Guide (API Version 2011-02-01)
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.Did this page help you?  Yes | No |  Tell us about it...

Using Relative Field Weighting to Customize Text Relevance in Amazon CloudSearch

You can use the cs.text_relevance function in a rank expression to calculate text_relevance scores using custom field weights. This enables you to control how much matches in particular text or literal fields affect a document's text_relevance score. By default all fields are considered equally important. The cs.text_relevance function enables you to assign different weights to different fields so you can boost the text_relevance score of documents with matches in key fields such as the title field, and minimize the impact of matches in less important fields. When you use the cs.text_relevance function to specify field weights within a rank expression, those weights are only applied when evaluating that rank expression.

The cs.text_relevance function takes a JSON object that can contain two members:

  • weights—a JSON object that defines weights for one or more source fields. Note that you specify weights for the source fields defined in your SDF data. By default, a source field is mapped to the index field with the same name, but source fields can also be explicitly mapped to any index field. For more information, see Adding Sources for an Amazon CloudSearch Index Field.

  • default_weight—the default weight to use for fields for which no weight is specified. (If the default_weight is not specified, the default weight for all fields is 1.0.)

Both the weights and default_weight members are optional. The specified weights must be in the range 0.0 to 10.0, inclusive. Weights must be specified as a numeric value, you cannot use mathematical functions or other rank expressions to define a field weight. Keep in mind that a document's text_relevance score is a value from 0 to 1000 (inclusive). If the text_relevance score calculated for a document is greater than 1000, the document's score is set to 1000. If you specify a large default weight, it increases the likelihood that text_relevance scores will exceed 1000 and be clipped.

For example, if you want matches within the title field to be ranked higher than matches within the description field, you could create a rank expression that sets the weight of the title field to 1.5 and the weight of the description field to 0.5:

cs.text_relevance({"weights": {"title": 1.5, "description": 0.5}})

Note that the keys can be enclosed in single or double quotes, or omitted entirely. To simplify this example, you can specify it as:

cs.text_relevance({weights: {title: 1.5, description: 0.5}})

Because the default_weight is not specified, the weight of all other fields defaults to 1.0. If you want all fields other than the title field to be treated the same, you could set the weight of the title field to 1.5, and set the default_weight to 0.5:

cs.text_relevance({weights: {title: 1.5}, default_weight: 0.5})

In your rank expressions, you can use the cs.text_relevance function in conjunction with uint fields, other rank expressions, a document's default text_relevance score, and the standard numeric operators and functions. For example, you could create a custom rank expression that's based on popularity (which could be defined as a uint field, or as another rank expression) and boosts the importance of the title field by setting its weight to 4.0:

((0.3*popularity)/10.0)+(0.7*cs.text_relevance({weights: {title:4.0}})

The cs.text_relevance function can only be used in the definition of a rank expression. To rank or threshold results using the specified weights, you specify the name of the rank expression. To rank or threshold results based on the default text_relevance score, you specify text_relevance.