Amazon CloudSearch
Developer Guide (API Version 2011-02-01)
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.Did this page help you?  Yes | No |  Tell us about it...

Configuring Stemming in Amazon CloudSearch

A stemming dictionary maps related words to a common stem. A stem is typically the root or base word from which variants are derived. For example, run is the stem of running and ran. During indexing, Amazon CloudSearch uses the stemming dictionary when it performs text-processing on text fields. At search time, the stemming dictionary is used to perform text-processing on the search request. This enables matching on variants of a word. For example, if you map the term running to the stem run and then search for running, the request matches documents that contain run as well as running.

Stems are specified as a collection of term and stem pairs. When you configure stemming options, the existing stemming dictionary is replaced with the mappings you specify. By default, Amazon CloudSearch does not define any stems. However, some basic algorithmic stemming is always performed, such as removing plural suffixes. (This is done whether or not you specify a custom stemming dictionary.)

The maximum size of a stemming dictionary is 500 KB.

You can configure stems using the cs-configure-text-options command, from the Amazon CloudSearch console, or using the UpdateStemmingOptions configuration action.

Command Line Tools

You can use the cs-configure-text-options command to upload a text file that contains a list of term and stem pairs.

To configure stemming options

  1. Create a text file for your stemming dictionary and specify one comma-separated term, stem pair per line. For example:

    mice, mouse

    people, person

    running, run

  2. Run the cs-configure-text-options command with the --stems option to upload the stemming dictionary to your domain:

    cs-configure-text-options -d mydomain -stems stems.txt
    Updating Stemming options
    Read the stems file
    Sent 3 token stem pairs.
  3. If you are done making configuration changes, run the cs-index-documents command to rebuild the domain's index.

    cs-index-documents -d mydomain

AWS Management Console

You can configure a domain's stemming options from the Text Options panel in the Amazon CloudSearch console.

To configure stemming options

  1. Go to the Amazon CloudSearch console at https://console.aws.amazon.com/cloudsearch/home.

  2. In the Navigation panel, click the name of the domain, and then click the domain's Text Options link.

  3. In the Text Options panel, click the Stemming tab.

  4. For each term, stem pair you want to add to the stemming dictionary, enter the term and its stem and click the Add button. You can also edit the list directly or copy and paste the list into a text editor to make changes.

    Configure Stems
  5. Click Submit to save your changes.

  6. If you are done making configuration changes, click Run Indexing on the domain dashboard to rebuild the domain's index.

API

Use the UpdateStemmingOptions configuration action to upload a JSON-formatted stemming dictionary to your domain. A stemming dictionary has a single JSON object with one property, stems. The value of the stems property is an object that contains a collection of string: value pairs that map terms to their stems:

{"stems": {"term1": "stem1", "term2": "stem2", "term3": "stem3"}}

For example:

https://cloudsearch.us-east-1.amazonaws.com
?Action=UpdateStemmingOptions
&DomainName=movies
&Stems={"stems": {"mice": "mouse", "people": "person", "running": "run"} }
&Version=2011-02-01
&X-Amz-Algorithm=AWS4-HMAC-SHA256
&X-Amz-Credential=AKIAIOSFODNN7EXAMPLE/20120402/us-east-1/cloudsearch/aws4_request
&X-Amz-Date=2012-04-02T21:43:50.884Z
&X-Amz-SignedHeaders=host
&X-Amz-Signature=4f7a17dc53fbd7e08b3d3a0c4d771466fe48d2739c8d6333ebe0261d
88941488