| « PreviousNext » | |
![]() ![]() ![]() | Did this page help you? Yes | No | Tell us about it... |
A stemming dictionary maps related words to a common stem. A stem is typically the root or base word from which variants are derived. For example, run is the stem of running and ran. During indexing, Amazon CloudSearch uses the stemming dictionary when it performs text-processing on text fields. At search time, the stemming dictionary is used to perform text-processing on the search request. This enables matching on variants of a word. For example, if you map the term running to the stem run and then search for running, the request matches documents that contain run as well as running.
Stems are specified as a collection of term and stem pairs. When you configure stemming options, the existing stemming dictionary is replaced with the mappings you specify. By default, Amazon CloudSearch does not define any stems. However, some basic algorithmic stemming is always performed, such as removing plural suffixes. (This is done whether or not you specify a custom stemming dictionary.)
The maximum size of a stemming dictionary is 500 KB.
You can configure stems using the cs-configure-text-options command, from the Amazon CloudSearch console, or using the UpdateStemmingOptions configuration action.
You can use the cs-configure-text-options command to upload a text file that contains a list of term and stem pairs.
To configure stemming options
Create a text file for your stemming dictionary and specify one comma-separated term, stem pair per line. For example:
mice, mouse
people, person
running, run
Run the cs-configure-text-options command with the --stems option to upload the stemming dictionary to your domain:
cs-configure-text-options -d mydomain -stems stems.txt Updating Stemming options Read the stems file Sent 3 token stem pairs.
If you are done making configuration changes, run the cs-index-documents command to rebuild the domain's index.
cs-index-documents -d mydomain
You can configure a domain's stemming options from the Text Options panel in the Amazon CloudSearch console.
To configure stemming options
Go to the Amazon CloudSearch console at https://console.aws.amazon.com/cloudsearch/home.
In the Navigation panel, click the name of the domain, and then click the domain's Text Options link.
In the Text Options panel, click the Stemming tab.
For each term, stem pair you want to add to the stemming dictionary, enter the term and its stem and click the Add button. You can also edit the list directly or copy and paste the list into a text editor to make changes.

Click Submit to save your changes.
If you are done making configuration changes, click Run Indexing on the domain dashboard to rebuild the domain's index.
Use the UpdateStemmingOptions configuration action to upload a JSON-formatted stemming dictionary to your domain. A stemming dictionary has a single JSON object with one property, stems. The value of the stems property is
an object that contains a collection of string: value pairs that map terms to their stems:
{"stems": {"term1": "stem1", "term2": "stem2", "term3": "stem3"}}For example:
https://cloudsearch.us-east-1.amazonaws.com
?Action=UpdateStemmingOptions
&DomainName=movies
&Stems={"stems": {"mice": "mouse", "people": "person", "running": "run"} }
&Version=2011-02-01
&X-Amz-Algorithm=AWS4-HMAC-SHA256
&X-Amz-Credential=AKIAIOSFODNN7EXAMPLE/20120402/us-east-1/cloudsearch/aws4_request
&X-Amz-Date=2012-04-02T21:43:50.884Z
&X-Amz-SignedHeaders=host
&X-Amz-Signature=4f7a17dc53fbd7e08b3d3a0c4d771466fe48d2739c8d6333ebe0261d
88941488