Getting and Using Facet Information in Amazon CloudSearch - Amazon CloudSearch

Getting and Using Facet Information in Amazon CloudSearch

A facet is an index field that represents a category that you want to use to refine and filter search results. When you submit search requests to Amazon CloudSearch, you can request facet information to find out how many documents share the same value in a particular field. You can display this information along with the search results, and use it to enable users to interactively refine their searches. (This is often referred to as faceted navigation or faceted search.)

You can get facet information for any facet-enabled field by specifying the facet.FIELD parameter in your search request. By default, Amazon CloudSearch returns facet counts for the top 10 values. For more information about enabling a field to return facets, see configure indexing options. For a description of the facet.FIELD parameter, see Search Request Parameters in the Search API reference.

You can specify facet options to control the sorting of the facet values for each field, limit the number of facet values returned, or choose what facet values to count and return.

Getting Facet Information in Amazon CloudSearch

To get facet information for a field, you use the facet.FIELD parameter. FIELD is the name of a facet-enabled field. You specify facet options as a JSON object. If the JSON object is empty (facet.FIELD={}), facet counts are computed for all field values, the facets are sorted by facet count, and the top 10 facets are returned in the results. You can request facet information for multiple fields in the same request.

You can retrieve facet information in two ways:

  • sort—Returns facet information sorted either by facet counts or facet values.

  • buckets—Returns facet information for particular facet values or ranges.

Sorting Facet Information

You specify the sort option to control how the facet information is sorted. There are two sort options: count and bucket:

  • Use count to sort the facets by facet counts. For example, facet.year={sort:'count'} counts the number of matches that have the same year value and sorts the facet information by that number.

  • Use bucket to sort the facets by the facet values. For example, facet.year={sort:'bucket'} .

When you use the sort option, you can specify the size option to control the maximum number of facet values returned in the results. The size option is valid only when you use the sort option.

In the following example, facet information is calculated for the genres field, the genres are sorted by facet value, and the first 5 genres are returned in the results:

facet.genres={sort:'bucket', size:5}

Bucketing Facet Information

You can explicitly specify the facet values or ranges that you want to count by using the buckets option. Buckets are specified as an array of values or ranges, for example, facet.color={buckets:["red","green","blue"]}.

To specify a range of values, use a comma (,) to separate the upper and lower bounds and enclose the range using brackets or braces. A square bracket, [ or ], indicates that the bound is included in the range, a curly brace, { or }, excludes the bound. You can omit the upper or lower bound to specify an open-ended range. When omitting a bound, you must use a curly brace. For example, facet.year={buckets:["[1970,1979]","[1980,1989]", "[1990,1999]","[2000,2009]","[2010,}"]}. For a timestamp, you can use q=-poet&facet.release_date={buckets:["[\'1980-01-01T00:00:00Z\',\'1986-01-01T00:00:01Z\']"]}.

The sort and size options are not valid if you specify buckets.

Amazon CloudSearch supports two methods for calculating bucket counts, filter and interval. By default, the filter method is used, which simply submits an additional filter query for each bucket to get the bucket counts. While this works well in many cases, if you have a high update rate or are retrieving a large number of facets, performance can suffer because those queries can't take advantage of the internal caching mechanism.

If you're experiencing slow query performance for bucketed facets, try setting the buckets method to interval, which post-processes the result set rather than submitting multiple queries:

facet.year={buckets:["[1970,1979]","[1980,1989]","[1990,1999]"],method:"interval"}

We recommend doing your own performance testing to determine which method is best for your application. In general, the filter method is faster if you have a fairly low update rate and aren't retrieving a large number of buckets. However, if you have a high update rate or a lot of buckets, using the interval method to post-process the result set can result in significantly faster query performance.

Using Facet Information in Amazon CloudSearch

You can display facet information to enable users to more easily browse search results and identify the information they are interested in. For example, if a user is trying to find one of the Star Trek movies, but can't remember the full title, he might start by searching for star. If you want to display top facets for genre, you would include facet.FIELD in the query, along with the number of facet values that you want to retrieve for each facet:

search?q=star&facet.genres={sort:'count',size:5}&format=xml&return=_no_fields

The preceding example gives you the following information in the search response:

<results> <status rid="v7r9hs8oFQqMHnk=" time-ms="3"/> <hits found="85" start="0"> <hit id="tt1411664"/> <hit id="tt1911658"/> <hit id="tt0086190"/> <hit id="tt0120601"/> <hit id="tt2141761"/> <hit id="tt1674771"/> <hit id="tt0056687"/> <hit id="tt0397892"/> <hit id="tt0258153"/> <hit id="tt0796366"/> </hits> <facets> <facet name="genres"> <bucket value="Comedy" count="41"/><bucket value="Drama" count="35"/> <bucket value="Adventure" count="29"/> <bucket value="Sci-Fi" count="24"/> <bucket value="Action" count="20"/> </facet> </facets> </results>

Multi-Select Facets in Amazon CloudSearch

If you want to display the available facets and enable users to select multiple values to refine the results, you can submit one request to get the documents that match the facet constraints and additional requests to get the facet counts.

For example, in the sample movie data, the genres, rating, and year fields are facet enabled. If the user searches for the term poet, you can submit the following request to get the matching movies and the facet counts for the genres, rating, and year fields:

q=poet&facet.genres={}&facet.rating={}&facet.year={}&return=_no_fields

Because no facet.FIELD options are specified, Amazon CloudSearch counts all of the facet values and returns the top 10 values for each facet:

{ "status" : { "rid" : "it3T8tIoDgrUSvA=", "time-ms" : 5 }, "hits" : { "found" : 14, "start" : 0, "hit" : [ {"id" : "tt0097165"}, {"id" : "tt0059113"}, { "id" : "tt0108174"}, {"id" : "tt1067765"}, { "id" : "tt1311071"}, {"id" : "tt0810784"}, {"id" : "tt0819714"}, {"id" : "tt0203009"}, {"id" : "tt0114702"}, {"id" : "tt0107840"} ] }, "facets" : { "genres" : { "buckets" : [ {"value" : "Drama","count" : 12}, {"value" : "Romance","count" : 9}, {"value" : "Biography", "count" : 4}, {"value" : "Comedy","count" : 2}, {"value" : "Thriller","count" : 2}, {"value" : "War","count" : 2}, {"value" : "Crime","count" : 1}, {"value" : "History","count" : 1}, {"value" : "Musical","count" : 1} ] }, "rating" : { "buckets" : [ {"value" : "6.3","count" : 3}, {"value" : "6.2","count" : 2}, {"value" : "7.1","count" : 2}, {"value" : "7.9","count" : 2}, {"value" : "5.3","count" : 1}, {"value" : "6.1""count" : 1}, {"value" : "6.4","count" : 1}, {"value" : "6.9","count" : 1}, {"value" : "7.6","count" : 1} ] }, "year" : { "buckets" : [ {"value" : "2013","count" : 3}, {"value" : "1993","count" : 2}, {"value" : "1965","count" : 1}, {"value" : "1989","count" : 1}, {"value" : "1995","count" : 1}, {"value" : "2001","count" : 1}, {"value" : "2004","count" : 1}, {"value" : "2006","count" : 1}, {"value" : "2008","count" : 1}, {"value" : "2009","count" : 1} ] } } }

When the user refines the search by selecting facet values, you use those facet selections to filter the results. For example, if the user selects 2013, 2012, and 1993, the following request gets the matching movies released during those years:

q=poet&fq=(or year:2013 year:2012 year:1993)&facet.genres={}&facet.rating={} &facet.year={}&return=_no_fields

This gets the documents that match the user's selection and the facet counts with the filter applied:

{ "status" : { "rid" : "zMP38tIoDwrUSvA=", "time-ms" : 6 }, "hits" : { "found" : 6, "start" : 0, "hit" : [ {"id" : "tt0108174"}, {"id" : "tt1067765"}, {"id" : "tt1311071"}, {"id" : "tt0107840"}, {"id" : "tt1462411"}, {"id" : "tt0455323"} ] }, "facets" : { "genres" : { "buckets" : [ {"value" : "Drama","count" : 4}, {"value" : "Romance","count" : 3}, {"value" : "Comedy","count" : 2}, {"value" : "Thriller","count" : 2}, {"value" : "Biography","count" : 1}, {"value" : "Crime","count" : 1} ] }, "rating" : { "buckets" : [ {"value" : "6.3","count" : 2}, {"value" : "5.3","count" : 1}, {"value" : "6.2","count" : 1}, {"value" : "6.4","count" : 1}, {"value" : "7.1","count" : 1} ] }, "year" : { "buckets" : [ {"value" : "2013","count" : 3}, {"value" : "1993","count" : 2}, {"value" : "2012","count" : 1} ] } } }

This is what you want to show for the genres and ratings. However, to enable the user to change the year filter, you need to get the facet counts for the years that aren't selected. To do this, you submit a second request to retrieve the facet counts for the year field without the filter:

q=poet&facet.year={}&size=0

There's no need to retrieve the matching documents, so the size parameter is set to zero to minimize the request latency. The request returns just the facet information for the year field:

{ "status" : { "rid" : "x/7r0NIoRwqlHfo=", "time-ms" : 4 }, "hits" : { "found" : 14, "start" : 0, "hit" : [ ] }, "facets" : { "year" : { "buckets" : [ {"value" : "2013","count" : 3}, {"value" : "1993","count" : 2}, {"value" : "1965","count" : 1}, {"value" : "1989","count" : 1}, {"value" : "1995","count" : 1}, {"value" : "2001","count" : 1}, {"value" : "2004","count" : 1}, {"value" : "2006","count" : 1}, {"value" : "2008","count" : 1}, {"value" : "2009","count" : 1} ] } } }

To minimize the response time, you can send this request in parallel with the request to get the filtered results. However, keep in mind that these additional requests can impact your overall query performance, and it might be necessary to scale your domain up to handle the additional traffic. (For more information about scaling, see Configuring Scaling Options in Amazon CloudSearch.)

If the user further refines the search by selecting a genre or rating, you add that to the filter criteria to get the matching documents. For example, the following request gets the movies released in 2013, 2012, or 1993 that have a rating of 6.3:

q=poet&fq=(and rating:6.3 (or year:2013 year:2012 year:1993))&facet.genres={}&return=_no_fields

Getting the facet information for genres in this request returns the facet counts with the rating and year filters applied:

{ "status" : { "rid" : "l66b89IoEArUSvA=", "time-ms" : 6 }, "hits" : { "found" : 2, "start" : 0, "hit" : [ {"id" : "tt1462411"}, {"id" : "tt0455323"} ] }, "facets" : { "genres" : { "buckets" : [ {"value" : "Drama","count" : 2} ] } } }

To enable the user to select a different rating, you submit an additional request to get the rating facet counts with only the year filter applied:

q=poet&fq=(or year:2013 year:2012 year:1993)&facet.rating={}&size=0

This request gets the following response:

{ "status" : { "rid" : "jqWj89IoEQrUSvA=", "time-ms" : 5 }, "hits" : { "found" : 6, "start" : 0, "hit" : [ ] }, "facets" : { "rating" : { "buckets" : [ {"value" : "6.3","count" : 2}, {"value" : "5.3","count" : 1}, {"value" : "6.2","count" : 1}, {"value" : "6.4","count" : 1}, {"value" : "7.1","count" : 1} ] } } }

Similarly, you need another request to get the year facet counts with only the rating filter applied:

q=poet&fq=rating:6.3&facet.year={}&size=0

This request gets the following response:

{ "status" : { "rid" : "4L6F8NIoDQrUSvA=", "time-ms" : 4 }, "hits" : { "found" : 3, "start" : 0, "hit" : [ ] }, "facets" : { "year" : { "buckets" : [ {"value" : "1995","count" : 1}, {"value" : "2012","count" : 1}, {"value" : "2013","count" : 1} ] } } }