Amazon CloudSearch
Developer Guide (API Version 2011-02-01)
Next »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.Did this page help you?  Yes | No |  Tell us about it...

What Is Amazon CloudSearch?

Amazon CloudSearch is a fully-managed service in the cloud that makes it easy to set up, manage, and scale a search solution for your website. Amazon CloudSearch enables you to search large collections of data such as web pages, document files, forum posts, or product information. With Amazon CloudSearch, you can quickly add search capabilities to your website without having to become a search expert or worry about hardware provisioning, setup, and maintenance. As your volume of data and traffic fluctuates, Amazon CloudSearch automatically scales to meet your needs.

You can use Amazon CloudSearch to index and search both structured data and plain text. Amazon CloudSearch supports full text search, searching within fields, prefix searches, Boolean searches, and faceting. You can get search results in JSON or XML, sort and filter results based on field values, and rank results alphabetically, numerically, or according to custom rank expressions.

To build a search solution with Amazon CloudSearch, you:

  • Create and configure a search domain. A search domain encapsulates your searchable data and the search instances that handle your search requests. You set up a separate domain for each different data set you want to search.

  • Upload the data you want to search to your domain. Amazon CloudSearch automatically indexes your data and deploys the search index to one or more search instances.

  • Search your domain. You send a search request to your domain's search endpoint as an HTTP/HTTPS GET request.

The rest of this section introduces the key concepts and terms that will help you understand what you need to do to build a search solution with Amazon CloudSearch:

For a high-level overview of Amazon CloudSearch, service highlights, and pricing information, see the Amazon CloudSearch detail page. The rest of this guide describes how to use Amazon CloudSearch and provides detailed information about the APIs and command line tools. If you are new to Amazon CloudSearch, you should begin with Getting Started with Amazon CloudSearch. For more information about working with your own data sets, see Preparing Your Data for Amazon CloudSearch. For more information about constructing searches with the Amazon CloudSearch query language, see Searching Your Data with Amazon CloudSearch.

The following table lets you jump directly to specific task or reference topics.

Automatic Scaling in Amazon CloudSearch

A search domain has one or more search instances, each with a finite amount of RAM and CPU resources for indexing data and processing requests. The number of search instances deployed for a domain depends on the documents in your collection and the volume and complexity of your search requests.

As a managed service, Amazon CloudSearch determines the size and number of search instances required to deliver low latency, high throughput search performance. When you upload your data and configure your index, Amazon CloudSearch builds an index and picks the appropriate initial search instance type. As you use your search domain, Amazon CloudSearch automatically scales to accommodate the amount of data uploaded to the domain and the volume and complexity of search requests.

Note

At this time, scaling is completely automatic. Amazon CloudSearch does not provide a mechanism for choosing a particular search instance type or configuring the desired number of instances.

When you create a search domain, a single instance is deployed for the domain. As the following illustration shows, you always have at least one instance for your domain and additional instances are added as the volume of data or traffic increases.

Scaling for Data and Traffic

Scaling for Data

When the amount of data you add to your domain exceeds the capacity of the initial search instance type, Amazon CloudSearch automatically scales your search domain to a larger search instance type. Once a domain exceeds the capacity of the largest search instance type, Amazon CloudSearch partitions the search index across multiple search instances. (The number of search instances required to hold the index partitions is sometimes referred to as the domain's width.)

Conversely, if the volume of data in your domain shrinks, your domain is scaled down to fewer search instances or a smaller search instance type to minimize costs.

Note

If your domain has scaled up to accommodate your index size and you delete a large number of documents, the domain scales down the next time the full index is rebuilt. While this is periodically done automatically, to scale down as quickly as possible you can explicitly rebuild the index when you are done deleting documents.

Scaling for Traffic

As your search request volume or complexity increases, it takes more processing power to handle the load. A high volume of document uploads also increases the load on a domain's search instances. When a search instance nears its maximum load, Amazon CloudSearch automatically deploys a duplicate search instance to provide additional processing power. (The number of duplicate search instances is sometimes referred to as the domain's depth.)

Conversely, when traffic drops, Amazon CloudSearch removes unneeded search instances to minimize costs. For example, a new domain might scale up to handle the initial influx of documents, and scale back down once you have finished uploading your data and are only submitting updates.

If your domain experiences a sudden surge in traffic, Amazon CloudSearch will automatically deploy additional search instances. However, it takes a few minutes to set up the new instances, so you might see an increase in 5xx errors until they are ready to start processing requests. For more information about handling 5xx errors, see Handling Errors in Amazon CloudSearch.

Keep in mind that the type and complexity of your search requests can impact overall search performance and in some cases increase the number of search instances required to operate your domain. For more information, see Tuning Search Requests in Amazon CloudSearch. Submitting a high volume of small or single-document batches can also have a negative impact on your search domain's performance.