k-Nearest Neighbor (k-NN) search in Amazon OpenSearch Service
Short for its associated k-nearest neighbors algorithm, k-NN for Amazon OpenSearch Service lets you search for points in a vector space and find the "nearest neighbors" for those points by Euclidean distance or cosine similarity. Use cases include recommendations (for example, an "other songs you might like" feature in a music application), image recognition, and fraud detection.
Note
This documentation describes version compatibility between OpenSearch Service and various versions
of the k-NN plugin, as well as limitations when using the plugin with managed OpenSearch Service. For
comprehensive documentation of the k-NN plugin, including simple and complex examples,
parameter references, and the complete API reference for the plugin, see the open source
OpenSearch documentation
Use the following tables to find the version of the k-NN plugin running on your Amazon OpenSearch Service
domain. Each k-NN plugin version corresponds to an OpenSearch
OpenSearch | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
OpenSearch version | k-NN plugin version | Notable features | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2.13 | 2.13.0.0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2.11 | 2.11.0.0 |
Added support for |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2.9 | 2.9.0.0 | Implemented k-NN byte vectors and efficient filtering with the Faiss |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2.7 | 2.7.0.0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2.5 | 2.5.0.0 | Extended SystemIndexPlugin for k-NN model system index, added Lucene-specific file extensions to core HybridFS | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2.3 | 2.3.0.0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1.3 | 1.3.0.0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1.2 | 1.2.0.0 | Added support for the Faiss |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1.1 | 1.1.0.0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1.0 |
1.0.0.0 |
Renamed REST APIs while supporting backwards compatibility, renamed
namespace from opendistro to opensearch |
Elasticsearch | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Elasticsearch version | k-NN plugin version | Notable features | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
7.1 |
1.3.0.0 |
Euclidean distance | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
7.4 |
1.4.0.0 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
7.7 |
1.8.0.0 |
Cosine similarity | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
7.8 |
1.9.0.0 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
7.9 |
1.11.0.0 |
Warmup API, custom scoring | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
7.10 |
1.13.0.0 |
Hamming distance, L1 Norm distance, Painless scripting |
Getting started with k-NN
To use k-NN, you must create an index with the index.knn
setting and add
one or more fields of the knn_vector
data type.
PUT my-index { "settings": { "index.knn": true }, "mappings": { "properties": { "
my_vector1
": { "type": "knn_vector", "dimension": 2 }, "my_vector2
": { "type": "knn_vector", "dimension": 4 } } } }
The knn_vector
data type supports a single list of up to 10,000 floats,
with the number of floats defined by the required dimension
parameter. After you
create the index, add some data to it.
POST _bulk { "index": { "_index": "my-index", "_id": "1" } } { "my_vector1": [1.5, 2.5], "price": 12.2 } { "index": { "_index": "my-index", "_id": "2" } } { "my_vector1": [2.5, 3.5], "price": 7.1 } { "index": { "_index": "my-index", "_id": "3" } } { "my_vector1": [3.5, 4.5], "price": 12.9 } { "index": { "_index": "my-index", "_id": "4" } } { "my_vector1": [5.5, 6.5], "price": 1.2 } { "index": { "_index": "my-index", "_id": "5" } } { "my_vector1": [4.5, 5.5], "price": 3.7 } { "index": { "_index": "my-index", "_id": "6" } } { "my_vector2": [1.5, 5.5, 4.5, 6.4], "price": 10.3 } { "index": { "_index": "my-index", "_id": "7" } } { "my_vector2": [2.5, 3.5, 5.6, 6.7], "price": 5.5 } { "index": { "_index": "my-index", "_id": "8" } } { "my_vector2": [4.5, 5.5, 6.7, 3.7], "price": 4.4 } { "index": { "_index": "my-index", "_id": "9" } } { "my_vector2": [1.5, 5.5, 4.5, 6.4], "price": 8.9 }
Then you can search the data using the knn
query type.
GET my-index/_search { "size": 2, "query": { "knn": { "
my_vector2
": { "vector": [2, 3, 5, 6], "k": 2 } } } }
In this case, k
is the number of neighbors you want the query to return,
but you must also include the size
option. Otherwise, you get
k
results for each shard (and each segment) rather than k
results for the entire query. k-NN supports a maximum k
value of
10,000.
If you mix the knn
query with other clauses, you might receive fewer than
k
results. In this example, the post_filter
clause reduces the
number of results from 2 to 1.
GET my-index/_search { "size": 2, "query": { "knn": { "
my_vector2
": { "vector": [2, 3, 5, 6], "k": 2 } } }, "post_filter": { "range": { "price
": { "gte": 6, "lte": 10 } } } }
If you need to handle a large volume of queries while maintaining optimal performance,
you can use the _msearch
GET _msearch { "index": "my-index"} { "query": { "knn": {"my_vector2":{"vector": [2, 3, 5, 6],"k":2 }} } } { "index": "my-index", "search_type": "dfs_query_then_fetch"} { "query": { "knn": {"my_vector1":{"vector": [2, 3],"k":2 }} } }
The following video demonstrates how to set up bulk vector searches for K-NN queries.
k-NN differences, tuning, and limitations
OpenSearch lets you modify all k-NN
settings_cluster/settings
API. On OpenSearch Service, you can
change all settings except knn.memory.circuit_breaker.enabled
and
knn.circuit_breaker.triggered
. k-NN statistics are included as Amazon CloudWatch metrics.
In particular, check the KNNGraphMemoryUsage
metric on each data node against
the knn.memory.circuit_breaker.limit
statistic and the available RAM for
the instance type. OpenSearch Service uses half of an instance's RAM for the Java heap (up to a heap
size of 32 GiB). By default, k-NN uses up to 50% of the remaining half, so an instance
type with 32 GiB of RAM can accommodate 8 GiB of graphs (32 * 0.5 * 0.5). Performance
can suffer if graph memory usage exceeds this value.
You can't migrate a k-NN index to UltraWarm or cold storage if the index uses approximate k-NN"index.knn": true
). If
index.knn
is set to false
(exact
k-NN