If we could search for only one word at a time, full-text search would be
pretty inflexible. Fortunately, the match
query makes multiword queries
just as simple:
GET /my_index/my_type/_search
{
"query": {
"match": {
"title": "BROWN DOG!"
}
}
}
The preceding query returns all four documents in the results list:
{
"hits": [
{
"_id": "4",
"_score": 0.73185337, (1)
"_source": {
"title": "Brown fox brown dog"
}
},
{
"_id": "2",
"_score": 0.47486103, (2)
"_source": {
"title": "The quick brown fox jumps over the lazy dog"
}
},
{
"_id": "3",
"_score": 0.47486103, (2)
"_source": {
"title": "The quick brown fox jumps over the quick dog"
}
},
{
"_id": "1",
"_score": 0.11914785, (3)
"_source": {
"title": "The quick brown fox"
}
}
]
}
-
Document 4 is the most relevant because it contains
"brown"
twice and"dog"
once. -
Documents 2 and 3 both contain
brown
anddog
once each, and thetitle
field is the same length in both docs, so they have the same score. -
Document 1 matches even though it contains only
brown
, notdog
.
Because the match
query has to look for two terms--["brown","dog"]
--internally it has to execute two term
queries and combine their individual
results into the overall result. To do this, it wraps the two term
queries
in a bool
query, which we examine in detail in [bool-query].
The important thing to take away from this is that any document whose
title
field contains at least one of the specified terms will match the
query. The more terms that match, the more relevant the document.
Matching any document that contains any of the query terms may result in a
long tail of seemingly irrelevant results. It’s a shotgun approach to search.
Perhaps we want to show only documents that contain all of the query terms.
In other words, instead of brown OR dog
, we want to return only documents
that match brown AND dog
.
The match
query accepts an operator
parameter that defaults to or
.
You can change it to and
to require that all specified terms must match:
GET /my_index/my_type/_search
{
"query": {
"match": {
"title": { (1)
"query": "BROWN DOG!",
"operator": "and"
}
}
}
}
-
The structure of the
match
query has to change slightly in order to accommodate theoperator
parameter.
This query would exclude document 1, which contains only one of the two terms.
The choice between all and any is a bit too black-or-white. What if the
user specified five query terms, and a document contains only four of them?
Setting operator
to and
would exclude this document.
Sometimes that is exactly what you want, but for most full-text search use cases, you want to include documents that may be relevant but exclude those that are unlikely to be relevant. In other words, we need something in-between.
The match
query supports the minimum_should_match
parameter, which allows
you to specify the number of terms that must match for a document to be considered
relevant. While you can specify an absolute number of terms, it usually makes
sense to specify a percentage instead, as you have no control over the number of words the user may enter:
GET /my_index/my_type/_search
{
"query": {
"match": {
"title": {
"query": "quick brown dog",
"minimum_should_match": "75%"
}
}
}
}
When specified as a percentage, minimum_should_match
does the right thing:
in the preceding example with three terms, 75%
would be rounded down to 66.6%
,
or two out of the three terms. No matter what you set it to, at least one term
must match for a document to be considered a match.
Note
|
The |
To fully understand how the match
query handles multiword queries, we need
to look at how to combine multiple queries with the bool
query.