Help:CirrusSearch/Logical operators
Note: When you edit this page, you agree to release your contribution under the CC0. See Public Domain Help Pages for more info. |
AND
and OR
should be used with great care, if at all.Negation and parentheses
CirrusSearch does support several ways of indicating negation.
The following queries are all equivalent: -dog
(minus sign), !dog
(exclamation point), and NOT dog
(NOT
operator).
CirrusSearch does not support parentheses, and they are removed from the query.
Lucene, MUST
, and SHOULD
CirrusSearch is built on top of Elasticsearch, which in turn is built on Lucene.
Our Lucene implementation does not support the classic boolean AND
or OR
operators, though it does offer those keywords as binary operators.
Instead Lucene converts AND
and OR
to a different formalismâunary MUST
and SHOULD
operatorsâgiving results that sometimes mimic the expected boolean results, but which can also be very divergent from them.
(Note that CirrusSearch does not currently support MUST
or SHOULD
operators in user queries.
They are used here only to demonstrate the internal workings of Lucene.)
In Lucene, MUST
indicates that a search term is required and must be present in any results.
So, a query like MUST dog
would only return results that contain some form of dog in them (note that this would also be equivalent to just searching for dog
).
On the other hand, SHOULD
terms are optional but should be present if possible; while they are not strictly required, they do effect ranking.
So MUST dog SHOULD cat
would require dog in every result, but would generally rank those that also contain cat as better matches.
The one exception to SHOULD
terms being optional is that if there are zero MUST
terms, then at least one SHOULD
term would be present in each result.
Thus, SHOULD dog SHOULD cat SHOULD fish
would actually give results that have at least one of dog, cat, or fish presentâthough any results with all three would generally rank higher.
Classic boolean search often has an implicit AND
, meaning that any query terms without an explicit boolean operator between them are assumed to have an AND
between them.
In Lucene, any query term without an explicit MUST
or SHOULD
is assumed to have an implicit MUST
applied to it.
Converting AND
and OR
Lucene converts AND
and OR
to MUST
and SHOULD
in a way that sometimes gives the expected results, but often leads to very unexpected results.
When Lucene encounters AND
, it applies MUST
to the terms before and after the AND
.
When it encounters OR
, it applies SHOULD
to the terms before and after the OR
.
The query is processed left to right, and later AND
or OR
operators override earlier ones (see examples below).
This effectively gives an unusual "backward order precedence" to the operators, and the results can be quite unexpected compared to classic boolean searching.
Examples that go wrong
Below are some worked examples where the conversion from AND
/OR
to MUST
/SHOULD
gives divergent results from the expectations of classic boolean operators.
blue OR red AND green
- convert
OR
toSHOULD
before and after, giving:
- convert
SHOULD blue SHOULD red AND green
- convert
AND
toMUST
before and after (in this case overriding the previously appliedSHOULD
), giving:
- convert
SHOULD blue MUST red MUST green
- The result set is thus the same as
red green
, withblue
being optional (and only affecting ranking).
- The result set is thus the same as
blue OR red green
- convert
OR
toSHOULD
before and after, giving:
- convert
SHOULD blue SHOULD red green
- apply an implicit
MUST
to any term without an explicitMUST
orSHOULD
, giving:
- apply an implicit
SHOULD blue SHOULD red MUST green
- In a classic boolean system with implicit
AND
, we would expect thatblue OR red AND green
andblue OR red green
to be the same, but compare this to the example above to see the differenceâonlygreen
is required here, whilered
andgreen
are both required above.
- In a classic boolean system with implicit
blue AND red OR green
- convert
AND
toMUST
before and after, giving:
- convert
MUST blue MUST red OR green
- convert
OR
toSHOULD
before and after, giving:
- convert
MUST blue SHOULD red SHOULD green
- The result set is thus the same as simply searching for
blue
, withred
andgreen
only affecting ranking. This also means that if there are zero documents with eitherred
orgreen
in them, you will get the same results searching forblue AND red OR green
as you would for just searching forblue
, which is not what you would expect from a classic boolean system.
- The result set is thus the same as simply searching for
In general, mixing OR
with AND
, including implicit AND
in one query gives results that are unintuitive in a classic boolean framework.
It can also be very difficult to detect these cases where the boolean logic goes awry, unless you already know exactly how many documents contain each possible positive and negative combination of your query terms.
Common use cases
If you have no explicit operators, then the boolean default is AND
and the Lucene default is MUST
, which are equivalent if they are the only operators present in the query:
blue red green
â user intent: all three terms must be present in any resultsblue AND red AND green
â explicit classic boolean query: all three terms must be present in any resultsMUST blue MUST red MUST green
â Lucene interpretation: all three terms must be present in any results
However, since MUST
is implicit, nothing is gained by making it explicit by using AND
, other than the potential for later boolean confusion.
If the only operator in the query is OR
âcrucially meaning that there is no implicit AND
, then it is the same as everything having a SHOULD
(recall that if a query has SHOULD
terms but no MUST
terms, than at least one of the SHOULD
terms will be present in any result):
blue OR red OR green
â classic boolean query: at least one of the three terms must be present in any resultsSHOULD blue SHOULD red SHOULD green
â Lucene interpretation: at least one of the three terms must be present in any results
Be very careful with implicit AND
/MUST
!
In the example above, blue OR red green
the implicit MUST
applied to green
means that neither blue
nor red
are strictly required to be in the results.
Booleans, keywords, and prefixes
AND
and OR
do not interact predictably with special keywords (like insource:
or hastemplate:
) or with namespaces (like Talk:
or User:
) and probably should not be used in conjunction with either.
Future plans
Of course, the Search Platform team is not very happy with this state of affairs.
In the short term we are creating this document and updating the Help:CirrusSearch documentation to reflect the reality of our current system.
Longer term, we plan to implement a new layer in CirrusSearch that will properly construct a Lucene MUST
/SHOULD
query that is equivalent to a given classic boolean query, including proper support for parentheses and return the expected results.
(It is possible to specify in Lucene that at least one of a set of query terms or clauses is a required to match, which is equivalent to a boolean OR
; requiring that all of a set of query terms or clauses match is the same as a boolean AND
.)
Beyond that, we may also make explicit the MUST
and SHOULD
operators, possibly using the unary syntax shown in this document, but also possibly using some other syntax, as yet to be determined.
Further reading
- BooleanQuerySyntax â a summary of a mailing list discussion about the problem, going back to 2005, with a link to a bug report on the problem from 2003. (The 2003 bug was closed in 2009, and claims there is a different Lucene query parser that does the right thing with boolean queries, but we don't have access to it in CirrusSearch.)