Discussion:
xapian parser bug?
David Bremner
2018-09-29 22:09:01 UTC
Permalink
Today we noticed that keywords can't be searched as prefixed terms. Or
that's what it looks like anyway. I tested and, or, and not.

╰─% NOTMUCH_DEBUG_QUERY=y notmuch search 'subject:"and"'
Query string is:
subject:"and"
notmuch search: A Xapian exception occurred
A Xapian exception occurred parsing query: Syntax: <expression> AND <expression>
Query string was: subject:"and"

╰─% NOTMUCH_DEBUG_QUERY=y notmuch search 'subject:"or"'
Query string is:
subject:"or"
notmuch search: A Xapian exception occurred
A Xapian exception occurred parsing query: Syntax: <expression> OR <expression>
Query string was: subject:"or"

╰─% NOTMUCH_DEBUG_QUERY=y notmuch search 'subject:"not"'
Query string is:
subject:"not"
notmuch search: A Xapian exception occurred
A Xapian exception occurred parsing query: Syntax: <expression> NOT <expression>
Query string was: subject:"not"

Interestingly, putting space around the operator seems to be a
workaround. Something about turning on phrase parsing maybe?

╰─% NOTMUCH_DEBUG_QUERY=y notmuch count 'subject:" not "'
Query string is:
subject:" not "
Exclude query is:
Query((((Kspam OR Kdeleted) OR Kmuted) OR Kbad-address))
Final query is:
Query(((Tmail AND 0 * ***@1) AND_NOT (((Kspam OR Kdeleted) OR Kmuted) OR Kbad-address)))
9927
Olly Betts
2018-09-30 09:20:39 UTC
Permalink
Note that I'm using 1.4.7, and from your output I believe you're not
(the * in the query description I believe doesn't happen in those
situations any more).
1.4.4 and later eliminate redundant 0 scaling factors, but this one
If it was on the right-hand side of AND_NOT it would be eliminated
(because the right-hand side doesn't contribute any weight anyway).

FWIW, I also couldn't reproduce this (I tried with quest and 1.4.7):

$ quest -psubject:S -fdefault,boolean_any_case 'subject:"and"'
Parsed Query: Query(***@1)

Cheers,
Olly
David Bremner
2018-09-30 12:05:25 UTC
Permalink
Post by Olly Betts
$ quest -psubject:S -fdefault,boolean_any_case 'subject:"and"'
Ah, OK, it must have something to do with the way that notmuch is using
field processors. And I see now that the following code (from
lib/regexp-fields.cc) is probably related (at least it explains
subject:" not" works)

if (str.find (' ') != std::string::npos)
query_str = '"' + str + '"';
else
query_str = str;

return parser.parse_query (query_str, NOTMUCH_QUERY_PARSER_FLAGS, term_prefix);

The motivation for not always triggering phrase processing is that it
breaks/disables wildcards. In particular this change was to fix the
query 'subject:foo*'. The difficulty here is that the field processor
doesn't know if its string argument was originally quoted.
David Bremner
2018-09-30 17:49:49 UTC
Permalink
Post by David Bremner
Post by Olly Betts
$ quest -psubject:S -fdefault,boolean_any_case 'subject:"and"'
Ah, OK, it must have something to do with the way that notmuch is using
field processors. And I see now that the following code (from
lib/regexp-fields.cc) is probably related (at least it explains
subject:" not" works)
if (str.find (' ') != std::string::npos)
query_str = '"' + str + '"';
else
query_str = str;
return parser.parse_query (query_str, NOTMUCH_QUERY_PARSER_FLAGS, term_prefix);
For the record, I have proposed a fix for notmuch (str is known to be
non-empty there). This will phrase quote by default, unless the string
looks like a wildcard query (without spaces).

diff --git a/lib/regexp-fields.cc b/lib/regexp-fields.cc
index 084bc8c0..52f30d82 100644
--- a/lib/regexp-fields.cc
+++ b/lib/regexp-fields.cc
@@ -194,7 +194,7 @@ RegexpFieldProcessor::operator() (const std::string & str)
* phrase parsing, when possible */
std::string query_str;

- if (str.find (' ') != std::string::npos)
+ if (*str.rbegin () != '*' || str.find (' ') != std::string::npos)
query_str = '"' + str + '"';
else
Olly Betts
2018-09-30 20:43:27 UTC
Permalink
Post by David Bremner
if (str.find (' ') != std::string::npos)
query_str = '"' + str + '"';
else
query_str = str;
return parser.parse_query (query_str, NOTMUCH_QUERY_PARSER_FLAGS, term_prefix);
I wouldn't recommend trying to generate strings to feed to QueryParser
like this code seems to be doing. QueryParser aims to parse input from
humans not machines.

As well as the case where str is an operation name, the code above looks
like it will mishandle cases where str contains a tab or double quotes.
There are likely other problem cases too.

Cheers,
Olly
David Bremner
2018-10-01 01:25:33 UTC
Permalink
Post by Olly Betts
Post by David Bremner
if (str.find (' ') != std::string::npos)
query_str = '"' + str + '"';
else
query_str = str;
return parser.parse_query (query_str, NOTMUCH_QUERY_PARSER_FLAGS, term_prefix);
I wouldn't recommend trying to generate strings to feed to QueryParser
like this code seems to be doing. QueryParser aims to parse input from
humans not machines.
str is the parameter to the FieldProcessor () operator. The field
processor needs a way to approximate the standard probabilistic prefix
parsing in the fallback case. The addition of quotes is to force the
generation of a phrase query, otherwise e.g. subject:"christmas party"
doesn't work out well.

I tried using OP_PHRASE as a the default operators, but it doesn't
handle some cases I need.

% quest -o phrase 'bob jones <***@example.com>'
UnimplementedError: OP_NEAR and OP_PHRASE only currently support leaf subqueries

If I don't recursively call parse_query, then I guess I need to generate
terms in a compatible way before turning them into a phrase query. Maybe
that's not as hard as I orginally thought, since being in phrase turns
off the stemmer anyway iiuc. Is there a Xapian API I can use to extract
"bob", "jones", "bob", "example", "com" from the example above? I guess
I guess I could use a throwaway Xapian::Document and a TermGenerator
(basically aping xapian_core/tests/api_termgen.cc).

d

Loading...