Discussion:
[PATCH RFC] index: add body: search query term
William Casarin
2018-10-10 05:53:26 UTC
Permalink
This adds the ability to search specifically on the body

eg.

notmuch search tag:notmuch and body:PATCH

Signed-off-by: William Casarin <***@jb55.com>
---

Hey there,

I'm looking to add the ability to search specifically on the body. I
was poking around in the indexer, added these lines and reindexed a
few tags. It appears to work!

I was just wondering if there's anything I'm missing? That seemed a
bit too easy. I noticed there are some NOTMUCH_FIELDS that I'm not
sure what they do.

If anyone has any xapian knowledge that could shine some insight into
what the next steps might be, if any.

Thanks!
Will


lib/database.cc | 3 +++
lib/index.cc | 2 +-
2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/lib/database.cc b/lib/database.cc
index 9cf8062c..0b085b21 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -297,6 +297,9 @@ prefix_t prefix_table[] = {
{ "subject", "XSUBJECT", NOTMUCH_FIELD_EXTERNAL |
NOTMUCH_FIELD_PROBABILISTIC |
NOTMUCH_FIELD_PROCESSOR},
+ { "body", "XBODY", NOTMUCH_FIELD_EXTERNAL |
+ NOTMUCH_FIELD_PROBABILISTIC |
+ NOTMUCH_FIELD_PROCESSOR},
};

static void
diff --git a/lib/index.cc b/lib/index.cc
index 3f694387..299b8770 100644
--- a/lib/index.cc
+++ b/lib/index.cc
@@ -506,7 +506,7 @@ _index_mime_part (notmuch_message_t *message,
body = (char *) g_byte_array_free (byte_array, false);

if (body) {
- _notmuch_message_gen_terms (message, NULL, body);
+ _notmuch_message_gen_terms (message, "body", body);

free (body);
}
--
2.19.0
David Bremner
2018-10-10 10:43:11 UTC
Permalink
Post by William Casarin
lib/database.cc | 3 +++
lib/index.cc | 2 +-
2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/lib/database.cc b/lib/database.cc
index 9cf8062c..0b085b21 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -297,6 +297,9 @@ prefix_t prefix_table[] = {
{ "subject", "XSUBJECT", NOTMUCH_FIELD_EXTERNAL |
NOTMUCH_FIELD_PROBABILISTIC |
NOTMUCH_FIELD_PROCESSOR},
+ { "body", "XBODY", NOTMUCH_FIELD_EXTERNAL |
+ NOTMUCH_FIELD_PROBABILISTIC |
+ NOTMUCH_FIELD_PROCESSOR},
};
static void
diff --git a/lib/index.cc b/lib/index.cc
index 3f694387..299b8770 100644
--- a/lib/index.cc
+++ b/lib/index.cc
@@ -506,7 +506,7 @@ _index_mime_part (notmuch_message_t *message,
body = (char *) g_byte_array_free (byte_array, false);
if (body) {
- _notmuch_message_gen_terms (message, NULL, body);
+ _notmuch_message_gen_terms (message, "body", body);
free (body);
}
--
I think you'll find you broke non-prefixed queries. Does the test suite
still pass? If so, we need more tests. Anyway, if you add a second set
of terms I'd be intersted how much this bloats the index. Ideally with
the performance corpus so we can all reproduce the experiment.

d
William Casarin
2018-10-10 16:34:26 UTC
Permalink
Post by David Bremner
I think you'll find you broke non-prefixed queries. Does the test suite
still pass? If so, we need more tests.
yeah they seem to pass. but you're right, something seems a bit off:

./notmuch count subject:github or body:github and tag:notmuch
3271

./notmuch count github and tag:notmuch
665
Post by David Bremner
of terms I'd be intersted how much this bloats the index. Ideally with
the performance corpus so we can all reproduce the experiment.
sounds good, I was wondering that as well.

I wonder if it's all worth the effort though, since a workaround could
be:

notmuch search <query> and not subject:<query>

If it's too annoying to have a body prefix, due to index bloat or
performance issues, would doing something hacky such as translating
'body:<query>' to '<query> and not subject:<query>' make sense?

Will
--
https://jb55.com
William Casarin
2018-10-10 16:36:41 UTC
Permalink
Post by William Casarin
I wonder if it's all worth the effort though, since a workaround could
notmuch search <query> and not subject:<query>
If it's too annoying to have a body prefix, due to index bloat or
performance issues, would doing something hacky such as translating
'body:<query>' to '<query> and not subject:<query>' make sense?
Thinking about this some more, this is not exactly the same, since this
would explicitly exclude subjects, whereas the body query wouldn't care
what the subject was.
--
https://jb55.com
Loading...