Thunderbird full-text search prototype a la SQLite FTS3

Full-text search using FTS3.

Full-text search with a contact constraint.

The global database sqlite file resulting from indexing all of is about 13M for something like 4500 messages.  We’re providing FTS3 with the bodies (but not attachments!) of all the newsgroup messages and the subjects of the messages which initiate new threads.  For real usage, we will need to also index the subjects of each message.

Note that the message bodies have not been processed at all by the Thunderbird/gloda code before handing them off to FTS3.  So quoted messages get indexed even though it’s a lot of excess data.  We’re relying on FTS3 to do all stop-words, etc.  FTS3’s Porter stemming/tokenization is in use.

8 thoughts on “Thunderbird full-text search prototype a la SQLite FTS3

  1. 4500 is bit low for daily usage. My mail folders have something like 2 million messages in total. I hope it’ll be still usable with that many mails.

  2. Right, that’s not a limitation on a number of messages it can index or easily handle. The example is to provide an idea of ‘bloat’ for disk usage for FTS3, hopefully being near and dear to many hearts.

    On the other hand, 2 million messages is a lot. I expect that would require extra leg-work to be able to handle that optimally.

    Are a lot of those messages from mailing lists? More specifically, can they be categorized into different domains, not all of which need to be accessed all simultaneously? We could potentially segment the database along those lines.

  3. Yes, a lot of my messages won’t ever be searched simultaneously. They’re in different folders and different mail account. The maximum of messages to search at the same time would probably be around 400’000 at the moment. That’s still quite a lot and the current subject/sender search field already takes its time. But I’m certainly looking forward to the new search features! Making the handling of mails easier is wonderful. Thank’s for the great work.

    A plot of index disk usage vs. number of mails and search time vs. number of mails would certainly be interesting. Probably interesting enough for an automated regression test like the current performance measurements for FF?

  4. Pingback: david ascher - » Gearing up

  5. Pingback: Bryan Clark » Blog Archive » Looking at User Experience for Thunderbird 3

  6. Is the code available for this prototype? Are you planning to release it as an extension for Thunderbird?

  7. Gloda-based searched is currently targeted for beta 2 using a quick-search like idiom. Search bugzilla for “gloda” to see current activities and patches.

  8. I have been playing with beta 2 for a little while now. I can get the radar vis using experimental toolbar to work, but I cannot figure out how to get a simple full text search that displays a list of results (like you posted here). Any help would be greatly appreciated.


Comments are closed.