Skip to content

Thunderbird full-text search prototype a la SQLite FTS3

Full-text search using FTS3.

Full-text search with a contact constraint.

The global database sqlite file resulting from indexing all of mozilla.dev.apps.thunderbird is about 13M for something like 4500 messages.  We’re providing FTS3 with the bodies (but not attachments!) of all the newsgroup messages and the subjects of the messages which initiate new threads.  For real usage, we will need to also index the subjects of each message.

Note that the message bodies have not been processed at all by the Thunderbird/gloda code before handing them off to FTS3.  So quoted messages get indexed even though it’s a lot of excess data.  We’re relying on FTS3 to do all stop-words, etc.  FTS3′s Porter stemming/tokenization is in use.

{ 6 } Comments

  1. Arthur | August 19, 2008 at 6:30 am | Permalink

    4500 is bit low for daily usage. My mail folders have something like 2 million messages in total. I hope it’ll be still usable with that many mails.

  2. Andrew Sutherland | August 19, 2008 at 11:44 am | Permalink

    Right, that’s not a limitation on a number of messages it can index or easily handle. The example is to provide an idea of ‘bloat’ for disk usage for FTS3, mozilla.dev.apps.thunderbird hopefully being near and dear to many hearts.

    On the other hand, 2 million messages is a lot. I expect that would require extra leg-work to be able to handle that optimally.

    Are a lot of those messages from mailing lists? More specifically, can they be categorized into different domains, not all of which need to be accessed all simultaneously? We could potentially segment the database along those lines.

  3. Arthur | August 19, 2008 at 1:11 pm | Permalink

    Yes, a lot of my messages won’t ever be searched simultaneously. They’re in different folders and different mail account. The maximum of messages to search at the same time would probably be around 400’000 at the moment. That’s still quite a lot and the current subject/sender search field already takes its time. But I’m certainly looking forward to the new search features! Making the handling of mails easier is wonderful. Thank’s for the great work.

    A plot of index disk usage vs. number of mails and search time vs. number of mails would certainly be interesting. Probably interesting enough for an automated regression test like the current performance measurements for FF?

  4. Silas | January 24, 2009 at 3:11 pm | Permalink

    Is the code available for this prototype? Are you planning to release it as an extension for Thunderbird?

  5. Andrew Sutherland | February 4, 2009 at 5:55 am | Permalink

    Gloda-based searched is currently targeted for beta 2 using a quick-search like idiom. Search bugzilla for “gloda” to see current activities and patches.

  6. Matt Fowles | April 27, 2009 at 9:47 am | Permalink

    I have been playing with beta 2 for a little while now. I can get the radar vis using experimental toolbar to work, but I cannot figure out how to get a simple full text search that displays a list of results (like you posted here). Any help would be greatly appreciated.

    Thanks,
    Matt

{ 2 } Trackbacks

  1. david ascher - » Gearing up | August 19, 2008 at 8:53 pm | Permalink

    [...] end-users, from the mundane (a birthday field in the address book, simpler account setup) to some powerful, platform-style [...]

  2. [...] some needed changes to the Thunderbird platform it has become possible to provide efficient full text search over messages and their headers.  This will enable Thunderbird to offer a much improved search experience over [...]