The above screenshot is of a normal search query on DevMo for “customize toolbar”. I see 2.5 results, and I honestly have no interest in the first item at all. (It’s a page that only advanced DevMo authors would care about, for those who refuse to squint or click on images to see bigger versions of images.)
The above screenshot is of the same query using DevMoXhibit. You will note you can see more things, and the first result from the other page is completely elided because we filter by default so that only “Real” result pages are shown. (In general, I am not looking for talk pages or user pages or meta-pages.)
But enough about my interpretations of pictures, why don’t you:
Neat things we do that may not be immediately obvious:
- We flatten the score into deciles, and then within each decile range we sort based on the view count for the page. The theory is that, given equally likely results, the one that more people have looked at is probably more interesting to you, roughly speaking.
- We use a simple heuristic to figure out the page type, as mentioned above (“Real”, “Talk”, “User”, etc.)
- We try and hide all things related to the language, as we explicitly query on a language which means it’s just noise. Right now, that language is always english, but the code uses a variable if you want to write the code to hook that up and expose it in the UI.
- We produce a “smart” snippet. The snippets provided by the search results naively will include “chrome” that is part of the document, which makes for a nearly useless snippet. For example, take a gander at XUL/toolbar:
- Plain old snippet:
- « XUL Reference home [ Examples | Attributes | Properties | Methods | Related ] A container which typically contains a row of buttons. It is a type of box that defaults to horizontal orientation. …
- Smart snippet:
- A container which typically contains a row of buttons. It is a type of box that defaults to horizontal orientation. …
- Plain old snippet:
- We produce a sometimes over-zealous smart snippet. If you were to keep reading both of those snippets, you would notice that the smart snippet eats a bit that the non-smart-snippet does not. That is because the smart snippet is based on looking at a version of the snippet which has HTML tags in it, and then it tries to nuke those HTML tags out of existence using simple regexps.
- This probably should work on other deki wikis if so adapted, but I don’t use any others, so YMMV.
- We actually issue two search queries because there are two result formats that can be produced. “xml” is an inexplicable mixture of too much data and too little data. Namely, it does not tell you the tags on a document, which is basically the most useful piece of info, but it does tell you every link to and from that page (which we expose, although I doubt it will be useful enough to justify it). It does give you a link to be able to get the tags, but that’s a costly operation when you have to perform it for each search result. In contrast, “search” gives you the tags; they are only space-delimited, but that’s fine. (“Inexplicable” may be a bit harsh; looking at the source, it’s just dumping the page info without further processing/lookups, but arguably it would be very useful if they made the effort to fetch that data.)
- Because of cross-site XHR issues, this is not quite as hackable as I would like. My demo server above is using mod_proxy (with a very specific constraint) to proxy the search to DevMo. When I develop locally, I have to do the same thing. Presumably if you are using Firefox 3.5 and devmo is set up correctly, then this would not be a problem. But, 1) for no good reason, I only use Firefox 3.0 and 2) have no clue whether devmo is emitting the headers that would enable that to work. I strongly encourage someone to look into #2 and fix it if not.
- As with BugXhibit, the sliders are totally broken for me and it’s sad, but I left them in there in the hopes that they work for someone, somewhere. Alternately, I would not complain if someone, somewhere, fixed them.
The hg repo is here.