build matrices for arbitrarypushlog

A TraceMonkey Push

As I mentioned in my teaser on rich thunderbird mozmill logs, in order to get the build logs and provide failure clustering you already have to do most of the stuff tinderboxpushlog does.  One of the key things is summarizing the builds in a way that is useful, where the definition of “useful” probably varies a lot between users.

While Thunderbird has an extremely boring set of build types (build, xpcshell, mozmill), my first take on summarizing them was no good.  While fixing it, I decided to feature creep (out of my hobby time allocation) and see if I could create a presentation that could handle the prolific Firefox and TraceMonkey trees.

A Firefox Push

While I am not going to claim it’s perfect, I like it.  It’s probably also important to caveat that not all Tracemonkey builds are categorized.  The mobile talos runs identify themselves by a completely different set of names from the desktop ones, and there’s not really room for columns for that.  Additionally, some builds cite a revision for “mobile-browser”, but we ignore that extra meta-data.  Although the design was intended to handle Thunderbird’s repository where each build is a tuple of “comm-central” and “mozilla-central” revision used, we really need to have that tuple for every build in the tinderbox tree, and TraceMonkey is not providing it.  (We could kick builds without the info to the “outer” push as an enhancement.)

A Thunderbird (well, comm-central) push.

As a gesture of friendship to non-Thunderbird trees, we now also process mochitest and reftest logs, although I’m failing to surface some of the details retrieved in the UI.

Anywho, you can see arbpl in action for yourself at arbpl.visophyte.org.  It cron scrapes every 5 minutes.  The error recovery logic is not production-grade yet; the scraper can fall victim to partially written tinderbox JSON files on the tinderbox server, which means that some trees might not see updates for longer than that.  And various other things may go wrong too.  The client does not auto-refresh or use Socket.IO or anything.  If you want to run your own, hit github.  If you want to read the source, you should probably hit github too, because the production serving mode is reasonably optimized and crams all the JS into a single (gzipped) file.

teaser: Rich contextual information for Thunderbird mozmill failures

Sometimes Thunderbird mozmill unit tests fail.  When they do, it’s frequently a mystery.  My logsploder work helped reduce the mystery for my local mozmill test runs, but did nothing for tinderbox runs or developers without tool fever.  Thanks to recent renewed activity on the Thunderbird front-end, this has become more of a problem and so it was time for another round of amortized tool building towards a platonic ideal.

The exciting pipeline that leads to screenshots like you see in this post:

  • Thunderbird Mozmill tests run with the testing framework set to log semi-rich object representations to in-memory per-test buckets.
  • In the event of a test failure, the in-Thunderbird test framework:
    • Gathers information about the state of the system now that we know there is a failure and emit it.  This takes the form of canvas-based screenshots (using chrome privileges) of all open windows in the application, plus the currently focused element in the window.
    • Emits (up to) the last 10 seconds of log events from the previous test.
    • Emits all of the log events from the current test.
  • The python test driver receives the emitted data and dumps it to stdout in a series of JSON blobs that are surrounded by magical annotations.
  • A node.js daemon doing the database-based tinderboxpushlog thing (like my previous Jetpack/CouchDB work that found CouchDB to not be a great thing to directly expose to users and died, but now with node and hbase) processes the tinderbox logs for the delicious JSON blobs.
    • It also processes xpcshell runs and creates an MD5 hash that attempts to characterize the specific fingerprint of the run.  Namely, xpcshell emits lines prefixed with “TEST-” that have a regular form to describe when the pending test count changes or when specific comparison operations or failures occur.  It is assumed that tests will not comparison check values that are effectively random values, that the comparisons will tend to be deterministic (even if there are random orderings in the test) or notable when not deterministic, and thus that the trace derived from this filtering and hashing will be sufficiently consistent that similar failures should result in identical hashes.
    • Nothing is done with mochitests because: Thunderbird does not have them, they don’t appear to emit any context about their failures, and as far as I can tell, frequently the source of a mochitest failure is the fault of an earlier test that claimed it had completed but still had some kind of ripples happening that broke a later test.
  • A wmsy-based UI does the UI thing.

    The particular failure shown here is an interesting one where the exception is telling us that a popup we expected to open never opened.  But when we look at the events from the log, we can see that what happened is the popup opened and then immediately closed itself.  Given that this happened (locally) on linux, this made me suspect that the window manager didn’t let the popup acquire focus and perform a capture.  It turns out that I forgot to install the xfwm4 window manager on my new machine which my xvnc session’s xstartup script was trying to run in order to get a window manager that plays nicely with mozmill and our focus needs.  (Many window managers have configurable focus protection that converts a window trying to grab focus into an attention-requested notification.)

    This is a teaser because it will probably be a few more days before the required patch lands on comm-central, I use RequireJS‘ fancy new optimizer to make the client load process more efficient, and I am happy that the server daemons can keep going for long stretches of time without messing up.  The server and client code is available on github, and the comm-central patch from hg.

    teaser: code completion in skywriter/ajax.org code editor using jstut and narcissus

    I’ve hooked up jstut’s (formerly narscribblus‘) narcissus-based parser and jsctags-based abstract interpreter up to the ajax.org code editor (ace, to be the basis for skywriter, the renamed and somewhat rewritten bespin).  Ace’s built-in syntax highlighters are based on the somewhat traditional regex-based state machine pattern and have no deep understanding of JS.  The tokenizers have a very limited stateful-ness; they are line-centric and the only state is the state of the parser at the conclusion of tokenizing the previous line.  The advantage is that they will tend to be somewhat resilient in the face of syntax errors.

    In contrast, narcissus is a recursive descent parser that explodes when it encounters a parse error (and takes all the state on the stack at the point of failure with it).  Accordingly, my jstut/narscribblus parser is exposed to ace through a hybrid tokenizer that uses the proper narcissus parser as its primary source of tokens and falls back to the regex state machine tokenizer for lines that the parser cannot provide tokens for.  I have thus far made some attempt at handling invalidation regions in a respectable fashion but it appears ace is pretty cavalier in terms of invalidating from the edit point to infinity, so it doesn’t really help all that much.

    Whenever a successful parse occurs, the abstract interpreter is kicked off which goes and attempts to evaluate the document.  This includes special support for CommonJS require() and CommonJS AMD define() operations.  The require(“wmsy/wmsy”) in the screenshot above actually retrieves the wmsy/wmsy module (using the RequireJS configuration), parses it using narcissus, parses the documentation blocks using jstut, performs abstract interpretation and follow-on munging, and then returns the contents of that namespace (asynchronously using promises) to the abstract interpreter for the body of the text editor.  The hybrid tokenizer does keep around a copy of the last good parse to deal with code completion in the very likely case where the intermediate stages of writing new code result in parse failures.  Analysis of the delta from the last good parse is used in conjunction with the last good parse to (attempt to) provide useful code completion

    The net result is that we have semantic information about many of the tokens on the screen and could do fancy syntax highlighting like Eclipse can do.  For example, global variables could be highlighted specially, types defines from third party libraries could get their own color, etc.  For the purposes of code completion, we are able to determine the context surrounding the cursor and the appropriate data types to use as the basis for completion.  For example, in the first screenshot, we are able to determine that we are completing a child of “wy” which we know to be an instance of type WmsyDomain from the wmsy namespace.  We know the children of the prototype of WmsyDomain and are able to subsequently filter on the letter “d” which we know has been (effectively) typed based on the position of the cursor.  (Note: completion items are currently not sorted bur rather shown in definition order.)

    In the second example, we are able to determine that the cursor is in an object initializer, that the object initializer is the first argument of a call to defineWidget on “wy” (which we know about as previously described).  We accordingly know the type constraint on the object initializer and thus know the legal/documented key names that can be used.

    This is not working enough to point people at a live demo, but it is exciting enough to post teaser screenshots.  Of course, the code is always available for the intrepid: jstut/narscribblus, wmsy.  In a nutshell, you can hit “Alt-/” and the auto-completion code will try and do its thing.  It will display its results in a wmsy popup that is not unified with ace in terms of how focus is handled (wmsy’s bad).  Nothing you do will actually insert text, but if you click outside of the popup or hit escape it will at least go away.  The egregious deficiencies are likely to go away soon, but I am very aware and everyone else should be aware that getting this to a production-quality state you can use on multi-thousand line files with complex control flow would likely be quite difficult (although if people document their types/signatures, maybe not so bad).  And I’m not planning to pursue that (for the time being); the goal is still interactive, editable, tutorial-style examples.  And for these, the complexity is way down low.

    My thanks to the ajax.org and skywriter teams; even at this early state of external and source documentation it was pretty easy to figure out how various parts worked so as to integrate my hybrid tokenizer and hook keyboard commands up.  (Caveat: I am doing some hacky things… :))  I am looking forward to the continued evolution and improvement of an already great text editor component!

    Thunderbuddy: Weave-synchronized Thunderbird contacts on Android

    I got my first modern Android phone at the end of last week and I figured it would be a good idea to get over the activation energy hump so it’s easy to do incremental mozilla hacking on it.  Thanks to a lot of work by other people, this turned out to be pretty easy.  It would have been even easier if the Android emulator ran at anything remotely close to real time or if my Samsung Galaxy S Vibrant (Bell Canada model) running 2.1 did not have a habit of locking up all the time.  (And this is before my code touched it, mind you.  I’m half-hoping my device is a lemon.  If anyone knows how to tell a lemon from lockups-as-usual, please let me know.)

    The screenshot shows my Android device running the app.  It’s a super trivial wmsy UI showing contacts from my address book pulled down from a weave server that were put there by Thunderbird.

    Anywho, props to:

    • Phonegap, the HTML/JS way to write mobile apps!  Major awesome.
    • weaveclient-chromium, a slightly bitrotted pure JS weave client by Philipp von Weitershausen / philikon (now of the mozilla weave/sync team) which built on the weaveweb project by Anant Narayanan and pure JS crypto work by the people mentioned in the README.
    • weaver and weave-ext by Shane Caraveo, which make weave happily run in Thunderbird and have it propagate the contents of the address book.
    • The Mozilla Weave/Firefox Sync team who made it easy and practical for software like Thunderbird to partake in the encrypted synchronization revolution.
    • RequireJS, CommonJS loader to the stars.

    The relevant repos for those interested, are:

    • weaveclient-js: This is a fork of weaveclient-chromium that ditches the chromium bits, makes things CommonJS/Asynchronous Module Definition happy, and slightly factors out the encryption so that thunderbuddy can provide ‘native’ accelerated encryption support on android.
    • thunderbuddy: The phonegap android app repo proper.  The only really notable thing at this point is the custom Java class that implements and exposes faster ‘native’ encryption methods.  (Thunderbuddy can also just be used as a webpage on any reasonably capable browser with good JS performance.)

    It’s worth noting that the goal is not actually to perform contact synchronization with Android.  There are already a ton of ways to synchronize your Thunderbird contacts with gmail and from there your phone.  The goal is to let other interesting data and meta-data propagate.  I just picked contacts for this example because Shane already had the data propagating.

    Documentation for complex things (you don’t basically already understand)

    The Problem

    One of my focuses at MoMo is to improve the plight of Thunderbird extension developers.  An important aspect of this is improving the platform they are exposed to.  Any platform usually entails a fair amount of complexity.  The trick is that you only pay for the things that are new-to-you, learning-wise.

    The ‘web’ as a platform is not particularly simple; it’s got a lot of pieces, some of which are fantastically complex (ex: layout engines).  But those bits are frequently orthogonal, can be learned incrementally, have reams of available documentation, extensive tools that can aid in understanding, and, most importantly, are already reasonably well known to a comparatively large population.  The further you get from the web-become-platform, the more new things you need to learn and the more hand-holding you need if you’re not going to just read the source or trial-and-error your way through.  (Not that those are bad ways to roll; but not a lot of people make it all the way through those gauntlets.)

    I am working to make Thunderbird more extensible in more than a replace-some-function/widget-and-hope-no-other-extensions-had-similar-ideas sort of way.  I am also working to make Thunderbird and its extensions more scalable and performant without requiring a lot of engineering work on the part of every extension.  This entails new mini-platforms and non-trivial new things to learn.

    There is, of course, no point in building a spaceship if no one is going to fly it into space and fight space pirates.  Which is to say, the training program for astronauts with all its sword-fighting lessons is just as important as the spaceship, and just buying them each a copy of “sword-fighting for dummies who live in the future” won’t cut it.

    Translating this into modern-day pre-space-pirate terminology, it would be dumb to make a super fancy extension API if no one uses it.  And given that the platform is far enough from pure-web and universally familiar subject domains, a lot of hand-holding is in order.  Since there is no pre-existing development community familiar with the framework, they can’t practically be human hands either.

    The Requirements

    I assert the following things are therefore important for the documentation to be able to do:

    • Start with an explained, working example.
    • Let the student modify the example with as little work on their part as possible so that they can improve their mental model of how things actually work.
    • Link to other relevant documentation that explains what is going on, especially reference/API docs, without the user having to open a browser window and manually go search/cross-reference things for themselves.
    • Let the student convert the modified example into something they can then use as the basis for an extension.

    The In-Process Solution: Narscribblus

    So, I honestly was quite willing to settle for an existing solution that was anywhere close to what I needed.  Specifically, the ability to automatically deep-link source code to the most appropriate documentation for the bits on hand.  It has become quite common to have JS APIs that take an object where you can have a non-trivial number of keys with very specific semantics, and my new developer friendly(-ish) APIs are no exception.

    Unfortunately, most existing JavaScript documentation tools are based on the doxygen/JavaDoc model of markup that:

    • Was built for static languages where your types need to be defined.  You can then document each component of the type by hanging doc-blocks off them.  In contrast, in JS if you have a complex Object/dictionary argument that you want to hang stuff of of, your best bet may be to just create a dummy object/type for documentation purposes.  JSDoc and friends do support a somewhat enriched syntax  like “@param arg.attr”, but run into the fact that the syntax…
    • Is basically ad-hoc with limited extensibility.  I’m not talking about the ability to add additional doctags or declare specific regions of markup that should be passed through a plugin, which is pretty common.  In this case, I mean that it is very easy to hit a wall in the markup language that you can’t resolve without making an end-run around the existing markup language entirely.  As per the previous bullet point, if you want to nest rich type definitions, you can quickly run out of road.

    The net result is that it’s hard to even describe the data types you are working with, let alone have tools that are able to infer links into their nested structure.

    So what is my solution?

    • Steal as much as possible from Racket (formerly PLT Scheme)’s documentation tool, Scribble.  To get a quick understanding of the brilliance of Racket and Scribble, check out the quick introduction to racket.  For those of you who don’t click through, you are missing out on examples that automatically hyperlink to the documentation for invoked methods, plus pictures capturing the results of the output in the document.
      • We steal the syntax insofar as it is possible without implementing a scheme interpreter.  The syntax amounts to @command[s-expression stuff where whitespace does not matter]{text stuff which can have more @command stuff in it and whitespace generally does matter}.  The brilliance is that everything is executed and there are no heuristics you need to guess at and that fall down.
      • Our limitation is that while Racket is a prefix language and can use reader macros and have the entire documents be processed in the same fashion as source code and totally understood by the text editor, such purity is somewhat beyond us.  But we do what we can.
    • Use narcissus, Brendan Eich/mozilla’s JS meta-circular interpreter thing, to handle parsing JavaScript.  Although we don’t have reader macros, we play at having them.  If you’ve ever tried to parse JavaScript, you know it’s a nightmare that requires the lexer to be aware of the parsing state thanks to the regexp syntax.  So in order for us to be able to parse JavaScript inline without depending on weird escaping syntaxes, when parsing our documents we use narcissus to make sure that we parse JavaScript as JavaScript; we just break out when we hit our closing squiggly brace.  No getting tricked by regular expressions, comments, etc.
    • Use the abstract interpreter logic from Patrick Walton‘s jsctags (and actually stole its CommonJS-ified narcissus as the basis for our hacked up one too) as the basis for abstract interpretation to facilitate being able to linkify all our JavaScript code.  The full narcissus stack is basically:
      • The narcissus lexer has been modified to optionally generate a log of all tokens it generates for the lowest level of syntax highlighting.
      • The narcissus parser has been modified to, when generating a token log, link syntax tokens to their AST parse nodes.
      • The abstract interpreter logic has been modified to annotate parse nodes with semantic links so that we can traverse the tokens to be able to say “hey, this is attribute ‘foo’ in an object that is argument index 1 of an invocation of function ‘bar'” where we were able to resolve bar to a documented node somewhere.  (We also can infer some object/class organization as a result of the limited abstract interpretation.)
      • We do not use any of the fancy static analysis stuff that is going on as of late with the DoctorJS stuff.  Automated stuff is sweet and would be nice to hook in, but the goal here is friendly documentation.
      • The abstract interpreter has been given an implementation of CommonJS require that causes it to load other source documents and recursively process them (including annotating documentation blocks onto them.)
    • We use bespin as the text editor to let you interactively edit code and then see the changes.  Unfortunately, I did not hook bespin up to the syntaxy magic we get when we are not using bespin.  I punted because of CommonJS loader snafus.  I did, however, make the ‘apply changes’ button use narcissus to syntax check things (with surprisingly useful error messages in some cases).

    Extra brief nutshell facts:

    • It’s all CommonJS code.  The web enabled version which I link to above and below runs using a slightly modified version of Gozala‘s teleport loader.  It can also run privileged under Jetpack, but there are a few unimplemented sharp edges relating to Jetpack loader semantics.  (Teleport has been modified mainly to play more like jetpack, especially in cases where its aggressive regexes go looking for jetpack libs that aren’t needed on the web.)  For mindshare reasons, I will probably migrate off of teleport for web loading and may consider adding some degree of node.js support.  The interactive functionality currently reaches directly into the DOM, so some minor changes would be required for the server model, but that was all anticipated.  (And the non-interactive “manual” language already outputs plain HTML documents.)
    • The web version uses a loader to generate the page which gets displayed in an iframe inside the page.  The jetpack version generates a page and then does horrible things using Atul‘s custom-protocol mechanism to get the page displayed but defying normal browser navigation; it should either move to an encapsulated loader or implement a proper custom protocol.

    Anywho, there is still a lot of work that can and will be done (more ‘can’ than ‘will’), but I think I’ve got all the big rocks taken care of and things aren’t just blue sky dreams, so I figure I’d give a brief intro for those who are interested.

    Feel free to check out the live example interactive tutorialish thing linked to in some of the images, and its syntax highlighted source.  Keep in mind that lots of inefficient XHRs currently happen, so it could take a few seconds for things to happen.  The type hierarchy emission and styling still likely has a number of issues including potential failures to popup on clicks.  (Oh, and you need to click on the source of the popup to toggle it…)

    Here’s a bonus example to look at too, keeping in mind that the first few blocks using the elided js magic have not yet been wrapped in something that provides them with the semantic context to do magic linking.  And the narscribblus repo is here.

    understanding where layout goes wrong with gecko reflow debug logs (Part 1)

    HTML and CSS provide awesome layout potential, but harnessing that potential to do your bidding can sometimes be problematic.  I suspect all people who have written HTML documents have been in a situation where they have randomly permuted the document structure and CSS of something that should work in the hopes of evolving it into something that actually does work.  Matching wits with a black box that is invulnerable to even pirate-grade profanity is generally not a pleasant experience.

    It turns out gecko has a way to let you see inside that black box.  Well, actually, multiple ways.  There’s a layout debugger (that seems broken on trunk?) that can display visual overlays for box sizes/events/reflow counts, dump info about paints/invalidations/events as they happen as well as dumping the current content tree and frames.  Even better, gecko’s frame reflow debugging mechanism will dump most of the inputs and outputs of each stage of reflow calculations as they happen.  With some comparatively minor patches[1] we can augment this information so that we can isolate reflow decisions to their origin presentation shell/associated URL and so that we know the tag name, element id, and class information on HTML nodes subjected to reflow calculations.  A reasonably sane person would want to do this if they were planning to be doing a lot of potentially complicated HTML layout work and would a) benefit from better understanding how layout actually works, b) not want to propagate layout cargo culting or its ilk from the code being replaced, c) not want to waste days of their lives later the next time this happens, d) help locate and fix layout bugs if bugs they be so that all might benefit.

    Of course, with logs that effectively amount to execution traces, examining them by hand is frequently intractable unless you really know what you’re looking for or are dealing with a toy example.  My non-reduced problem case resulted in 58,107 lines, for one.  So writing a tool is a good idea, and writing it in JS using Jetpack doubly so.

    In any event, the problem is I am trying to use the flexible box model to create an area of the screen that takes up as much space as possible.  In this space I want to be able to house a virtual scrolling widget so I use “overflow: hidden”.  Regrettably, when my logic goes to populate the div, the box ends up resizing itself and now the whole page wants to scroll.  Very sad.  (Things work out okay with an explicitly sized box which is why my unit tests for the virtual scrolling widget pass…)

    Let’s formulate a query on the div of interest (which I will conceal) and then see what the first little bit of output is:

    *** Box 24 tag:div id: classes:default-bugzilla-ui-bug-page-runBox
    *** scroll 25 tag:div id: classes:default-bugzilla-ui-bug-page-runs
    scroll 25 variant 1  (parent 24) first line 406
      why: GetPrefSize
      inputs: boxAvail: 0,UC
              boxLast: 0,0
              reflowAvailable: 0,UC
              reflowComputed: 0,UC
              reflowExtra: dirty v-resize
      output: prefWidth: 0
              minWidth: 0
              reflowDims: 0,0
              prefSize: 0,0
              minSize: 0,0
              maxSize: UC,UC
      parent concluded: minSize: 0,0
                        maxSize: UC,UC
                        prefSize: 2,0
    scroll 25 variant 2  (parent 24) first line 406
      why: Layout
      inputs: boxAvail: 771,1684
              boxLast: 0,0
              reflowAvailable: 771,UC
              reflowComputed: 771,1684
              reflowExtra: dirty dirty-children h-resize v-resize
      output: prefSize: 0,0
              minSize: 0,0
              maxSize: UC,UC
              reflowDims: 771,1684
              layout: 2,0,771,1684
      parent concluded: minSize: 0,0
                        maxSize: UC,UC
                        prefSize: 0,0
                        layout: 0,0,773,1684
    

    This is the general pattern we will see to the reflows.  The parent will ask it what size it wants to be and it will usually  respond with “ridiculously wide but not so tall”.  (Not in this first base case, but the next one responds with a prefsize of “1960,449”, and that’s pixels.) The parent will then perform layout and say “no, you need to be taller than you want to be”, at least until I start cramming stuff in there.

    So we skim down the output to find out where things first went off the rails…

    scroll 25 variant 16  (parent 24) first line 20548
      why: GetPrefSize
      inputs: boxAvail: 1960,UC
              boxLast: 771,1686
              reflowAvailable: 1960,UC
              reflowComputed: 1960,UC
              reflowExtra: dirty-children h-resize v-resize
      output: prefWidth: 1960
              minWidth: 352
              reflowDims: 1960,1755
              prefSize: 1960,1755
              minSize: 352,1755
              maxSize: UC,UC
      parent concluded: minSize: 0,0
                        maxSize: UC,UC
                        prefSize: 1962,1755
    scroll 25 variant 17  (parent 24) first line 20548
      why: Layout
      inputs: boxAvail: 771,1755
              boxLast: 1960,1755
              reflowAvailable: 771,UC
              reflowComputed: 771,1755
              reflowExtra: dirty-children h-resize
      output: prefSize: 0,0
              minSize: 352,1755
              maxSize: UC,UC
              reflowDims: 771,1755
              layout: 2,0,771,1755
      parent concluded: minSize: 0,0
                        maxSize: UC,UC
                        prefSize: 0,0
                        layout: 0,0,773,1755
    

    Okay, that looks pretty likely to be the area of concern.  The parent asked it for its ideal size, so it told it, but then the parent apparently decided to enlarge itself too.  That is not what we wanted.  We would have been cool if just the scroll #25 enlarged itself (or its block child #26 that corresponds to the same content node but which I have elided because it always says the same thing as its parent #25) since some frame needs to end up holding the overflow coordinate space.

    Thus concludes part 1 of our exciting saga.  In part 2, we hopefully figure out what the problem is and how to fix it.  Lest anyone suggest the root problem is that I am completely off base and am not remotely reasonably sane for choosing this as a strategy to solve the problem… it works in chrome.  Which is not to say that my html/css is correct and firefox’s layout is wrong; it’s quite possible for layout engines to err or deal with unspecified behaviour cases in my favor, after all.  But it does make me want to understand what the layout engine is thinking and be able to do so with a minimum of effort in the future, since I doubt this is the last time I will not immediately understand the problem or that layout engines will differ in their behaviour.

    For those who want to play along at home: the raw gzip’d /tmp/framedebug file (gunzip to /tmp/framedebug) that is the entirety of the trunk firefox log with debug output, the spliced output just for the one window derived from an invocation of “cfx run splice” (it will show up under the URL in /tmp/framedumps), and the output of the output of “cfx run summarize serial-12-0 summarize unique 22,24,25,26.  Those unique identifiers are deterministic but arbitrary values for the given file.  We discovered them by using the query on the CSS class using “cfx run summarize serial-12-0 summarize class wlib-wlib-virt-wlib-virt-container-root”.  The hg repo for the processing tool is here, the mozilla-central patches are: first and second.  A minor jetpack patch is also required for the command line stuff to work.

    1: I was initially trying to avoid patching anything.  This didn’t work out, but it did cause the initial log file splicing logic to leverage the arena allocation scheme of presentation shells to allow us to to map frames back to their URLs.  Sadly, it turned out the arena allocation blocks requested from the upstream allocators are really small (4k or 1k) and all from the same source and so I had to instrument the allocation as well as adding debug output of the window/docshell/presshell linkages.  The output adds an unacceptable additional level of noise to the default DEBUG case; the right thing to do is likely to cause the reflow log debugging to emit the document URL before each logging outburst if it is different from the last outburst.

    Thunderbird Quick Filter Bar extensions, they’re a thing!

    The previously discussed Quick Filter Bar interface landed in time for Thunderbird (Lanikai) 3.1 beta 2 (whose release is real-soon-now).  Although the Quick Filter Bar already contains dangerously high levels of awesome, we made sure to make it extensible so you can cram even more awesome in.

    As an example, I have created an extension that enables you to ‘pivot’ based on the currently selected message(s).

    In its most straightforward manner of operation, you can click on an e-mail address in the message reader header and pivot by that e-mail address.  Out of the box, this will show you all the messages in the current folder sent by that user.  You can also (or only) show messages where that user is on the to/cc lines by toggling the buttons on the expando bar appropriately.

    You can also constrain the display to only show messages within some time interval of the message(s) in question.

    The more confusing way to access the pivot functionality is to simply toggle the facet on the quick filter bar.  When you toggle the filter on, we walk all of the selected messages and build up a list of the observed e-mail addresses for all of the senders and all of the recipients.  One trick is that we filter out any e-mail addresses associated with your own account in order to avoid the filter becoming too inclusive.  We save those e-mail addresses into our state.  We also walk all the messages and find the time range that the messages cover and save that.  These are used as the basis for the specific constraints you can enable from the expando bar.  Because the values are saved, the filter does not change when you change your selected messages.  You need to toggle the pivot filter off and on again (or use the ‘pivot’ option on the email address popup) to change the data we are pivoting on.

    The extension can be found on AMO and in source form.  In a forward-looking move that is probably complicating things for now, I used the Jetpack SDK and its XUL extension support to implement the extension.  In practice, all the example uses Jetpack for is its CommonJS module system and the standard mozilla-style JS modules would have been sufficient.  All the UI stuff is done purely in XUL overlays bound to the backing code using addEventListener.

    The broad strokes for implementing a Quick Filter Bar extension are:

    1. Overlay your toolbarbutton into the quick filter bar, preferably the #quick-filter-bar-collapsible-buttons box.  If you have an expando bar you want to contribute, you probably want to put that into the #quick-filter-bar-expando.  As things currently stand, multiple expando bars being uncollapsed at the same time will probably turn out poorly, so you may need to be pragmatic and put your bar outside that box or do something clever.
    2. Import resource:///modules/quickFilterManager.js, implement your filter, and register it with QuickFilterManager.defineFilter() (whose documentation explains the interface you need to conform to).

    We will likely not be doing up full documentation for this extension point beyond the in-code documentation and in-tree/extension examples.  This is an advanced extension point by virtue of it touching nsIMsgDBHdrs, nsIMsgDBViews, nsIMsgSearchSessions, and friends, all of which have sharp edges.  The way forward with soft, human-friendly documented abstractions is gloda and that’s where (my) effort will be concentrated.  The awesome in this extension point is making it practical for people who already have the scars to utilize the full power of the message view/traditional search subsystem without losing a limb and do so in a previously impossibly short period of time.

    PS: I made the icon!

    work-in-progress tooling: feeding GWT SpeedTracer systemtapped mozilla performance data

    Now that Thunderbird 3.1 is string/feature-frozen it’s time to focus on performance.  This post is just to make sure that people who are also working on similar things know what I’m up to.  For example, there’s some very exciting work going on involving adding a startup timeline to mozilla-central that everyone keeping up-to-date with mozilla performance work should be aware of.  In summary, no results yet, but soon!

    The big bullet points for now are:

    • GWT SpeedTracer is a very impressive tool.
    • I have seriously cleaned up / automated my previous SystemTap script work for mozilla.  Invoking python chewchewwoowoo.py speedtracer/mozspeedtrace.stp `pgrep thunderbird-bin` does the following:
      • generates a version of the source script with any line statements fixed up based on one or more line seeks in a file.
      • builds the command line and invokes the system tap script.
      • invokes a post-processing script specified in the tap file, feeding it address-translation helpers and systemtap bulk logging per-cpu fusion.
    • mozspeedtrace.stp and its post-processor produce an HTML file with embedded JSON that the SpeedTracer Chrome extension recognizes as one of its own.  (SpeedTracer runs in Firefox through the GWT development mode, but being very new to GWT I am not yet clear on if/how to be able to wrap it up to be used as a webapp from static data.)
    • My mapping of events as recorded by my SystemTap probes to SpeedTracer events is somewhat sketchy, but part of that is due to the limited set of events and their preconfigured display output.  (I believe it turns out I can generate optional data like fake stack traces to be able to expand the set of mapped events without having to modify SpeedTracer.)  This is way basically everything in the screenshot is a yellow ‘JavaScript Callback’.
    • I have not implemented probes for all of the event types defined by SpeedTracer, I have not ported all of my previous probes yet, and there are still some new probes yet to add.  Happily, most of the hard work is already in the can.
    • The motivation behind this is very different from the startup timeline.  Thunderbird uses mozilla-1.9.2, not mozilla-central, and my primary investigatory focus is memory usage in steady-state after start-up, not the time it takes to start up.  (We do, of course, want to improve performance where possible.)  It just so happens that in order to usefully explain who is allocating memory we also have to know who is active at the time / why they are active, and that provides us with the performance timeline data.
    • This work is not likely to be useful for per-tab/webpage performance data gathering… just whole-program performance investigation like Thunderbird needs.
    • This is a work-in-progress, but the focus is on improving Thunderbird’s performance issues, not building tooling.  As such, I may have to bail on SpeedTracer in favor of command-line heuristic analysis if I can’t readily modify SpeedTracer to suit my needs.  I picked SpeedTracer partially to (briefly) investigate GWT, but from my exploration of the code thus far, the activation energy required may be too great.

    a systemtap script to attribute JS string memory costs to the creating JS functions

    ----- js_GC data pages: 96731 total vm pages: 234849
       ###      KiB
     11376      839 BF_format                                        log4moz.j:487
     15906      725 mime_emitter_writeBody                           jsmimeemitter.j:381
       148      576 MimeMessage_coerceBodyToPlaintext                mimemsg.j:341
      4460      492 gloda_ds_queryFromQuery                          datastore.j:2988
      9655      387 <MYSTERIOUS>                                     <BEYOND THE VEIL>:0
       911      237 gloda_content_quoted                             connotent.j:267
      1723      144 gloda_indexMessage                               index_msg.j:2597
       187      141 gloda_content_content                            connotent.j:290
       170      108 gloda_content_getContent                         connotent.j:166
      1917       73 gloda_identity_toString                          datamodel.j:785
      1775       69 Log4Moz_enumerateProps                           log4moz.j:220
    value |-------------------------------------------------- count
        0 |@                                                    366
        1 |@@@@                                                1371
        2 |@@@@@                                               1584
        4 |@@@@@@@@@@@@@@@@@@@@@@@@                            6863
        8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@          11784
       16 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@      12794
       32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  14016
       64 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                      8662
      128 |@@@@@@@@@@@@@@                                      4093
      256 |@@@@@                                               1567
      512 |@                                                    309
     1024 |                                                     105
     2048 |                                                      58
     4096 |                                                      28
     8192 |                                                      17
    16384 |                                                      12
    32768 |                                                       0
    65536 |                                                       0
    

    The script.  It tells you what JS code has been creating strings since the last GC and the overall distribution of the marginal costs of those strings, leaving out puny code that allocates less than 64k’s worth.  Marginal costs because spidermonkey has support for dependent strings.  “a += b” will actually result in the storage of a being realloc’d with a pointing into the storage of a’.  I forgot to check for the js_ConcatStrings case at first and, let me tell you, the stats were not pretty.  Poor mime_emitter_writeBody was accused of allocating 11 gigabytes of strings in one intra-GC cycle!

    What is interesting about this script is that it is actually pulling all of the function name/script name/line number information directly out of a JSContext object.  What is exciting is that it was pretty easy to do thanks to systemtap’s understanding of DWARF debug symbols.  This is in severe comparison to my efforts using VProbes where there was no such support and I had to write a gdbpython script to begin to approach the same thing.  Also, because systemtap’s guru mode allows you to bail into real C code when you need, I was able to write a function that truncates the jschar (16-bit unicode) strings that make up the function names into ASCII that is both straightforward and efficient.  It also allowed for a quick/easy/fast basename() implementation too.

    It is worth noting that spidermonkey does have built-in (conditionally compiled) static probe points originally created for DTrace and that they do provide more generic object lifetime hooks.  Although they were created for DTrace, systemtap is also able to use them.  The limitations with those probes is that once any probe is active using a hook, a non-trivial cost is incurred because filtering can only be performed inside the probe and you may receive orders of magnitude more calls than you actually care about.  The cost is worst for probes that provide function names since that requires a call to js_GetStringBytes every time (although there is some caching.)  Keep in mind that I’m not saying the systemtap userspace probes are free, but since they can be placed at arbitrary points, you can have probes that fire only in the cases you actually care about, only incur data-processing costs when desired, and are disturbingly easy to implement since you can just reach into the data structures you care about.

    Since everyone loves an implementation fight, DTrace, great and magical though it is, is basically incapable of pulling off the JS stack-walking trick that systemtap makes possible.  Unless DTrace has grown these features since I last checked, it lacks the DWARF magic to do such things easily and the language intentionally avoids the control-flow constructs that would allow it to do anything quite so complex.  And you can entirely forget about arbitrary C code for the unicode string truncation…

    This post is brought to you by efforts to decrease the memory footprint of Thunderbird when gloda’s indexer is active.  Because Thunderbird loves you.  But it loves your memory more.  For now.  That’s what we’re trying to fix.

    performance annotated SQLite EXPLAINation visualizations using systemtap

    For the Thunderbird 3.1 release cycle we are not just fixing UX problems but also resolving various performance issues.  Building on my previous work on a visualization of SQLite opcode control flow graphs using graphviz I give you… the same thing!  But updated to more recent versions of SQLite and integrating performance information retrieved through the use of systemtap with utrace.

    In this case we are using systemtap to extract the number of times each opcode is executed and the number of btree pages that are requested during the course of executing the opcode.  (We do not differentiate between cache hits and misses because we are looking at big-O right now.)  Because virtual tables (like those used by FTS3) result in nested SQLite queries and we do not care about analyzing the queries used by FTS3, we ignore nested calls and attribute all btree page accesses to the top-level opcode under execution.

    Because the utility of a tool is just as dependent on ease of use as its feature set, I’ve cleaned things up and made it much easier to get information out of Thunderbird/gloda with this patch which should land soon and provides the following:

    • The gloda datastore will write a JSON file with the EXPLAINed results of all its SQL queries to the path found in the preference mailnews.database.global.datastore.explainToPath.  This preference is observed so that setting it during runtime will cause it to create the file and begin explaining all subequently created queries.  Clearing/changing the preference closes out the current file and potentially opens a new one.
    • Gloda unit tests will automatically set the preference to the value of the environment variable GLODA_DATASTORE_EXPLAIN_TO_PATH if set.
    • A schema dump is no longer required for meta-data because we just assume that you are using a SQLite DEBUG build that tells us everything we want to know about in the ‘comment’ column.
    • grokexplain.py now uses optparse and has more internal documentation and such.

    So what do the pretty pictures show?

    • Before: A gloda fulltext query search retrieves all of the results data before applying the LIMIT.  This results in a lot more data transplanted into temporary results tables than we will end up using; wasted bookkeeping.  Additionally, we incur the row lookup costs for both our messages data storage table and our fulltext messagesText table for all hits, even the ones we will not return.  (Noting that there was no escaping hitting both tables since we were using offsets() and it hits the fulltext table as part of its operation.)
    • After: We perform an initial query phase where we minimize wasted bookkeeping by only retrieving and using the bare minimum required to compute the LIMITed list of document id’s.  Additionally, by use of the FTS3 matchinfo() function instead of the offsets() function we are able to avoid performing row lookup on the messagesText table for results that will not be returned to the user.  Use of the matchinfo() function requires a custom ranking function which allows us to be somewhat more clever about boosting results based on fulltext matches too.
    • The poor man’s bar graphs in the pictures are expressing a hand-rolled logarithmic scale for the number of invocations (left bar) and number of btree pages accessed (right bar).  On the right side of each line the actual numbers are also presented in the same order.  The goal is not to convey good/bad so much as to draw the eye to hot spots.

    Notes for people who want more to read:

    • SQLite has built-in infrastructure to track the number of times an opcode is executed as well as the wall-clock time used; you do not have to use systemtap.  It’s a conditional compilation kind of thing, just -DVDBE_PROFILE and every statement you execute gets its performance data appended to vdbe_profile.out when you reset the statement.  It doesn’t do the btree pages consulted trick, but it’s obviously within its power with some code changes.
    • Our use of systemtap is primarily a question of discretionary reporting control, the ability to integrate the analysis across other abstraction layers, and various build concerns.  The JSON output is not a driving concern, just something nice we are able to do for ourselves since we are formatting the output.
    • The tool has really crossed over into the super-win-payoff category with this fix investigation.  (Having said that, I probably could have skipped some of the data-flow stuff last time around.  But these projects are both for personal interest and amusement as well as practical benefit, so what are you gonna do?  Also, that effort could pay off a bit more by propagating comments along register uses so that the LIMIT counter register r8 and the compute-once r7 in the after diagram would not require human thought.)

    References:

    • The grokexplain repo.  Used like so: python grokexplain.py –vdbe-stats=/tmp/glodaNewSearchPerf.json /tmp/glodaNewSearchExplained.json -o /tmp/glodasearch-newcheck
    • The systemtap script in its repo.  Used like so: sudo stap -DMAXSTRINGLEN=1024 sqlite-perf.stp /path/to/thunderbird-objdir/mozilla/dist/lib/libsqlite3.so > /tmp/glodaNewSearchPerf.json
    • The bug with the gloda explain logic patch and the improved SQL query logic.  I also used the tooling to fix another (now-fixed) bug, but that problem turned out to be less interesting.