Skip to content

fighting non-deterministic xpcshell unit tests through causality tracking with systemtap; step 1

It’s a story as old as time itself.  You write a unit test.  It works for you.  But the evil spirits in the tinderboxes cause the test to fail.  The trick is knowing that evil spirits are a lot like genies.  When you ask for a wish, genies will try and screw you over by choosing ridiculous values wherever you forgot to constrain things.  For example, I got my dream job, but now I live in Canada.  The tinderbox evil spirits do the same thing, but usually by causing pathological thread scheduling.

This happened to me the other day and a number of our other tests are running slower than they should, so I decided it was time for another round of incremental amortized tool-building.  Building on my previous systemtap applied to mozilla adventures by porting things to the new JS representations in mozilla-2.0 and adding some new trace events I can now generate traces that:

  • Know when the event loop is processing an event.  Because we reconstruct a nested tree from our trace information and we have a number of other probes, we can also attribute the event to higher-level concepts like timer callbacks.
  • Know when a new runnable is scheduled for execution on a thread or a new timer firing is scheduled, etc.  To help understand why this happened we emit a JS backtrace at that point.  (We could also get a native backtrace cheaply, or even a unified backtrace with some legwork.)
  • Know when high-level events occur during the execution of the unit test.  We hook the dump() implementations in the relevant contexts (xpcshell, JS components/modules, sandboxes) and then we can listen in on all the juicy secrets the test framework shouts into the wind.  What is notable about this choice of probe point is that it:
    • is low frequency, at least if you are reasonably sane about your output.
    • provides a useful correlation between what it is going on under the hood with something that makes sense to the developer.
    • does not cause the JS engine to need to avoid tracing or start logging everything that ever happens.

Because we know when the runnables are created (including what runnable they live inside of) and when they run, we are able to build what I call a causality graph because it sounds cool.  Right now this takes the form of a hierarchical graph.  Branches form when a runnable (or the top-level) schedules more than one runnable during the execution of a single runnable.  The dream is to semi-automatically (heuristics / human annotations may be required) transform the hierarchical graph into one that merges these branches back into a single branch when appropriate.  Collapsing linear runs into a single graph node is also desired, but easy.  Such fanciness may not actually be required to fix test non-determinism, of course.

The protovis-based visualization above has the following exciting bullet points describing it:

  • Time flows vertically downwards.  Time in this case is defined by a globally sequential counter incremented for each trace event.  Time used to be the number of nanoseconds since the start of the run, but there seemed to somehow be clock skew between my various processor cores that live in a single chip.
  • Horizontally, threads are spaced out and within those threads the type of events are spaced out.
    • The weird nested alpha grey levels convey nesting of JS_ExecuteScript calls which indicates both xpcshell file loads and JS component/module top-level executions as a result of initial import.  If there is enough vertical space, labels are shown, otherwise they are collapsed.
  • Those sweet horizontal bands convey the various phases of operation and have helpful labels.
  • The nodes are either events caused by top-level runnable being executed by the event loop or important events that merit the creation of synthetic nodes in the causal graph.  For example, we promote the execution of a JS file to its own link so we can more clearly see when a file caused something to happen.  Likewise, we generate new links when analysis of dump() output tells us a test started or stopped.
  • The blue edges are expressing the primary causal chain as determined by the dump() analysis logic.  If you are telling us a test started/ended, it only follows that you are on the primary causal chain.
  • If you were viewing it in a web browser, you could click on the nodes and it would console.log them and then you could see what is actually happening in there.  If you hovered over nodes they would also highlight their ancestors and descendents in various loud shades of red.

  • The specific execution being visualized had a lot of asynchronous mozStorage stuff going on.  (The right busy thread is the asynchronous thread for the database.)  The second image has the main init chain hovered, resulting in all the async fallout from the initialization process being highlighted in red.  At first glance it’s rather concerning that the initialization process is still going on inside the first real test.  Thanks to the strict ordering of event queues, we can see that everything that happens on the primary causal chain after it hops over to the async thread and back is safe because of that ordering.  The question is whether bad things could happen prior to that joining.  The answer?  In another blog post, I fear.
    • Or “probably”, given that the test has a known potential intermittent failure.  (The test is in dangerous waters because it is re-triggering synchronous database code that normally is only invoked during the startup phase before references are available to the code and before any asynchronous dispatches occur.  All the other tests in the directory are able to play by the rules and so all of their logic should be well-ordered, although we might still expect the initialization logic to likewise complete in what is technically a test phase.  Since the goal is just to make tests deterministic (as long as we do not sacrifice realism), the simplest solution may just be to force the init phase to complete before allowing the tests to start.  The gloda support logic already has an unused hook capable of accomplishing this that I may have forgotten to hook up…

Repo is my usual systemtapping repo.  Things you might type if you wanted to use the tools follow:

  • make SOLO_FILE=test_name.js EXTRA_TEST_ARGS=”–debugger /path/to/tb-test-help/systemtap/ –debugger-args ‘/path/to/tb-test-help/systemtap/mozperfish/mozperfish.stp –‘” check-one
  • python /path/to/tb-test-help/ –re-run=/tmp/chewtap-##### mozperfish/mozperfish.stp /path/to/mozilla/dist/bin/xpcshell

The two major gotchas to be aware of are that you need to: a) make your xpcshell build with jemalloc since I left some jemalloc specific memory probes in there, and b) you may need to run the first command several times because systemtap has some seriously non-deterministic dwarf logic going on right now where it claims that it doesn’t believe that certain types are actually unions/structs/whatnot.

Tagged ,

non-infuriating indentation with emacs and js2-mode with require.def asynchronous module definition CommonJS boilerplate

Classic CommonJS modules assume a synchronous execution environment (for the purposes of “require”) with a specialized loader mechanism that evaluates the module in its proper context and takes care of namespacing it.  If you want to use CommonJS modules in the browser you can either:

  • Leave the source code as it is and use an XHR-based loader that uses eval to perform the namespacing trick.  In order to deal with the synchronous require assumption you can use some combination of deferring the evaluation of the module until you think you have all the dependencies and synchronous XHR.  Commonly, regular expressions are used to figure out the dependencies, but one could also use some form of static analysis.  Examples of browser-based CommonJS loaders supporting this are teleport and yabble.
  • Wrap your source code in boilerplate that takes care of the namespacing.  This can be done via a build system or done permanently in the source.  Pretty much every browser-based CommonJS loader supports this, with RequireJS being the only one I’m going to name-check because there are too many of these suckers as is.

The synchronous idiom for module “foo” might look like this:

var bar = require("bar");
var baz = require("baz");
exports.doStuff = function() {
  return "awwww yeah.";

The asynchronous module definition for “foo” might look like this, noting that there are actually a couple of possible variations on this:

require.def("foo", ["exports", "bar", "baz"], function(exports, bar, baz) {
exports.doStuff = function() {
  return "awwww yeah.";

The thing that may jump out at you is that the asynchronous wrapping means that the body of our module actually lives inside a function definition within the argument list of a function call.  Let’s assume you enjoy the finer things in life and are using emacs and js2-mode for your javascript editing.  js2-mode will helpfully suggest indenting 14 characters because that puts us 2 characters in from the enclosing function call’s opening paren.

That indentation could drive a man crazy and was really my only reason for avoiding the asynchronous idiom.  Thankfully, emacs being what it is, I was able to make it do what I roughly what I want:

;; Check if the suggested indentation is due to require.def().  If it is, force
;;  the indentation down to zero.  We detect this case by checking whether the
;;  parse depth is 2 and the last top-level point was preceded by require.def.
(defun require-def-deindent (list index)
  (when (and (eq (nth 0 parse-status) 2)
               (let ((tl-point (syntax-ppss-toplevel-pos parse-status)))
                 (goto-char tl-point)
                 (backward-word 2)
                 (equal "require.def" (buffer-substring (point) tl-point))))
             ;; only intercede if they are suggesting what the sexprs suggest
             (let ((suggested-column (js-proper-indentation parse-status)))
               (eq (nth index list) suggested-column))
    (indent-line-to 0)
;; Uncomment the following to enable the hook if you want tab to always slam you
;;  to column 0 rather than doing the cycle thing.  (With the newline hook in
;;  place, I haven't seen the need yet.)
;(add-hook 'js2-indent-hook 'require-def-deindent)
;; Unfortunately, js2-enter-key turns off the bounce indent logic so we need to
;;  intentionally do something to get our helper invoked.  In this case, we use
;;  advice but we could also mess with the keybinding.
;; This assumes js2-enter-indents-newline is enabled / desired.
(defadvice js2-enter-key (around js2-enter-key-around)
  "Trigger require-def-deindent on enter for the newline."
  (let ((parse-status (save-excursion
                        (parse-partial-sexp (point-min) (point-at-bol))))
    (push (current-column) positions)
    (require-def-deindent positions 0)))
(ad-activate 'js2-enter-key)

If you paste the above into your .emacs and have sufficient emacs karma, hopefully the above will work for you too.

UPDATE (2011/1/1):

The AMD idiom has settled on using “define” instead of “require.def”, so here is the above code modified to this end:

;; --- CommonJS AMD define() compensation
;; Check if the suggested indentation is due to define().  If it is, force
;;  the indentation down to zero.  We detect this case by checking whether the
;;  parse depth is 2 and the last top-level point was preceded by define.
(defun require-def-deindent (list index)
  (when (and (eq (nth 0 parse-status) 2)
               (let ((tl-point (syntax-ppss-toplevel-pos parse-status)))
                 (goto-char tl-point)
                 (backward-word 1)
                 (equal "define" (buffer-substring (point) tl-point))))
             ;; only intercede if they are suggesting what the sexprs suggest
             (let ((suggested-column (js-proper-indentation parse-status)))
               (eq (nth index list) suggested-column))
    (indent-line-to 0)
;; Uncomment the following to enable the hook if you want tab to always slam you
;;  to column 0 rather than doing the cycle thing.  (With the newline hook in
;;  place, I haven't seen the need yet.)
;(add-hook 'js2-indent-hook 'require-def-deindent)
;; Unfortunately, js2-enter-key turns off the bounce indent logic so we need to
;;  intentionally do something to get our helper invoked.  In this case, we use
;;  advice but we could also mess with the keybinding.
;; This assumes js2-enter-indents-newline is enabled / desired.
(defadvice js2-enter-key (around js2-enter-key-around)
  "Trigger require-def-deindent on enter for the newline."
  (let ((parse-status (save-excursion
                        (parse-partial-sexp (point-min) (point-at-bol))))
    (push (current-column) positions)
    (require-def-deindent positions 0)))
(ad-activate 'js2-enter-key)
;; (end define compensation)
Tagged , ,

Documentation for complex things (you don’t basically already understand)

The Problem

One of my focuses at MoMo is to improve the plight of Thunderbird extension developers.  An important aspect of this is improving the platform they are exposed to.  Any platform usually entails a fair amount of complexity.  The trick is that you only pay for the things that are new-to-you, learning-wise.

The ‘web’ as a platform is not particularly simple; it’s got a lot of pieces, some of which are fantastically complex (ex: layout engines).  But those bits are frequently orthogonal, can be learned incrementally, have reams of available documentation, extensive tools that can aid in understanding, and, most importantly, are already reasonably well known to a comparatively large population.  The further you get from the web-become-platform, the more new things you need to learn and the more hand-holding you need if you’re not going to just read the source or trial-and-error your way through.  (Not that those are bad ways to roll; but not a lot of people make it all the way through those gauntlets.)

I am working to make Thunderbird more extensible in more than a replace-some-function/widget-and-hope-no-other-extensions-had-similar-ideas sort of way.  I am also working to make Thunderbird and its extensions more scalable and performant without requiring a lot of engineering work on the part of every extension.  This entails new mini-platforms and non-trivial new things to learn.

There is, of course, no point in building a spaceship if no one is going to fly it into space and fight space pirates.  Which is to say, the training program for astronauts with all its sword-fighting lessons is just as important as the spaceship, and just buying them each a copy of “sword-fighting for dummies who live in the future” won’t cut it.

Translating this into modern-day pre-space-pirate terminology, it would be dumb to make a super fancy extension API if no one uses it.  And given that the platform is far enough from pure-web and universally familiar subject domains, a lot of hand-holding is in order.  Since there is no pre-existing development community familiar with the framework, they can’t practically be human hands either.

The Requirements

I assert the following things are therefore important for the documentation to be able to do:

  • Start with an explained, working example.
  • Let the student modify the example with as little work on their part as possible so that they can improve their mental model of how things actually work.
  • Link to other relevant documentation that explains what is going on, especially reference/API docs, without the user having to open a browser window and manually go search/cross-reference things for themselves.
  • Let the student convert the modified example into something they can then use as the basis for an extension.

The In-Process Solution: Narscribblus

So, I honestly was quite willing to settle for an existing solution that was anywhere close to what I needed.  Specifically, the ability to automatically deep-link source code to the most appropriate documentation for the bits on hand.  It has become quite common to have JS APIs that take an object where you can have a non-trivial number of keys with very specific semantics, and my new developer friendly(-ish) APIs are no exception.

Unfortunately, most existing JavaScript documentation tools are based on the doxygen/JavaDoc model of markup that:

  • Was built for static languages where your types need to be defined.  You can then document each component of the type by hanging doc-blocks off them.  In contrast, in JS if you have a complex Object/dictionary argument that you want to hang stuff of of, your best bet may be to just create a dummy object/type for documentation purposes.  JSDoc and friends do support a somewhat enriched syntax  like “@param arg.attr”, but run into the fact that the syntax…
  • Is basically ad-hoc with limited extensibility.  I’m not talking about the ability to add additional doctags or declare specific regions of markup that should be passed through a plugin, which is pretty common.  In this case, I mean that it is very easy to hit a wall in the markup language that you can’t resolve without making an end-run around the existing markup language entirely.  As per the previous bullet point, if you want to nest rich type definitions, you can quickly run out of road.

The net result is that it’s hard to even describe the data types you are working with, let alone have tools that are able to infer links into their nested structure.

So what is my solution?

  • Steal as much as possible from Racket (formerly PLT Scheme)’s documentation tool, Scribble.  To get a quick understanding of the brilliance of Racket and Scribble, check out the quick introduction to racket.  For those of you who don’t click through, you are missing out on examples that automatically hyperlink to the documentation for invoked methods, plus pictures capturing the results of the output in the document.
    • We steal the syntax insofar as it is possible without implementing a scheme interpreter.  The syntax amounts to @command[s-expression stuff where whitespace does not matter]{text stuff which can have more @command stuff in it and whitespace generally does matter}.  The brilliance is that everything is executed and there are no heuristics you need to guess at and that fall down.
    • Our limitation is that while Racket is a prefix language and can use reader macros and have the entire documents be processed in the same fashion as source code and totally understood by the text editor, such purity is somewhat beyond us.  But we do what we can.
  • Use narcissus, Brendan Eich/mozilla’s JS meta-circular interpreter thing, to handle parsing JavaScript.  Although we don’t have reader macros, we play at having them.  If you’ve ever tried to parse JavaScript, you know it’s a nightmare that requires the lexer to be aware of the parsing state thanks to the regexp syntax.  So in order for us to be able to parse JavaScript inline without depending on weird escaping syntaxes, when parsing our documents we use narcissus to make sure that we parse JavaScript as JavaScript; we just break out when we hit our closing squiggly brace.  No getting tricked by regular expressions, comments, etc.
  • Use the abstract interpreter logic from Patrick Walton‘s jsctags (and actually stole its CommonJS-ified narcissus as the basis for our hacked up one too) as the basis for abstract interpretation to facilitate being able to linkify all our JavaScript code.  The full narcissus stack is basically:
    • The narcissus lexer has been modified to optionally generate a log of all tokens it generates for the lowest level of syntax highlighting.
    • The narcissus parser has been modified to, when generating a token log, link syntax tokens to their AST parse nodes.
    • The abstract interpreter logic has been modified to annotate parse nodes with semantic links so that we can traverse the tokens to be able to say “hey, this is attribute ‘foo’ in an object that is argument index 1 of an invocation of function ‘bar'” where we were able to resolve bar to a documented node somewhere.  (We also can infer some object/class organization as a result of the limited abstract interpretation.)
    • We do not use any of the fancy static analysis stuff that is going on as of late with the DoctorJS stuff.  Automated stuff is sweet and would be nice to hook in, but the goal here is friendly documentation.
    • The abstract interpreter has been given an implementation of CommonJS require that causes it to load other source documents and recursively process them (including annotating documentation blocks onto them.)
  • We use bespin as the text editor to let you interactively edit code and then see the changes.  Unfortunately, I did not hook bespin up to the syntaxy magic we get when we are not using bespin.  I punted because of CommonJS loader snafus.  I did, however, make the ‘apply changes’ button use narcissus to syntax check things (with surprisingly useful error messages in some cases).

Extra brief nutshell facts:

  • It’s all CommonJS code.  The web enabled version which I link to above and below runs using a slightly modified version of Gozala‘s teleport loader.  It can also run privileged under Jetpack, but there are a few unimplemented sharp edges relating to Jetpack loader semantics.  (Teleport has been modified mainly to play more like jetpack, especially in cases where its aggressive regexes go looking for jetpack libs that aren’t needed on the web.)  For mindshare reasons, I will probably migrate off of teleport for web loading and may consider adding some degree of node.js support.  The interactive functionality currently reaches directly into the DOM, so some minor changes would be required for the server model, but that was all anticipated.  (And the non-interactive “manual” language already outputs plain HTML documents.)
  • The web version uses a loader to generate the page which gets displayed in an iframe inside the page.  The jetpack version generates a page and then does horrible things using Atul‘s custom-protocol mechanism to get the page displayed but defying normal browser navigation; it should either move to an encapsulated loader or implement a proper custom protocol.

Anywho, there is still a lot of work that can and will be done (more ‘can’ than ‘will’), but I think I’ve got all the big rocks taken care of and things aren’t just blue sky dreams, so I figure I’d give a brief intro for those who are interested.

Feel free to check out the live example interactive tutorialish thing linked to in some of the images, and its syntax highlighted source.  Keep in mind that lots of inefficient XHRs currently happen, so it could take a few seconds for things to happen.  The type hierarchy emission and styling still likely has a number of issues including potential failures to popup on clicks.  (Oh, and you need to click on the source of the popup to toggle it…)

Here’s a bonus example to look at too, keeping in mind that the first few blocks using the elided js magic have not yet been wrapped in something that provides them with the semantic context to do magic linking.  And the narscribblus repo is here.

Tagged , , , ,

test-case-mode support for jetpack unit tests in emacs

Use Jetpack?  Occasionally write unit tests so you won’t be a complete hypocrite when criticizing other people’s code?  Think that picture up above looks more useful than this?:

error: fail: list contents ("4,5,6,7" != "3,4,5,6,7")
info: Traceback (most recent call last):
  File "resource://wmsy-jetpack-core-lib/timer.js", line 57, in notify
    this._callback.apply(null, []);
  File "resource://wmsy-jetpack-core-lib/unit-test.js", line 257, in anonymous
    timer.setTimeout(function() { onDone(self); }, 0);
  File "resource://wmsy-jetpack-core-lib/unit-test.js", line 282, in runNextTest
    self.start({test: test, onDone: runNextTest});
  File "resource://wmsy-jetpack-core-lib/unit-test.js", line 300, in start
  File "resource://wmsy-jetpack-core-lib/unit-test.js", line 57, in runTest
  File "resource://wmsy-wmsy-tests/test-vs-static.js", line 37, in anonymous, 2, 2);
  File "resource://wmsy-wmsy-lib/wmsy/viewslice-static.js", line 51, in anonymous
    this._list.slice(this.bufLow, this.bufHigh));
  File "resource://wmsy-wmsy-tests/test-vs-static.js", line 31, in anonymous
    "list contents");
  File "resource://wmsy-jetpack-core-lib/unit-test.js", line 229, in assertEqual;
  File "resource://wmsy-jetpack-core-lib/unit-test.js", line 147, in fail

Then do I have an .el for you!  This lives here.  You mileage may vary and may involve things catching on fire which can, in turn, affect your mileage.

Tagged , ,

( CouchDB services offline)

This means doccelerator and my Tinderbox scraper that chucked stuff into a CouchDB for exposure by a modified bugzilla jetpack.  Either couch went crazy or someone gave it a request that is ridiculously expensive to answer with a database the accumulated size of the tinderbox database.  I’m not aware of any trivial ways to contend with the latter.  Judging by the logs, it looks like 2 people other than myself used these services, so, my apologies to those cool, insightful, forward-looking individuals.

The specific services are probably not coming back.  doccelerator is being subsumed into something else that better meets documentation needs and will be more fully baked, more on that soon.  I got the impression at the summit that the tinderbox problem is in hand, or very close to someone’s hand; maybe one of those robot grabby-arm things is involved?  My reviewboard with bugzilla hacks instance is sticking around for the time being, but I think wheels are turning elsewhere in Mozilla on that front too, so hopefully my install is mooted before it falls over.

Lest there be any doubt, all clicky services are provided on a self-interested basis… I am happy when they benefit others, but I do these things for the benefit of my own productivity/sanity and they are hosted using my own resources.  (While I had higher hopes for doccelerator, the MoMo-resourced couchdb service provisioning never happened so doccelerator never got pushed public with nightly updates because of said clicky resource constraints.)

Tagged , , ,

ediosk: an emacs buffer switcher for the rest of us

Emacs users and would-be emacs users, are you tired of those emacs developers in their ivory towers not supporting buffer switching via touch-screen on a computer that’s not running emacs and using modern web browser technology instead of disproven parentheses-based technology?  Be tired no more!*

Thanks to Christopher Wellons and Chunye Wang’s work on emacs-httpd it is a simple matter to expose a JSON representation of the current set of frames/windows/buffers in your emacs session and provide non-REST manipulation mechanisms via a webserver implemented in elisp.

Once you have exposed an API, it is a subsequent simple matter to implement some JavaScript that understands these things and presents a nice UI.  In this case, we have used the Jetpack SDK, wmsy (the Widget Manifesting SYstem, an widgeting framework I am developing), and protovis.

The screenshot basically captures the entire feature-set:

  • A protovis-based visualization that shows the location of all of the emacs “windows” (the things that show buffers).  Emacs reports to us the pixel-space coordinates/sizes of the “frames” (GUI windows) and “windows”, so this all comes magically for free.  The downside is your emacs windows need to be in the same coordinate space, so use of multiple X displays without use of DMX will likely lead to weird overlap.
    • The selected “window” in each “frame” gets a diamond.  The active frame’s diamond gets to be black instead of gray.
    • Clicking on a “window” focuses/raises the containing “frame” and selects the “window”.
  • Buffers grouped by the directory they live in (if they have a backing file).
    • Buffers visible in windows have their background composed of the colors for all the windows they are in.
    • Buffers that are modified have their text colored red.
    • Buffers that have not been freshly displayed in a window recently have their text colored grey.
    • Clicking on a buffer displays it in what the UI believes to be the currently selected frame’s currently selected window.

* This entire paragraph is a joke**; no flaming necessary.

** ediosk is not a joke though.  I seriously have a touch-screen monitor hooked up to my windows build machine to the right of my two monitors hooked up to my linux/primary development machine.  While c-x b (icicle mode) will still be my dominant buffer switching mechanism, I expect ediosk to prove useful/amusing for cases where the number of buffers greatly exceeds my mental stack, when I am switching contexts, or when I am working in multiple branches simultaneously.

Tagged , , ,

understanding where layout goes wrong with gecko reflow debug logs (Part 1)

HTML and CSS provide awesome layout potential, but harnessing that potential to do your bidding can sometimes be problematic.  I suspect all people who have written HTML documents have been in a situation where they have randomly permuted the document structure and CSS of something that should work in the hopes of evolving it into something that actually does work.  Matching wits with a black box that is invulnerable to even pirate-grade profanity is generally not a pleasant experience.

It turns out gecko has a way to let you see inside that black box.  Well, actually, multiple ways.  There’s a layout debugger (that seems broken on trunk?) that can display visual overlays for box sizes/events/reflow counts, dump info about paints/invalidations/events as they happen as well as dumping the current content tree and frames.  Even better, gecko’s frame reflow debugging mechanism will dump most of the inputs and outputs of each stage of reflow calculations as they happen.  With some comparatively minor patches[1] we can augment this information so that we can isolate reflow decisions to their origin presentation shell/associated URL and so that we know the tag name, element id, and class information on HTML nodes subjected to reflow calculations.  A reasonably sane person would want to do this if they were planning to be doing a lot of potentially complicated HTML layout work and would a) benefit from better understanding how layout actually works, b) not want to propagate layout cargo culting or its ilk from the code being replaced, c) not want to waste days of their lives later the next time this happens, d) help locate and fix layout bugs if bugs they be so that all might benefit.

Of course, with logs that effectively amount to execution traces, examining them by hand is frequently intractable unless you really know what you’re looking for or are dealing with a toy example.  My non-reduced problem case resulted in 58,107 lines, for one.  So writing a tool is a good idea, and writing it in JS using Jetpack doubly so.

In any event, the problem is I am trying to use the flexible box model to create an area of the screen that takes up as much space as possible.  In this space I want to be able to house a virtual scrolling widget so I use “overflow: hidden”.  Regrettably, when my logic goes to populate the div, the box ends up resizing itself and now the whole page wants to scroll.  Very sad.  (Things work out okay with an explicitly sized box which is why my unit tests for the virtual scrolling widget pass…)

Let’s formulate a query on the div of interest (which I will conceal) and then see what the first little bit of output is:

*** Box 24 tag:div id: classes:default-bugzilla-ui-bug-page-runBox
*** scroll 25 tag:div id: classes:default-bugzilla-ui-bug-page-runs
scroll 25 variant 1  (parent 24) first line 406
  why: GetPrefSize
  inputs: boxAvail: 0,UC
          boxLast: 0,0
          reflowAvailable: 0,UC
          reflowComputed: 0,UC
          reflowExtra: dirty v-resize
  output: prefWidth: 0
          minWidth: 0
          reflowDims: 0,0
          prefSize: 0,0
          minSize: 0,0
          maxSize: UC,UC
  parent concluded: minSize: 0,0
                    maxSize: UC,UC
                    prefSize: 2,0
scroll 25 variant 2  (parent 24) first line 406
  why: Layout
  inputs: boxAvail: 771,1684
          boxLast: 0,0
          reflowAvailable: 771,UC
          reflowComputed: 771,1684
          reflowExtra: dirty dirty-children h-resize v-resize
  output: prefSize: 0,0
          minSize: 0,0
          maxSize: UC,UC
          reflowDims: 771,1684
          layout: 2,0,771,1684
  parent concluded: minSize: 0,0
                    maxSize: UC,UC
                    prefSize: 0,0
                    layout: 0,0,773,1684

This is the general pattern we will see to the reflows.  The parent will ask it what size it wants to be and it will usually  respond with “ridiculously wide but not so tall”.  (Not in this first base case, but the next one responds with a prefsize of “1960,449”, and that’s pixels.) The parent will then perform layout and say “no, you need to be taller than you want to be”, at least until I start cramming stuff in there.

So we skim down the output to find out where things first went off the rails…

scroll 25 variant 16  (parent 24) first line 20548
  why: GetPrefSize
  inputs: boxAvail: 1960,UC
          boxLast: 771,1686
          reflowAvailable: 1960,UC
          reflowComputed: 1960,UC
          reflowExtra: dirty-children h-resize v-resize
  output: prefWidth: 1960
          minWidth: 352
          reflowDims: 1960,1755
          prefSize: 1960,1755
          minSize: 352,1755
          maxSize: UC,UC
  parent concluded: minSize: 0,0
                    maxSize: UC,UC
                    prefSize: 1962,1755
scroll 25 variant 17  (parent 24) first line 20548
  why: Layout
  inputs: boxAvail: 771,1755
          boxLast: 1960,1755
          reflowAvailable: 771,UC
          reflowComputed: 771,1755
          reflowExtra: dirty-children h-resize
  output: prefSize: 0,0
          minSize: 352,1755
          maxSize: UC,UC
          reflowDims: 771,1755
          layout: 2,0,771,1755
  parent concluded: minSize: 0,0
                    maxSize: UC,UC
                    prefSize: 0,0
                    layout: 0,0,773,1755

Okay, that looks pretty likely to be the area of concern.  The parent asked it for its ideal size, so it told it, but then the parent apparently decided to enlarge itself too.  That is not what we wanted.  We would have been cool if just the scroll #25 enlarged itself (or its block child #26 that corresponds to the same content node but which I have elided because it always says the same thing as its parent #25) since some frame needs to end up holding the overflow coordinate space.

Thus concludes part 1 of our exciting saga.  In part 2, we hopefully figure out what the problem is and how to fix it.  Lest anyone suggest the root problem is that I am completely off base and am not remotely reasonably sane for choosing this as a strategy to solve the problem… it works in chrome.  Which is not to say that my html/css is correct and firefox’s layout is wrong; it’s quite possible for layout engines to err or deal with unspecified behaviour cases in my favor, after all.  But it does make me want to understand what the layout engine is thinking and be able to do so with a minimum of effort in the future, since I doubt this is the last time I will not immediately understand the problem or that layout engines will differ in their behaviour.

For those who want to play along at home: the raw gzip’d /tmp/framedebug file (gunzip to /tmp/framedebug) that is the entirety of the trunk firefox log with debug output, the spliced output just for the one window derived from an invocation of “cfx run splice” (it will show up under the URL in /tmp/framedumps), and the output of the output of “cfx run summarize serial-12-0 summarize unique 22,24,25,26.  Those unique identifiers are deterministic but arbitrary values for the given file.  We discovered them by using the query on the CSS class using “cfx run summarize serial-12-0 summarize class wlib-wlib-virt-wlib-virt-container-root”.  The hg repo for the processing tool is here, the mozilla-central patches are: first and second.  A minor jetpack patch is also required for the command line stuff to work.

1: I was initially trying to avoid patching anything.  This didn’t work out, but it did cause the initial log file splicing logic to leverage the arena allocation scheme of presentation shells to allow us to to map frames back to their URLs.  Sadly, it turned out the arena allocation blocks requested from the upstream allocators are really small (4k or 1k) and all from the same source and so I had to instrument the allocation as well as adding debug output of the window/docshell/presshell linkages.  The output adds an unacceptable additional level of noise to the default DEBUG case; the right thing to do is likely to cause the reflow log debugging to emit the document URL before each logging outburst if it is different from the last outburst.

Tagged ,

Thunderbird Quick Filter Bar extensions, they’re a thing!

The previously discussed Quick Filter Bar interface landed in time for Thunderbird (Lanikai) 3.1 beta 2 (whose release is real-soon-now).  Although the Quick Filter Bar already contains dangerously high levels of awesome, we made sure to make it extensible so you can cram even more awesome in.

As an example, I have created an extension that enables you to ‘pivot’ based on the currently selected message(s).

In its most straightforward manner of operation, you can click on an e-mail address in the message reader header and pivot by that e-mail address.  Out of the box, this will show you all the messages in the current folder sent by that user.  You can also (or only) show messages where that user is on the to/cc lines by toggling the buttons on the expando bar appropriately.

You can also constrain the display to only show messages within some time interval of the message(s) in question.

The more confusing way to access the pivot functionality is to simply toggle the facet on the quick filter bar.  When you toggle the filter on, we walk all of the selected messages and build up a list of the observed e-mail addresses for all of the senders and all of the recipients.  One trick is that we filter out any e-mail addresses associated with your own account in order to avoid the filter becoming too inclusive.  We save those e-mail addresses into our state.  We also walk all the messages and find the time range that the messages cover and save that.  These are used as the basis for the specific constraints you can enable from the expando bar.  Because the values are saved, the filter does not change when you change your selected messages.  You need to toggle the pivot filter off and on again (or use the ‘pivot’ option on the email address popup) to change the data we are pivoting on.

The extension can be found on AMO and in source form.  In a forward-looking move that is probably complicating things for now, I used the Jetpack SDK and its XUL extension support to implement the extension.  In practice, all the example uses Jetpack for is its CommonJS module system and the standard mozilla-style JS modules would have been sufficient.  All the UI stuff is done purely in XUL overlays bound to the backing code using addEventListener.

The broad strokes for implementing a Quick Filter Bar extension are:

  1. Overlay your toolbarbutton into the quick filter bar, preferably the #quick-filter-bar-collapsible-buttons box.  If you have an expando bar you want to contribute, you probably want to put that into the #quick-filter-bar-expando.  As things currently stand, multiple expando bars being uncollapsed at the same time will probably turn out poorly, so you may need to be pragmatic and put your bar outside that box or do something clever.
  2. Import resource:///modules/quickFilterManager.js, implement your filter, and register it with QuickFilterManager.defineFilter() (whose documentation explains the interface you need to conform to).

We will likely not be doing up full documentation for this extension point beyond the in-code documentation and in-tree/extension examples.  This is an advanced extension point by virtue of it touching nsIMsgDBHdrs, nsIMsgDBViews, nsIMsgSearchSessions, and friends, all of which have sharp edges.  The way forward with soft, human-friendly documented abstractions is gloda and that’s where (my) effort will be concentrated.  The awesome in this extension point is making it practical for people who already have the scars to utilize the full power of the message view/traditional search subsystem without losing a limb and do so in a previously impossibly short period of time.

PS: I made the icon!

Tagged , ,

work-in-progress tooling: feeding GWT SpeedTracer systemtapped mozilla performance data

Now that Thunderbird 3.1 is string/feature-frozen it’s time to focus on performance.  This post is just to make sure that people who are also working on similar things know what I’m up to.  For example, there’s some very exciting work going on involving adding a startup timeline to mozilla-central that everyone keeping up-to-date with mozilla performance work should be aware of.  In summary, no results yet, but soon!

The big bullet points for now are:

  • GWT SpeedTracer is a very impressive tool.
  • I have seriously cleaned up / automated my previous SystemTap script work for mozilla.  Invoking python speedtracer/mozspeedtrace.stp `pgrep thunderbird-bin` does the following:
    • generates a version of the source script with any line statements fixed up based on one or more line seeks in a file.
    • builds the command line and invokes the system tap script.
    • invokes a post-processing script specified in the tap file, feeding it address-translation helpers and systemtap bulk logging per-cpu fusion.
  • mozspeedtrace.stp and its post-processor produce an HTML file with embedded JSON that the SpeedTracer Chrome extension recognizes as one of its own.  (SpeedTracer runs in Firefox through the GWT development mode, but being very new to GWT I am not yet clear on if/how to be able to wrap it up to be used as a webapp from static data.)
  • My mapping of events as recorded by my SystemTap probes to SpeedTracer events is somewhat sketchy, but part of that is due to the limited set of events and their preconfigured display output.  (I believe it turns out I can generate optional data like fake stack traces to be able to expand the set of mapped events without having to modify SpeedTracer.)  This is way basically everything in the screenshot is a yellow ‘JavaScript Callback’.
  • I have not implemented probes for all of the event types defined by SpeedTracer, I have not ported all of my previous probes yet, and there are still some new probes yet to add.  Happily, most of the hard work is already in the can.
  • The motivation behind this is very different from the startup timeline.  Thunderbird uses mozilla-1.9.2, not mozilla-central, and my primary investigatory focus is memory usage in steady-state after start-up, not the time it takes to start up.  (We do, of course, want to improve performance where possible.)  It just so happens that in order to usefully explain who is allocating memory we also have to know who is active at the time / why they are active, and that provides us with the performance timeline data.
  • This work is not likely to be useful for per-tab/webpage performance data gathering… just whole-program performance investigation like Thunderbird needs.
  • This is a work-in-progress, but the focus is on improving Thunderbird’s performance issues, not building tooling.  As such, I may have to bail on SpeedTracer in favor of command-line heuristic analysis if I can’t readily modify SpeedTracer to suit my needs.  I picked SpeedTracer partially to (briefly) investigate GWT, but from my exploration of the code thus far, the activation energy required may be too great.
Tagged , ,

a systemtap script to attribute JS string memory costs to the creating JS functions

----- js_GC data pages: 96731 total vm pages: 234849
   ###      KiB
 11376      839 BF_format                                        log4moz.j:487
 15906      725 mime_emitter_writeBody                           jsmimeemitter.j:381
   148      576 MimeMessage_coerceBodyToPlaintext                mimemsg.j:341
  4460      492 gloda_ds_queryFromQuery                          datastore.j:2988
  9655      387 <MYSTERIOUS>                                     <BEYOND THE VEIL>:0
   911      237 gloda_content_quoted                             connotent.j:267
  1723      144 gloda_indexMessage                               index_msg.j:2597
   187      141 gloda_content_content                            connotent.j:290
   170      108 gloda_content_getContent                         connotent.j:166
  1917       73 gloda_identity_toString                          datamodel.j:785
  1775       69 Log4Moz_enumerateProps                           log4moz.j:220
value |-------------------------------------------------- count
    0 |@                                                    366
    1 |@@@@                                                1371
    2 |@@@@@                                               1584
    4 |@@@@@@@@@@@@@@@@@@@@@@@@                            6863
    8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@          11784
   16 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@      12794
   32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  14016
   64 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                      8662
  128 |@@@@@@@@@@@@@@                                      4093
  256 |@@@@@                                               1567
  512 |@                                                    309
 1024 |                                                     105
 2048 |                                                      58
 4096 |                                                      28
 8192 |                                                      17
16384 |                                                      12
32768 |                                                       0
65536 |                                                       0

The script.  It tells you what JS code has been creating strings since the last GC and the overall distribution of the marginal costs of those strings, leaving out puny code that allocates less than 64k’s worth.  Marginal costs because spidermonkey has support for dependent strings.  “a += b” will actually result in the storage of a being realloc’d with a pointing into the storage of a’.  I forgot to check for the js_ConcatStrings case at first and, let me tell you, the stats were not pretty.  Poor mime_emitter_writeBody was accused of allocating 11 gigabytes of strings in one intra-GC cycle!

What is interesting about this script is that it is actually pulling all of the function name/script name/line number information directly out of a JSContext object.  What is exciting is that it was pretty easy to do thanks to systemtap’s understanding of DWARF debug symbols.  This is in severe comparison to my efforts using VProbes where there was no such support and I had to write a gdbpython script to begin to approach the same thing.  Also, because systemtap’s guru mode allows you to bail into real C code when you need, I was able to write a function that truncates the jschar (16-bit unicode) strings that make up the function names into ASCII that is both straightforward and efficient.  It also allowed for a quick/easy/fast basename() implementation too.

It is worth noting that spidermonkey does have built-in (conditionally compiled) static probe points originally created for DTrace and that they do provide more generic object lifetime hooks.  Although they were created for DTrace, systemtap is also able to use them.  The limitations with those probes is that once any probe is active using a hook, a non-trivial cost is incurred because filtering can only be performed inside the probe and you may receive orders of magnitude more calls than you actually care about.  The cost is worst for probes that provide function names since that requires a call to js_GetStringBytes every time (although there is some caching.)  Keep in mind that I’m not saying the systemtap userspace probes are free, but since they can be placed at arbitrary points, you can have probes that fire only in the cases you actually care about, only incur data-processing costs when desired, and are disturbingly easy to implement since you can just reach into the data structures you care about.

Since everyone loves an implementation fight, DTrace, great and magical though it is, is basically incapable of pulling off the JS stack-walking trick that systemtap makes possible.  Unless DTrace has grown these features since I last checked, it lacks the DWARF magic to do such things easily and the language intentionally avoids the control-flow constructs that would allow it to do anything quite so complex.  And you can entirely forget about arbitrary C code for the unicode string truncation…

This post is brought to you by efforts to decrease the memory footprint of Thunderbird when gloda’s indexer is active.  Because Thunderbird loves you.  But it loves your memory more.  For now.  That’s what we’re trying to fix.