pecobro, the tell-you-who-reads/writes-what performance code browser

No cool pictures, but I’ve enhanced and exposed the global reader/writer understanding of javascript code.  In other words, pecobro now does a passable job at telling you what global variables a given javascript file reads from/writes to, and who else writes to/reads from those variables.  False negatives are expected (don’t rely on things to be exhaustive), and false positives are quite conceivable.

As an example starting point, take a look at messageWindow.js’s global reads or global writes.  From either of these you can click on other files in the sidebar to go to them.  There is no execution trace, so the overview diagram won’t be any help.  Be sure you have some free memory available and won’t try and hurt me if Firefox does something crazy.  Note: for long documents, some delay is expected as it attempts to apply various fix-ups.  Also note: clicking on things in the global reads/global writes tab won’t get you anywhere for now.

A quick example of a global read from that file (and who writes to that global):

gDBView

  • Top Level: commandglue.js
  • Function: ClearThreadPane (in commandglue.js)
  • Function: CreateBareDBView (in commandglue.js)
  • Function: RerootFolder (in commandglue.js)
  • Function: SwitchView (in commandglue.js)
  • Function: openFolderTab (in mailWindowOverlay.js)
  • Function: setMailTabState (in mailWindowOverlay.js)
  • Function: restorePreSearchView (in searchBar.js)
  • Function: MsgGroupBySort (in threadPane.js)

A quick example of a global write from that file (and who reads from that global):

SelectFolder

  • Function: OpenMessageByHeader (in mailContextMenus.js)
  • Function: OpenInboxForServer (in mailWindow.js)
  • Function: selectFolder (in mailWindow.js)
  • Function: openFolderTab (in mailWindowOverlay.js)
  • Function: LoadNavigatedToMessage (in messageWindow.js)
  • Function: LoadNavigatedToMessage (in msgMail3PaneWindow.js)
  • Function: OnLocationTreeSelect (in msgMail3PaneWindow.js)
  • Function: SelectServer (in msgMail3PaneWindow.js)
  • Function: loadStartFolder (in msgMail3PaneWindow.js)
  • Function: RenameFolder (in widgetglue.js)
  • Function: loadInboxForNewAccount (in accountUtils.js)
  • Function: DropOnFolderTree (in messengerdnd.js)
  • Function: CrossFolderNavigation (in msgViewNavigation.js)

The code, as always, is available at http://hg.mozilla.org/users/bugmail_asutherland.org/pecobro/

master control pecobro, the overkill performance code browser

The title isn’t true yet, but it’s disturbingly almost-true.  Pecobro’s hackish underpinnings have been “accidentally” replaced with surprisingly forward-looking code capable of supporting a much fancier feature set.  (I didn’t mean to, but I got tricked because the incremental cost to doing the ‘right thing’ was always so low.)

The trace that you can see here, by clicking on any of this text what has coloring (and using firefox 3 in a session that you don’t mind if it crashes), is of running Mark Banner (Standard8)‘s Thunderbird bloatTest on OS X with DTrace probes enabled, but without actually doing the bloat testing.  So Thunderbird starts up, opens the address book, closes it, opens the compose window, closes it, and then quits.

Here is a preliminary processed trace from a run triggering bug 296453.  Be forewarned that there is some missing information from the underlying trace and so it’s not all that it could be.  I think the XPConnect probes need to be fleshed out slightly more (and processed).

The code (both pecobro and mozilla codebase patches) is at: http://hg.mozilla.org/users/bugmail_asutherland.org/

What crazy overkill things can pecobro do now?

  • Parse all of the javascript code used in Thunderbird, including nearly all (maybe all?) of the 1.x enhancements.  The parser is still a bit hackish, especially when it comes to support of regexes, but it actually parses things.  Thanks to Chris Lambrou for posting his initial JavaScript.g antlr3 grammar (and BSD licensing it) that I used as a basis for my hacky version.  This was certainly a learning experience; I now know that javascript can never replace python in my heart…
  • Find all of the places functions can be defines in javascript code and do a pretty good job at figuring out reasonable names for them.  This even includes “(new)? Function()”.  This allows us to use the function-info DTrace probes to use line number information to identify functions.  We do this so that we can identify anonymous functions.
  • Handle XBL as well as we handle pure-javascript files.  (Parsing/AST fix-ups/syntax highlighting with AST traversal/etc.)
  • Parse/interpret disturbingly large portions of makefiles.  Sure, it ignores the rules, and I’ve only implemented the functions I require, but it does all the conditional and include stuff with variable expansion, including some key functions like foreach/filter/etc.
  • Parse jar manifests sufficiently well to figure out what file ends up at what chrome path.
  • Cache its javascript parse results at both the AST and (sorta-semantic) processing level, among other things.  Hooray for cerealizer which dares persist/de-persist that which pickle dare not.  (Nested classes.)  (Note: We don’t cache XBL stuff yet, and the ASTs are large enough that de-persisting them can be quite perceptible.)

What is not crazy, but still nice:

  • The traces are now much more reliable thanks to modifications to the Mozilla USDT probes to provide sufficient information for us to distinguish between JavaScript execution contexts (namely, JSContext).  Modifications were also made to avoid the ‘guessing’/ghost value problems that happen when native functions were called.
  • A foolish bug in the sparkbar calculation has been fixed.  aka “Now With Accuracy”!
  • If a function in a trace called another function or was called by another function, this will be displayed inline in the source display.
  • Infer ‘native’ functions (with the above additional modifications to the Mozilla USDT probes), assigning them to an object.  This ends up being Object, ChromeWindow, or other native Class type.  Some of this may be mooted by getting the XPConnect probes reliably in the mix.

What doesn’t it do?

  • Add the XPConnect js_Invoke probes; I punted on that because that was turning out to be difficult to get right.
  • It ignores .xul files for now.  Although xul files primarily appears in a caller context (as told by function-info probes), they can also be a callee when someone pulls a fast one and inlines some simple code in a script tag.  We brutally mis-attribute the call to the previous function when this happens.  This will eventually get resolved because we will need to understand .xul files for namespace reasons.  Also, people sound like they’re interested in who overlays what and the like, and that’s sorta right up our alley thanks to all the overkill stuff we do.
  • Exhaustively determine reads from/contributions to the ‘global’ (window or what not) namespace/scope.  The groundwork is there in terms of understanding the contribution of top-level statements to the namespace or its reads from it, but we don’t traverse into functions.
  • Associate functions with an object/type (ignoring the native function case).  This requires more semantic understanding.
  • Clicking on functions still doesn’t do anything.  I disabled that before my first post on pecobro due to firefox-crashing issues, and still haven’t gotten around to resolving it.
  • A usable overview visualization.  The overview diagram has become cluttered by the presence of so many ‘relevant’ (touched) files.

pecobro: the performance code browser (early stage)

At the beginning of last week, I had gotten dtrace working on a Mac Mini using the Mozilla javascript-provider probes.  Very cool stuff, but it left me with several questions about what would really be best to do next:

  • How do I best understand what I’m seeing?  (Most of the codebase is brand new to me…)
  • How do I share the data with others in a way that is both comprehensible and allows them to draw their own conclusions from the data?
  • What can I do to reduce the effort required to work on performance problems?

I was tempted to just try and just dig into the system so I could have something to show immediately, but I knew it would still take a while to see the big picture just using an editor/ctags/lxr/opengrok, even informed by dtrace.  And even then, that big picture doesn’t scale well; whatever picture I managed to formulate would be stuck inside my head…

So my solution was to try and build a tool that could help me accomplish my short-term goals soon, and have the potential to grow into a usable solution to all of the above… eventually.  The goal, in a nutshell, is to provide a code browser for javascript that is able to integrate performance information (retrieved from traces) alongside the code.  Seeing that lxr/mxr and opengrok didn’t understand javascript or XBL all that well, it also seemed feasible to try and improve on their browsing capabilities for javascript.  A far-down-the-road goal is also to be able to pull in information from the underlying C++ code as well, potentially leveraging dehydra, etc.  (This would primarily be for understanding what happens when we leave the javascript layer, not trying to be the same solution for C++ space.)

So what can it do so far?  You can go try it for yourself if you like as long as you keep your expectations very low and realize the current state does not reflect all of the bullet points below.  Also, you probably want firefox 3.0.  Or you can read my bullet points:

  • Parse custom DTrace script output!  The Mozilla DTrace probe points could probably use a little love to improve what we are able to get out.  Also, I think it’s betraying us somewhere.
  • Parse JavaScript! Sorta!  (I hacked in support for the regular expression syntax, but I haven’t corrected the ambiguity with division, so things with division break.  Also, there’s at least one or two other glitches that cause early termination.) [Yay antlr!]
  • Parse XBL!  Even with entity inlining!  Even when people put #ifdefs in the XML document! Sorta!  We don’t actually do anything intelligent with the XBL right now or with its JavaScript, but it won’t take much to get that much improved. [Yay elementtree!]
  • Visualize some stuff!  Inter-file relationship graph in the overview.  In the code and ‘Funcs’ sidebar tab you get a sparkbar where each bar represents a time interval.  The height of the par is the percentage of possible time we could have spent in that time interval.  Red means we belive that time was spent in the function itself, green means we think we spent that time in calls to other functions. [Yay visophyte!]
  • Navigate with history!  Click on the overview graph and you go to things.  Click on the file names in the ‘Files’ list and you go to the files.  I tried to make it so you could click on function names in the side bars to go to them, but jquery.scrollTo and/or firefox 3.0b5 had serious crashing issues.  [Yay jquery, jquery.history!]
  • See syntax-highlighted code with random headings intertwined (shows the parser worked) and potentially a visualization.  [Yay pygments!]

My hope in the near-term is to fix the outright bugs (parsing issues), get XBL going, and then augment the function information with more trace-derived data including more traditional call-stacks, etc.  Then the tool should be sufficiently usable that my immediate focus can change to creating automated tests to actually gather performance/execution traces so we can use the tool for what I started it for.  This may also get shelved for a while if it turns out that we need action (patches) immediately.

Mozilla JavaScript DTrace Probes, visichron style

 dtrace javascript snippet

This visualization is the result of an adapted visichron.py script (from my chronicle-recorder ‘chroniquery’ bindings) run against the output of a custom DTrace script (using the Mozilla JS providers) on OS X.  It’s a proof-of-possibility rather than anything immediately useful.

The differences from the last visichron post (using chronicle-recorder as a back-end) are:

  • Node hues are distinct based on the file the javascript was executed from.  (Saturation still varies with amount of time spent in the function.)
  • Graph layout is done using graphviz’s neato’s “ipsep” (experimental) mode.  This works fantastically well at reducing/eliminating overlap.  Having said that, a hierarchical layout may make more sense.
  • Ring colors are based on call depth (so redundantly encoded with the ring radius) rather than any knowledge about the control-flow.  Full control-flow information is not readily available and would be extremely expensive, but we could provide at least some degree of approximation using the calls made by the function as indicators.  Of course, the ring visualization at this point and for this purpose is probably better represented as non-nested (side-by-side rings of different radii; not containing each other) faux-continuous ring slices with transparency varying by amount of time spent in the function at that call-depth.  This would simplify the object model, allowing for non-insane use of the SVG backend and interactivity enhancements.

scaled indexed

chronicle-recorder and amd64, hooray!

overview: visichron.py trace python -f main -d 3

My personal laptop rolls amd64-style (rather than i686), and chronicle-recorder’s valgrind component was not working on it (“illegal instruction”). I have done some vendor-branch dancing to get chronicle-recorder’s valgrind sub-directory to use valgrind 3.3.0. A bzr branch of just the valgrind subdirectory (drop-in or build separately and make sure you invoke valgrind from that location) is available here: http://www.visophyte.org/rev_control/bzr/valgrind-chronicle/valgrind-chronicle/

A probably faster location to bzr branch from is here: http://clicky.visophyte.org/rev_control/bzr/valgrind-chronicle/valgrind-chronicle/

and a tarball of the tip is here: http://www.visophyte.org/rev_control/tarballs/valgrind-chronicle-r6.tar.bz2

I have also updated chroniquery (my python binding for chronicle-query’s JSON interface, and its own set of tools that build on it) to work with amd64. Its bzr branch is here: http://www.visophyte.org/rev_control/bzr/chroniquery/trunk/

The goal of all of this was to be able to run chronicle against Thunderbird, which I was able to do. Unfortunately, visichron (the visualizing part of chroniquery using visophyte) is not quite ready for such an undertaking at this time.  (Not to mention a few C++ symbols issues…)

snippet: visichron.py trace python -f main -d 3

However, it was powerful enough to handle visualizing the trace resulting from invoking python on a python file with just “print ‘hello world'” in it. So that’s what you see here, albeit limited to only 3 call levels deep. Click on the upper picture to see the whole thing at 2000×2000, or just look on the lower picture to get an idea of what’s going on. Just like my first post using visichron, the rings are colored based on a (naive) control-flow-taken basis. The ring colors are per-function, however. Also, the node colors are ‘hot’ colored based on how many ‘ticks’ were spent inside the functions, multiple counting for recursion.

Other interesting changes include some primitive watch functionality for chronisole’s ‘trace’ mode. Also, the previously unmentioned readchron.py now understands and prints BunchedEffect and RegEffect info. (readchron aspires to be along the lines of readelf for chronicle databases, but with more colors.)

Chronicle-Recorder Graph/Ring Visualization! Hooray!

Thanks to new and improved time management skills (and a holiday weekend doesn’t hurt), I’ve got a chronicle-recorder visualization going on via chroniquery:

visichron.py trace vfancy -f smain

Above, we have the visualization run against the ‘fancy’ program seen in the previous chroniquery posts (with one caveat, addressed later). What does it mean?

  • The circular nodes are functions in the executed program. In this case, we start from ‘smain’ and pull in all the subroutines that we detect.
  • The edges between nodes indicate that a function call occurred between the two functions sometime during the execution; it could be once, it could be many times. The color of the edge is a slightly more saturated version of the color of the node that performed the call. If they call each other, only one color wins.
  • The rings around the outside of each node indicate when it was called, specifically:
    • The ring starts and stops based on the chronicle-query timestamp. Like on a clock, time starts at due north and flows around clock-wise, with the last smidge of time just before we reach due north again.  This has its ups and downs. The reason we are using ‘smain’ instead of ‘main’ is that when we used the untouched main, the first “new” ended up taking up most of our timestamp space. So I turned main into smain and had a new main that takes the memory allocator start-up cost hit and then calls smain.
    • The thickness of the ring indicates the depth of the call-stack at the time. The thickest ring corresponds to the outermost function, the thinnest ring to the innermost function. This results in a nice nesting effect for recursive functions, even if it’s more of an ‘indirect’ recursion.
    • The color of each ring slice is based on the control flow taken during the function call. (I think this is awesome, hence the bolding.) Now, I’m making it sound fancier than it is; as a hackish first pass, we simply determine the coverage of all the instructions executed during that function call. A more clever implementation might do something when iteration is detected. A better implementation would probably move the analysis into the chronicle-query core where information on the basic blocks under execution should be available. Specific examples you can look at above:
      • print_list: The outermost calls are aqua-green because their boolean arguments are true. The ‘middle’ calls are light green because their boolean arguments are false. The fina, innermost calls are orange-ish because they are the terminal case of the linked-list traversal where we realize we have a NULL pointer and should bail without calling printf. They are also really tiny because no printf means basically no timestamps for the function.
      • nuke: Four calls are made to nuke; the first and third times (light blue) we are asking to remove something that is not there, the second and fourth times (purple) we are asking to remove something that is. I have no idea why the third removal is so tiny; either I have a bug somewhere or using the timestamps is far more foolish than I thought.

Besides the obvious shout-out to chronicle-recorder, pygraphviz and graphviz power the graph layout. Now, a good question would be whether this actually works on something more complex? Could it be? Well, you can probably see the next picture as you’re reading this, so we’ll cut this rhetorical parade short.

visichron.py trace chronicle-query -f load_all_mmapped_objects

Besides the obvious font issues, this actually looks pretty nice. But what does it tell us? Honestly, the full traversal here is excessive. All we care about is the center node, load_all_mmapped_objects, and load_dwarf2_for (due north and a teeny bit east of the center node). If we look at the calls to load_dwarf2_for, we can see that two of them have different control-flow coverages. Those happen to be the times debug symbols could not be found (which is my problem I want to debug). The first one is for ld-2.6.1.so (I’m looking at the textual output of chronisole for the same command), and the second one is for /usr/bin/python2.5. The second one should definitely not fail, because it should find the symbols in /usr/lib/debug, but it does.

Unfortunately, the trail sorta stops cold there with a bug in either chronicle-query or in chroniquery (possibly cribbed from chronomancer). load_dwarf2_for should be calling dwarf2_load, but we don’t see it. I don’t know why, but I haven’t actually looked into it either. Rest assured that a much more awesome graph awaits me in the future!

Because there’s too much text and not enough pictures, I’ll also throw in the output of the ring visualization test which uses a stepped tick-count (1 per function call) to show how things could look if we went with an artificial time base…

ring visualization example

Bzr repositories for both can be found at http://www.visophyte.org/rev_control/bzr/.

Chroniquery: Chronicle-Recorder and Python Boogie Down

Robert O’Callahan‘s Chronicle-Recorder is one of the most exciting debugging tools to come down the road ever. In order to help me track down my PyXPCOM problem that has been a stumbling block to full Thunderbird integration, I’ve laid the groundwork for a nice Python-based interface to chronicle-query.

chronisole output on loopy

You can find it in a bzr repository at http://www.visophyte.org/rev_control/bzr/chroniquery/

It depends on some mods I’ve made to chronicle-recorder which can either be found in my (st)git repository (git-svnimport’ed) at http://www.visophyte.org/rev_control/git/chronicle-recorder.git/ on the acs-base branch. I have no idea how to have someone branch the repository and get stgit to actually understand the patches, so I have also exported the patches to http://www.visophyte.org/rev_control/patches/chronicle-recorder/. They should apply cleanly to rev 13/15 of the trunk. In a nutshell, the patches make things work with a somewhat-up-to-date Ubuntu gutsy (7.10) install, add the ability to retrieve information on compilation units including their memory offsets, add the ability to get source line information (with some overkill on determining the end of the current ‘line’), and a bugfix for negative line offsets.

It’s my understanding from roc’s blog that he’s been working on a proper debugger interface, so some of these patches may be redundant or otherwise rendered moot.

The picture above is a screenshot of Firefox rendering the styled HTML output of chronisole.py run against the loopy.c program in the chroniquery examples dir. (Click on the picture to go to the HTML.) It’s like a fibonacci generator, but it’s not. If you run chronisole in a terminal without the HTML args, it will output the equivalent, but you need to have a 256-color xterm. To summarize its actions, it finds all the compilation units residing in the executable which have low_pc/high_pc info and uses that as the basis of the ranges of interest for an instruction scan. It loops over the source-lines as they are executed in sequence, grabs the lines, and looks up the values of the locals just after the source line completes execution. If each local’s value differs or was not in the previously executed line’s locals, it gets to be yellow, otherwise it is a subtle color.

the beginnings of a gdb execution trace visualization

GDB execution trace vis 1

Python code driving gdb via the mi2 interface (using pexpect and pyparsing) begets a trace file. Python code processing the trace file begets an html document (with syntax highlighting thanks to pygments.) The above is just a screenshot of the output, which you can actually browse here.

The “visualization” is pretty simple. Lines of code with a light green background were executed in the trace. 15 lines of context are shown around executed lines of code. The bars to the left show when the line of code was executed during the trace. If the code was a function call, its bar is extended until the function call completed based on conclusions drawn from knowing the stack depth at each step. There is currently no way to visually distinguish from the graphic between a line executed many times in rapid succession and a call that lasted a long time. You would need to either look at the code or cheat and use firebug.

The visualization is of a call to commit() using pysqlite2 on sqlite3 which fails because of an outstanding select that has not yet had all its rows retrieved. My conclusion is that clever generator tricks (not mine!) can bite you when any piece of your stack has concurrency issues… and pysqlite2 should report the most specific error string possible. (Going to provide a patch; still think the trunk would provide the rather generic error that bit me.) For those using trac’s version control abstraction, be warned that changeset traversal is most definitely a (logic) generator wrapping a row-results-fetching generator.

GDB execution trace vis 2

The tracing mechanism is pretty straight-forward. The tracing class is told an interesting breakpoint and an ignore count. Once the count has been reached, “step” tracing begins. The idea was to do a first pass that just counts how many times the interesting breakpoint occurs before the program tanks, then we can skip to the last few before doing the (slow) step-wise tracing in the second pass. The goal would be to approximate a light-weight tracing mechanism where we can see what the program did before whatever bad state occurred. Gah! I was just going to say that there aren’t any open source tools readily available to accomplish that goal (dyninst/dpcl still require a bit of legwork), but it turns out that Chronicle (formerly Amber) has recently been released. (For whatever reason, Robert O’Callahan’s post didn’t show up in his feed via planet mozilla for me.) OTOH, this method is still potentially useful for remote gdb target debugging on actual hardware (albeit extremely slowly), or perhaps for a case where the program must do a lot of execution before the interesting part of the trace happens, lending itself to a non-valgrind solution. I presume you would pay the simulation price for valgrind the whole way through, unless valgrind can ‘absorb’ a running process into its simulation.

I also have a visualization done using visophyte’s model abstraction (if you are going to go and invent yet another multi-backend canvas abstraction layer, you might as well use it), but without some way to view the corresponding code and explore, it is useless. The generated HTML here is more useful, but still lacking in interactivity or true context. That would be the next step. The step after that would be throwing back in a concise and interactive visualization. The same effort required for that should allow me to produce interactive visualizations from thunderbird.