The joy of integrated logging and log viewing with fancy logs

The deuxdrop messaging experiment‘s current incarnation exists as an (under development) Jetpack that runs in Firefox.  I am still trying to shake out functionality to be driven by the UI rather than headless unit tests.  While firebug has been a great help, another boon has been the logging framework and log viewing framework developed for the unit tests.  (Previous posts here and here).  Since the log is made up of structured JSON data, all the log processing logic is written in JS, and the log viewing UI is HTML/CSS/JS, it is trivial to embed the log viewer into the Jetpack itself.

If you type about:loggest in the URL bar (or better yet, create a bookmark on the bookmark bar and click that), the log viewer is displayed.  Deuxdrop’s client daemon logic (which runs in a hidden frame), uses a log reaper that runs at 1-second intervals.  If anything happens log-wise during that second, it is packaged and added to a circular-buffer style list of the last 60 seconds where anything happened.  When the log viewer starts up, it asks for and receives the data.  The result looks like the above small screenshot.  If no errors were logged during the time interval, it is automatically collapsed.

Let us experience the joy of integrated logging by looking at a real problem I recently encountered.  In the development UI (accessible via about:dddev), I brought up a list of my contacts after starting a conversation.  It looks like this right now:

The problem is that I, the user, am “Andrew Sutherland” and should not be in my own contact list.  Also, the display should not be claiming there are an undefined number of unread messages from me, but that is likely fallout from the system intentionally not maintaining such information about me, the user.

I want to quickly figure out why this is happening, so I bring up about:loggest and open the most recent time entry to see what happened when this query was issued and filled:

I can see that the query ended up issuing database requests for both Walternate (purple) and myself (green), strongly suggesting that the database index being queried on names me.

I wonder if the conversation processing logic was the code that did this… let’s check by going to the time slice where the conversation was processed, expanding it, and only screenshotting some of it:

Yes, the conversation logic did this.  It’s generating index values in the peepData table for the idxPeepAny and idxPeepRecip indices.  But I thought my unit tests covered this?  Nope.  It turns that although we test that a peep query returns the right thing both cold from the database and inductively via notifications as contact relationships are established, we don’t issue a new query after creating a conversation.  Furthermore, we only issued queries against the alphabetical index, not against idxPeepAny.  So we rectify that by augmenting the unit test:

  // - make sure that the conversation addition did not screw up our peeps list
  T.group('check peeps list after conversation join');
  lqFinalAllPeeps = moda_a.do_queryPeeps("allPeepsFinal:any", {by: 'any'});

And the test indeed now fails:

The relevant bit is in the lower right, which I blow up here with the “unexpected event” obj popup displayed above it, and the “failed expectation” obj popup below it.  The postAnno stuff is to indicate what is new in the query result.  Because it’s a freshly issued query and this is the first set of results, everything is new.  It’s probably worth noting that these errors would usually show up as a single “mismatched” error instead of an unexpected/failed pair in our tests, but the specific logger was operating in unordered set mode because we don’t care about the exact order that different query notifications occur in, we just care that they do occur.

(The structure is intended to later be enhanced to provide a nicer visualization where we only show the contents of the “state” attribute and use preAnno to indicate annotations on a representation of the most recent state for the object (from a previous log entry) and postAnno to indicate annotations on the now-current representation “state”.  For postAnno, values of 1 represent an addition, and values of 0 represent a change or event firing on the object.)

A potentially even more exciting bit of integrated logging is that about:loggest-server opens a tab that retrieves its contents from the server.  When run with the –loggest-web-debug flag, the server loads a module that cranks up the logging and does the same 1-second interval log reaping magic and exposes it for HTTP retrieval.  While this specific configuration with the high level of detailed logging is only appropriate for a developer-machine test server, it is incredibly valuable to be able to just pop open another tab and see what the server just got up to.

In any event, I leave it as an exercise to the reader to assume that I will take care of the bug now that it’s extremely clear what the problem is.  Well, extremely clear if I had taken a bigger screenshot of the conversation creation log.  Above the region captured is a header that indicates the actions are being triggered by the ‘convJoin’ task and the entry (which is visible) indicates the update_conv_data function likely kicked off the database activity.

PS: All the gibberish looking characters in the screenshots are crypto keys or other binary data that lack aliases mapping them to known objects.  Examples of successfully mapped aliases are the colored blocks.  In the case of the conversation creation gibberish, we are seeing the conversation id.  Those aliases are generated as a separate pass by the log reaper by walking the set of existing queries and using helper functions to map the items currently exposed by the queries to human names because it’s quick and easy and is O(what’s being looked at) not O(size of the database).  In the case of the conversation, there was no query against the conversation and I also have not written the helper function yet, which is why it did not get aliased.  Unit tests don’t have the problem because we create aliases for everything.

overview ownership/communication graphs for rich execution logs

relationship-overview-spgm

My last blog post covered the fancy pants logging being used for the deuxdrop project.  To summarize:

  • we have loggers
  • they are organized by ownership hierarchy
  • they provide sufficient metadata that we can reconstruct the loggers that were talking to each other
This can result in a lot of loggers.  If you look at the log output from the last blog post (caution: BIG JSON file in standalone ArbPL instance) you might notice a list of loggers that looks something like the below, except not quite as sideways or shrunk:

last generation's logger hierarchy rotated sideways

Obviously, that much raw data is not super helpful.  So now we process the hierarchy, constructing graph nodes or aggregate graph nodes for things that are interesting in their own or in aggregate.  We use d3 of the vaunted protovis lineage to visualize the network and graphviz to lay it out.

d3 has a super-polished, fun-to-use interactive force-directed graph implementation, but graphviz’s circo layout produces better results.  Given that ArbPL already has a processing step for intake, it wasn’t too much extra work to include a step where we use the same JS code as on the client to generate a dot file, pass it to circo to lay it out, then extract the layout information and store it with the log data.  Many props to Gregoire Lejeune for his node-graphviz bindings that made it so easy to do.

overview-z-joined-notif-highlighted

Although the overview graph on its own is neat, it becomes useful by showing us the involved loggers/actors in a test step by highlighting them.  It does this by listening for wmsy‘s focus change events to know what the focused test step is.  (Since mouse interaction also causes toggling, it’s easiest to appreciate the process by using the up/down arrow keys to change focus without expanding the steps.  And if you want to toggle the expanded state without involving the mouse, you can hit the enter key.)

z-joined-step-callout-causes-highlighting

For reference, the steps look like the above, and the highlighted one has the ever-so-fashionable glowing fancy focus ring.

If you would like to see this for yourself, you can check out the archived (and therefore less likely to break) standalone version of the example log file: https://clicky.visophyte.org/examples/arbpl-loggest/20110720/.  The previously super-huge JSON file is now about an order-of-magnitude smaller because I started eliding large strings that are of no human interest.

new adventures in rich (execution) logs for debugging and program understanding

Understanding what is going on inside software can be very hard, even for the developers most familiar with the software.  During my time working on Thunderbird I used a variety of techniques to try and peer inside: printf/dump/console.log, debuggers, execution analysis (dtrace, chronicle recorder, with object diffs, on timelines), logging (log4j style, with  timelines, with rich data, extra instrumentation and custom presentations, prettier and hooked up to dump on test failures), improving platform error reporting, gdb extensions, control-flow analysis of SQL queries (vanilla, augmented with systemtap perf probes), performance analysis (VProbes with custom UI, systemtap, crammed into the SpeedTracer UI, custom UI with the async work), chewing existing log facilities’ output (TB IMAPgecko layout), and asynchronous operation causality reconstruction (systemtap, JS promises).

Logging with rich, structured data easily provided the most bang-for-the-buck because it:

  1. Provided the benefit of baked-in human meaning with some degree of semantic hierarchy.
  2. Was less likely to make wrong assumptions about what data was relevant.  How many times have you had to go back to change what a printf is logging?
  3. Once sufficiently automated, required no activation energy, no extra steps, and everyone can see the results.
However, it still has numerous downsides/gotchas:
  1. Potential performance impact, especially with rich, structured data.
  2. If people don’t write log statements, you don’t get log entries.
  3. If an error propagates and no one catches it or otherwise says anything about it, your log trace stops dead.
  4. Lack of context can hide causation and leave you filtering through tons of entries trying to reconstruct what happened from shadows on the cave wall.

 

As a result, when I recently started on a new project (implemented in JS), I tried to make sure to bake logging into the system from the ground up:
  • The core classes, abstractions, and APIs generate log entries automatically so developers don’t need to fill their code with boilerplate.
  • Loggers are built around object ownership hierarchies/life-cycles to provide context and the ability to filter.  This is in contrast to log4j style logging which is usually organized around code module hierarchies, noting that log4j does provide nested diagnostic contexts.
  • The test framework is designed around writing tests in terms of expectations around the loggers.  This helps ensure interesting things get logged.  It also improves the quality of the tests by making it easier to ensure the tests are really doing what you think they are doing.
  • Logger definitions explicitly name the log events they will generate and their semantic type, some of which have special handling.  The currently supported types are: state changes, latched states, events, asynchronous jobs (with separate begin and end entries), calls (which wrap a function call, catching exceptions), and errors.  This allows specialized processing and better automated analysis without having to try and reverse engineer the meaning using regular expressions.
  • Performance is addressed by instantiating different logger classes based on needs.  For modules not under test (or without logging desired), everything turns into no-ops except for events and errors which are counted for reporting to a time-series database for system health/performance/etc analysis.  The decision making process happens at runtime and is able to see the parent logger, so heavy-weight logging can be used on a statistical sampling basis or only for specific users experiencing problems/etc.
  • Loggers can give themselves complex semantic names that can be used to reconstruct relationships between loggers when the ownership hierarchy is not available or not appropriate.  For example, we can link both sides of the connection between a client and a server by having the loggers make sure to name themselves and the other side.
  • Simple wrapper helpers exist that make it easy to wrap a function so that a custom log entry is generated and it “holds” the call in limbo from whence it can later be “released”.  This allows unit tests to break complicated behaviour into discrete steps that humans can understand.  Far better to look at one thing at a time than eight things all jumbled together (or filtered back down to one, potentially hiding important details).

 

In any event, as one might surmise from the screenshots, this is more than a dream, it’s a pastel colored reality.

What are the screenshots showing?

  1. The logger hierarchy.  The colored bits are “named things”.  The test framework has the concept of things, actors, and loggers.  Each actor corresponds to exactly one logger and is the object on which tests specify their expectations.  Actors can be owned by other actors, resulting in a hierarchy we call a family.  Each family gets tagged with a distinct identifier that allows us to associate a color with them.  Things provide a human name to a (hopefully) unique string.  Things can be owned by actors and get tagged with the family name and so can be colorized.  In the logger hierarchy, the stuff to the right of the colon is the semantic name of the logger.  So “clientConn: A client to X longtermat endpoint blah” is really (under the hood) an array of strings where “A client” is actually the crypto key so named.  There are two colors because the connection is naming both its own identifying crypto key and the server’s crypto key it is trying to talk to.
  2. An example of the display of log entries.  Each logger gets its own column to display its entries in.  The header shows the name of the logger and is colored based on that logger’s family.  The colors were not shown in the logger hierarchy because I think it would end up too busy.  Each entry is timestamped with the number of milliseconds since the start of the test.  The event names are arbitrarily in blue to help delineate them from other semantic classes.  For example, “handleMsg” is a call-type.  The “obj” bits with the dotted stuff under it means something we can click on to see more of.  The events being shown are part of a CurveCP-style connection establishment.
  3. Similar to the previous screenshot, but here you can see named thing resolution in play with arguments that are strings.
  4. And if we click on one of those “obj” bits, we get a nice nested table display of the object.  As you can see from the pretty colors, named thing resolution is also in play.  You can also see crypto keys I did not name and which accordingly look like gibberish.  It is probably worth noting that some developer participation is required to make sure to put toJSON() implementations on all the complex objects that are exposed to the logger to make sure we don’t try and serialize huge swathes of object graph.  While this is a “don’t break the system” requirement, it also makes it easy to expose the useful bits of information for debugging.

If you would like to see the actual log display UI for yourself on the log from the screenshots (and can stomach it fetching 2+ MiB of compressed JSON), you can see it at https://clicky.visophyte.org/examples/arbpl-loggest/20110712/.  While the logs normally live on the ArbitraryPushlog (ArbPL) server, links to it are currently not stable because its all-in-one hbase datastore keeps self-destructing.  I baked ArbPL into a standalone form that lives at that URL and so should ideally not break so much.  Fingers-crossed.

arbitrarypushlog now more realtimey; likes wide screens, humans

Arbitrary Pushlog (ArbPL) recap:

  • It shows revision control pushes and the tinderbox build results for specific build trees like tinderboxpushlog (TBPL).
  • It is backed by a server which has a database so your browser is not going to launch a DOS attack on the mozilla infrastructure.  On the other hand, if that server falls over…
  • It figures out what failed by parsing the failed test logs.  It passes the time savings of this onto you in the form of listing the failures.  It also secretly knows things about failures that it does not tell you.
  • For Thunderbird mozmill test runs that experience failures it scrapes the fancy logging goodness out of the logs and exposes them in a pretty UI.  See the teaser post for screenshots.  You can also check to see if we’ve fixed all the Thunderbird mozmill oranges or not and maybe see a fresh new log!
  • It’s not capable of serving as a (full) replacement for TBPL at this time.  It does not show tree status (although that is easily fixed and should be fixed sometime soon), and it currently cannot star builds.

What’s new:

  • Widescreen display friendly.  It previously assumed you were me and ran your screen in portrait mode because that is the way you can see the most code; now it spills to two columns if your window is at least 1400px wide.  (See top screenshot.)
  • TraceMonkey’s mobile build matrix is split out from its desktop build matrix because the column sets did not meaningfully overlap.  No attempt has been made to deal with the other unique aspects of the TraceMonkey tree, although at least all known TraceMonkey builds should be categorized.  (If the builds that were triggered off another repo were split out into their own tinderbox tree so they could use the same mechanism Thunderbird and other comm-central complex repos use, no special work would be required.  It sounds like the tupled repo is going away soon, so it likely should become a moot issue.)

  • It now does fancy hierarchical grouping of the failures.  The builder names’ backgrounds are in gray because the failures have been starred.  The builder names would be in orange if the failing builds had not been starred.

  • When people merge things or otherwise push more than 8 changesets, we don’t show you everything they pushed, just the top 3 and provide a link to the full log elsewhere.

  • I made an effort to help explain what is going on with builds in non-secret code.  You click on a build, you get this lovely box.  Unfortunately I apparently also forgot to expose the secret code in the popup like I was planning.

And last, but not least…!

  • ArbPL now uses socket.io magic to provide realtime-ish updates.  Your browser creates a websocket/weird flash object bridge/XHR long-poll/etc. in order to be able to get the build information that’s hot off the press.  The “ish” part of realtime-ish comes from the bit where the presses are a cron job that runs every 3 minutes.  I think I have appropriate abort/retry logic in place now to contend with the half-written JSON and hanging HTTP requests, but I’m still sticking with the safety of a cron job with a built-in death clock until the logs agree with me.

Important caveats:

  • If the server restarts, the client is smart enough to reconnect, but not smart enough to re-establish its subscriptions with the server.  Refreshing the page or clicking the ArbPL logo in the upper left to go back to the list of trees and then picking a tree will resolve this issue.  The server only restarts when I push to production, as it were, so there is some other problem if you find yourself constantly needing to refresh.
  • Builders in the failure list look like they want to do something when you click on them, but this is a lie for all non-Thunderbird mozmill builders at the current time; those are the secrets ArbPL is keeping from you.

You can use ArbPL for yourself at arbpl.visophyte.org.  You can make your own using the source on github.

As noted by mcote in his post on autolog, the automation team is working on something in this domain, autolog, and I’m hoping to work with them to collaborate / pass-the-torch / or port Thunderbird’s mozmill logging to their solution so Thunderbird can have its fancy logging cake but not have to bake it too.

If you would like to discuss ArbPL/autolog/etc., I think the mozilla.tools group is supposed to be the right place for that.  One caveat is that I can’t seem to find the mailing list proper to sign up, just the newsgroup and google group…?  (Neither https://lists.mozilla.org/listinfo or https://mail.mozilla.org/listinfo/ seem to know about it.)

build matrices for arbitrarypushlog

A TraceMonkey Push

As I mentioned in my teaser on rich thunderbird mozmill logs, in order to get the build logs and provide failure clustering you already have to do most of the stuff tinderboxpushlog does.  One of the key things is summarizing the builds in a way that is useful, where the definition of “useful” probably varies a lot between users.

While Thunderbird has an extremely boring set of build types (build, xpcshell, mozmill), my first take on summarizing them was no good.  While fixing it, I decided to feature creep (out of my hobby time allocation) and see if I could create a presentation that could handle the prolific Firefox and TraceMonkey trees.

A Firefox Push

While I am not going to claim it’s perfect, I like it.  It’s probably also important to caveat that not all Tracemonkey builds are categorized.  The mobile talos runs identify themselves by a completely different set of names from the desktop ones, and there’s not really room for columns for that.  Additionally, some builds cite a revision for “mobile-browser”, but we ignore that extra meta-data.  Although the design was intended to handle Thunderbird’s repository where each build is a tuple of “comm-central” and “mozilla-central” revision used, we really need to have that tuple for every build in the tinderbox tree, and TraceMonkey is not providing it.  (We could kick builds without the info to the “outer” push as an enhancement.)

A Thunderbird (well, comm-central) push.

As a gesture of friendship to non-Thunderbird trees, we now also process mochitest and reftest logs, although I’m failing to surface some of the details retrieved in the UI.

Anywho, you can see arbpl in action for yourself at arbpl.visophyte.org.  It cron scrapes every 5 minutes.  The error recovery logic is not production-grade yet; the scraper can fall victim to partially written tinderbox JSON files on the tinderbox server, which means that some trees might not see updates for longer than that.  And various other things may go wrong too.  The client does not auto-refresh or use Socket.IO or anything.  If you want to run your own, hit github.  If you want to read the source, you should probably hit github too, because the production serving mode is reasonably optimized and crams all the JS into a single (gzipped) file.

teaser: Rich contextual information for Thunderbird mozmill failures

Sometimes Thunderbird mozmill unit tests fail.  When they do, it’s frequently a mystery.  My logsploder work helped reduce the mystery for my local mozmill test runs, but did nothing for tinderbox runs or developers without tool fever.  Thanks to recent renewed activity on the Thunderbird front-end, this has become more of a problem and so it was time for another round of amortized tool building towards a platonic ideal.

The exciting pipeline that leads to screenshots like you see in this post:

  • Thunderbird Mozmill tests run with the testing framework set to log semi-rich object representations to in-memory per-test buckets.
  • In the event of a test failure, the in-Thunderbird test framework:
    • Gathers information about the state of the system now that we know there is a failure and emit it.  This takes the form of canvas-based screenshots (using chrome privileges) of all open windows in the application, plus the currently focused element in the window.
    • Emits (up to) the last 10 seconds of log events from the previous test.
    • Emits all of the log events from the current test.
  • The python test driver receives the emitted data and dumps it to stdout in a series of JSON blobs that are surrounded by magical annotations.
  • A node.js daemon doing the database-based tinderboxpushlog thing (like my previous Jetpack/CouchDB work that found CouchDB to not be a great thing to directly expose to users and died, but now with node and hbase) processes the tinderbox logs for the delicious JSON blobs.
    • It also processes xpcshell runs and creates an MD5 hash that attempts to characterize the specific fingerprint of the run.  Namely, xpcshell emits lines prefixed with “TEST-” that have a regular form to describe when the pending test count changes or when specific comparison operations or failures occur.  It is assumed that tests will not comparison check values that are effectively random values, that the comparisons will tend to be deterministic (even if there are random orderings in the test) or notable when not deterministic, and thus that the trace derived from this filtering and hashing will be sufficiently consistent that similar failures should result in identical hashes.
    • Nothing is done with mochitests because: Thunderbird does not have them, they don’t appear to emit any context about their failures, and as far as I can tell, frequently the source of a mochitest failure is the fault of an earlier test that claimed it had completed but still had some kind of ripples happening that broke a later test.
  • A wmsy-based UI does the UI thing.

    The particular failure shown here is an interesting one where the exception is telling us that a popup we expected to open never opened.  But when we look at the events from the log, we can see that what happened is the popup opened and then immediately closed itself.  Given that this happened (locally) on linux, this made me suspect that the window manager didn’t let the popup acquire focus and perform a capture.  It turns out that I forgot to install the xfwm4 window manager on my new machine which my xvnc session’s xstartup script was trying to run in order to get a window manager that plays nicely with mozmill and our focus needs.  (Many window managers have configurable focus protection that converts a window trying to grab focus into an attention-requested notification.)

    This is a teaser because it will probably be a few more days before the required patch lands on comm-central, I use RequireJS‘ fancy new optimizer to make the client load process more efficient, and I am happy that the server daemons can keep going for long stretches of time without messing up.  The server and client code is available on github, and the comm-central patch from hg.