{"id":775,"date":"2011-07-12T07:08:41","date_gmt":"2011-07-12T14:08:41","guid":{"rendered":"http:\/\/www.visophyte.org\/blog\/?p=775"},"modified":"2011-07-12T07:08:41","modified_gmt":"2011-07-12T14:08:41","slug":"new-adventures-in-rich-execution-logs-for-debugging-and-program-understanding","status":"publish","type":"post","link":"https:\/\/www.visophyte.org\/blog\/2011\/07\/12\/new-adventures-in-rich-execution-logs-for-debugging-and-program-understanding\/","title":{"rendered":"new adventures in rich (execution) logs for debugging and program understanding"},"content":{"rendered":"<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-776\" src=\"http:\/\/www.visophyte.org\/blog\/wp-content\/uploads\/2011\/07\/logger-hier-spgm.png\" alt=\"\" width=\"512\" height=\"358\" srcset=\"https:\/\/www.visophyte.org\/blog\/wp-content\/uploads\/2011\/07\/logger-hier-spgm.png 512w, https:\/\/www.visophyte.org\/blog\/wp-content\/uploads\/2011\/07\/logger-hier-spgm-300x209.png 300w\" sizes=\"auto, (max-width: 512px) 100vw, 512px\" \/><\/p>\n<p>Understanding what is going on inside software can be very hard, even for the developers most familiar with the software. \u00a0During my time working on Thunderbird I used a variety of techniques to try and peer inside: printf\/dump\/console.log, debuggers, execution analysis (<a href=\"http:\/\/www.visophyte.org\/blog\/2008\/04\/08\/pecobro-the-performance-code-browser-early-stage\/\">dtrace<\/a>, <a href=\"http:\/\/www.visophyte.org\/blog\/2008\/09\/17\/chroniquery-does-useful-things\/\">chronicle recorder<\/a>, <a href=\"http:\/\/www.visophyte.org\/blog\/2009\/04\/28\/understanding-where-unit-tests-go-wrong-with-object-diffs\/\">with object diffs<\/a>, <a href=\"http:\/\/www.visophyte.org\/blog\/2009\/04\/29\/displaying-execution-traces-on-a-simile-timeline\/\">on timelines<\/a>), logging (<a href=\"http:\/\/www.visophyte.org\/blog\/2008\/10\/20\/log4moz-and-chainsaw-via-new-xmlformattersocketappender\/\">log4j style<\/a>, <a href=\"http:\/\/www.visophyte.org\/blog\/2008\/12\/14\/logsploder-logsploding-its-way-to-your-logs-soon-also-logsplosion\/\">with \u00a0timelines<\/a>, <a href=\"http:\/\/www.visophyte.org\/blog\/2011\/01\/14\/logsploder-circa-a-year-ago\/\">with rich data, extra instrumentation and custom presentations<\/a>, <a href=\"http:\/\/www.visophyte.org\/blog\/2011\/03\/02\/teaser-rich-contextual-information-for-thunderbird-mozmill-failures\/\">prettier and hooked up to dump on test failures<\/a>), <a href=\"http:\/\/www.visophyte.org\/blog\/2009\/05\/16\/better-error-reporting-for-the-mozilla-platform\/\">improving platform error reporting<\/a>, <a href=\"http:\/\/www.visophyte.org\/blog\/2011\/03\/30\/unified-jsc-backtraces-colorized-backtraces-colorized-source-listing-for-gdb-7-3archer-trunk-updated\/\">gdb extensions<\/a>, control-flow analysis of SQL queries (<a href=\"http:\/\/www.visophyte.org\/blog\/2009\/02\/04\/visualization-of-control-flowdata-flow-analysis-of-sqlite-explain-ations\/\">vanilla<\/a>, <a href=\"http:\/\/www.visophyte.org\/blog\/2010\/04\/06\/performance-annotated-sqlite-explaination-visualizations-using-systemtap\/\">augmented with systemtap perf probes<\/a>), performance analysis (<a href=\"http:\/\/www.visophyte.org\/blog\/2009\/08\/03\/using-vmware-recordreplay-and-vprobes-for-low-time-distortion-performance-profiling\/\">VProbes with custom UI<\/a>, <a href=\"http:\/\/www.visophyte.org\/blog\/2010\/03\/19\/more-systemtap-mozilla-event-loop-awareness\/\">systemtap<\/a>, <a href=\"http:\/\/www.visophyte.org\/blog\/2010\/04\/28\/work-in-progress-tooling-feeding-gwt-speedtracer-systemtapped-mozilla-performance-data\/\">crammed into the SpeedTracer UI<\/a>, <a href=\"http:\/\/www.visophyte.org\/blog\/2010\/10\/17\/fighting-oranges-with-systemtap-probes-latency-fighting-step-2\/\">custom UI with the async work<\/a>), chewing existing log facilities&#8217; output (<a href=\"https:\/\/github.com\/asutherland\/tb-test-help\/blob\/master\/nspr_imap_log_parser.py\">TB IMAP<\/a>,\u00a0<a href=\"http:\/\/www.visophyte.org\/blog\/2010\/12\/04\/understanding-where-layout-goes-wrong-with-gecko-reflow-debug-logs-part-2\/\">gecko layout<\/a>), and asynchronous operation causality reconstruction (<a href=\"http:\/\/www.visophyte.org\/blog\/2010\/10\/12\/fighting-non-deterministic-xpcshell-unit-tests-through-causality-tracking-with-systemtap-step-1\/\">systemtap<\/a>, <a href=\"http:\/\/www.visophyte.org\/blog\/2011\/01\/05\/visualizing-asynchronous-javascript-promises-q-style-promisesb\/\">JS promises<\/a>).<\/p>\n<p>Logging with rich, structured data easily provided the most bang-for-the-buck because it:<\/p>\n<ol>\n<li>Provided the benefit of baked-in human meaning with some degree of semantic hierarchy.<\/li>\n<li>Was less likely to make wrong assumptions about what data was relevant. \u00a0How many times have you had to go back to change what a printf is logging?<\/li>\n<li>Once sufficiently automated, required no activation energy, no extra steps, and everyone can see the results.<\/li>\n<\/ol>\n<div>However, it still has numerous downsides\/gotchas:<\/div>\n<div>\n<ol>\n<li>Potential performance impact, especially with rich, structured data.<\/li>\n<li>If people don&#8217;t write log statements, you don&#8217;t get log entries.<\/li>\n<li>If an error propagates and no one catches it or otherwise says anything about it, your log trace stops dead.<\/li>\n<li>Lack of context can hide causation and leave you filtering through tons of entries trying to reconstruct what happened from shadows on the cave wall.<\/li>\n<\/ol>\n<\/div>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-777\" title=\"rawclient-a-signup-x-spgm\" src=\"http:\/\/www.visophyte.org\/blog\/wp-content\/uploads\/2011\/07\/rawclient-a-signup-x-spgm.png\" alt=\"\" width=\"660\" height=\"238\" srcset=\"https:\/\/www.visophyte.org\/blog\/wp-content\/uploads\/2011\/07\/rawclient-a-signup-x-spgm.png 660w, https:\/\/www.visophyte.org\/blog\/wp-content\/uploads\/2011\/07\/rawclient-a-signup-x-spgm-600x216.png 600w, https:\/\/www.visophyte.org\/blog\/wp-content\/uploads\/2011\/07\/rawclient-a-signup-x-spgm-300x108.png 300w\" sizes=\"auto, (max-width: 660px) 100vw, 660px\" \/><\/p>\n<p>&nbsp;<\/p>\n<div>As a result, when I recently started on a new project (implemented in JS), I tried to make sure to bake logging into the system from the ground up:<\/div>\n<ul>\n<li>The core classes, abstractions, and APIs generate log entries automatically so developers don&#8217;t need to fill their code with boilerplate.<\/li>\n<li>Loggers are built around object ownership hierarchies\/life-cycles to provide context and the ability to filter. \u00a0This is in contrast to log4j style logging which is usually organized around code module hierarchies, noting that log4j does provide nested diagnostic contexts.<\/li>\n<li>The test framework is designed around writing tests in terms of expectations around the loggers. \u00a0This helps ensure interesting things get logged. \u00a0It also improves the quality of the tests by making it easier to ensure the tests are really doing what you think they are doing.<\/li>\n<li>Logger definitions explicitly name the log events they will generate and their semantic type, some of which have special handling. \u00a0The currently supported types are: state changes, latched states, events, asynchronous jobs (with separate begin and end entries), calls (which wrap a function call, catching exceptions), and errors. \u00a0This allows specialized processing and better automated analysis without having to try and reverse engineer the meaning using regular expressions.<\/li>\n<li>Performance is addressed by instantiating different logger classes based on needs. \u00a0For modules not under test (or without logging desired), everything turns into no-ops except for events and errors which are counted for reporting to a time-series database for system health\/performance\/etc analysis. \u00a0The decision making process happens at runtime and is able to see the parent logger, so heavy-weight logging can be used on a statistical sampling basis or only for specific users experiencing problems\/etc.<\/li>\n<li>Loggers can give themselves complex semantic names that can be used to reconstruct relationships between loggers when the ownership hierarchy is not available or not appropriate. \u00a0For example, we can link both sides of the connection between a client and a server by having the loggers make sure to name themselves and the other side.<\/li>\n<li>Simple wrapper helpers exist that make it easy to wrap a function so that a custom log entry is generated and it &#8220;holds&#8221; the call in limbo from whence it can later be &#8220;released&#8221;. \u00a0This allows unit tests to break complicated behaviour into discrete steps that humans can understand. \u00a0Far better to look at one thing at a time than eight things all jumbled together (or filtered back down to one, potentially hiding important details).<\/li>\n<\/ul>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-778\" title=\"testclient-c-add-contact-a-spgm\" src=\"http:\/\/www.visophyte.org\/blog\/wp-content\/uploads\/2011\/07\/testclient-c-add-contact-a-spgm.png\" alt=\"\" width=\"745\" height=\"240\" srcset=\"https:\/\/www.visophyte.org\/blog\/wp-content\/uploads\/2011\/07\/testclient-c-add-contact-a-spgm.png 745w, https:\/\/www.visophyte.org\/blog\/wp-content\/uploads\/2011\/07\/testclient-c-add-contact-a-spgm-600x193.png 600w, https:\/\/www.visophyte.org\/blog\/wp-content\/uploads\/2011\/07\/testclient-c-add-contact-a-spgm-300x96.png 300w\" sizes=\"auto, (max-width: 745px) 100vw, 745px\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>In any event, as one might surmise from the screenshots, this is more than a dream, it&#8217;s a pastel colored reality.<\/p>\n<p>What are the screenshots showing?<\/p>\n<ol>\n<li>The logger hierarchy. \u00a0The colored bits are &#8220;named things&#8221;. \u00a0The test framework has the concept of <em>things<\/em>, <em>actors<\/em>, and <em>loggers<\/em>. \u00a0Each actor corresponds to exactly one logger and is the object on which tests specify their expectations. \u00a0<em>Actors<\/em> can be owned by other actors, resulting in a hierarchy we call a family. \u00a0Each family gets tagged with a distinct identifier that allows us to associate a color with them. \u00a0<em>Things<\/em>\u00a0provide a human name to a (hopefully) unique string. \u00a0<em>Things<\/em>\u00a0can be owned by <em>actors<\/em>\u00a0and get tagged with the family name and so can be colorized. \u00a0In the logger hierarchy, the stuff to the right of the colon is the semantic name of the logger. \u00a0So &#8220;clientConn:\u00a0<em>A client<\/em> to <em>X longterm<\/em>at endpoint blah&#8221; is really (under the hood) an array of strings where &#8220;A client&#8221; is actually the crypto key so named. \u00a0There are two colors because the connection is naming both its own identifying crypto key and the server&#8217;s crypto key it is trying to talk to.<\/li>\n<li>An example of the display of log entries. \u00a0Each logger gets its own column to display its entries in. \u00a0The header shows the name of the logger and is colored based on that logger&#8217;s family. \u00a0The colors were not shown in the logger hierarchy because I think it would end up too busy. \u00a0Each entry is timestamped with the number of milliseconds since the start of the test. \u00a0The event names are arbitrarily in blue to help delineate them from other semantic classes. \u00a0For example, &#8220;handleMsg&#8221; is a call-type. \u00a0The &#8220;obj&#8221; bits with the dotted stuff under it means something we can click on to see more of. \u00a0The events being shown are part of a <a href=\"http:\/\/curvecp.org\/\">CurveCP<\/a>-style connection establishment.<\/li>\n<li>Similar to the previous screenshot, but here you can see <em>named thing<\/em>\u00a0resolution in play with arguments that are strings.<\/li>\n<li>And if we click on one of those &#8220;obj&#8221; bits, we get a nice nested table display of the object. \u00a0As you can see from the pretty colors, named thing resolution is also in play. \u00a0You can also see crypto keys I did not name and which accordingly look like gibberish. \u00a0It is probably worth noting that some developer participation is required to make sure to put toJSON() implementations on all the complex objects that are exposed to the logger to make sure we don&#8217;t try and serialize huge swathes of object graph. \u00a0While this is a &#8220;don&#8217;t break the system&#8221; requirement, it also makes it easy to expose the useful bits of information for debugging.<\/li>\n<\/ol>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-779\" title=\"popup-send-to-all-spgm\" src=\"http:\/\/www.visophyte.org\/blog\/wp-content\/uploads\/2011\/07\/popup-send-to-all-spgm.png\" alt=\"\" width=\"515\" height=\"263\" srcset=\"https:\/\/www.visophyte.org\/blog\/wp-content\/uploads\/2011\/07\/popup-send-to-all-spgm.png 515w, https:\/\/www.visophyte.org\/blog\/wp-content\/uploads\/2011\/07\/popup-send-to-all-spgm-300x153.png 300w\" sizes=\"auto, (max-width: 515px) 100vw, 515px\" \/><\/p>\n<p>If you would like to see the actual log display UI for yourself on the log from the screenshots (and can stomach it fetching 2+ MiB of compressed JSON), you can see it at\u00a0<a href=\"https:\/\/clicky.visophyte.org\/examples\/arbpl-loggest\/20110712\/\">https:\/\/clicky.visophyte.org\/examples\/arbpl-loggest\/20110712\/<\/a>. \u00a0While the logs normally live on the ArbitraryPushlog (<a href=\"http:\/\/www.visophyte.org\/blog\/2011\/03\/22\/arbitrarypushlog-now-more-realtimey-likes-wide-screens-humans\/\">ArbPL<\/a>) server, links to it are currently not stable because its all-in-one hbase datastore keeps self-destructing. \u00a0I baked ArbPL into a standalone form that lives at that URL and so should ideally not break so much. \u00a0Fingers-crossed.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Understanding what is going on inside software can be very hard, even for the developers most familiar with the software. \u00a0During my time working on Thunderbird I used a variety of techniques to try and peer inside: printf\/dump\/console.log, debuggers, execution &hellip; <a href=\"https:\/\/www.visophyte.org\/blog\/2011\/07\/12\/new-adventures-in-rich-execution-logs-for-debugging-and-program-understanding\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":""},"categories":[7,8],"tags":[106,108],"class_list":["post-775","post","type-post","status-publish","format-standard","hentry","category-debugging","category-program-execution","tag-arbpl","tag-loggest"],"_links":{"self":[{"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/posts\/775","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/comments?post=775"}],"version-history":[{"count":16,"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/posts\/775\/revisions"}],"predecessor-version":[{"id":796,"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/posts\/775\/revisions\/796"}],"wp:attachment":[{"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/media?parent=775"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/categories?post=775"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/tags?post=775"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}