{"id":586,"date":"2010-10-12T10:52:38","date_gmt":"2010-10-12T18:52:38","guid":{"rendered":"http:\/\/www.visophyte.org\/blog\/?p=586"},"modified":"2010-10-12T12:22:44","modified_gmt":"2010-10-12T20:22:44","slug":"fighting-non-deterministic-xpcshell-unit-tests-through-causality-tracking-with-systemtap-step-1","status":"publish","type":"post","link":"https:\/\/www.visophyte.org\/blog\/2010\/10\/12\/fighting-non-deterministic-xpcshell-unit-tests-through-causality-tracking-with-systemtap-step-1\/","title":{"rendered":"fighting non-deterministic xpcshell unit tests through causality tracking with systemtap; step 1"},"content":{"rendered":"<p><a href=\"http:\/\/www.visophyte.org\/blog\/wp-content\/uploads\/2010\/10\/chewtap-1432-nothing-selected.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-thumbnail wp-image-592\" title=\"chewtap-1432-nothing-selected\" src=\"http:\/\/www.visophyte.org\/blog\/wp-content\/uploads\/2010\/10\/chewtap-1432-nothing-selected-600x600.png\" alt=\"\" width=\"600\" height=\"600\" srcset=\"https:\/\/www.visophyte.org\/blog\/wp-content\/uploads\/2010\/10\/chewtap-1432-nothing-selected-600x600.png 600w, https:\/\/www.visophyte.org\/blog\/wp-content\/uploads\/2010\/10\/chewtap-1432-nothing-selected-300x300.png 300w, https:\/\/www.visophyte.org\/blog\/wp-content\/uploads\/2010\/10\/chewtap-1432-nothing-selected-1024x1024.png 1024w, https:\/\/www.visophyte.org\/blog\/wp-content\/uploads\/2010\/10\/chewtap-1432-nothing-selected.png 1032w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/a><\/p>\n<p>It&#8217;s a story as old as time itself.\u00a0 You write a unit test.\u00a0 It works for you.\u00a0 But the evil spirits in the tinderboxes cause the test to fail.\u00a0 The trick is knowing that evil spirits are a lot like genies.\u00a0 When you ask for a wish, genies will try and screw you over by choosing ridiculous values wherever you forgot to constrain things.\u00a0 For example, I got my dream job, but now I live in Canada.\u00a0 The tinderbox evil spirits do the same thing, but usually by causing pathological thread scheduling.<\/p>\n<p>This happened to me the other day and a number of our other tests are running slower than they should, so I decided it was time for another round of incremental amortized tool-building.\u00a0 Building on my <a href=\"http:\/\/www.visophyte.org\/blog\/tag\/systemtap\/\">previous systemtap applied to mozilla adventures<\/a> by porting things to the new JS representations in mozilla-2.0 and adding some new trace events I can now generate traces that:<\/p>\n<ul>\n<li>Know when the event loop is processing an event.\u00a0 Because we reconstruct a nested tree from our trace information and we have a number of other probes, we can also attribute the event to higher-level concepts like timer callbacks.<\/li>\n<li>Know when a new runnable is scheduled for execution on a thread or a new timer firing is scheduled, etc.\u00a0 To help understand why this happened we emit a JS backtrace at that point.\u00a0 (We could also get a native backtrace cheaply, or even a unified backtrace with some legwork.)<\/li>\n<li>Know when high-level events occur during the execution of the unit test.\u00a0 We hook the dump() implementations in the relevant contexts (xpcshell, JS components\/modules, sandboxes) and then we can listen in on all the juicy secrets the test framework shouts into the wind.\u00a0 What is notable about this choice of probe point is that it:\n<ul>\n<li>is low frequency, at least if you are reasonably sane about your output.<\/li>\n<li>provides a useful correlation between what it is going on under the hood with something that makes sense to the developer.<\/li>\n<li>does not cause the JS engine to need to avoid tracing or start logging everything that ever happens.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>Because we know when the runnables are created (including what runnable they live inside of) and when they run, we are able to build what I call a causality graph because it sounds cool.\u00a0 Right now this takes the form of a hierarchical graph.\u00a0 Branches form when a runnable (or the top-level) schedules more than one runnable during the execution of a single runnable.\u00a0 The dream is to semi-automatically (heuristics \/ human annotations may be required) transform the hierarchical graph into one that merges these branches back into a single branch when appropriate.\u00a0 Collapsing linear runs into a single graph node is also desired, but easy.\u00a0 Such fanciness may not actually be required to fix test non-determinism, of course.<\/p>\n<p>The <a href=\"http:\/\/vis.stanford.edu\/protovis\/\">protovis-based<\/a> visualization above has the following exciting bullet points describing it:<\/p>\n<ul>\n<li>Time flows vertically downwards.\u00a0 Time in this case is defined by a globally sequential counter incremented for each trace event.\u00a0 Time used to be the number of nanoseconds since the start of the run, but there seemed to somehow be clock skew between my various processor cores that live in a single chip.<\/li>\n<li>Horizontally, threads are spaced out and within those threads the type of events are spaced out.\n<ul>\n<li>The weird nested alpha grey levels convey nesting of JS_ExecuteScript calls which indicates both xpcshell file loads and JS component\/module top-level executions as a result of initial import.\u00a0 If there is enough vertical space, labels are shown, otherwise they are collapsed.<\/li>\n<\/ul>\n<\/li>\n<li>Those sweet horizontal bands convey the various phases of operation and have helpful labels.<\/li>\n<li>The nodes are either events caused by top-level runnable being executed by the event loop or important events that merit the creation of synthetic nodes in the causal graph.\u00a0 For example, we promote the execution of a JS file to its own link so we can more clearly see when a file caused something to happen.\u00a0 Likewise, we generate new links when analysis of dump() output tells us a test started or stopped.<\/li>\n<li>The blue edges are expressing the primary causal chain as determined by the dump() analysis logic.\u00a0 If you are telling us a test started\/ended, it only follows that you are on the primary causal chain.<\/li>\n<li>If you were viewing it in a web browser, you could click on the nodes and it would console.log them and then you could see what is actually happening in there.\u00a0 If you hovered over nodes they would also highlight their ancestors and descendents in various loud shades of red.<\/li>\n<\/ul>\n<p><a href=\"http:\/\/www.visophyte.org\/blog\/wp-content\/uploads\/2010\/10\/chewtap-1432-init-chain-selected.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-thumbnail wp-image-593\" title=\"chewtap-1432-init-chain-selected\" src=\"http:\/\/www.visophyte.org\/blog\/wp-content\/uploads\/2010\/10\/chewtap-1432-init-chain-selected-598x600.png\" alt=\"\" width=\"598\" height=\"600\" srcset=\"https:\/\/www.visophyte.org\/blog\/wp-content\/uploads\/2010\/10\/chewtap-1432-init-chain-selected-598x600.png 598w, https:\/\/www.visophyte.org\/blog\/wp-content\/uploads\/2010\/10\/chewtap-1432-init-chain-selected-300x300.png 300w, https:\/\/www.visophyte.org\/blog\/wp-content\/uploads\/2010\/10\/chewtap-1432-init-chain-selected-1021x1024.png 1021w, https:\/\/www.visophyte.org\/blog\/wp-content\/uploads\/2010\/10\/chewtap-1432-init-chain-selected.png 1032w\" sizes=\"auto, (max-width: 598px) 100vw, 598px\" \/><\/a><\/p>\n<ul>\n<li>The specific execution being visualized had a lot of asynchronous mozStorage stuff going on.\u00a0 (The right busy thread is the asynchronous thread for the database.)\u00a0 The second image has the main init chain hovered, resulting in all the async fallout from the initialization process being highlighted in red.\u00a0 At first glance it&#8217;s rather concerning that the initialization process is still going on inside the first real test.\u00a0 Thanks to the strict ordering of event queues, we can see that everything that happens on the primary causal chain after it hops over to the async thread and back is safe because of that ordering.\u00a0 The question is whether bad things could happen prior to that joining.\u00a0 The answer?\u00a0 In another blog post, I fear.\n<ul>\n<li>Or &#8220;probably&#8221;, given that the test has a known potential intermittent failure.\u00a0 (The test is in dangerous waters because it is re-triggering synchronous database code that normally is only invoked during the startup phase before references are available to the code and before any asynchronous dispatches occur.\u00a0 All the other tests in the directory are able to play by the rules and so all of their logic should be well-ordered, although we might still expect the initialization logic to likewise complete in what is technically a test phase.\u00a0 Since the goal is just to make tests deterministic (as long as we do not sacrifice realism), the simplest solution may just be to force the init phase to complete before allowing the tests to start.\u00a0 The gloda support logic already has an unused hook capable of accomplishing this that I may have forgotten to hook up&#8230;<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>Repo is my usual <a href=\"http:\/\/hg.mozilla.org\/users\/bugmail_asutherland.org\/tb-test-help\/\">systemtapping repo<\/a>.\u00a0 Things you might type if you wanted to use the tools follow:<\/p>\n<ul>\n<li>make SOLO_FILE=test_name.js EXTRA_TEST_ARGS=&#8221;&#8211;debugger \/path\/to\/tb-test-help\/systemtap\/chewchewwoowoo.py &#8211;debugger-args &#8216;\/path\/to\/tb-test-help\/systemtap\/mozperfish\/mozperfish.stp &#8211;&#8216;&#8221; check-one<\/li>\n<li>python \/path\/to\/tb-test-help\/chewchewwoowoo.py &#8211;re-run=\/tmp\/chewtap-##### mozperfish\/mozperfish.stp \/path\/to\/mozilla\/dist\/bin\/xpcshell<\/li>\n<\/ul>\n<p>The two major gotchas to be aware of are that you need to: a) make your xpcshell build with jemalloc since I left some jemalloc specific memory probes in there, and b) you may need to run the first command several times because systemtap has some seriously non-deterministic dwarf logic going on right now where it claims that it doesn&#8217;t believe that certain types are actually unions\/structs\/whatnot.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>It&#8217;s a story as old as time itself.\u00a0 You write a unit test.\u00a0 It works for you.\u00a0 But the evil spirits in the tinderboxes cause the test to fail.\u00a0 The trick is knowing that evil spirits are a lot like &hellip; <a href=\"https:\/\/www.visophyte.org\/blog\/2010\/10\/12\/fighting-non-deterministic-xpcshell-unit-tests-through-causality-tracking-with-systemtap-step-1\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":""},"categories":[7,8,4],"tags":[56,76],"class_list":["post-586","post","type-post","status-publish","format-standard","hentry","category-debugging","category-program-execution","category-visualizing","tag-protovis","tag-systemtap"],"_links":{"self":[{"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/posts\/586","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/comments?post=586"}],"version-history":[{"count":9,"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/posts\/586\/revisions"}],"predecessor-version":[{"id":597,"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/posts\/586\/revisions\/597"}],"wp:attachment":[{"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/media?parent=586"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/categories?post=586"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/tags?post=586"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}