{"id":363,"date":"2009-08-03T20:54:59","date_gmt":"2009-08-04T04:54:59","guid":{"rendered":"http:\/\/www.visophyte.org\/blog\/?p=363"},"modified":"2009-08-03T20:54:59","modified_gmt":"2009-08-04T04:54:59","slug":"using-vmware-recordreplay-and-vprobes-for-low-time-distortion-performance-profiling","status":"publish","type":"post","link":"https:\/\/www.visophyte.org\/blog\/2009\/08\/03\/using-vmware-recordreplay-and-vprobes-for-low-time-distortion-performance-profiling\/","title":{"rendered":"Using VMWare Record\/Replay and VProbes for low time-distortion performance profiling"},"content":{"rendered":"<p><a href=\"http:\/\/www.visophyte.org\/blog\/wp-content\/uploads\/2009\/08\/profile-performance-graph-enumerateProps.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-thumbnail wp-image-365\" title=\"profile-performance-graph-enumerateProps\" src=\"http:\/\/www.visophyte.org\/blog\/wp-content\/uploads\/2009\/08\/profile-performance-graph-enumerateProps-600x302.png\" alt=\"profile-performance-graph-enumerateProps\" width=\"600\" height=\"302\" srcset=\"https:\/\/www.visophyte.org\/blog\/wp-content\/uploads\/2009\/08\/profile-performance-graph-enumerateProps-600x302.png 600w, https:\/\/www.visophyte.org\/blog\/wp-content\/uploads\/2009\/08\/profile-performance-graph-enumerateProps-300x151.png 300w, https:\/\/www.visophyte.org\/blog\/wp-content\/uploads\/2009\/08\/profile-performance-graph-enumerateProps-1024x515.png 1024w, https:\/\/www.visophyte.org\/blog\/wp-content\/uploads\/2009\/08\/profile-performance-graph-enumerateProps.png 1060w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/a><\/p>\n<p>The greatest problem with performance profiling is getting as much information as possible while affecting the results as little as possible.\u00a0  For my work on <a href=\"http:\/\/www.visophyte.org\/blog\/tag\/pecobro\/\">pecobro<\/a> I used mozilla&#8217;s JavaScript DTrace probes.\u00a0 Because the probes are limited to notifications of all function invocations\/returns with no discretion and there is no support for JS backtraces, the impact on performance was heavy.\u00a0 Although I have never seriously entertained using <a href=\"http:\/\/code.google.com\/p\/chronicle-recorder\/\">chronicle-recorder<\/a> (via <a href=\"http:\/\/www.visophyte.org\/blog\/tag\/chroniquery\/\">chroniquery<\/a>) for performance investigations, it is a phenomenal tool and it would be fantastic if it were usable for this purpose.<\/p>\n<p>VMware introduced with Workstation <a href=\"http:\/\/blogs.vmware.com\/sherrod\/2007\/04\/the_amazing_vm_.html\">6<\/a>\/<a href=\"http:\/\/blogs.vmware.com\/vmtn\/2008\/08\/introducing-rep.html\">6.5<\/a> the ability to efficiently record VM execution by recording the non-deterministic parts of VM execution.\u00a0 When you hit the record button it takes a snapshot and then does its thing.\u00a0 For a 2 minute execution trace where Thunderbird is started up and gloda starts indexing and adaptively targets for 80% cpu usage, I have a 1G memory snapshot (the amount of memory allocated to the VM), a 57M vmlog file, and a 28M vmsn file.\u00a0 There is also and a 40M disk delta file (against the disk snapshot), but I presume that&#8217;s a side effect of the execution rather than a component of it.<\/p>\n<p>The record\/replay functionality is the key to being able to analyze performance while minimizing the distortion of the data-gathering mechanisms.\u00a0 There are apparently a lot of other solutions in the pipeline, many of them open source.\u00a0 VMware peeps <a href=\"http:\/\/sourceforge.net\/mailarchive\/forum.php?thread_name=BAY103-W4642373BA0DDCC8326A9A0AA420%40phx.gbl&amp;forum_name=valgrind-developers\">apparently also created a record\/replay-ish mechanism<\/a> for <a href=\"http:\/\/valgrind.org\/\">valgrind<\/a>, valgrind-rr, which <a href=\"http:\/\/weblogs.mozillazine.org\/roc\/\">roc<\/a> has thought about <a href=\"http:\/\/weblogs.mozillazine.org\/roc\/archives\/2008\/10\/interesting_dev.html\">leveraging for chronicle-recorder<\/a>.\u00a0 I have also heard of Xen solutions to the problem, but am not currently aware of any usable solutions today.\u00a0 And of course, there are many precursors to VMware&#8217;s work, but this blog post is not a literature survey.<\/p>\n<p>There are 3 ways to get data out of a VM under replay, only 2 of which are usable for my purposes.<\/p>\n<ol>\n<li>Use gdb\/the gdb remote target protocol.\u00a0 The VMware server opens up a port that you can attach to.\u00a0 The server has some built-in support to understand linux processes if you spoon feed it some critical offsets.\u00a0 Once you do that, &#8220;info threads&#8221; lists every process in the image as a thread which you can attach to.\u00a0 If you do the dance right, gdb provides perfect back-traces and you can set breakpoints and generally do your thing.\u00a0 You can even rewind execution if you want, but since that means restoring state at the last checkpoint and running execution forward until it reaches the right spot, it&#8217;s not cheap.\u00a0 In contrast, chronicle-recorder can run (process) time backwards, albeit at a steep initial cost.<\/li>\n<li>Use <a href=\"http:\/\/x86vmm.blogspot.com\/2007\/09\/presenting-vprobes.html\">VProbes<\/a>.\u00a0 Using a common analogy, dtrace is like a domesticated assassin black bear that comes from the factory understanding English and knowing how to get you a beer from the fridge as well as off your enemies.\u00a0 VProbes, in contrast, is a grizzly bear that speaks no English.\u00a0 Assuming you can convince it to go after your enemies, it will completely demolish them.\u00a0 And you can probably teach it to get you a beer too, it just takes a lot more effort.<\/li>\n<li>Use <a href=\"http:\/\/blogs.vmware.com\/vmtn\/2008\/06\/introducing-vas.html\">VAssert<\/a>.\u00a0 Just like asserts only happen in debug builds, VAsserts only happen during replay (but not during recording).\u00a0 Except for the requirement that you think ahead to VAssert-enable your code, it&#8217;s awesome because, like static dtrace probes, you can use your code that already understands your code rather than trying to wail on things from outside using gdb or the like.\u00a0 This one was not an option because it is Windows only as of WS 6.5.\u00a0 (And Windows was not an option because building mozilla in a VM is ever so slow, and, let&#8217;s face it, I&#8217;m a linux kind of guy.\u00a0 At least until someone buys me a solid gold house and a rocket car.)<\/li>\n<\/ol>\n<p><a href=\"http:\/\/www.visophyte.org\/blog\/wp-content\/uploads\/2009\/08\/profile-performance-graph-callbackDriver-doubleClicked.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-thumbnail wp-image-367\" title=\"profile-performance-graph-callbackDriver-doubleClicked\" src=\"http:\/\/www.visophyte.org\/blog\/wp-content\/uploads\/2009\/08\/profile-performance-graph-callbackDriver-doubleClicked-600x414.png\" alt=\"profile-performance-graph-callbackDriver-doubleClicked\" width=\"600\" height=\"414\" srcset=\"https:\/\/www.visophyte.org\/blog\/wp-content\/uploads\/2009\/08\/profile-performance-graph-callbackDriver-doubleClicked-600x414.png 600w, https:\/\/www.visophyte.org\/blog\/wp-content\/uploads\/2009\/08\/profile-performance-graph-callbackDriver-doubleClicked-300x207.png 300w, https:\/\/www.visophyte.org\/blog\/wp-content\/uploads\/2009\/08\/profile-performance-graph-callbackDriver-doubleClicked-1024x708.png 1024w, https:\/\/www.visophyte.org\/blog\/wp-content\/uploads\/2009\/08\/profile-performance-graph-callbackDriver-doubleClicked.png 1387w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/a><\/p>\n<p>My first step in this direction has been using a combination of #1 and #2 to get javascript backtraces using a timer-interval probe.\u00a0 The probe roughly does the following:<\/p>\n<ul>\n<li>Get a pointer to the current linux kernel task_struct:\n<ul>\n<li>Assume we are uniprocessor and retrieve the value of x86_hw_tss.sp0 from the TSS struct for the first processor.<\/li>\n<li>Now that we know the per-task kernel stack pointer, we can find a pointer to the task_struct at the base of the page.<\/li>\n<\/ul>\n<\/li>\n<li>Check if the name of our task is &#8220;thunderbird-bin&#8221; and bail if it is not.<\/li>\n<li>Pull the current timestamp from the linux kernel maintained <em>xtime<\/em>.\u00a0 Ideally we could use VProbe&#8217;s <em>getsystemtime<\/em> function, but it doesn&#8217;t seem to work and\/or is not well defined.\u00a0 Our goal is to have a reliable indicator of what the real time is at this stage in the execution, because with a rapidly polling probe our execution will obviously be slower than realtime.\u00a0 <em>xtime<\/em> is pretty good for this, but ticks at 10ms out of box (Ubuntu 9.04 i386 VM-targeted build), which is a rather limited granularity.\u00a0 Presumably we can increase its tick rate, but not without some additional (though probably acceptable) time distortion.<\/li>\n<li>Perform a JS stack dump:\n<ul>\n<li>Get XPConnect&#8217;s context for the thread.\n<ul>\n<li>Using information from gdb on where XPCPerThreadData::gTLSIndex is, load the tls slot.\u00a0 (We could also just directly retrieve the tls slot from gdb.)<\/li>\n<li>Get the NSPR thread private data for that TLS slot.\n<ul>\n<li>Using information from gdb on where pt_book is located, get the pthread_key for NSPR&#8217;s per-thread data.<\/li>\n<li>Using the current task_struct from earlier, get the value of the GS segment register by looking into tls0_base and un-scrambling it from its hardware-specific configuration.<\/li>\n<li>Use the pthread_key and GS to traverse the pthread structure and then the NSPR structure&#8230;<\/li>\n<\/ul>\n<\/li>\n<li>Find the last XPCJSContextInfo in the nsTArray in the XPCJSContextStack.<\/li>\n<\/ul>\n<\/li>\n<li>Pull the JSContext out, then get its JSStackFrame.<\/li>\n<li>Recursively walk the frames (no iteration), manually\/recursively (ugh) &#8220;converting&#8221; the 16-bit characters into 8-bit strings through violent truncation and dubious use of sprintf.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>The obvious-ish limitation is that by relying on XPConnect&#8217;s understanding of the JS stack, we miss out on the most specific pure interpreter stack frames at any given time.\u00a0 This is mitigated by the fact that XPConnect is like air to the Thunderbird code-base and that we still have the functions higher up the call stack.\u00a0 This can also presumably be addressed by detecting when we are in the interpreter code and poking around.\u00a0 It&#8217;s been a while since I&#8217;ve been in that part of SpiderMonkey&#8217;s guts&#8230; there may be complications with fast natives that could require clever stack work.<\/p>\n<p>This blog post is getting rather long, so let&#8217;s just tie this off and say that I have extended <a href=\"http:\/\/www.visophyte.org\/blog\/tag\/doccelerator\/\">doccelerator<\/a> to be able to parse the trace files, spitting the output into its own CouchDB database.\u00a0 Then doccelerator is able to expose that data via <a href=\"http:\/\/www.kylescholz.com\/blog\/\">Kyle Scholz<\/a>&#8216;s <a href=\"http:\/\/code.google.com\/p\/jsviz\/\">JSViz<\/a> in an interactive force-directed graph that is related back to the documentation data.\u00a0 The second screenshot demonstrates that double-clicking on the (blue) node that is the source of the tooltip brings up our documentation on GlodaIndexer.callbackDriver.\u00a0 <a href=\"http:\/\/hg.mozilla.org\/users\/bugmail_asutherland.org\/doccelerator\/\">doccelerator hg repo<\/a>.\u00a0 <a href=\"http:\/\/hg.mozilla.org\/users\/bugmail_asutherland.org\/tb-test-help\/file\/edd5e8a44ad3\/jsstack.emt\">vprobe emmett script<\/a> in <a href=\"http:\/\/hg.mozilla.org\/users\/bugmail_asutherland.org\/tb-test-help\/\">hg repo<\/a>.<\/p>\n<p><a href=\"http:\/\/clicky.visophyte.org\/examples\/live\/doccelerator\/_design\/doccelerator\/index.html#perfTop=perfish\">See a live demo here<\/a>.\u00a0 It will eat your cpu although it will eventually back off once it feels that layout has converged.\u00a0 You should be able to drag nodes around.\u00a0 You should also be able to double-click on nodes and have the documentation for that function be shown *if it is available*.\u00a0 We have no mapping for native frames or XBL stuff at this time.\u00a0 Depending on what other browsers do when they see JS 1.8 code, it may not work in non-Firefox browsers.\u00a0 (If they ignore the 1.8 file, all should be well.)\u00a0 I will ideally fix that soon by adding an explicit extension mechanism.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The greatest problem with performance profiling is getting as much information as possible while affecting the results as little as possible.\u00a0 For my work on pecobro I used mozilla&#8217;s JavaScript DTrace probes.\u00a0 Because the probes are limited to notifications of &hellip; <a href=\"https:\/\/www.visophyte.org\/blog\/2009\/08\/03\/using-vmware-recordreplay-and-vprobes-for-low-time-distortion-performance-profiling\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":""},"categories":[7,8,6,4],"tags":[29,64,48,73,26,72,71,69,70],"class_list":["post-363","post","type-post","status-publish","format-standard","hentry","category-debugging","category-program-execution","category-thunderbird","category-visualizing","tag-chronicle","tag-doccelerator","tag-dtrace","tag-jsviz","tag-pecobro","tag-performance","tag-profiling","tag-vmware","tag-vprobes"],"_links":{"self":[{"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/posts\/363","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/comments?post=363"}],"version-history":[{"count":8,"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/posts\/363\/revisions"}],"predecessor-version":[{"id":373,"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/posts\/363\/revisions\/373"}],"wp:attachment":[{"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/media?parent=363"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/categories?post=363"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/tags?post=363"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}