Skip to content

a systemtap script to attribute JS string memory costs to the creating JS functions

----- js_GC data pages: 96731 total vm pages: 234849
   ###      KiB
 11376      839 BF_format                                        log4moz.j:487
 15906      725 mime_emitter_writeBody                           jsmimeemitter.j:381
   148      576 MimeMessage_coerceBodyToPlaintext                mimemsg.j:341
  4460      492 gloda_ds_queryFromQuery                          datastore.j:2988
  9655      387 <MYSTERIOUS>                                     <BEYOND THE VEIL>:0
   911      237 gloda_content_quoted                             connotent.j:267
  1723      144 gloda_indexMessage                               index_msg.j:2597
   187      141 gloda_content_content                            connotent.j:290
   170      108 gloda_content_getContent                         connotent.j:166
  1917       73 gloda_identity_toString                          datamodel.j:785
  1775       69 Log4Moz_enumerateProps                           log4moz.j:220
value |-------------------------------------------------- count
    0 |@                                                    366
    1 |@@@@                                                1371
    2 |@@@@@                                               1584
    4 |@@@@@@@@@@@@@@@@@@@@@@@@                            6863
    8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@          11784
   16 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@      12794
   32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  14016
   64 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                      8662
  128 |@@@@@@@@@@@@@@                                      4093
  256 |@@@@@                                               1567
  512 |@                                                    309
 1024 |                                                     105
 2048 |                                                      58
 4096 |                                                      28
 8192 |                                                      17
16384 |                                                      12
32768 |                                                       0
65536 |                                                       0

The script.  It tells you what JS code has been creating strings since the last GC and the overall distribution of the marginal costs of those strings, leaving out puny code that allocates less than 64k’s worth.  Marginal costs because spidermonkey has support for dependent strings.  “a += b” will actually result in the storage of a being realloc’d with a pointing into the storage of a’.  I forgot to check for the js_ConcatStrings case at first and, let me tell you, the stats were not pretty.  Poor mime_emitter_writeBody was accused of allocating 11 gigabytes of strings in one intra-GC cycle!

What is interesting about this script is that it is actually pulling all of the function name/script name/line number information directly out of a JSContext object.  What is exciting is that it was pretty easy to do thanks to systemtap’s understanding of DWARF debug symbols.  This is in severe comparison to my efforts using VProbes where there was no such support and I had to write a gdbpython script to begin to approach the same thing.  Also, because systemtap’s guru mode allows you to bail into real C code when you need, I was able to write a function that truncates the jschar (16-bit unicode) strings that make up the function names into ASCII that is both straightforward and efficient.  It also allowed for a quick/easy/fast basename() implementation too.

It is worth noting that spidermonkey does have built-in (conditionally compiled) static probe points originally created for DTrace and that they do provide more generic object lifetime hooks.  Although they were created for DTrace, systemtap is also able to use them.  The limitations with those probes is that once any probe is active using a hook, a non-trivial cost is incurred because filtering can only be performed inside the probe and you may receive orders of magnitude more calls than you actually care about.  The cost is worst for probes that provide function names since that requires a call to js_GetStringBytes every time (although there is some caching.)  Keep in mind that I’m not saying the systemtap userspace probes are free, but since they can be placed at arbitrary points, you can have probes that fire only in the cases you actually care about, only incur data-processing costs when desired, and are disturbingly easy to implement since you can just reach into the data structures you care about.

Since everyone loves an implementation fight, DTrace, great and magical though it is, is basically incapable of pulling off the JS stack-walking trick that systemtap makes possible.  Unless DTrace has grown these features since I last checked, it lacks the DWARF magic to do such things easily and the language intentionally avoids the control-flow constructs that would allow it to do anything quite so complex.  And you can entirely forget about arbitrary C code for the unicode string truncation…

This post is brought to you by efforts to decrease the memory footprint of Thunderbird when gloda’s indexer is active.  Because Thunderbird loves you.  But it loves your memory more.  For now.  That’s what we’re trying to fix.

{ 3 } Comments

  1. Dan Mosedale | April 8, 2010 at 9:13 am | Permalink

    Nifty! What are high-level conclusions were you able to draw from this data, if any?
    Does log4moz always generate that many pages, or is that only in the non-default case of having gloda logging preffed on?

  2. Andrew Sutherland | April 8, 2010 at 3:37 pm | Permalink

    Conclusions are on the motivating bug:
    https://bugzilla.mozilla.org/show_bug.cgi?id=557001#c10

    That is the non-default case of gloda logging at maximum. Since we go out of our way to avoid making calls to log4moz when debugging is not enabled, I would expect much less log4moz traffic in that case.

    I should note that the page counts are the overall page counts for the process at the time of the GC (before it runs) and not deltas or anything like that.

  3. db48x | April 11, 2010 at 3:48 am | Permalink

    That’s superb. congrats