python-dev mailman archive thread-arc visualizations

What’s new? Mailman archive processing! Bezier segmenting for more accurate arcs (thank you Luc Masonobe)! Pseudo-thread-arcs! (Actual thread arcs as visible on the whoa-awesome-but-why-aren’t-you-released-software ReMail project would seem to more practically squash ridiculously-large arcs and of course leverage the other side.)


  • Node colors are per-author. The middle bottom thread demonstrates the need for a smarter distinct color logic, as I’m pretty confident that thread isn’t the result of one person with multiple personalities.
  • Nodes are sequenced time-wise from left-to-right with uniform (pixel-based) spacing. I tried placing them by time, but things clump up pretty uselessly. Non-linear time scaling is on my list.
  • Arc colors are based on the subject, whereas reconstruction is based on the headers. As such, arc colors in a thread may change if someone changes the subject.
  • Arc width is constant, though I was entertaining varying it based on the amount of new content in each e-mail. Of course, there’s always knobs to tweak.

Please note that the above image is actually the result of re-arranging the contents of two separate images; the layout currently used just stacks them all vertically with a (dynamic) uniform spacing. For this reason, the rightmost threads are actually from a separate run where I set the ‘chaos’ filter with a much higher bar. Oh, and those are threads from the July 2007 python-dev archive, although without labeling you have no reason to believe me.


And this is the result of sucking up January-July 2007, filtering on threads with a ‘chaos’ greater than 200. (‘chaos’ being an arbitrary and arguably incorrect term for the total distance of all nodes from their parent (less one). So a thread where every message is just a response to the one that chronologically preceded it will have a chaos of 0. A thread where every message is a response to the message before the one that chronologically preceded it will have a chaos of (n – 2) where n is the number of messages. Apply algebra, rinse, and repeat.

I’m going to try and get this going with Thunderbird reasonably soon, but it might take longer than I’d hope. Thunderbird w/pyxpcom didn’t build clean on my amd64-arch laptop, chronicle-recorder’s valgrind crashes on amd64, and the like means some busywork on the horizon.