{"id":68,"date":"2007-11-01T03:19:19","date_gmt":"2007-11-01T08:19:19","guid":{"rendered":"http:\/\/www.visophyte.org\/blog\/2007\/11\/01\/some-email-analysis-for-some-email-visualization\/"},"modified":"2009-04-01T08:24:58","modified_gmt":"2009-04-01T13:24:58","slug":"some-email-analysis-for-some-email-visualization","status":"publish","type":"post","link":"https:\/\/www.visophyte.org\/blog\/2007\/11\/01\/some-email-analysis-for-some-email-visualization\/","title":{"rendered":"some email analysis for some email visualization"},"content":{"rendered":"<p>An attempt to apply <a href=\"http:\/\/code.google.com\/p\/openhtmm\/\">hidden topic markov models<\/a> to e-mail to perform topic analysis has morphed into simply deriving (aggregate) word-frequency information for <a href=\"http:\/\/en.wikipedia.org\/wiki\/Tfidf\">TF-IDF<\/a> purposes.  The e-mails I attempted to analyze from my corpus appear to simply have been too short and wanting for quantity to pull a rabbit out of the (algorithmic) hat.  (I only threw e-mails amongst my &#8216;village&#8217;-tagged contacts, as previously visualized.)<\/p>\n<p><img decoding=\"async\" src=\"http:\/\/www.visophyte.org\/blog\/wp-content\/uploads\/2007\/11\/poor-mans-themail-but-hey-first-pass.png\" alt=\"poor-mans-themail-but-hey-first-pass.png\" \/><\/p>\n<p>Luckily, there&#8217;s a lot you can do with such information.  (And in fact, I ended up using the word frequency info to attempt to normalize out e-mail signatures since I didn&#8217;t feel like doing the right thing for signatures at the time.)   The bad news is I am not doing anything polished or good with the info yet.<\/p>\n<p>The above is a quick proof-of-it-kinda-works which apes <a href=\"http:\/\/alumni.media.mit.edu\/~fviegas\/projects\/themail\/study\/index.htm\">Themail<\/a>&#8216;s monthly words concept.  If you&#8217;re not familiar with Themail, click the link, read the PDF.  It is\/was a covet-able research prototype that let &#8216;you&#8217; explore your history, e-mail-wise.  It&#8217;s not available for download, hence &#8216;was&#8217;, and was only available to participating subjects, hence &#8216;you&#8217;.  The good news is that, as always, you can download my hacked-up version of posterity and my visterity plugin.  I wouldn&#8217;t try using it if I were you, though.<\/p>\n<p><img decoding=\"async\" src=\"http:\/\/www.visophyte.org\/blog\/wp-content\/uploads\/2007\/11\/conv-index-with-terms-as-databased.png\" alt=\"conv-index-with-terms-as-databased.png\" \/><\/p>\n<p>The second screenshot is my Inbox with the &#8216;best&#8217; scoring keyword (using traditional tf-idf, not the themail revised metrics) displayed for each message where the histogram information is available.  Since I only ran the processing code against a set of my contacts, only messages involving those people have a keyword displayed.<\/p>\n<p>I&#8217;m going to try and pull in my old pre-gmail email into the system to try and get some more (personal) data to work with.  Or, people who are not spammers, e-mail me so I have some more correspondence. sombrero@alum.mit.edu.  Conversations about why the Pet Shop Boys are the greatest band ever are preferable.  Eventually I&#8217;ll try and pull in my gaim\/pidgin logs which would be more useful, but that&#8217;s arguably a different data case with special needs, and I&#8217;m already spread pretty thin focus-wise as is, so that will have to wait.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>An attempt to apply hidden topic markov models to e-mail to perform topic analysis has morphed into simply deriving (aggregate) word-frequency information for TF-IDF purposes. The e-mails I attempted to analyze from my corpus appear to simply have been too &hellip; <a href=\"https:\/\/www.visophyte.org\/blog\/2007\/11\/01\/some-email-analysis-for-some-email-visualization\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":""},"categories":[5,12,4],"tags":[127,44,39],"class_list":["post-68","post","type-post","status-publish","format-standard","hentry","category-email","category-posterity","category-visualizing","tag-posterity","tag-tfidf","tag-visophyte"],"_links":{"self":[{"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/posts\/68","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/comments?post=68"}],"version-history":[{"count":1,"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/posts\/68\/revisions"}],"predecessor-version":[{"id":249,"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/posts\/68\/revisions\/249"}],"wp:attachment":[{"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/media?parent=68"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/categories?post=68"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/tags?post=68"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}