{"id":84,"date":"2007-12-22T03:30:37","date_gmt":"2007-12-22T08:30:37","guid":{"rendered":"http:\/\/www.visophyte.org\/blog\/2007\/12\/22\/adding-stews-hackish-destructive-accumulationreduction-to-couchdb\/"},"modified":"2009-04-01T08:28:24","modified_gmt":"2009-04-01T13:28:24","slug":"adding-stews-hackish-destructive-accumulationreduction-to-couchdb","status":"publish","type":"post","link":"https:\/\/www.visophyte.org\/blog\/2007\/12\/22\/adding-stews-hackish-destructive-accumulationreduction-to-couchdb\/","title":{"rendered":"Adding stews (hackish destructive accumulation\/reduction) to CouchDB"},"content":{"rendered":"<p>As all misguidedly-lazy programmers are wont to do, I decided that it would be easier to &#8216;enhance&#8217; <a href=\"http:\/\/couchdb.com\/\">CouchDB<\/a> to meet my needs rather than to rewrite visotank to use SQLAlchemy.  Also, I wanted to understand what CouchDB was doing under the hood with views and try my hand at some Erlang.<\/p>\n<p align=\"center\"> <img decoding=\"async\" src=\"http:\/\/www.visophyte.org\/blog\/wp-content\/uploads\/2007\/12\/big-spiral.png\" alt=\"This Has Nothing To Do With Anything\" \/><\/p>\n<p>CouchDB as currently implemented maintains a lot of information for each mapped document.  There is a B-tree associated with each View Group whose keys are Document Ids and whose Values are a list of {View Id, Actual-Key-You-Mapped-In-That-View} tuples for every key mapped from that document for every view in the view group.  Next, each View has a B-tree associated with it whose keys are {Actual-Key-You-Mapped, Document Id} tuples and whose values are the Actual-Value-You-Mapped.<\/p>\n<p>This is all well and good, but is a poor fit for one of my key use-cases: reducing e-mail message traffic to date-binned summary statistics so I can render graphics.  If I want the weekly-messages-sent count for a given &#8216;author&#8217;, <em>map(message.author, blah)<\/em> will allow me to filter only to messages sent by that author, but no matter what <em>blah<\/em> is, I will still get one per message.<\/p>\n<p>Long blog post short, I have implemented a hackish first-pass reduce\/accumulate solution to my problem.  The idea is that &#8216;stews&#8217; allow you to aggregate mapped data that shares the same key.  I&#8217;m a little fuzzy on exactly what the definition of &#8216;reduce&#8217; is in the map\/reduce papers (it&#8217;s been a while, if ever), so we&#8217;ll call this &#8216;accumulate&#8217; (in the SICP\/Scheme sense). It is a hack because:<\/p>\n<ul>\n<li>It does not unify views and &#8216;stews&#8217;.  Whereas views are defined under &#8216;_design&#8217; and accessed via &#8216;_view&#8217;, stews are defined under &#8216;_pot&#8217; and accessed via &#8216;_stew&#8217;.<\/li>\n<li>Values can only be integers right now, and it&#8217;s assumed you want to add them.  (No custom JavaScript logic!)<\/li>\n<li>I have not yet dealt with modified\/removed documents.  Which is to say that if you modify or remove a stew-mapped document, your accumulated values will climb ever-skyward.<\/li>\n<li>It is in no way, shape, or form intended to be anything other than a learning experiment.  (It is my hope that <a href=\"http:\/\/damienkatz.net\/\">Damien Katz<\/a> magically solves my problems <a href=\"http:\/\/damienkatz.net\/2007\/12\/couchdb_roundup.html\">in the next release<\/a>.  Having said that, I&#8217;m not opposed to trying to actually implement a more solid feature along these lines; coding in Erlang is wicked awesome. (sounds better with a fake accent))<\/li>\n<\/ul>\n<p>It just so happens that these constraints are perfectly in line with visotank&#8217;s needs.  Using stews and otherwise limiting my use of views, CouchDB is less ridiculous in its view-update times and the fully-populated (view\/stew-wise) from-scratch &#8216;messages&#8217; database tops out at 77M rather than 1.2G.<\/p>\n<p align=\"center\"> <img decoding=\"async\" src=\"http:\/\/www.visophyte.org\/blog\/wp-content\/uploads\/2007\/12\/little-spiral.png\" alt=\"This also has nothing to do with anything\" \/><\/p>\n<p>Anyways, if anyone is interested in the code (or the comments I added to the existing couch_view_group.erl logic), my bzr branch for <a href=\"http:\/\/couchdb.com\/\">CouchDB<\/a> is at:  <a href=\"http:\/\/www.visophyte.org\/rev_control\/bzr\/couchdb\/visbrero-couchdb\/\">http:\/\/www.visophyte.org\/rev_control\/bzr\/couchdb\/visbrero-couchdb\/<\/a> .  My bzr branch for <a href=\"http:\/\/code.google.com\/p\/couchdb-python\/\">couchdb-python<\/a>, adding a simple unit test for stews is at: <a href=\"http:\/\/www.visophyte.org\/rev_control\/bzr\/couchdb-python\/visbrero\/\">http:\/\/www.visophyte.org\/rev_control\/bzr\/couchdb-python\/visbrero\/<\/a> .<\/p>\n<p><strong>Update<\/strong>!\u00a0 The bzr repository is powerful messed up, so a better choice might be my changes in patch form:\u00a0 <a href=\"http:\/\/www.visophyte.org\/rev_control\/patches\/couchdb\/visbrero-couchdb-stews-1.patch\">http:\/\/www.visophyte.org\/rev_control\/patches\/couchdb\/visbrero-couchdb-stews-1.patch<\/a><\/p>\n<p><strong>Update 2<\/strong>! The bzr repository accessible at <a href=\"http:\/\/clicky.visophyte.org\/rev_control\/bzr\/couchdb\/visbrero-couchdb\/\">http:\/\/clicky.visophyte.org\/rev_control\/bzr\/couchdb\/visbrero-couchdb\/<\/a> works and there&#8217;s a checkout with working copy (that you can browse) at <a href=\"http:\/\/clicky.visophyte.org\/rev_control\/bzr-checkouts\/couchdb\/visbrero-couchdb\/\">http:\/\/clicky.visophyte.org\/rev_control\/bzr-checkouts\/couchdb\/visbrero-couchdb\/<\/a> .\u00a0\u00a0 Note that these locations are not guaranteed to be valid for all time, but will be good for at least a month or two.<\/p>\n<p>I fear my (sleepy) explanation may not be sufficient, so the unit test I added to couchdb-python may speak better to this end:<\/p>\n<p><code>self.db['tom1'] = {'author': 'tom', 'subject': 'cheese'}<br \/>\nself.db['tom2'] = {'author': 'tom', 'subject': 'cats'}<br \/>\nself.db['tom3'] = {'author': 'tom', 'subject': 'mice'}<br \/>\nself.db['bob1'] = {'author': 'bob', 'subject': 'hats'}<br \/>\nself.db['jon1'] = {'author': 'jon', 'subject': 'hats'}<br \/>\nself.db['kim1'] = {'author': 'kim', 'subject': 'cats'}<br \/>\nself.db['kim2'] = {'author': 'kim', 'subject': 'cows'}<br \/>\nself.db['_pot\/test'] = {'views': {<br \/>\n'authors': 'function(doc) { map(doc.author, 1) }',<br \/>\n'subjects': 'function(doc) { map(doc.subject, 1) }'<br \/>\n}}<br \/>\nauthors = dict([(row.key, row.value) for row in self.db.view('_stew\/test\/authors')])<br \/>\nself.assertEqual(authors['tom'], 3)<br \/>\nself.assertEqual(authors['bob'], 1)<br \/>\nself.assertEqual(authors['jon'], 1)<br \/>\nself.assertEqual(authors['kim'], 2)<br \/>\nsubjects = dict([(row.key, row.value) for row in self.db.view('_stew\/test\/subjects')])<br \/>\nself.assertEqual(subjects['cheese'], 1)<br \/>\nself.assertEqual(subjects['cats'], 2)<br \/>\nself.assertEqual(subjects['mice'], 1)<br \/>\n<\/code><\/p>\n<p>Uh, the spiral visualizations have nothing to do with the post.  They are new insofar as I have never posted them before, but they are in fact rather quite old.  They have a new aspect in that they now work with the cairo renderer, having relied upon &#8216;special&#8217; (horrible) custom renderers in the old agg backend.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>As all misguidedly-lazy programmers are wont to do, I decided that it would be easier to &#8216;enhance&#8217; CouchDB to meet my needs rather than to rewrite visotank to use SQLAlchemy. Also, I wanted to understand what CouchDB was doing under &hellip; <a href=\"https:\/\/www.visophyte.org\/blog\/2007\/12\/22\/adding-stews-hackish-destructive-accumulationreduction-to-couchdb\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":""},"categories":[15,5,11],"tags":[130],"class_list":["post-84","post","type-post","status-publish","format-standard","hentry","category-couchdb","category-email","category-software","tag-couchdb"],"_links":{"self":[{"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/posts\/84","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/comments?post=84"}],"version-history":[{"count":1,"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/posts\/84\/revisions"}],"predecessor-version":[{"id":256,"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/posts\/84\/revisions\/256"}],"wp:attachment":[{"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/media?parent=84"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/categories?post=84"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.visophyte.org\/blog\/wp-json\/wp\/v2\/tags?post=84"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}