teaser: code completion in skywriter/ajax.org code editor using jstut and narcissus

I’ve hooked up jstut’s (formerly narscribblus‘) narcissus-based parser and jsctags-based abstract interpreter up to the ajax.org code editor (ace, to be the basis for skywriter, the renamed and somewhat rewritten bespin).  Ace’s built-in syntax highlighters are based on the somewhat traditional regex-based state machine pattern and have no deep understanding of JS.  The tokenizers have a very limited stateful-ness; they are line-centric and the only state is the state of the parser at the conclusion of tokenizing the previous line.  The advantage is that they will tend to be somewhat resilient in the face of syntax errors.

In contrast, narcissus is a recursive descent parser that explodes when it encounters a parse error (and takes all the state on the stack at the point of failure with it).  Accordingly, my jstut/narscribblus parser is exposed to ace through a hybrid tokenizer that uses the proper narcissus parser as its primary source of tokens and falls back to the regex state machine tokenizer for lines that the parser cannot provide tokens for.  I have thus far made some attempt at handling invalidation regions in a respectable fashion but it appears ace is pretty cavalier in terms of invalidating from the edit point to infinity, so it doesn’t really help all that much.

Whenever a successful parse occurs, the abstract interpreter is kicked off which goes and attempts to evaluate the document.  This includes special support for CommonJS require() and CommonJS AMD define() operations.  The require(“wmsy/wmsy”) in the screenshot above actually retrieves the wmsy/wmsy module (using the RequireJS configuration), parses it using narcissus, parses the documentation blocks using jstut, performs abstract interpretation and follow-on munging, and then returns the contents of that namespace (asynchronously using promises) to the abstract interpreter for the body of the text editor.  The hybrid tokenizer does keep around a copy of the last good parse to deal with code completion in the very likely case where the intermediate stages of writing new code result in parse failures.  Analysis of the delta from the last good parse is used in conjunction with the last good parse to (attempt to) provide useful code completion

The net result is that we have semantic information about many of the tokens on the screen and could do fancy syntax highlighting like Eclipse can do.  For example, global variables could be highlighted specially, types defines from third party libraries could get their own color, etc.  For the purposes of code completion, we are able to determine the context surrounding the cursor and the appropriate data types to use as the basis for completion.  For example, in the first screenshot, we are able to determine that we are completing a child of “wy” which we know to be an instance of type WmsyDomain from the wmsy namespace.  We know the children of the prototype of WmsyDomain and are able to subsequently filter on the letter “d” which we know has been (effectively) typed based on the position of the cursor.  (Note: completion items are currently not sorted bur rather shown in definition order.)

In the second example, we are able to determine that the cursor is in an object initializer, that the object initializer is the first argument of a call to defineWidget on “wy” (which we know about as previously described).  We accordingly know the type constraint on the object initializer and thus know the legal/documented key names that can be used.

This is not working enough to point people at a live demo, but it is exciting enough to post teaser screenshots.  Of course, the code is always available for the intrepid: jstut/narscribblus, wmsy.  In a nutshell, you can hit “Alt-/” and the auto-completion code will try and do its thing.  It will display its results in a wmsy popup that is not unified with ace in terms of how focus is handled (wmsy’s bad).  Nothing you do will actually insert text, but if you click outside of the popup or hit escape it will at least go away.  The egregious deficiencies are likely to go away soon, but I am very aware and everyone else should be aware that getting this to a production-quality state you can use on multi-thousand line files with complex control flow would likely be quite difficult (although if people document their types/signatures, maybe not so bad).  And I’m not planning to pursue that (for the time being); the goal is still interactive, editable, tutorial-style examples.  And for these, the complexity is way down low.

My thanks to the ajax.org and skywriter teams; even at this early state of external and source documentation it was pretty easy to figure out how various parts worked so as to integrate my hybrid tokenizer and hook keyboard commands up.  (Caveat: I am doing some hacky things… :))  I am looking forward to the continued evolution and improvement of an already great text editor component!

Visualizing asynchronous JavaScript promises (Q-style Promises/B)

Asynchronous JS can be unwieldy and confusing.  Specifically, callbacks can be unwieldy, especially when you introduce error handling and start chaining asynchronous operations.  So, people frequently turn to something like Python’s Twisted‘s deferreds which provide for explicit error handling and the ability for ‘callbacks’ to return yet another asynchronous operation.

In CommonJS-land, there are proposals for deferred-ish promises.  In a dangerously concise nutshell, these are:

  • Promises/A: promises have a then(callback, errback) method.
  • Promises/B: the promises module has a when(value, callback, errback) helper function.

I am in the Promises/B camp because the when construct lets you not care whether value is actually a promise or not both now and in the future.  The bad news about Promises/B is that:

  • It is currently not duck typable (but there is a mailing list proposal to support unification that I am all for) and so really only works if you have exactly one promises module in your application.
  • The implementation will make your brain implode-then-explode because it is architected for safety and to support transparent remoting.

To elaborate on the (elegant) complexity, it uses a message-passing idiom where you send your “when” request to the promise which is then responsible for actually executing your callback or error back.  So if value is actually a value, it just invokes your callback on the value.  If value was a promise, it queues your callback until the promise is resolved.  If value was a rejection, it invokes your rejection handler.  When a callback returns a new promise, any “when”s that were targeted at the associated promise end up retargeted to the newly returned promise.  The bad debugging news is that almost every message-transmission step is forward()ed into a subsequent turn of the event loop which results in debuggers losing a lot of context.  (Although anything that maintains linkages between the code that created a timer and the fired timer event or other causal chaining at least has a fighting chance.)

In short, promises make things more manageable, but they don’t really make things less confusing, at least not without a little help.  Some time ago I created a modified version of Kris Kowal‘s Q library implementation that:

  • Allows you to describe what a promise actually represents using human words.
  • Tracks relationships between promises (or allows you to describe them) so that you can know all of the promises that a given promise depends/depended on.
  • Completely abandons the security/safety stuff that kept promises isolated.

The end goal was to support debugging/understanding of code that uses promises by converting that data into something usable like a visualization.  I’ve done this now, applying it to jstut’s (soon-to-be-formerly narscribblus’) load process to help understand what work is actually being done.  If you are somehow using jstut trunk, you can invoke document.jstutVisualizeDocLoad(/* show boring? */ false) from your JS console and see such a graph in all its majesty for your currently loaded document.

The first screenshot (show boring = true) is of a case where a parse failure of the root document occurred and we display a friendly parse error screen.  The second screenshot (show boring = false) is the top bit of the successful presentation of the same document where I have not arbitrarily deleted a syntactically important line.

A basic description of the visualization:

  • It’s a hierarchical protovis indented tree.  The children of a node are the promises it depended on.  A promise that depended in parallel(-ish) on multiple promises will have multiple children.  The special case is that if we had a “when” W depending on promise X, and X was resolved with promise Y, then W gets retargeted to Y.  This is represented in the visualization as W having children X and Y, but with Y having a triangle icon instead of a circle in order to differentiate from W having depended on X and Y in parallel from the get-go.
  • The poor man’s timeline on the right-hand side shows the time-span between when the promise was created and when it was resolved.  It is not showing how long the callback function took to run, although it will fall strictly within the shown time-span.  Time-bar widths are lower bounded at 1 pixel, so the duration of something 1-pixel wide is not representative of anything other than position.
  • Nodes are green if they were resolved, yellow if they were never resolved, red if they were rejected.  Nodes are gray if the node and its dependencies were already shown elsewhere in the graph; dependencies are not shown in such a case.  This reduces redundancy in the visualization while still expressing actual dependencies.
  • Timelines are green if the promise was resolved, maroon if it was never resolved or rejected.  If the timeline is never resolved, it goes all the way to the right edge.
  • Boring nodes are elided when so configured; their interesting children spliced in in their place.  A node is boring if its “what” description starts with “auto:” or “boring:”.  The when() logic automatically annotates an “auto:functionName” if the callback function has a name.

You can find pwomise.js and pwomise-vis.js in the narscribblus/jstut repo.  It’s called pwomise not to be adorable but rather to make it clear that it’s not promise.js.  I have added various comments to pwomise.js that may aid in understanding.  Sometime soon I will update my demo setup on clicky.visophyte.org so that all can partake.

Documentation for complex things (you don’t basically already understand)

The Problem

One of my focuses at MoMo is to improve the plight of Thunderbird extension developers.  An important aspect of this is improving the platform they are exposed to.  Any platform usually entails a fair amount of complexity.  The trick is that you only pay for the things that are new-to-you, learning-wise.

The ‘web’ as a platform is not particularly simple; it’s got a lot of pieces, some of which are fantastically complex (ex: layout engines).  But those bits are frequently orthogonal, can be learned incrementally, have reams of available documentation, extensive tools that can aid in understanding, and, most importantly, are already reasonably well known to a comparatively large population.  The further you get from the web-become-platform, the more new things you need to learn and the more hand-holding you need if you’re not going to just read the source or trial-and-error your way through.  (Not that those are bad ways to roll; but not a lot of people make it all the way through those gauntlets.)

I am working to make Thunderbird more extensible in more than a replace-some-function/widget-and-hope-no-other-extensions-had-similar-ideas sort of way.  I am also working to make Thunderbird and its extensions more scalable and performant without requiring a lot of engineering work on the part of every extension.  This entails new mini-platforms and non-trivial new things to learn.

There is, of course, no point in building a spaceship if no one is going to fly it into space and fight space pirates.  Which is to say, the training program for astronauts with all its sword-fighting lessons is just as important as the spaceship, and just buying them each a copy of “sword-fighting for dummies who live in the future” won’t cut it.

Translating this into modern-day pre-space-pirate terminology, it would be dumb to make a super fancy extension API if no one uses it.  And given that the platform is far enough from pure-web and universally familiar subject domains, a lot of hand-holding is in order.  Since there is no pre-existing development community familiar with the framework, they can’t practically be human hands either.

The Requirements

I assert the following things are therefore important for the documentation to be able to do:

  • Start with an explained, working example.
  • Let the student modify the example with as little work on their part as possible so that they can improve their mental model of how things actually work.
  • Link to other relevant documentation that explains what is going on, especially reference/API docs, without the user having to open a browser window and manually go search/cross-reference things for themselves.
  • Let the student convert the modified example into something they can then use as the basis for an extension.

The In-Process Solution: Narscribblus

So, I honestly was quite willing to settle for an existing solution that was anywhere close to what I needed.  Specifically, the ability to automatically deep-link source code to the most appropriate documentation for the bits on hand.  It has become quite common to have JS APIs that take an object where you can have a non-trivial number of keys with very specific semantics, and my new developer friendly(-ish) APIs are no exception.

Unfortunately, most existing JavaScript documentation tools are based on the doxygen/JavaDoc model of markup that:

  • Was built for static languages where your types need to be defined.  You can then document each component of the type by hanging doc-blocks off them.  In contrast, in JS if you have a complex Object/dictionary argument that you want to hang stuff of of, your best bet may be to just create a dummy object/type for documentation purposes.  JSDoc and friends do support a somewhat enriched syntax  like “@param arg.attr”, but run into the fact that the syntax…
  • Is basically ad-hoc with limited extensibility.  I’m not talking about the ability to add additional doctags or declare specific regions of markup that should be passed through a plugin, which is pretty common.  In this case, I mean that it is very easy to hit a wall in the markup language that you can’t resolve without making an end-run around the existing markup language entirely.  As per the previous bullet point, if you want to nest rich type definitions, you can quickly run out of road.

The net result is that it’s hard to even describe the data types you are working with, let alone have tools that are able to infer links into their nested structure.

So what is my solution?

  • Steal as much as possible from Racket (formerly PLT Scheme)’s documentation tool, Scribble.  To get a quick understanding of the brilliance of Racket and Scribble, check out the quick introduction to racket.  For those of you who don’t click through, you are missing out on examples that automatically hyperlink to the documentation for invoked methods, plus pictures capturing the results of the output in the document.
    • We steal the syntax insofar as it is possible without implementing a scheme interpreter.  The syntax amounts to @command[s-expression stuff where whitespace does not matter]{text stuff which can have more @command stuff in it and whitespace generally does matter}.  The brilliance is that everything is executed and there are no heuristics you need to guess at and that fall down.
    • Our limitation is that while Racket is a prefix language and can use reader macros and have the entire documents be processed in the same fashion as source code and totally understood by the text editor, such purity is somewhat beyond us.  But we do what we can.
  • Use narcissus, Brendan Eich/mozilla’s JS meta-circular interpreter thing, to handle parsing JavaScript.  Although we don’t have reader macros, we play at having them.  If you’ve ever tried to parse JavaScript, you know it’s a nightmare that requires the lexer to be aware of the parsing state thanks to the regexp syntax.  So in order for us to be able to parse JavaScript inline without depending on weird escaping syntaxes, when parsing our documents we use narcissus to make sure that we parse JavaScript as JavaScript; we just break out when we hit our closing squiggly brace.  No getting tricked by regular expressions, comments, etc.
  • Use the abstract interpreter logic from Patrick Walton‘s jsctags (and actually stole its CommonJS-ified narcissus as the basis for our hacked up one too) as the basis for abstract interpretation to facilitate being able to linkify all our JavaScript code.  The full narcissus stack is basically:
    • The narcissus lexer has been modified to optionally generate a log of all tokens it generates for the lowest level of syntax highlighting.
    • The narcissus parser has been modified to, when generating a token log, link syntax tokens to their AST parse nodes.
    • The abstract interpreter logic has been modified to annotate parse nodes with semantic links so that we can traverse the tokens to be able to say “hey, this is attribute ‘foo’ in an object that is argument index 1 of an invocation of function ‘bar'” where we were able to resolve bar to a documented node somewhere.  (We also can infer some object/class organization as a result of the limited abstract interpretation.)
    • We do not use any of the fancy static analysis stuff that is going on as of late with the DoctorJS stuff.  Automated stuff is sweet and would be nice to hook in, but the goal here is friendly documentation.
    • The abstract interpreter has been given an implementation of CommonJS require that causes it to load other source documents and recursively process them (including annotating documentation blocks onto them.)
  • We use bespin as the text editor to let you interactively edit code and then see the changes.  Unfortunately, I did not hook bespin up to the syntaxy magic we get when we are not using bespin.  I punted because of CommonJS loader snafus.  I did, however, make the ‘apply changes’ button use narcissus to syntax check things (with surprisingly useful error messages in some cases).

Extra brief nutshell facts:

  • It’s all CommonJS code.  The web enabled version which I link to above and below runs using a slightly modified version of Gozala‘s teleport loader.  It can also run privileged under Jetpack, but there are a few unimplemented sharp edges relating to Jetpack loader semantics.  (Teleport has been modified mainly to play more like jetpack, especially in cases where its aggressive regexes go looking for jetpack libs that aren’t needed on the web.)  For mindshare reasons, I will probably migrate off of teleport for web loading and may consider adding some degree of node.js support.  The interactive functionality currently reaches directly into the DOM, so some minor changes would be required for the server model, but that was all anticipated.  (And the non-interactive “manual” language already outputs plain HTML documents.)
  • The web version uses a loader to generate the page which gets displayed in an iframe inside the page.  The jetpack version generates a page and then does horrible things using Atul‘s custom-protocol mechanism to get the page displayed but defying normal browser navigation; it should either move to an encapsulated loader or implement a proper custom protocol.

Anywho, there is still a lot of work that can and will be done (more ‘can’ than ‘will’), but I think I’ve got all the big rocks taken care of and things aren’t just blue sky dreams, so I figure I’d give a brief intro for those who are interested.

Feel free to check out the live example interactive tutorialish thing linked to in some of the images, and its syntax highlighted source.  Keep in mind that lots of inefficient XHRs currently happen, so it could take a few seconds for things to happen.  The type hierarchy emission and styling still likely has a number of issues including potential failures to popup on clicks.  (Oh, and you need to click on the source of the popup to toggle it…)

Here’s a bonus example to look at too, keeping in mind that the first few blocks using the elided js magic have not yet been wrapped in something that provides them with the semantic context to do magic linking.  And the narscribblus repo is here.