Visualizing Regular Expressions 38

Posted by Oliver on February 19, 2006

Here’s something I’ve wanted for a long time. So I finally built it. reAnimator is a tool for visualizing how regular expression engines use finite-state automata to match regular regular expression patterns against text.

This is intended to demonstrate the implementation of regular expressions. If you want to learn how to use them instead, I recommend these references instead:
* Regular-Expressions.info
* A.M. Kuchling’s Regular Expression HOWTO
* Steve Mansour A Tao of Regular Expressions
* Jeffrey Friedl’s Mastering Regular Expressions
* The Regular Expression Library’s list of resources
* New My own reWork online regular expression workbench.

The User Interface

The screen has these areas:

The Pattern

The “pattern” shows the regular expression. Click on it to set another regular expression to match against.

The Input

The “input” is the string that is matched. As you type into the input string, the color of this string indicates whether it’s a complete match (green), a partial match (blue), or a non-match (red).

The Graphs

There are two graphs, which each display a finite-state automaton (FSA) that corresponds to the regular expression in the “pattern” area. As you type into the “input” area, the graphs also update, to display the state of the match.

A deterministic FSA (a DFA) is like a board game, with a counter that is moved according to the successive letters of the input string. The counter starts at the initial state (the leftmost circle with the arrow from off the board). Each consecutive letter of the input string tells where to move the counter to next. If the counter ends up in a terminal state (a double circle), there was a match.

A nondeterministic FSA is the same except that when there’s more than one legal move, you take them both.

The nondeterministic FSA bears the most direct resemblance to the regular expression. In exchange for this simplicity in its construction, it’s more complex to evaluate: instead of keeping track of just one counter, you have to keep track of a set of them. (This is an instance of the compile-time versus execution-time trade-offs that computer science is rife with.)

If you want to learn more about finite-state automata, the wikipedia entry has some useful information, but this also seems like a good place to plug my father-in-law’s book.

Implementation details

The front end is written in OpenLaszlo, and compiled to Flash. It uses AJAX and JSON to request the FSAs and the graphs.

The back end is written in Python and C. The Python part is my PyFSA library, plus a bit of glue to turn various JavaScript objects and graph files into JSON strings. There’s also a cache so that my shared server doesn’t get quite so stressed even if the site becomes popular. (Having the front end logic on the client should help here too.)

The C portion is the wonderful GraphViz. PyFSA creates a .dot file for each FSA, and uses GraphViz to lay out the graph. The server parses the annotated .dot file into a JavaScript object, and uses JSON to download the resulting graph to the client.

An OpenLaszlo class on the client interprets the JavaScript graph description into a sequence of drawing instructions. It also saves the node positions, so that it can animate against them.

Update: Some of the support libraries are now available as open source. See JSON for OpenLaszlo, JavaScript Beziers, and Canvas with Text. OpenLaszlo and PyFSA were available previously.

Implementation choices

Client vs. server: There’s no reason that this couldn’t be a client-side-only application. I just happened to have PyFSA lying around, and didn’t feel like porting it to JavaScript. My goal for this was one day of implementation time, and I didn’t think porting PyFSA would fit into this. (It ended up taking three days anyway, because I forgot about implementing cubic beziers and JSON, and because I got carried away and added animation.)

CGI vs. FastCGI: I’ve been doing most of my server-side programming with either PHP or FastCGI and Ruby, so that pages don’t take so long to serve. This is the first Python service I’ve deployed since I started using FastCGI, and I was planning to use it. But Python doesn’t include FastCGI in its library, and the fact that there were four different third-party libraries with different APIs, none of them endorsed, and that the latest PEP to mention FastCGI was deferred five years ago, made me unwilling to take on the project of evaluating them.I’m glad to hear that I was completely off base about Python and FastCGI. See Phillip J. Eby’s comment below. I stuck with CGI, which is in the standard library.

OpenLaszlo vs. DHTML: It would be just as easy to draw the graph itself using the canvas class in DHTML. I balked at doing the animation and user interface in DHTML, though. (There’s little touches like laying the graphs out horizontally only if there’s enough room, which were only a few lines of declarative code in OpenLaszlo.) And then it wouldn’t have worked on as many browsers. I decided to wait until OpenLaszlo compiles to DHTML, for a DHTML version of this.

PyFSA vs. …: Higher-performance © implementations of FSA minimization and determinization exist. I went with my own because it’s the only one I know of that has the option of preserving source location information across transformations. I use source location information minimally in the interface, and might add more.

Credits

Thanks to Margaret Minsky and Gary Drescher for commenting on a draft of the application. I used Patrick Logan’s json-py for server-side JSON. The credits for GraphViz are here. Thanks to my former colleages at Laszlo Systems, Inc. for helping create the OpenLaszlo platform. I adapted Philip J. Scheider’s code for subdividing cubic beziers; this is a compact implementation of de Casteljau’s algorithm for Algol-like languages. Thanks to Guido and company for Python. Lastly, since people always ask, I drew the architecture diagram with Omigraffle (gee I wish I got a commission!) — which I like because I’m not much of a designer, and diagrams I draw with it are passable without much work.

Update: I changed the name to reAnimator. (It was reMatch.) Thanks to Apache RewriteRule for letting me do this without breaking the old URLs!

Trackbacks

Trackbacks are closed.

Comments

Comments are closed.

  1. alpheccar Sun, 19 Feb 2006 20:47:52 PST

    It is gorgeous !

    I wonder how you did the animation of the disks along the bezier curves. Do you just have a parametric equation for the curve and you just animate the parameter in OpenLaszlo ?

  2. PDI^2 :: Visualizzare Regexp :: February :: 2006 Sun, 19 Feb 2006 22:06:16 PST

    [...] Però volevo segnalare questo tool bellissimo che visualizza il funzionamento di un semplice sistema di regexp (forse un po’ troppo semplice, ma è fico). [...]

  3. Jon Mon, 20 Feb 2006 03:17:40 PST

    Wow! Totally cool

  4. Oliver Mon, 20 Feb 2006 14:44:18 PST

    Thanks, alpheccar! You’re exactly right about the disk animation. (Wordpress keeps cutting the end of my comment off. I’ll answer this in a separate post.)

  5. Phillip J. Eby Mon, 20 Feb 2006 23:35:41 PST

    FYI, the most recent PEP to mention FastCGI is PEP 333 (WSGI), not 222. There are now several FastCGI-supporting implementations of the WSGI API described in PEP 333, not to mention implementations of that same API for various other things including CGI, mod_python, Twisted, etc. Since they all have the same API, you can focus your evaluation on performance, documentation, or other characteristics without having to try different APIs.

  6. Justin Mason Tue, 21 Feb 2006 10:11:17 PST

    Very nice!

    I just posted the URL to the SpamAssassin users list. hopefully there won’t be any really messy regexps ;)

  7. Robert Boyd Tue, 21 Feb 2006 14:54:30 PST

    This is way cool!

    There are some weird things about the input field — I haven’t been able to get it to accept the “;” character. The pattern input field happily accepts the “;” and the graph of the state machines clearly displays it. Is there some trick to getting the input field to stop mapping the “;” to “:” ? every which way I type in a “;” it changes it to “:”

    Thanks,

    Robert

  8. Larry Yaeger Wed, 22 Feb 2006 19:30:02 PST

    Very nice! If FSAs come up in class, this will be an excellent teaching aid. Thanks!

    Just FYI, there is an (annoyingly intermittent) minor bug (at least when viewed with Safari 2.0.3 on Mac OS X 10.4.4) in the handling of at least the first and last example expressions. If the very first character you type does not satisfy the expression, deleting it will often cause the head node to flash blue, but immediately return to white, after which the graph ceases to display any colors or state changes. I initially thought this was 100% reproducible, but have now seen the graph recover from this on occasion.

  9. Don Hopkins Sat, 25 Feb 2006 22:48:15 PST

    How about writing a Relax NG visualizer, Oliver? I’d love to see a visual animation of James Clark’s derivitive algorithm. It lazily generates the FSA on demand, as it validates XML, so it can efficient support interleaving (which would otherwise demand a huge FSA to cover all possible cases).

    -Don

  10. Ferran Basora Tue, 28 Feb 2006 12:52:22 PST

    That is very cool, a great work!!

    Thanks

  11. myavuzselim Thu, 02 Mar 2006 08:20:04 PST

    Nice thing. See also jflap: http://www.jflap.org/

  12. [...] The blog entry explains how the reAnimator is built and what tools it uses. I wasn’t overly surprised to see Graphviz in the mix. Command line tools often play well with others. [...]

  13. Tom Scola Thu, 02 Mar 2006 13:28:53 PST

    Hmmm… It doesn’t seem to handle exponential blowup very well. Try these regexps:

    .*a.

    .*a..

    .*a…

    .*a….

  14. Kel Thu, 02 Mar 2006 16:03:42 PST

    This is a great tool.

    Actually seeing how the token moves through the states is such an intuitive way of thinking about regexps. Will you keep it up and available for use? Does it require much bandwidth to use?

  15. Shawn Thu, 02 Mar 2006 21:16:25 PST

    Amazing tool!!

    Suggestions for improvement:
    - Scroll bars! I can’t see the full graphs for larger expressions.
    - Support for {-quantifiers
    - Support for look ahead assertions

    Example:
    9(?!11)\d{2}
    matches: 9?!110{2}
    rather then: any three digit number starting with 9, except 911

    Thank you for the cool tools!

  16. ron Fri, 03 Mar 2006 15:58:02 PST

    Fun to use. Are you making the code available?

  17. Tom CF Fri, 03 Mar 2006 19:26:02 PST

    You’re state machine example for a*b|b*a doesn’t quite work right. It fails on the example string “abab”.

    The problem is that there are no links from the final state back to a non-final state.

  18. Tom CF Fri, 03 Mar 2006 19:58:28 PST

    Also the example state machine incorrectly accepts “a” as a valid string for the regular expression “a*b”

  19. Tom CF Fri, 03 Mar 2006 20:07:10 PST

    Never mind, wrong reg. ex. syntax.

  20. Pete Yandell Sat, 04 Mar 2006 06:03:29 PST

    This is great! I remember drawing a lot of this state diagrams by hand when studying regexps at uni. This sort of thing would have helped me to grasp the concepts a lot quicker.

  21. [...] Oliver Steele posted a link to his regular expression visualizer on his blog. You definitely have to try playing around with it! This is just great! [...]

  22. [...] Reanimator – средство “визуализации процесса применения регулярного выражения к тексту”. Это даже круче чем Regulator, описанный Зубинским в одном из недавних номеров КО. [...]

  23. Dr. Wombat Fri, 10 Mar 2006 17:55:30 PST

    Your illustration of RegEx is awexome!.

    However, i found ab|(baba)* not to work.

    ab|(cd)* does… so i think something has gone awry somewheres..

    just thought you might like to know.

  24. [...] Cute: a little flash applet to visualize a regex and watch it matching an input. [...]

  25. [...] OliverSteele hat mit dem reAnimator ein Tool zur Visualisierung der nicht-deterministischen und der daraus hergeleiteten deterministischen Automaten geschaffen, mit denen ein regulärer Ausdruck gematcht wird. [...]

  26. [...] Steve Oualline wrote a nifty little Perl program to graph regular expressions. And Oliver Steele wrote an even niftier OpenLaszlo app to show how regular expressions work. [...]

  27. Quick links on 2007-06-20 Β« solnul Wed, 20 Jun 2007 16:59:42 PDT

    [...] reAnimator: a regular expression visualizer (implementation details) [...]

  28. Fernando.com.ar Β» Blog Archive Β» RegEx Fri, 17 Aug 2007 12:14:20 PDT

    [...] Reanimator es una interesante herramienta creada por Oliver Steele que grafica como actúan las expresiones regulares usando Autómatas Finitos. Via PC++ [...]

  29. osteele.com | Deliggit.com…

    osteele.com

    … A tool for visualising regular expressions.

    CGI vs. FastCGI: I’ve been ……

  30. test 01/16/2008 Β« Strange Kite Wed, 16 Jan 2008 15:33:57 PST

    [...] Visualizing Regular Expressions at Oliver Steele  Annotated [...]

  31. ndanger.organism :: blog :: LOTD: 2008-01-16 Wed, 16 Jan 2008 23:08:32 PST

    [...] Visualizing Regular Expressions [...]

  32. links for 2008-01-17 at DeStructUred Blog Thu, 17 Jan 2008 00:20:27 PST

    [...] Visualizing Regular Expressions at Oliver Steele (tags: blog visualization regex programming) [...]

  33. My daily readings 01/17/2008 Β« Strange Kite Thu, 17 Jan 2008 09:35:11 PST

    [...] Visualizing Regular Expressions at Oliver Steele  Annotated [...]

  34. [...] Visualizing Regular Expressions at Oliver Steele This is intended to demonstrate the implementation of regular expressions. If you want to learn how to use them instead, I recommend these references instead: (tags: osteele.com 2008 mes0 dia17 at_tecp regexp regular_expressions visualization) [...]

  35. [...] Visualizing Regular Expressions at Oliver Steele (tags: Animation Data Design Development Flash Generator Math Programming Python RegularExpressions Search Visualization Visual) [...]

  36. rex Sun, 04 May 2008 21:41:19 PDT

    Dear Oliver,
    I’m a regex amateur, from China, and i kept a blog about regex, (the url of which can be found on the homepage section), focusing on regex and searching engine.

    Now i found your great works about regex, including reAnimator and reWork. May i translate them into Chinese and posted them on my blog? of course the original URL, your website and the author’s name would be shown on the translated version.

    Waiting for your response.
    rex.

  37. buy silver bullion Sun, 22 Jun 2008 10:07:39 PDT

    I didn’t get all the details but this looks like a promising tool. Congrats.

  38. fernO Wed, 21 Jan 2009 21:34:43 PST

    Wow, this is awesome! You should talk a little more about the code behind how this works, would be a great read.