Here’s something I’ve wanted for a long time. So I finally built it. reAnimator is a tool for visualizing how regular expression engines use finite-state automata to match regular regular expression patterns against text.
This is intended to demonstrate the implementation of regular expressions. If you want to learn how to use them instead, I recommend these references instead:
- Regular-Expressions.info
- A.M. Kuchling’s Regular Expression HOWTO
- Steve Mansour A Tao of Regular Expressions
- Jeffrey Friedl’s Mastering Regular Expressions
- The Regular Expression Library’s list of resources
- New My own reWork online regular expression workbench.
The User Interface
The screen has these areas:
The Pattern
The “pattern” shows the regular expression. Click on it to set another regular expression to match against.
The Input
The “input” is the string that is matched. As you type into the input string, the color of this string indicates whether it’s a complete match (green), a partial match (blue), or a non-match (red).
The Graphs
There are two graphs, which each display a finite-state automaton (FSA) that corresponds to the regular expression in the “pattern” area. As you type into the “input” area, the graphs also update, to display the state of the match.
A deterministic FSA (a DFA) is like a board game, with a counter that is moved according to the successive letters of the input string. The counter starts at the initial state (the leftmost circle with the arrow from off the board). Each consecutive letter of the input string tells where to move the counter to next. If the counter ends up in a terminal state (a double circle), there was a match.
A nondeterministic FSA is the same except that when there’s more than one legal move, you take them both.
The nondeterministic FSA bears the most direct resemblance to the regular expression. In exchange for this simplicity in its construction, it’s more complex to evaluate: instead of keeping track of just one counter, you have to keep track of a set of them. (This is an instance of the compile-time versus execution-time trade-offs that computer science is rife with.)
If you want to learn more about finite-state automata, the wikipedia entry has some useful information, but this also seems like a good place to plug my father-in-law’s book.
Implementation details

The front end is written in OpenLaszlo, and compiled to Flash. It uses AJAX and JSON to request the FSAs and the graphs.
The back end is written in Python and C. The Python part is my PyFSA library, plus a bit of glue to turn various JavaScript objects and graph files into JSON strings. There’s also a cache so that my shared server doesn’t get quite so stressed even if the site becomes popular. (Having the front end logic on the client should help here too.)
The C portion is the wonderful GraphViz. PyFSA creates a .dot file for each FSA, and uses GraphViz to lay out the graph. The server parses the annotated .dot file into a JavaScript object, and uses JSON to download the resulting graph to the client.
An OpenLaszlo class on the client interprets the JavaScript graph description into a sequence of drawing instructions. It also saves the node positions, so that it can animate against them.
Update: Some of the support libraries are now available as open source. See JSON for OpenLaszlo, JavaScript Beziers, and Canvas with Text. OpenLaszlo and PyFSA were available previously.
Implementation choices
Client vs. server: There’s no reason that this couldn’t be a client-side-only application. I just happened to have PyFSA lying around, and didn’t feel like porting it to JavaScript. My goal for this was one day of implementation time, and I didn’t think porting PyFSA would fit into this. (It ended up taking three days anyway, because I forgot about implementing cubic beziers and JSON, and because I got carried away and added animation.)
CGI vs. FastCGI: I’ve been doing most of my server-side programming with either PHP or FastCGI and Ruby, so that pages don’t take so long to serve. This is the first Python service I’ve deployed since I started using FastCGI, and I was planning to use it. But Python doesn’t include FastCGI in its library, and the fact that there were four different third-party libraries with different APIs, none of them endorsed, and that the latest PEP to mention FastCGI was deferred five years ago, made me unwilling to take on the project of evaluating them.I’m glad to hear that I was completely off base about Python and FastCGI. See Phillip J. Eby’s comment below. I stuck with CGI, which is in the standard library.
OpenLaszlo vs. DHTML: It would be just as easy to draw the graph itself using the canvas class in DHTML. I balked at doing the animation and user interface in DHTML, though. (There’s little touches like laying the graphs out horizontally only if there’s enough room, which were only a few lines of declarative code in OpenLaszlo.) And then it wouldn’t have worked on as many browsers. I decided to wait until OpenLaszlo compiles to DHTML, for a DHTML version of this.
PyFSA vs. …: Higher-performance© implementations of FSA minimization and determinization exist. I went with my own because it’s the only one I know of that has the option of preserving source location information across transformations. I use source location information minimally in the interface, and might add more.
Credits
Thanks to Margaret Minsky and Gary Drescher for commenting on a draft of the application. I used Patrick Logan’s json-py for server-side JSON. The credits for GraphViz are here. Thanks to my former colleages at Laszlo Systems, Inc. for helping create the OpenLaszlo platform. I adapted Philip J. Scheider’s code for subdividing cubic beziers; this is a compact implementation of de Casteljau’s algorithm for Algol-like languages. Thanks to Guido and company for Python. Lastly, since people always ask, I drew the architecture diagram with Omigraffle (gee I wish I got a commission!) —- which I like because I’m not much of a designer, and diagrams I draw with it are passable without much work.
Update: I changed the name to reAnimator. (It was reMatch.) Thanks to Apache RewriteRule for letting me do this without breaking the old URLs!

Comments
[...] Visualizing Regular Expressions at Oliver Steele (tags: Animation Data Design Development Flash Generator Math Programming Python RegularExpressions Search Visualization Visual) [...]
[...] Visualizing Regular Expressions at Oliver Steele This is intended to demonstrate the implementation of regular expressions. If you want to learn how to use them instead, I recommend these references instead: (tags: osteele.com 2008 mes0 dia17 at_tecp regexp regular_expressions visualization) [...]
[...] Visualizing Regular Expressions at Oliver Steele Annotated [...]
[...] Visualizing Regular Expressions at Oliver Steele (tags: blog visualization regex programming) [...]
[...] Visualizing Regular Expressions [...]
[...] Visualizing Regular Expressions at Oliver Steele Annotated [...]
osteele.com | Deliggit.com...
osteele.com
...
... A tool for visualising regular expressions.
...
... CGI vs. FastCGI: I’ve been ......
[...] Reanimator es una interesante herramienta creada por Oliver Steele que grafica como actúan las expresiones regulares usando Autómatas Finitos. Via PC++ [...]
[...] reAnimator: a regular expression visualizer (implementation details) [...]
[...] Steve Oualline wrote a nifty little Perl program to graph regular expressions. And Oliver Steele wrote an even niftier OpenLaszlo app to show how regular expressions work. [...]
[...] OliverSteele hat mit dem reAnimator ein Tool zur Visualisierung der nicht-deterministischen und der daraus hergeleiteten deterministischen Automaten geschaffen, mit denen ein regulärer Ausdruck gematcht wird. [...]
[...] Cute: a little flash applet to visualize a regex and watch it matching an input. [...]
Your illustration of RegEx is awexome!.
However, i found ab|(baba)* not to work.
ab|(cd)* does... so i think something has gone awry somewheres..
just thought you might like to know.
[...] Reanimator - ÑредÑтво “визуализации процеÑÑа Ð¿Ñ€Ð¸Ð¼ÐµÐ½ÐµÐ½Ð¸Ñ Ñ€ÐµÐ³ÑƒÐ»Ñрного Ð²Ñ‹Ñ€Ð°Ð¶ÐµÐ½Ð¸Ñ Ðº текÑту”. Ðто даже круче чем Regulator, опиÑанный ЗубинÑким в одном из недавних номеров КО. [...]
[...] Oliver Steele posted a link to his regular expression visualizer on his blog. You definitely have to try playing around with it! This is just great! [...]
This is great! I remember drawing a lot of this state diagrams by hand when studying regexps at uni. This sort of thing would have helped me to grasp the concepts a lot quicker.
Never mind, wrong reg. ex. syntax.
Also the example state machine incorrectly accepts "a" as a valid string for the regular expression "a*b"
You're state machine example for a*b|b*a doesn't quite work right. It fails on the example string "abab".
The problem is that there are no links from the final state back to a non-final state.
Fun to use. Are you making the code available?
Amazing tool!!
Suggestions for improvement:
- Scroll bars! I can't see the full graphs for larger expressions.
- Support for {-quantifiers
- Support for look ahead assertions
Example:
9(?!11)\d{2}
matches: 9?!110{2}
rather then: any three digit number starting with 9, except 911
Thank you for the cool tools!
This is a great tool.
Actually seeing how the token moves through the states is such an intuitive way of thinking about regexps. Will you keep it up and available for use? Does it require much bandwidth to use?
Hmmm... It doesn't seem to handle exponential blowup very well. Try these regexps:
.*a.
.*a..
.*a...
.*a....
[...] The blog entry explains how the reAnimator is built and what tools it uses. I wasn’t overly surprised to see Graphviz in the mix. Command line tools often play well with others. [...]
Nice thing. See also jflap: http://www.jflap.org/
That is very cool, a great work!!
Thanks
How about writing a Relax NG visualizer, Oliver? I'd love to see a visual animation of James Clark's derivitive algorithm. It lazily generates the FSA on demand, as it validates XML, so it can efficient support interleaving (which would otherwise demand a huge FSA to cover all possible cases).
-Don
Very nice! If FSAs come up in class, this will be an excellent teaching aid. Thanks!
Just FYI, there is an (annoyingly intermittent) minor bug (at least when viewed with Safari 2.0.3 on Mac OS X 10.4.4) in the handling of at least the first and last example expressions. If the very first character you type does not satisfy the expression, deleting it will often cause the head node to flash blue, but immediately return to white, after which the graph ceases to display any colors or state changes. I initially thought this was 100% reproducible, but have now seen the graph recover from this on occasion.
This is way cool!
There are some weird things about the input field -- I haven't been able to get it to accept the ";" character. The pattern input field happily accepts the ";" and the graph of the state machines clearly displays it. Is there some trick to getting the input field to stop mapping the ";" to ":" ? every which way I type in a ";" it changes it to ":"
Thanks,
Robert
Very nice!
I just posted the URL to the SpamAssassin users list. hopefully there won't be any really messy regexps ;)
FYI, the most recent PEP to mention FastCGI is PEP 333 (WSGI), not 222. There are now several FastCGI-supporting implementations of the WSGI API described in PEP 333, not to mention implementations of that same API for various other things including CGI, mod_python, Twisted, etc. Since they all have the same API, you can focus your evaluation on performance, documentation, or other characteristics without having to try different APIs.
Thanks, alpheccar! You're exactly right about the disk animation. (Wordpress keeps cutting the end of my comment off. I'll answer this in a separate post.)
Wow! Totally cool
[...] Però volevo segnalare questo tool bellissimo che visualizza il funzionamento di un semplice sistema di regexp (forse un po’ troppo semplice, ma è fico). [...]
It is gorgeous !
I wonder how you did the animation of the disks along the bezier curves. Do you just have a parametric equation for the curve and you just animate the parameter in OpenLaszlo ?