The Biofuel Economy 4

Posted by Oliver on May 09, 2008

(Or, a Cobordism of Carbon.)

Here’s my understanding of this (with the energy cost dip greatly exaggerated).

Oops! It takes a village (down) to raise an (American) child.

Anyone want to make one of these with real numbers?

Supply/Demand Springs 4

Posted by Oliver on February 07, 2008

Full size (pdf, png).

Update: This is what I call an entry-level metaphor — it’s a rough sketch of the relation between the concepts, not a productive metaphor that can be used to reason about them beyond this. It doesn’t support analytic microeconomic analysis, and it’s not even consistent at the level of the supply chain. (For example, the unit cost needs to include the component cost, whereas the illustration shows these as complementary; this is because the metaphor leaves out profit.) Nonetheless, I find it a helpful starting point before going more analytic.

It popped into my head when I was answering my son’s question about what “supply and demand” meant. (He had run across it in a Newsweek article he’s reading in his history class.) It seemed to work for him, so I decided to write it down here. We’re both so used to talking about images in words that I didn’t realize until I made this that I’d never actually put it on paper!

Visualizing Regular Expressions 38

Posted by Oliver on February 19, 2006

Here’s something I’ve wanted for a long time. So I finally built it. reAnimator is a tool for visualizing how regular expression engines use finite-state automata to match regular regular expression patterns against text.

This is intended to demonstrate the implementation of regular expressions. If you want to learn how to use them instead, I recommend these references instead:
* Regular-Expressions.info
* A.M. Kuchling’s Regular Expression HOWTO
* Steve Mansour A Tao of Regular Expressions
* Jeffrey Friedl’s Mastering Regular Expressions
* The Regular Expression Library’s list of resources
* New My own reWork online regular expression workbench.

The User Interface

The screen has these areas:

The Pattern

The “pattern” shows the regular expression. Click on it to set another regular expression to match against.

The Input

The “input” is the string that is matched. As you type into the input string, the color of this string indicates whether it’s a complete match (green), a partial match (blue), or a non-match (red).

The Graphs

There are two graphs, which each display a finite-state automaton (FSA) that corresponds to the regular expression in the “pattern” area. As you type into the “input” area, the graphs also update, to display the state of the match.

A deterministic FSA (a DFA) is like a board game, with a counter that is moved according to the successive letters of the input string. The counter starts at the initial state (the leftmost circle with the arrow from off the board). Each consecutive letter of the input string tells where to move the counter to next. If the counter ends up in a terminal state (a double circle), there was a match.

A nondeterministic FSA is the same except that when there’s more than one legal move, you take them both.

The nondeterministic FSA bears the most direct resemblance to the regular expression. In exchange for this simplicity in its construction, it’s more complex to evaluate: instead of keeping track of just one counter, you have to keep track of a set of them. (This is an instance of the compile-time versus execution-time trade-offs that computer science is rife with.)

If you want to learn more about finite-state automata, the wikipedia entry has some useful information, but this also seems like a good place to plug my father-in-law’s book.

Implementation details

The front end is written in OpenLaszlo, and compiled to Flash. It uses AJAX and JSON to request the FSAs and the graphs.

The back end is written in Python and C. The Python part is my PyFSA library, plus a bit of glue to turn various JavaScript objects and graph files into JSON strings. There’s also a cache so that my shared server doesn’t get quite so stressed even if the site becomes popular. (Having the front end logic on the client should help here too.)

The C portion is the wonderful GraphViz. PyFSA creates a .dot file for each FSA, and uses GraphViz to lay out the graph. The server parses the annotated .dot file into a JavaScript object, and uses JSON to download the resulting graph to the client.

An OpenLaszlo class on the client interprets the JavaScript graph description into a sequence of drawing instructions. It also saves the node positions, so that it can animate against them.

Update: Some of the support libraries are now available as open source. See JSON for OpenLaszlo, JavaScript Beziers, and Canvas with Text. OpenLaszlo and PyFSA were available previously.

Implementation choices

Client vs. server: There’s no reason that this couldn’t be a client-side-only application. I just happened to have PyFSA lying around, and didn’t feel like porting it to JavaScript. My goal for this was one day of implementation time, and I didn’t think porting PyFSA would fit into this. (It ended up taking three days anyway, because I forgot about implementing cubic beziers and JSON, and because I got carried away and added animation.)

CGI vs. FastCGI: I’ve been doing most of my server-side programming with either PHP or FastCGI and Ruby, so that pages don’t take so long to serve. This is the first Python service I’ve deployed since I started using FastCGI, and I was planning to use it. But Python doesn’t include FastCGI in its library, and the fact that there were four different third-party libraries with different APIs, none of them endorsed, and that the latest PEP to mention FastCGI was deferred five years ago, made me unwilling to take on the project of evaluating them.I’m glad to hear that I was completely off base about Python and FastCGI. See Phillip J. Eby’s comment below. I stuck with CGI, which is in the standard library.

OpenLaszlo vs. DHTML: It would be just as easy to draw the graph itself using the canvas class in DHTML. I balked at doing the animation and user interface in DHTML, though. (There’s little touches like laying the graphs out horizontally only if there’s enough room, which were only a few lines of declarative code in OpenLaszlo.) And then it wouldn’t have worked on as many browsers. I decided to wait until OpenLaszlo compiles to DHTML, for a DHTML version of this.

PyFSA vs. …: Higher-performance © implementations of FSA minimization and determinization exist. I went with my own because it’s the only one I know of that has the option of preserving source location information across transformations. I use source location information minimally in the interface, and might add more.

Credits

Thanks to Margaret Minsky and Gary Drescher for commenting on a draft of the application. I used Patrick Logan’s json-py for server-side JSON. The credits for GraphViz are here. Thanks to my former colleages at Laszlo Systems, Inc. for helping create the OpenLaszlo platform. I adapted Philip J. Scheider’s code for subdividing cubic beziers; this is a compact implementation of de Casteljau’s algorithm for Algol-like languages. Thanks to Guido and company for Python. Lastly, since people always ask, I drew the architecture diagram with Omigraffle (gee I wish I got a commission!) — which I like because I’m not much of a designer, and diagrams I draw with it are passable without much work.

Update: I changed the name to reAnimator. (It was reMatch.) Thanks to Apache RewriteRule for letting me do this without breaking the old URLs!

β€œStretch” Languages, or, 28 years of programming 11

Posted by Oliver on February 05, 2006

Recently I reviewed the programming languages I’ve used over the 28 years1 of my programming career. The result is shown in the chart below. (Click on the image to see it full size.)

There are some obvious trends here2. The languages are mostly getting higher level. There are a few “survivors”: languages that I’ve used over the the course of a decade, although discontiguously: C/C++, Common Lisp, and Java. Java has replaced C (except for a stint around 2000 where I went back to low-level graphics programming at Alphamask), and the scripting languages have taken over from Common Lisp — they’re slower, but they’re terse, have better libraries, and are easier to deploy.

It didn’t surprise me that there are so many languages. For one thing, I like languages, and I enjoy learning how to program by learning new ones. For another thing, times changes and programming languages get better. You’d have to be crazy to program in 1970’s BASIC these days. (That’s BASIC, not Visual Basic. The language I cut my teeth on didn’t have functions or structures.) And there just isn’t much call for Z80 assembly any more.

But why so many languages at one time?

The reason is that different languages are good at different things. When I first learned to program, I used BASIC for everything. Then I started using assembly for code that had to be fast, but I stuck with BASIC for doing my homework. (In tenth grade I wrote programs to multiply and invert matrices, so I wouldn’t be bored silly doing it by hand. It took longer, but I had more fun.)

This is what the second chart shows. Again, click on the thumbnail to see the full size version. And beware! — the colors don’t mean the same thing as in the chart above. (Sorry.)

I use four kinds of languages: utility languages, for writing tools and solving problems; application languages, for writing distributable integrated applications; system programming languages, for implementing operating systems and low-level libraries; and a fourth category of languages, which I’ll call “stretch languages”, for learning how to program. (I also dabble with bash for shell programming and R for statistics, but I don’t really think of the way I use them as programming.)

A utility language is useful for solving small problems and for exploring problem domains. It’s what you write your ten-minute program in. If you can use your utility language as your file manipulation language and your build language, all the better. (For a while I did a lot of programming in C and then Java, and I tried to use make and ant for these. Now that I’m using Ruby as my utility language, I’m a lot happier using Rake instead.)

An application language is for building larger, more full-feature programs that other people can use. (This is my own use of the term. Think “desktop application”, or “web application”.) An application language often has requirements for speed, OS integration, and deployability, that a utility language does not.

(”Application” for me used to mean desktop applications. Recently I’ve only written web applications — although I did write a MacOS app a few months ago, in order to learn about the MacOS toolbox. That’s why the “Desktop Application” category in the second chart trails off, to be replaced by languages for writing web applications, and browser-based clients that connect to them.)

Lastly, the point of a stretch language is to teach me new ways to think about programming. This is the opposite of a utility language, although a stretch language can turn into a utility language after I absorb its concepts. A utility language makes it easy to express programs that I have in mind; it gets out of the way, so I only have to think about the problem domain, and not how to program. A stretch language can make it difficult to do even simple things, because I’m still learning the concepts that are necessary to use it idiomatically or, sometimes, to use it at all.

Sometimes a stretch language becomes a utility language or an application language. This was the case with Common Lisp, which took me a while to understand, but for a while was my most productive language.

More often, the concepts that I learn from a stretch language are helpful in using other languages, even if I have to express these concepts by writing the equivalent of macro-expanded code or a VM. For example, I’ve written a continuation-based pattern matcher in Common Lisp and a logic program in Java, even though neither language supports those respective features. Designing the programs as though the features existed made a hard problem tractable.

And finally, learning concepts from one language makes it easier for me to recognize them and start using them when they’re available in a more mainstream language or one with better deployment characteristics. Those of us who were familiar with advise and the CLOS MOP had a leg up on understanding how to use AOP when it came around to Java. The modern round of scripting languages (Python 2.x and Ruby) have MOPs — if you’re familiar with them from Smalltalk, you know how to take advantage of them in Python and Ruby too. And if I ever have to use C++ again, at least I’ll know from Haskell to look for template implementations of combinator libraries.

Here are the languages that I’ve learned from over the years, and what I’ve learned from each one3. Some of the languages that aren’t in my list would be perfectly respectable stretch languages for someone else — I just happened to be familiar with their concepts by the time I got to them. For example, Python is probably a fine way to learn about OO (and my son learned a lot of OO concepts from Logo Microworlds), and I could have learned metaobject programming from Ruby if I hadn’t already learned it from Smalltalk and Common Lisp.

  • BASIC: basic programming
  • Z80: ADTs; recursion
  • Prolog: logic variables; abstracting over control flow
  • CLU: exception handling; coroutines; parameterized data types
  • Smalltalk: OOP
  • Common Lisp: functional objects; meta-object programming
  • Java: concurrency
  • Eiffel: design by contract
  • Haskell: lazy evaluation; type classes
  • LZX (OpenLaszlo): constraint-based programming

Right now my stretch language is Haskell. When I write in Haskell, I feel like a beginning programmer again. I can’t use it when I’m in a hurry, and I don’t use it when I’m more interested in the problem domain than in the craft of programming. Reading Haskell is like reading poetry (I’m very slow at it, and I can’t skim because the new-concept density is very high), and writing Haskell is like writing poetry (something else that takes me forever). As opposed to Python and Ruby, which are more prosaic, and Enterprise Java, which is more like a tax form.


1 I first learned program in seventh grade, from the TRS-80 Level I Basic reference. That makes me both (1) almost 30, and (2) innumerate.

2 In this entire essay, I’m quantifying over the languages I use and the way I use them. YMMV.

3 This list includes more languages than the charts do because it includes languages that I’ve never used commercially. I’ve only ever used Prolog and CLU for college projects, for example, but I learned a lot from them anyway.

Visualizing Subversion Project Activity 3

Posted by Oliver on January 31, 2006

Last week I wrote a couple of tools to keep track of subversion checkins:

The Subversion Log Viewer is a master-detail list of recent subversion revisions. It’s based on the OpenLaszlo contactlist example. The nicest feature is really an afterthought: at the last moment, I added faces for authors; I think this makes projects a lot friendlier. Right now it only adds the faces to the OpenLaszlo log; let me know if you’re interested in using this for your own project, and I’ll make a public API for adding faces to a repository.

The Subversion iCalendar Gateway transcodes subversion logs into iCalendar files, that you can subscribe to with Apple iCal or Mozilla Sunbird. I find it useful for a projects that I want to check in on occasionally. Unlike an RSS feed, it gives you a sense of the activity level and the change frequency, at least if you’re a spatial person like me.

Both of these point at the OpenLaszlo log by default, but they’ve got a UI for putting in any subversion repository (http: or svn: protocol only), and generating a permalink for that repository.

One caveat: It takes a long time to request a complete subversion log, so the iCalendar gateway only requests the first 500 revisions the first time you (or anyone) view a given calendar, and then the next time anyone or refreshes the same calendar, it catches up to the present 500 revisions at time.

Expialidocio.us 1

Posted by Oliver on January 08, 2006

Expialidocio.us is a tool for visualizing your del.icio.us posting activity. It displays a graph of your posting activity over time. You can select a timespan from this graph, and it will show you a tag cloud weighted by just those dates.

Expialidocio.us was inspired by a posting by Jon Udell. Coming full circle, Udell has posted since posted about this application. Since then, I’ve published the sources.

Aargh! 29

Posted by Oliver on December 24, 2005

“Aargh!” But how do you spell it?

(Click here to skip straight to the visualization.)

In the late nineties, I tried using internet search as a spelling corrector. (I think I was using AltaVista at the time. It was the latest and greatest search engine, supplanting — was it Lycos?)

At the time, for the words I tried, there were about two orders of magnitude between a misspelling and the correct word. A spelling variant, such as “color” and “colour”, were typically less than one order of magnitude.

In 2002 I used Google to figure out the most common spelling for “closable”, for use in the OpenLaszlo API. It had been “closeable”; why use a spelling that most people would guess wrong the first time, I figured. [Update: This paragraph originally said the word was "resizeable", which is a straightforward misspelling.]

Here’s what this looks like today. First, a common misspelling:

compatible170M
compatable2M1.3%

And a couple of spelling variants:

closable137K
closeable101K73%
sizable8.3M
sizeable6.8M81%

(The percentage is the ratio of the page count to the page count of the most common variant, which is the form in bold above it.)

Some other misspellings:

commit73.9M
comit0.8M1%
resizable1.74M
resizeable0.18M10%
misspell466K
mispell55K12%

And some other acceptable variants:

color434M
colour63.0M16%
gray125M
grey73M59%
judgment77M
judgement24M32%

(What’s the difference between an acceptable variant, and a misspelling? An interesting topic for another posting. Maybe.)

What got me thinking about this again, was, of all things, thinking about how to spell “aargh!” One ‘a’, two, three…? And how many ‘r’s?

This is an interesting problem, first, because so many repetition counts are attested. There’s not just “mispelling” (1s) and “misspelling” (2s), but “argh”, “aargh”, “aaargh”, etc. And second, because the space is two-dimensional: not just “argh”, “aargh”, “aaargh”, …, but also “argh”, “arrgh”, “arrrgh”, … — and the product, with “aarrgh”, “aaarrrgh”, etc.

It’s clear that a wide range of spellings are acceptable. What’s the most common?

Without further ado, I created this page to help me find the answer.

PackageMapper 3

Posted by Oliver on December 18, 2005

PackageMapper shows you a map of your FedEx, UPS, USPS package routes. Enter a carrier and a tracking number to see your package’s progress plotted on the map. Sign in to enter a list of packages and see their current locations on a table or map.

This is an itch I’ve wanted to scratch for about a year now: being able to see where all my packages are on a map. On Thanksgiving morning I discovered SimpleTracking.com, which gives you an RSS feed for a UPS or USPS package. Over Thanksgiving weekend I added FedEx and mashed it up with Google Maps, and this is the result.

I can’t guarantee that this will stay up if it starts to get much traffic, but in the meantime I’m finding it useful myself, so, enjoy.

Second grade squares 3

Posted by Oliver on December 08, 2005

I posed a second-grader the question of what nine squared was. She reasoned that ten squared is 100, and nine times ten is ten less then that, and nine times nine is nine less than that, so the answer is 81. Then I asked her what eight squared was, and she was flummoxed. She saw that it was a similar problem to the one she’d just solved, but wasn’t sure how to apply the analogy.

Here are the pictures that showed her how to figure out the answer. We drew the location of the squares on a multiplication grid:

and I introduced the idea of a “solution structure”. A solution structure is a graphical representation of the steps of a solution. This is the section that represents the relation between 92 and 102.

Two problems can have different numbers but the same structure. This is the problem structure for both problems shown together:

And then she got it.

But this leads to the arithmetic problem of 81 minus 17, which was harder, for this seven-year-old, than 100 minus 10 minus 9. There are several ways to compute the difference betweeen 81 and 17. The hard ways are to count down by 17, or to do two-digit subtraction and carry the one. The easy way is to adjust the problem to 84 minus 20, and count down two tens to 64. But how can you show that 81-17 = 84-20?

Here’s what didn’t work: explain that adding three to both the minuend and the subtrahend leaves the difference unchanged. Seven was too early for something this symbolic. We used a number line instead:

The difference is the blue bar. Moving it on the number line moves its ends by the same amount, without changing the length of the bar itself. Conversely, you can move both ends by the same amount without changing the length of the line between them.

Problem solved.

Visualizing Basic Algebra 9

Posted by Oliver on December 05, 2004

Last weekend, I shared some interesting properties of numbers with my kids.

The great thing about explaining something to a non-expert is that you have to actually understand the topic. (This is why making teaching universities and research universities the same actually makes sense.) If you hide behind a formalism, the explanation won’t work. Usually, this means that you didn’t understand why the formalism worked either.

This is why I thought “why are far away things smaller?” was such a great question. “Similar triangles” answers are brittle, and if a tiny error makes far away things come out bigger instead, you won’t catch yourself until you got to the end of the proof.

Some of the interesting properties of numbers are: that (n + 1)×(n-1)=n2-1: that the perfect squares (0,1,4,9,…) go up by successive odd numbers (1,3,5,…); and that the area of a triangular number (1+2+…+n) has a closed form. These properties are easy to show using algebra, but to a child, the algebra doesn’t make sense.

Multiplication and division are grounded in visuospatial concepts, which is why these number theoretical results are easy to understand. What would a proof that stayed grounded in visuospatial concepts look like?

Properties of Addition

Addition is associative:

and commutative:

Multiplication is Commutative

The commutative law is that a×b=b×a. If a rectangle a high and b wide represents a×b , this is the same as saying that rotation is area-preserving:

Or, in the case of three factors, volume preserving:

Distributive Law

Associativity

Difference of Squares

The perfect squares are 0, 1, 4, 9, 16, …. The differences between successive squares are the odd numbers: 1, 3, 5, 7, 9, ….

In algebra, this is because the n2 terms in (n+1)2=n2+2n+1 and in n2 cancel, leaving 2n+1. Here’s why this is true in geometry:

Product of Alternates

1×1=10×2=0
2×2=41×3=3
3×3=92×4=8
4×4=163×5=15
5×5=254×6=24

The product of two numbers adjacent to a median, is one less than the square of the median. For example, 52=25, and 6×4=25-1=24. Algebraically, this is because (n+1)(n-1)=n2-1. Geometrically:

Triangle Numbers

In closed form, the sum of the numbers from 1 to n is n(n+1)/2. One algebraic proof for this uses induction: substitute each of k+1 and k for n, and compute the difference.

There are two geometric proofs. The first requires two cases: one for even n, and one for odd. The second is simpler, but requires a bit of algebra: you need to draw two triangles, and compute twice the area; at the end, you divide by two.

First, the two-case proof, and some terminology. Cut a line at an integer mark as nearly in half as you can. The big piece is the superhalf. The little piece is the subhalf.

For an even number such as 6, the subhalf and superhalf are the same as the half (3). For an odd number such as 5, the subhalf (2) and superhalf (3) differ by one. The subhalf and superhalf of any integer sum to that integer: 3+3=6, and 2+3=5.

Now the first proof. If n is even, take the top half of the triangle and fold it over onto the bottom half. Each row has the same length, because the base increases by one going up, and the flipped top increases by one going down. That length is the width of the original base, plus one from the tip. Since there are n/2 rows, the triangle has the area of a rectangle that is n/2 high and n+1 wide.

For odd n, take the top subhalf of the triangle and fold it onto the bottom superhalf, extending each row except the bottom. The new rectangle is (n+1)/2 high (the superhalf), but only n wide. Multiply these out, and out the same as the first case.

A proof that doesn’t require two cases is to use two triangles. Consider the two triangles, which collectively have the area n×(n + 1):

This is the same, in algebraic terms, of taking the sum [1+n]+[2+(n-1)]+…+[(n-1)+2]+[n+1]=n×(n+1).

Next week: grounded fractions.

Addendum

Hans Martin Kern suggested calling this “Visualizing Basic Algebra.” That’s a much better name, and I’ve changed the title.