Fortunately 1
Jim Grandy wrote:
From: jgrandy
Subject: stupid Google game
Date: January 7, 2006 6:17:58 PM EST
Google for "unfortunately, yournamehere":
Lots of fun hits for "unfortunately, jim":
- unfortunately Jim’s orange dry suit made him look like a carrot
- Unfortunately Jim is no longer with us as he died of a brain tumor in 1993.
- Unfortunately, Jim did not respond. He disbelieved that it was an angel.
- Unfortunately, Jim is only one person with a limited amount of time available to
help Jane find answers to her questions.
I’ve turned this into a web page here.
I prototyped it with a screen scraper for Google, but I didn’t want to deploy a screen scraper.
Fortunately, Google has a Search API.
Unfortunately, Google’s API uses SOAP.
Fortunately, Ruby has a SOAP library.
Unfortunately, the Ruby SOAP library doesn’t work on Dreamhost.
Fortunately, the Yahoo Web Search API uses REST.
Unfortunately, Yahoo’s summaries don’t include enough right-hand context, so it’s harder to extract decent sentences from them.
Maybe I’ll go back to screen-scraping after all.
Update: Jim tells me he got the idea from Jorg Brown at Google.
Aargh! 29
“Aargh!” But how do you spell it?
(Click here to skip straight to the visualization.)
In the late nineties, I tried using internet search as a spelling corrector. (I think I was using AltaVista at the time. It was the latest and greatest search engine, supplanting — was it Lycos?)
At the time, for the words I tried, there were about two orders of magnitude between a misspelling and the correct word. A spelling variant, such as “color” and “colour”, were typically less than one order of magnitude.
In 2002 I used Google to figure out the most common spelling for “closable”, for use in the OpenLaszlo API. It had been “closeable”; why use a spelling that most people would guess wrong the first time, I figured. [Update: This paragraph originally said the word was "resizeable", which is a straightforward misspelling.]
Here’s what this looks like today. First, a common misspelling:
| compatible | 170M | |
| compatable | 2M | 1.3% |
And a couple of spelling variants:
| closable | 137K | |
| closeable | 101K | 73% |
| sizable | 8.3M | |
| sizeable | 6.8M | 81% |
(The percentage is the ratio of the page count to the page count of the most common variant, which is the form in bold above it.)
Some other misspellings:
| commit | 73.9M | |
| comit | 0.8M | 1% |
| resizable | 1.74M | |
| resizeable | 0.18M | 10% |
| misspell | 466K | |
| mispell | 55K | 12% |
And some other acceptable variants:
| color | 434M | |
| colour | 63.0M | 16% |
| gray | 125M | |
| grey | 73M | 59% |
| judgment | 77M | |
| judgement | 24M | 32% |
(What’s the difference between an acceptable variant, and a misspelling? An interesting topic for another posting. Maybe.)
What got me thinking about this again, was, of all things, thinking about how to spell “aargh!” One ‘a’, two, three…? And how many ‘r’s?
This is an interesting problem, first, because so many repetition counts are attested. There’s not just “mispelling” (1s) and “misspelling” (2s), but “argh”, “aargh”, “aaargh”, etc. And second, because the space is two-dimensional: not just “argh”, “aargh”, “aaargh”, …, but also “argh”, “arrgh”, “arrrgh”, … — and the product, with “aarrgh”, “aaarrrgh”, etc.
It’s clear that a wide range of spellings are acceptable. What’s the most common?
Without further ado, I created this page to help me find the answer.
There They’re
(For Miles.)
| _Possessives | _Places | _Contractions | _Verbs |
| our | . | . | are |
| . | here | . | hear |
| . | where | we’re | were |
| their | there | they’re | . |
| its | . | it’s | . |
| your | . | you’re | . |
| his, her, my | . | . | . |
Read across the rows to see words that are easily confused with each other. Read down the columns to see the patterns.
Things to note:
* All of the contractions have apostrophes ‘.
* Nothing in this table that isn’t a contraction has an apostrophe. “Its roof is red” doesn’t have an apostrophe even though “its” is a possessive. This makes “its” different from noun possessives (”The house’s roof is red”), but the same as “his”, “her”, and “their” (”Her hair is red”).
* The place words all have “here” in them: “here“, “there“, and “where“.
* Nothing that isn’t a place name has “here” in it: “Their hands are wet”, “They’re leaving”.
If you notice you’re using a word in this table, think about which kind of word it is (Possessive, Place, Contraction, or Verb), or which column it fits into. If you’re using “its”, would “his” also work? Or would you use a verb such as “we’re” — in which case you should be using “it’s” instead.
Once again, I’m sorry about English spelling — I would have done it differently. It’s not my fault.
PyWordNet 2.0 1
After a spate of requests and a contribution from Wei-Hao Lin, I’ve finally gotten around to releasing an update of PyWordNet that works with the WordNet 2.0 database files. (WordNet 2.0 adds lexical links for derivational morphology and topical classification. This broke the PyWordNet 1.4 dictionary file parser.)
Ingrediants and Isochems
Now that product ingredient lists have become marketing bullets, here are two terms that I’ve found useful for thinking about them:
- Ingrediant (with an ‘a’)
- An item added to an ingredient list purely for its marketing effect, as opposed to any material qualities that it adds to the product. For example, the vitamins that are added to shampoos. (By analogy with surfactant, flavorant, colorant.)
- Isochem
- A more appealing synonym for a common ingredient. For example, “evaported cane juice” for “sugar”, or “hydrolyzed vegetable protein” for “monosodium glutamate (MSG)”.
Semiotics of Weddings
A wedding is a coercion operator from a state, to an event that marks the beginning of the state. (The English word “marriage” denotes either.) The advantage of an event over a state, is that it can be used as a reference for other events, symbolizing happiness, community, fertility, ,etc., by placing these other events at the same time and location. This is analogous to time binding in linguistics.
Nobody’s On That Train 1
I used to be an unrepetant nineteenth century grammarian. I never used a preposition to end a sentence; I took care to never never to split an infinitive; I knew “who” from “whom”; and I never, ever, used “they” for “he or she”.
Continue reading…



