Aargh!

Posted by oliver
Sun, 12/25/2005 - 01:13

“Aargh!” But how do you spell it?


(Click here to skip straight to the visualization.)

In the late nineties, I tried using internet search as a spelling corrector. (I think I was using AltaVista at the time. It was the latest and greatest search engine, supplanting —- was it Lycos?)

At the time, for the words I tried, there were about two orders of magnitude between a misspelling and the correct word. A spelling variant, such as “color” and “colour”, were typically less than one order of magnitude.

In 2002 I used Google to figure out the most common spelling for “closable”, for use in the OpenLaszlo API. It had been “closeable”; why use a spelling that most people would guess wrong the first time, I figured. [Update: This paragraph originally said the word was “resizeable”, which is a straightforward misspelling.]

Here’s what this looks like today. First, a common misspelling:

compatible 170M
compatable 2M 1.3%

And a couple of spelling variants:

closable 137K
closeable 101K 73%
sizable 8.3M
sizeable 6.8M 81%

(The percentage is the ratio of the page count to the page count of the most common variant, which is the form in bold above it.)

Some other misspellings:

commit 73.9M
comit 0.8M 1%
resizable 1.74M
resizeable 0.18M 10%
misspell 466K
mispell 55K 12%

And some other acceptable variants:

color 434M
colour 63.0M 16%
gray 125M
grey 73M 59%
judgment 77M
judgement 24M 32%

(What’s the difference between an acceptable variant, and a misspelling? An interesting topic for another posting. Maybe.)

What got me thinking about this again, was, of all things, thinking about how to spell “aargh!” One ‘a’, two, three…? And how many ‘r’s?

This is an interesting problem, first, because so many repetition counts are attested. There’s not just “mispelling” (1s) and “misspelling” (2s), but “argh”, “aargh”, “aaargh”, etc. And second, because the space is two-dimensional: not just “argh”, “aargh”, “aaargh”, ..., but also “argh”, “arrgh”, “arrrgh”, ... —- and the product, with “aarrgh”, “aaarrrgh”, etc.

It’s clear that a wide range of spellings are acceptable. What’s the most common?

Without further ado, I created this page to help me find the answer.

Trackback URL for this post:

http://osteele.com/trackback/103

Comments

[...] Looking at the number of pages using competing spellings of common words (think ‘closeable’ vs. ‘closable’) can help gauge what is the most current usage in a more accurate way than is possible with a book. Oliver Steele has some interesting comments on this. [...]

Aargh! Hmmm… « Someone and Anyone - Sun, 09/16/2007 - 16:58

[...] Oliver Steele has an interesting post on this topic here. [...]

[...] Eso es lo que hizo el autor de Aargh!. Se ve como plasma generado, con curiosas salpicaduras bastante lejos del centro crítico (Aaaaaaaaaaaaaaaaarrrrrrrrrrrrrrrrrrrrrrrgh es extrañamente popular, comparada con sus vecinos - y eso va a empeorar una vez que publique). [...]

[...] This is important code, as pirates hide all over the web. But it’s pretty clunky, we have to import a library and call functions and evaluating responses and save objects… It’d sure be handy if regular expressions were part of the language like in Ruby:   if “Oim a poirate, arrrgh!” =~ /ar+g+h/ then puts “There must be a pirate, I heard someone say ‘#{$&}’.” else puts “No pirates detected.” end [...]

Rixn - Fri, 01/13/2006 - 10:33

What a wonderful site of important non-sense. Very fun and in a way deep.

Da Count - Wed, 01/11/2006 - 05:28

While I'm not sure I agree with the "official" spelling...freakin' hilarious. I tip my eye patch to ya matie! Arrrrrrrgh!

Da Count

Vardibidian - Wed, 01/11/2006 - 03:00

Language Log looked at the various spellings for wassuupp about a year ago; the Aaarrgh chart is a great visualization of that entry.
On the other hand, you should probably be aware that Google's counts for anything over a thousand or so are wildly inaccurate; comparing a count of 2,000,000 to a count of 3,000,000 does not necessarily mean the former is actually less frequent than the latter. There's been a fair amount of work on this, but the nut is that you cannot trust even relative counts for large kghits.

Thanks,
-V.

Kirsten - Tue, 01/10/2006 - 22:42

No matter how long I've been online, it never ceases to amaze me how anything I hope might be a unique thought of mine has not only been thought by somebody else, but has been taken to the extreme as well.

I've held a few conversations on the infinite nature of the spelling of "argh"... but never got around to, um, really studying it. Very impressive.

ylime - Tue, 01/10/2006 - 18:19

Paul,

Incidentally, 'argh' is actually four-dimensional:

arggh - 216,000 (with an arggh.com)
argggh - 38,500
arggggh - 11,200
arg(6)h - 2,480 (with arggggg a suggested replacement)
arg(7)h - 900
arg(13)h - 1,200

a(6)r(6)g(6)h(6) - 549

Unfortunately, there is no visualization for that one.

Kimberly - Tue, 01/10/2006 - 17:54

Lee! I love this! Could you also - somehow - signal their computers' circuits to overload and shock them in some manner that will deprive them of the capability of speech? Just a thought.

benbenbenbenben - Tue, 01/10/2006 - 16:44

argh!

Lee - Tue, 01/10/2006 - 15:22

It think if you could establish the proportion of people who are able to spell the words 'lose' and 'loser' you would find that they were in a minority.
Not sure how easy it would be to find out since the commonly used extra-o variants are legitimate words in the correct context.
If you do find a way perhaps you could set up a network of servers which will disconnect everyone who can't spell and switch their computer off at the mains, leaving the words 'you suck at the internet' burned into their screen.
Just a thought.

The Aargh Page ~ Low Weblog - Mon, 01/09/2006 - 01:41

[...] d.d. 8 Jan 2006 om 23:46 in Dump en Komisch. De reacties hebben een RSS 2.0 feed. Reageer zelf of trackback ditbericht. [...]

mamling - Sun, 01/08/2006 - 22:12

The sited page http://osteele.com/words/aargh says "the former has occurs in 520 pages and the latter in almost 10,0000."
"has" should be omitted. "almost 10,0000" should be "almost 10,000".

dlg - Sun, 01/08/2006 - 13:45

@Nummuts:

Because Pirates say "Arrrrrr", and never "Arrrrrrgh". Unless they die. And then you can't ask them any more.

Numnuts - Sun, 01/08/2006 - 05:14

why not just ask a pirate how to spell it ?

Curtis Retherford - Sun, 01/08/2006 - 04:58

Yes, I think it would be more interesting if you omitted direct quotes of "argh," as MonkeySaltedNuts said. http://www.google.com/search?q=aaaaaaaaaaaaaaaaarrrrrrrrrrrrrrrrrrrrrrrg...
is only one actual use of a(17)r(23)gh, but because it was the topic of a frequently replied to message board thread, it counts as 171 uses, and many of these uses are actually due to the way the Mumsnet message board creates a new webpage each time there is an additional post: there is a page for when there were 3547, there was a whole new page for when someone posted an additional message, bringing the total to 3548 posts. Google counts both pages as uses of a(17)r(23)gh, even though they are the exact same page, with one more message attached (a message completely unrelated to "a(17)r(23)gh" and its spellings.

In other words, a(17)r(23)gh has 171 results not because people tend to spell it that way, but because someone once spelled it that way on a message board that had a badly designed php script.

If, when you balance your data by removing direct quotes of other people's "argh"s and there are still high-frequency islands in the outlying "a" and "r" dimensions, that that would be interesting. Right now, however, your data only shows what uses of "argh" happened to have been in a page that was copied to a different server, or referenced, et cetera. Your data therefore has nothing to do with "argh" and everything to do with one small 2d strata of the multi-dimensional array of internet traffic.

From what I checked of the outer high-frequency arghs, they are all one (or two at most) uses that have been re-referenced. Google omits these results as similar for that very reason, and even the ones Google doesn't omit are still often identical, or from the exact same source with minor aesthetic changes. So just using the not-similar hits that Google returns would be a good first step, but if you really want to prove anything (or draw conjectures about anything) you need to actually go through each not-similar hit and make sure it is really unique.

MonkeySaltedNuts - Sun, 01/08/2006 - 00:35

most of the uneven distribution I believe comes from duplicate articles.
For example http://www.google.com/search?q=aaaaaaaaaaaaaaarrrrrrrrrgh
says: "In order to show you the most relevant results, we have omitted some entries very similar to the 28 already displayed."

Thus while there are about 3,740 pages containing this word, only 28 of them are unique.

dlg - Sun, 01/08/2006 - 00:24

I think one possible explanation for (some of) the islands might be that they are visually balanced.

Look at the string of islands 7,13 - 11,15 - 15,19 - 17,23. In a normal font, the string of 'a's and the string of 'r's looks about equally long. You need more 'r's since the 'a' is wider.

Of course, this doesn't explain the other islands. And it's quite sketchy to begin with.

Jim - Sat, 01/07/2006 - 23:15

OOOOOoooooooooooohhhhhh! This is sooooooo cool! When I find stuff like this I scream "Yiippppppeeeee!"

Jim

Paul - Sat, 01/07/2006 - 23:07

A critical flaw in your argument is that argh is actually three dimensional.
argh: 3.4 M
arghh: 333 K
arghhh: 351 K
arghhhh: 168 K
and so on.

how bout, say, a^6r^6gh^6: 166

PeikkokinBlogaa + Aaaargh - Sat, 01/07/2006 - 22:43

[...] Aargh! [...]

Tim - Sat, 01/07/2006 - 14:30

I love this - this is what the web's for!

Some of the 'acceptable variants' you show are just the original English spellings. The subsequent variants, which have become more popular, seem to be attempts to simplify the original pronunciation or spelling (eg by dropping the 'i' in aluminium, replacing s with z where it's pronounced that way). Bit half-heartedly done though - otherwise you guyz would be buying low-sodum salt. Heh, sorry - couldn't rezist!

Elizabeth - Sun, 12/25/2005 - 01:46

Although not a linguist, it seems to me that what is "acceptable" depends on context. Using your example of judgment/judgement, if you are an attorney filing a brief with a court, using the incorrect spelling will suggest to the judge that you are sloppy and inexperienced. Your credibility will be damaged, and your client's case hurt (as a former clerk, more than you might imagine). In that situation, even if the incorrect spelling is extremely common in every day non-legal usage (and even in legal usage), there is an acceptable spelling and an unacceptable one. Of course, in designing a user interface, I could see that it could make sense to use a common, but not officially approved, spelling.