Aargh! 29

Posted by Oliver on December 24, 2005

“Aargh!” But how do you spell it?

(Click here to skip straight to the visualization.)

In the late nineties, I tried using internet search as a spelling corrector. (I think I was using AltaVista at the time. It was the latest and greatest search engine, supplanting — was it Lycos?)

At the time, for the words I tried, there were about two orders of magnitude between a misspelling and the correct word. A spelling variant, such as “color” and “colour”, were typically less than one order of magnitude.

In 2002 I used Google to figure out the most common spelling for “closable”, for use in the OpenLaszlo API. It had been “closeable”; why use a spelling that most people would guess wrong the first time, I figured. [Update: This paragraph originally said the word was "resizeable", which is a straightforward misspelling.]

Here’s what this looks like today. First, a common misspelling:

compatible170M
compatable2M1.3%

And a couple of spelling variants:

closable137K
closeable101K73%
sizable8.3M
sizeable6.8M81%

(The percentage is the ratio of the page count to the page count of the most common variant, which is the form in bold above it.)

Some other misspellings:

commit73.9M
comit0.8M1%
resizable1.74M
resizeable0.18M10%
misspell466K
mispell55K12%

And some other acceptable variants:

color434M
colour63.0M16%
gray125M
grey73M59%
judgment77M
judgement24M32%

(What’s the difference between an acceptable variant, and a misspelling? An interesting topic for another posting. Maybe.)

What got me thinking about this again, was, of all things, thinking about how to spell “aargh!” One ‘a’, two, three…? And how many ‘r’s?

This is an interesting problem, first, because so many repetition counts are attested. There’s not just “mispelling” (1s) and “misspelling” (2s), but “argh”, “aargh”, “aaargh”, etc. And second, because the space is two-dimensional: not just “argh”, “aargh”, “aaargh”, …, but also “argh”, “arrgh”, “arrrgh”, … — and the product, with “aarrgh”, “aaarrrgh”, etc.

It’s clear that a wide range of spellings are acceptable. What’s the most common?

Without further ado, I created this page to help me find the answer.

Trackbacks

Trackbacks are closed.

Comments

Comments are closed.

  1. Elizabeth Sat, 24 Dec 2005 20:46:56 PST

    Although not a linguist, it seems to me that what is “acceptable” depends on context. Using your example of judgment/judgement, if you are an attorney filing a brief with a court, using the incorrect spelling will suggest to the judge that you are sloppy and inexperienced. Your credibility will be damaged, and your client’s case hurt (as a former clerk, more than you might imagine). In that situation, even if the incorrect spelling is extremely common in every day non-legal usage (and even in legal usage), there is an acceptable spelling and an unacceptable one. Of course, in designing a user interface, I could see that it could make sense to use a common, but not officially approved, spelling.

  2. Tim Sat, 07 Jan 2006 09:30:16 PST

    I love this – this is what the web’s for!

    Some of the ‘acceptable variants’ you show are just the original English spellings. The subsequent variants, which have become more popular, seem to be attempts to simplify the original pronunciation or spelling (eg by dropping the ‘i’ in aluminium, replacing s with z where it’s pronounced that way). Bit half-heartedly done though – otherwise you guyz would be buying low-sodum salt. Heh, sorry – couldn’t rezist!

  3. PeikkokinBlogaa + Aaaargh Sat, 07 Jan 2006 17:43:01 PST

    [...] Aargh! [...]

  4. Paul Sat, 07 Jan 2006 18:07:43 PST

    A critical flaw in your argument is that argh is actually three dimensional.
    argh: 3.4 M
    arghh: 333 K
    arghhh: 351 K
    arghhhh: 168 K
    and so on.

    how bout, say, a6r6gh^6: 166

  5. Jim Sat, 07 Jan 2006 18:15:19 PST

    OOOOOoooooooooooohhhhhh! This is sooooooo cool! When I find stuff like this I scream “Yiippppppeeeee!”

    Jim

  6. dlg Sat, 07 Jan 2006 19:24:47 PST

    I think one possible explanation for (some of) the islands might be that they are visually balanced.

    Look at the string of islands 7,13 – 11,15 – 15,19 – 17,23. In a normal font, the string of ‘a’s and the string of ‘r’s looks about equally long. You need more ‘r’s since the ‘a’ is wider.

    Of course, this doesn’t explain the other islands. And it’s quite sketchy to begin with.

  7. MonkeySaltedNuts Sat, 07 Jan 2006 19:35:56 PST

    most of the uneven distribution I believe comes from duplicate articles.
    For example http://www.google.com/search?q=aaaaaaaaaaaaaaarrrrrrrrrgh
    says: “In order to show you the most relevant results, we have omitted some entries very similar to the 28 already displayed.”

    Thus while there are about 3,740 pages containing this word, only 28 of them are unique.

  8. Curtis Retherford Sat, 07 Jan 2006 23:58:07 PST

    Yes, I think it would be more interesting if you omitted direct quotes of “argh,” as MonkeySaltedNuts said. http://www.google.com/search?q=aaaaaaaaaaaaaaaaarrrrrrrrrrrrrrrrrrrrrrrgh
    is only one actual use of a(17)r(23)gh, but because it was the topic of a frequently replied to message board thread, it counts as 171 uses, and many of these uses are actually due to the way the Mumsnet message board creates a new webpage each time there is an additional post: there is a page for when there were 3547, there was a whole new page for when someone posted an additional message, bringing the total to 3548 posts. Google counts both pages as uses of a(17)r(23)gh, even though they are the exact same page, with one more message attached (a message completely unrelated to “a(17)r(23)gh” and its spellings.

    In other words, a(17)r(23)gh has 171 results not because people tend to spell it that way, but because someone once spelled it that way on a message board that had a badly designed php script.

    If, when you balance your data by removing direct quotes of other people’s “argh”s and there are still high-frequency islands in the outlying “a” and “r” dimensions, that that would be interesting. Right now, however, your data only shows what uses of “argh” happened to have been in a page that was copied to a different server, or referenced, et cetera. Your data therefore has nothing to do with “argh” and everything to do with one small 2d strata of the multi-dimensional array of internet traffic.

    From what I checked of the outer high-frequency arghs, they are all one (or two at most) uses that have been re-referenced. Google omits these results as similar for that very reason, and even the ones Google doesn’t omit are still often identical, or from the exact same source with minor aesthetic changes. So just using the not-similar hits that Google returns would be a good first step, but if you really want to prove anything (or draw conjectures about anything) you need to actually go through each not-similar hit and make sure it is really unique.

  9. Numnuts Sun, 08 Jan 2006 00:14:12 PST

    why not just ask a pirate how to spell it ?

  10. dlg Sun, 08 Jan 2006 08:45:50 PST

    @Nummuts:

    Because Pirates say “Arrrrrr”, and never “Arrrrrrgh”. Unless they die. And then you can’t ask them any more.

  11. mamling Sun, 08 Jan 2006 17:12:34 PST

    The sited page http://osteele.com/words/aargh says “the former has occurs in 520 pages and the latter in almost 10,0000.”
    “has” should be omitted. “almost 10,0000″ should be “almost 10,000″.

  12. The Aargh Page ~ Low Weblog Sun, 08 Jan 2006 20:41:17 PST

    [...] d.d. 8 Jan 2006 om 23:46 in Dump en Komisch. De reacties hebben een RSS 2.0 feed. Reageer zelf of trackback ditbericht. [...]

  13. Lee Tue, 10 Jan 2006 10:22:30 PST

    It think if you could establish the proportion of people who are able to spell the words ‘lose’ and ‘loser’ you would find that they were in a minority.
    Not sure how easy it would be to find out since the commonly used extra-o variants are legitimate words in the correct context.
    If you do find a way perhaps you could set up a network of servers which will disconnect everyone who can’t spell and switch their computer off at the mains, leaving the words ‘you suck at the internet’ burned into their screen.
    Just a thought.

  14. benbenbenbenben Tue, 10 Jan 2006 11:44:12 PST

    argh!

  15. Kimberly Tue, 10 Jan 2006 12:54:53 PST

    Lee! I love this! Could you also – somehow – signal their computers’ circuits to overload and shock them in some manner that will deprive them of the capability of speech? Just a thought.

  16. ylime Tue, 10 Jan 2006 13:19:52 PST

    Paul,

    Incidentally, ‘argh’ is actually four-dimensional:

    arggh – 216,000 (with an arggh.com)
    argggh – 38,500
    arggggh – 11,200
    arg(6)h – 2,480 (with arggggg a suggested replacement)
    arg(7)h – 900
    arg(13)h – 1,200

    a(6)r(6)g(6)h(6) – 549

    Unfortunately, there is no visualization for that one.

  17. Kirsten Tue, 10 Jan 2006 17:42:02 PST

    No matter how long I’ve been online, it never ceases to amaze me how anything I hope might be a unique thought of mine has not only been thought by somebody else, but has been taken to the extreme as well.

    I’ve held a few conversations on the infinite nature of the spelling of “argh”… but never got around to, um, really studying it. Very impressive.

  18. Vardibidian Tue, 10 Jan 2006 22:00:23 PST

    Language Log looked at the various spellings for wassuupp about a year ago; the Aaarrgh chart is a great visualization of that entry.
    On the other hand, you should probably be aware that Google’s counts for anything over a thousand or so are wildly inaccurate; comparing a count of 2,000,000 to a count of 3,000,000 does not necessarily mean the former is actually less frequent than the latter. There’s been a fair amount of work on this, but the nut is that you cannot trust even relative counts for large kghits.

    Thanks,
    -V.

  19. Da Count Wed, 11 Jan 2006 00:28:25 PST

    While I’m not sure I agree with the “official” spelling…freakin’ hilarious. I tip my eye patch to ya matie! Arrrrrrrgh!

    Da Count

  20. Rixn Fri, 13 Jan 2006 05:33:49 PST

    What a wonderful site of important non-sense. Very fun and in a way deep.

  21. Strings are a Domain-Specific Language - Push cx Sat, 27 May 2006 21:11:40 PDT

    [...] This is important code, as pirates hide all over the web. But it’s pretty clunky, we have to import a library and call functions and evaluating responses and save objects… It’d sure be handy if regular expressions were part of the language like in Ruby:   if “Oim a poirate, arrrgh!” =~ /ar+g+h/ then puts “There must be a pirate, I heard someone say ‘#{$&}’.” else puts “No pirates detected.” end [...]

  22. [...] Eso es lo que hizo el autor de Aargh!. Se ve como plasma generado, con curiosas salpicaduras bastante lejos del centro crítico (Aaaaaaaaaaaaaaaaarrrrrrrrrrrrrrrrrrrrrrrgh es extrañamente popular, comparada con sus vecinos – y eso va a empeorar una vez que publique). [...]

  23. Aargh! Hmmm… « Someone and Anyone Sun, 16 Sep 2007 12:58:32 PDT

    [...] Oliver Steele has an interesting post on this topic here. [...]

  24. [...] Looking at the number of pages using competing spellings of common words (think ‘closeable’ vs. ‘closable’) can help gauge what is the most current usage in a more accurate way than is possible with a book. Oliver Steele has some interesting comments on this. [...]

  25. Niche Blueprint Bonus Thu, 08 Jan 2009 01:42:10 PST

    That a lot of misspelling going on. I guess I am one of the culprit too. Lol

  26. Anony McMous Sun, 11 Jan 2009 14:45:41 PST

    I reached this page after searching some version of argh on google.
    I think for your next study you should research how many As or Rs constitute an expression of anger and/or annoyance through to frustration, despair and a feeling of impending catastrophe.

    And FYI I usually like to even out the amount of letters, so like to have quite a few Gs and Hs in there too. Like this:

    AAAAAAAAAAARRRRRRRRRRRGGGGGGGGGHHHHHHHHHHHHHHHHHHH!!!!!!!!

  27. Anony McMous Sun, 11 Jan 2009 14:45:41 PST

    I reached this page after searching some version of argh on google.
    I think for your next study you should research how many As or Rs constitute an expression of anger and/or annoyance through to frustration, despair and a feeling of impending catastrophe.

    And FYI I usually like to even out the amount of letters, so like to have quite a few Gs and Hs in there too. Like this:

    AAAAAAAAAAARRRRRRRRRRRGGGGGGGGGHHHHHHHHHHHHHHHHHHH!!!!!!!!

  28. Anony McMous Sun, 11 Jan 2009 14:45:41 PST

    I reached this page after searching some version of argh on google.
    I think for your next study you should research how many As or Rs constitute an expression of anger and/or annoyance through to frustration, despair and a feeling of impending catastrophe.

    And FYI I usually like to even out the amount of letters, so like to have quite a few Gs and Hs in there too. Like this:

    AAAAAAAAAAARRRRRRRRRRRGGGGGGGGGHHHHHHHHHHHHHHHHHHH!!!!!!!!

  29. Anony McMous Sun, 11 Jan 2009 14:45:41 PST

    I reached this page after searching some version of argh on google.
    I think for your next study you should research how many As or Rs constitute an expression of anger and/or annoyance through to frustration, despair and a feeling of impending catastrophe.

    And FYI I usually like to even out the amount of letters, so like to have quite a few Gs and Hs in there too. Like this:

    AAAAAAAAAAARRRRRRRRRRRGGGGGGGGGHHHHHHHHHHHHHHHHHHH!!!!!!!!