Tuesday, February 26, 2013

Lazy Red Foxes

If you've ever tested a mechanical typewriter you know this sentence which contains every letter of the English alphabet:

The quick red fox jumps over the lazy brown dog.

Although the distribution of letters differs somewhat from the language at large they do not appear with equal probability either. Thus the information entropy of the letters is less than the maximum one would expect and this suggests that the sentence may not be a random agglomeration.

Looking a little deeper we can see that there is a certain amount of mutual information in letter sequences, i.e., 'h' is always followed by 'e' in this tiny sample.

It also parses into convenient words when broken at the spaces, and these words are all found in the dictionary. Even more surprisingly the word order matches the language's Syntax perfectly:

Noun-phrase Verb-phrase Object-phrase

Maybe it means something? Hmm, let's just see... Each phrase seems to make sense. Based on an exhaustive search of the corpus of written knowledge, adjectives modify nouns in an appropriate manner and the verb phrase stands up to the same scrutiny. Everything is Semantically copacetic and thus we have a candidate for a meaningful utterance.

Of course in amongst all the rule fitting -- we know it when we see it -- the sentence actually does mean something. It communicates the description of an event that we can easily picture occurring.

Now lets just mess things up a bit. There are 10! (>36 million) possible sequences of these words (actually not quite because the "the" appears twice but I'm not smart enough to figure out that probability). We can reject most of these sequences since only a few remain syntactically and semantically proper. From the reduced set of candidates for meaningfulnesses, consider:

The quick brown fox jumps over the lazy red dog.

Still makes good sense. Different colored canines are well within the scope of meaningful utterance. However, how about:

The lazy red dog jumps over the quick brown fox.

This makes semantic sense but lacks plausibility. Because we seldom experience a lazy thing getting one over on a quick one, it is hermeneutically surprising. (I would use semiotically here but it is over-over-loaded with other meanings and I've always liked the sound of hermeneutic. I'm also taking the surprise factor from explanations of information entropy that we started with -- low probability and/or completely random occurrences are more surprising to behold because we expect them less.)

Therefore I propose that Hermeneutic Surprise (HS) be added to the set of Information Measures. It is probably one of those things that peaks in the middle of its range. Low HS is meaningful but of little interest: "Apples are red." And high HS may be poetic but meaningless in experience. E.g. the example from my Another Chinese Room post: "The green bunny was elected president of the atomic bomb senate."

The trouble is going to be figuring out how to measure Hermeneutic Surprise...because right now we just know it when we see it...

No comments:

Post a Comment