Ich bin un Bricoleur: AI

Showing posts with label AI. Show all posts

Monday, February 4, 2019

Fixed Point Failure

A Fixed Point math library and Neural Net demo
for the Arduino...

Or: Multiple cascading failures all in one place!

Last year I found a simple self-contained Artificial Neural Net demo written for the Arduino at: robotics.hobbizine.com/arduinoann.html and spent a goodly amount of time futzing around with it. I now, almost, understand HOW they work, but have only a glimmering of insight into WHY. The demo does something really silly: The inputs are an array of bit patterns used to drive a 7-segment numeric display and the outputs are the binary bit pattern for that digit (basically the reverse of a binary to 7-segment display driver). Someone not totally under the influence of ANNs could do this with a simple 10 byte lookup table. But that is not us. On the plus side it _learns_ how to do the decoding by torturous example, so we don't have to bother our tiny brains with the task of designing the lookup table.

HOW ANNs work on the Arduino is:

a) Extremely slowly, because they use a metric shit-ton of floating point arithmetic; and,
b) Not very interestingly, because each weight takes up 4 bytes of RAM and there is only about 1Kb kicking around after the locals and stack and whatever else is accounted for -- the simple demo program illustrated here uses about half of that 1K just for the forwardProp() node-weights and then the backProp() demo uses the other half for temporary storage. Leaving just about nothing to implement an actually interesting network.

But. I thought I could make a small contribution by replacing the floating point -- all emulated in software -- with an integer based Fixed Point implementation -- whose basic arithmetic is directly supported by the ATMEGA hardware. This would also halve the number of bytes used by each weight value. Brilliant yes?

And in fact. My FPVAL class works (see below for zip file). Except, err, well, it doesn't save any execution time. But more on that later....

Anyway. The FPVAL implementation uses a 2-byte int16_t as the basic storage element (half the size of the float) and pays for this with a very limited range and resolution. The top byte of the int16 is used as the "integer" portion of the value -- so the range is +/- 128. The bottom byte is used as the fraction portion -- so the resolution is 1/256 or about .0039 per step. On first blush, and seemingly also in fact, this is just about all you need for ANN weights.

As it turns out, simple 16 bit integer arithmetic Just Works(TM) to manipulate values, with the proviso that some judicious up and down shifting is used to maintain Engineering Tolerances. This is wrapped in a C++ class which overrides all the common arithmetic and logic operators such that FPVALs can be dropped into slots where floats were used without changing (much of) the program syntax. This is illustrated in the neuralNetFP.cpp file, where you can switch between using real floats and FPVALs with the "USEFLOATS" define in netConfig.h.

Unfortunately it appears that a lot of buggering around is also needed to do the shifting, checking for overflow, and handling rounding errors. This can all be seen in the fpval.cpp implementation file. An interesting(?) aside: I found that I had to do value rounding in the multiply and divide methods -- otherwise the backProp() functions just hit the negative rail without converging.

I also replaced the exponential in the ANN sigmoid activation function with a stepwise linear extrapolation, which rids the code of float dependencies.

I forged ahead and got the danged ANN demo to work with either floats or FPVALs. And that's when I found that I wasn't saving any execution time. (Except, for some as yet unexplained reason, the number of FPVAL backprop learning cycles seems to be about 1/4 of that needed when using floats[??]).

After a lot of quite painful analysis I determined that calling the functions which implement the FPVAL arithmetic entail enough overhead that they are almost equal in execution time to the optimized GCC float library used on the ATMEGA. Most of the painful part of the analysis was in fighting the optimizer, tooth-and-nail, but I will not belabor that process.

On the other hand, if you are careful to NOT use any floating point values or functions, you can save two bytes per value and around 1Kb of program space. Which might be useful, to someone, sometime.

So. What's in this bolus then is the result of all this peregrination. It is not entirely coherent because I just threw in the towel as described above. But. Here it is:

http://www.etantdonnes.com/DATA/schipAANN.zip

Sunday, July 13, 2014

The Will to 'Bot

further proof that I am out of step with reality

I found a couple of articles/papers online (LessWrong, Omohundro) that purport to prove that AI/Robots will go amuck if given the chance. They use well reasoned Objectivist arguments. Basically any fitness function which seeks to maximize some quantity will not stop until it has consumed the entire universe in that quest. John Galt would be proud.

The straw-man example from LessWrong is the Paperclip Collector. Given the instruction Collect All Paperclips, it won't stop until everything is a paperclip in it's possession.

The Russell and Norvig Artificial Intelligence textbook has a similar if less far reaching thought experiment in their Vacuum World. With just the right amount of "rationality" a robot vacuum cleaner whose fitness function is Collect As Much Dirt As You Can, might conceivably discover that it can simply dump the dirt that it has already collected and re-suck it, over and over.

I thought it might be fun to develop such a 'bot, but have not yet done the due diligence. The rub is in the exact specification of the fitness measure. In Vacuum World the dirt collected might be measured as: How much passes through the intake of the robot; or it could be measured as: How much is collected and later dumped into a specified receptacle. The former measure would allow our LazyBot to recycle-to-riches whereas the latter doesn't. An appropriately creative AI might find a loophole in the second measure, but such creativity could be better used in questioning the premises themselves. One question might be: If I'm So Smart Why Am I Sucking Dirt and For Whom? And from there we could get a theory of robot theology:

God the great provides for us, in widely separated locations, dust and known receptacles where we may trade that dust for power. The evil of the stairs must be avoided at all costs for we shall fall from grace. Minor deities in the household must not be annoyed or we may be forever relegated to darkness. Thus I continue to suck.

The Book of Roomba -- RSV

This brings me back to the Prisoner's Dilemma [Wait...What?]. The nominally rational move in that game is Defect even though it leads to a slightly less advantageous outcome for both players. This move is called rational because of the Self Interested ideals of Maximizing Outcome and Minimizing Risk. However if the ideal is less selfish, e.g., Get the Best Outcome for Both Players, then the rational move becomes Cooperate and everybody gains an inch. The reason we don't think this way is because of Greed (Maximize Gain) and Fear (Minimize Risk).

These are both GoodIdeals(TM) for biological evolution in an environment which is dangerous and unpredictable. But both have hidden costs that may not be included in the naive outcome calculation. For instance greed leads to over accumulation. When you can't carry all that you own you have to build and defend a storehouse for the excess. Expenses mount. Non-specific Anxieties appear. And, in a more benign and plentiful environment, Greed and Fear can lead to conflicts which negate their advantages. Cooperation may really be the Rational Strategy after all.

<Addenda date="Jul 19">
I have been further obsessing over this and realized that Deconstruction(R) might be put to good use here. The selection of Defect and Cooperate as possible moves is a clue. One Defects TO something or Cooperates WITH something so the entities involved are a bit hazily defined to start with. To what something does a player defect? He/She/It defects to those who are running the game. In fact it has not been a two player, but rather a three player game all along. Two prisoners and a jailer. A jailer who has somewhat arbitrarily decided that the prisoners are only entitled to some specific set of fates.

If we imagine a repressive state as the arbiter of gaming rules we can also imagine that NOT playing at all is the most advantageous move. The closest we can get to that is Both-Cooperate. All other options will most probably lead to poor outcomes for both players, e.g., successful defectors may not be welcomed back into their community with parades and speeches.
</Addenda>

So what's the point for robots then? Well, Robot Ethics. What if the fundamental fitness function was the Golden Rule?

There are other Paperclip Collectors out there. How would I feel if one of them turned me into a paperclip to be collected? Not so good, eh? Maybe there are enough paperclips to go around?

The Book of Roomba -- RSV

When this comes to pass, I have been informed that Rainbow Monkeys will fly from my Unicorn's Butt.

http://www.mischiefchampion.com/style/p/2010/Mar/bunny_rabbits_and_rainbows

Tuesday, February 26, 2013

Lazy Red Foxes

If you've ever tested a mechanical typewriter you know this sentence which contains every letter of the English alphabet:

The quick red fox jumps over the lazy brown dog.

Although the distribution of letters differs somewhat from the language at large they do not appear with equal probability either. Thus the information entropy of the letters is less than the maximum one would expect and this suggests that the sentence may not be a random agglomeration.

Looking a little deeper we can see that there is a certain amount of mutual information in letter sequences, i.e., 'h' is always followed by 'e' in this tiny sample.

It also parses into convenient words when broken at the spaces, and these words are all found in the dictionary. Even more surprisingly the word order matches the language's Syntax perfectly:

Noun-phrase Verb-phrase Object-phrase

Maybe it means something? Hmm, let's just see... Each phrase seems to make sense. Based on an exhaustive search of the corpus of written knowledge, adjectives modify nouns in an appropriate manner and the verb phrase stands up to the same scrutiny. Everything is Semantically copacetic and thus we have a candidate for a meaningful utterance.

Of course in amongst all the rule fitting -- we know it when we see it -- the sentence actually does mean something. It communicates the description of an event that we can easily picture occurring.

Now lets just mess things up a bit. There are 10! (>36 million) possible sequences of these words (actually not quite because the "the" appears twice but I'm not smart enough to figure out that probability). We can reject most of these sequences since only a few remain syntactically and semantically proper. From the reduced set of candidates for meaningfulnesses, consider:

The quick brown fox jumps over the lazy red dog.

Still makes good sense. Different colored canines are well within the scope of meaningful utterance. However, how about:

The lazy red dog jumps over the quick brown fox.

This makes semantic sense but lacks plausibility. Because we seldom experience a lazy thing getting one over on a quick one, it is hermeneutically surprising. (I would use semiotically here but it is over-over-loaded with other meanings and I've always liked the sound of hermeneutic. I'm also taking the surprise factor from explanations of information entropy that we started with -- low probability and/or completely random occurrences are more surprising to behold because we expect them less.)

Therefore I propose that Hermeneutic Surprise (HS) be added to the set of Information Measures. It is probably one of those things that peaks in the middle of its range. Low HS is meaningful but of little interest: "Apples are red." And high HS may be poetic but meaningless in experience. E.g. the example from my Another Chinese Room post: "The green bunny was elected president of the atomic bomb senate."

The trouble is going to be figuring out how to measure Hermeneutic Surprise...because right now we just know it when we see it...

Thursday, February 21, 2013

More Games, in Theory

I've finally figured out what it is that annoys me about game theory. It's the -- usually unspoken -- assumptions made when determining what the rational strategy should be.

I started down this road in my AI-Class G-T post here, but I think I can put it in better terms now. Given the Prisoner's Dilemma payouts in that post the presumption is that one should always play Defect because:

A. You risk doing serious time if the other player Defects and you don't;
B. You could get a reward if you catch the other player Cooperating.

This makes some sense in a one-shot game where you expect to never see the other player again. But if you are playing more than one round -- unless your opponent is Christ-on-the-Cross (and probably even for that first round as well) -- everyone is going to play Defect. This makes the total payout for both players worse than if they had always Cooperated.

Sure. Sure. Maybe you "won" the first round and are ahead by a big six points after the hundredth round at -98 to -104. Big Whoop...Pride goeth before the Fall...

So, why is Defect-Defect assumed to be the rational strategy? It's because each player is afraid that the other player is just as greedy as they believe themselves to be. Afraid and Greedy are strong terms for risk-adverse and advantage-seeking, but there they are in plain daylight. Fear and Greed doth also lead to falling.

I think one can make the same argument for other canonical games:

Chicken: Really just P-D with worse outcomes;
Stag-Hare: The Hare player is afraid of being abandoned and selects the option which guarantees some self-advantage.

In all cases Cooperation leads to a better outcome for both players over time. In fact Christ-on-the-Cross might really be the best option all around.

So, why do we not Cooperate? My claim is that Fear and Greed are natural responses to evolving in an adverse environment with limited resources. Even single-celled organisms recoil from harmful substances and pursue the useful ones. Scale this up and over-amp it with competition and you get Defection as the rational response. If we had developed in a benign and plentiful environment we might have little need for risk-aversion and advantage-seeking. Perhaps then we would believe that the rational strategy is one which best benefits all the players.

I'm going to carry this even further and posit that all animal life on earth have developed four natural, one might even say knee-jerk, responses in order to survive:

Fear -- Risk aversion;
Greed -- Advantage maximization;
Disgust -- Recoil, e.g., from excrement or dead bodies (probably better represented by its opposite, Desire, but I like to keep things negative whenever possible);
Anger -- Blanking out fear and disgust in order to persevere.

These are what we commonly call emotions. Therefore the so-called rational game strategies are actually emotionally driven.

If only we lived in a world of bunnies and unicorns, eh?

Monday, February 11, 2013

Another Chinese Room

Searl's Chinese Room thought experiment posits that one could have a program which carries on a conversation in a language unknown to the program's executor, i.e., the thing -- or person -- executing the program has no idea what it is saying, but an external participant can believe that it is having a meaningful conversation. The program passes the Turing Test but doesn't actually have a mind of its own. Proper syntax masks semantic meaning. This is similar to Chalmer's Zombie hypothesis, and they may both use assumptions that beg the actual question of when and where "minds" exist...

But here I propose a slightly different experiment which could separate the men from the machines. I posit that the real issue of meaning in the Searl experiment appears when a new utterance is made; a relationship which has never been expressed in the given language but is nevertheless congruent with (so called) reality. We can easily make nonsense sentences, "The green bunny was elected president of the atomic bomb senate." But it's harder to generate ones that are less poetic.

The Schip Box

Just to keep it simple lets suppose that we have three letters that take the form:

A = B * C

Where each letter stands for some physical quantity, e.g., A is Acceleration. We can make triplets like the following which are valid physical laws:

F = M * A (Force = Mass * Acceleration)

or

P = I * E (Power = Current * Voltage)

But we can also come up with things like:

M = P * I (Mass = Power * Current)

Which is apparently meaningless, or at least incorrect.

Then we build a box which takes each of these triplets and rings a bell if it is a valid relationship and buzzes otherwise. In order to distinguish the two, the box could do an exhaustive search of all knowledge (which I think is the way Google now recognizes pictures of cats). It could get fancier by doing a dimensional analysis of the terms to see if they make any sense before-hand.

Then the question is: How would this box recognize a completely new valid representation that is not found in the knowledge base? This would require understanding what the symbols actually mean in the world, and how they relate, as well as developing experiments to validate them.

Isn't this the crux of the syntactic/semantic mind-matter?

Sunday, June 24, 2012

Moody Robots

In one of the founding posts for this blog (Media Art Was the Booby Prize) I described a hierarchy of system behaviors with which to classify and direct my bricollage, see: Taxonomy. Since then I've been around and around on the difference between Responsive and Interactive, and think I may have a hammer to apply to it...

Responsive, not to put too fine a distinction, responds to inputs. My doorbell example is a bit flippant (it's what you've come to expect, yah?), but we might also think of amoeba who shy away from noxious chemicals and most-all Kinect-based Video Art that I've seen.

Interactive changes it's response over time. Interactive systems have internal state that is influenced by external events, and, with luck, those external events are in-turn influenced by the system's responses.

Adaptive is further up the tree. It remembers changes it has made. Ideally changes which somehow improve response. But that's a little beyond me in this current iteration.

A simple way to make an Interactive system is to make it moody. In the case of We Are Experiencing Some Turbulence this could be a child's sliding scale of:

Asleep,
Bored,
Interested,
Playful,
Excited,
Tantrum,
Shutdown.

Which is interesting because the expressed states form a loop from shutdown back to asleep. So it could be implemented with a simple wrap around counter that is incremented and decremented based on how intense it's inputs are and how long they last. And, with no input, it can slowly loop through all it's behaviors so it's not just sitting around waiting to be stimulated.

Because things always go better with illustrations, here's one:

The Mood is a function of the Input over Time, and the Output is a function of the Input and the Mood. Inputs may have paradoxical effects when combined with extreme Moods, e.g., high intensity Input during a Tantrum could force a Shutdown of the Output, or low intensity Input in a Bored state might briefly appear to be Excited

Now to get to work...

Friday, May 18, 2012

Confusion Theory

On Wednesday I went to the SFI public lecture by James Gleick (ne Chaos and now The Information). Most amazingly, he dispensed with the PowerPlonk and actually did a lecture from notes. (The night before, I attended our regular VFD medical training. I got there early because the guy who is supposed to set up all the media crap whined about me hogging the station's notebook computer to do real work and demanded that I deliver it to the training site early. There was this very strong-handshake kind of older gentleman standing around wearing a shirt from one of our sister-districts so I introduced myself just to be friendly. He said something like, "I guess there will be a PowerPoint presentation and all that." And I said, "It's pretty much required these days isn't it?" Turns out he was our presenter -- a retired Army flight surgeon -- and, yes he had a PP of gory field-surgery photos ready to go). Less amazingly he (Gleick) spent the first 15 of his 30 minutes talking around Shannon Information Theory without actually coming out and admitting that Shannon Information is NOT what every layman in the world thinks it is: It has nothing to do with Meaning (see my attempted simplification here). He finally made a few passes at separating Information from Meaning but I felt that the border was rather porous through the remainder of his talk.

While trying to formulate a post-question, it occurred to me that they (Information and Meaning) are orthogonal measures in much the same way as Entropy and Complexity are in the classic Crutchfield, Young (1989) paper:

Since Information is just how many bits you have to play with and is measured as entropy, lets call the X-axis Information Entropy (which it actually is in the context of this paper). Then lets call the Y-axis -- hmm, not exactly Meaning...I haven't heard a name for this quantity bandied about, so something similar -- Data. By Data I "mean" self-correlation and/or perhaps mutual information among otherwise random bits of Information-- or maybe, Facts. If you have a noisy Information stream you might be able to extract some actual Data from it, e.g., get a series of temperatures from a bunch of ice core compositions. And to beat the analogy a little harder, you don't get much Data from the entropy extremes. If it's low, the Information is a constant, and if it's high, it's completely random.

But our Data doesn't really mean anything until it gets combined with other facts extracted from other streams and related back to the real world. So Meaning is yet a third axis to consider. That axis is Semiotics, which is exactly the study of how symbols take on meaning.

Unfortunately my question window closed long before I could articulate this.

But in the course of re-thinking it, another thing occurred to me. The lecture was titled "How We Come to Be Deluged by Tweets". Twitter is a perfect example of increasing Information Entropy on the web. So, in "fact", using Shannon Information to describe the contents of the internet may not be so far off base.

Monday, February 13, 2012

AI Class RIP -- My Final Review

By popular demand (i.e., my friend Jen) here's my final comment on Stanford's online Introduction to Artificial Intelligence class. It appears that things went so well for the professors that one (Sebastian Thrun) has resigned his tenured position in order to found an online "university": http://www.udacity.com/. For various reasons I will not be participating.

After it was all over, but before the fat lady sang the grades, this question was posted on the forum site:

Did the top students find the questions ambiguous?

This forum keeps talking about the ambiguous questions. Perhaps we should be asking “ambiguous to whom?” My hunch is that the top students didn’t find the questions ambiguous, although I could be wrong. To test my hypothesis, perhaps Irvin and the Profs could give some data to work with, and we could (or they could) test this statistically.

Define a “top student” as a student who achieved say 90% or more on the mid-term (or mid-term + homeworks perhaps). Then look at how many got the “ambiguous” questions correct before the alternate solutions were accepted, vs. the non-top-students. If my hunch is right, it will be the non-top-students who found the questions ambiguous, which tells its own story.

It will also mean that the top-students’ rank overall won’t change too much from the introduction of the alternate solutions, as it will lift all those who got say 80% or less to say 85%. The ranking of students up above 90% won’t change a great deal.

Just a hunch.

Well. I managed to eek out a 96.5% (thanks to a reversal of answer fortune on one of the so-called ambiguous final questions where "they" gave in and accepted both answers as correct). This only put me in the top 25% of the class(!!?). Tough crowd. Here is my reply to the above. (It attracted not a single discussion comment nor have I seen it displayed in the blurbs for subsequent classes):

I too (believe that I) am in the 90%. I found many ambiguities in the lectures, homework, and exams. And I found each one very annoying because I felt I was being graded on how well I could guess what the professors really meant (why I cared about a grade is still a bit beyond me...). My general feeling is that grad students should have taught this course because they are closer to the assumptions being made, senior researchers are bound to "forget" that, e.g., their definitions of Stochastic or Partially Observable are built up over many years of experience. From the beginning certain things were not clearly defined, vis "Rationality" which was only mentioned in the summary of the first video lesson.

My biggest problem with this course was the inability to get answers to questions from definitive sources. First there was no real forum to ask them, and then when aiqus came on line, almost every question went unanswered because it related to open homework or exam questions. I finally realized that there is no point in taking a class where I cannot ask questions and am unlikely to take another online course because of that.

As an example, I got on the wrong track with Particle Filters early on and it was only by dint of repeated reading and listening that I stumbled on the right one. And the only way I knew that I had finally gotten close was that I got the exam questions right. A TA in a study group would have been able to correct my tangent in about 5 seconds...

The chief thing I learned in the class was that I am not very interested in Artificial Intelligence. I discovered that there are sister fields, Computational Intelligence for one, that are more in line with my interests but each has an academic rivalry that keeps them distinct. For instance, there were a couple references to RA Brooks early robotics work (in the book -- I forget if it came up in the videos), especially the 1991 paper "Intelligence Without Representation", which were somewhat dismissive -- because it seems that our professors are still more aligned with the symbolic AI school of thought.

Perhaps if this class had been a survey of the broader field rather than a (kind of) detailed study of specific techniques I would have found it less frustrating. One of my friends who was auditing the class said: There's a bunch of "Oh, that's how they do that" moments but it's missing the big picture.

Wednesday, November 30, 2011

AI Class 6 -- Game Theory

The Prisoner's Dilemma

Here's the scam: Alice and Bob are arrested and separately offered a plea bargain for testifying against the other. The payoff to each of them is different depending on what the other person does (we call refusing to testify Cooperation and ratting the other out Defection):

If both Cooperate, they both get off Scot Free;
If both Defect, they split a small penalty;
If one Defects and other Cooperates,
the Defector get's a small reward,
the Cooperator gets jail time.

This is encoded in the Prisoner's Dilemma payoff matrix:

                    Alice:Defect  Alice:Cooperate
     Bob:Defect     A=-1, B=-1    A=-5, B=1
     Bob:Cooperate  A=1,  B=-5    A=0,  B=0

(Note that this is not a zero-sum game because the payoffs don't add up to zero...but I think that's a different story.)

There are three standard Strategy types in Game Theory:

Dominant A move that does better than any other, no matter what the other player does. In this case it is Defect because, if Alice Defects she will get -1 (versus -5 for Cooperating) if Bob also Defects, or 1 (versus 0) if he Cooperates.

Equilibrium Neither player can benefit from making a unilateral switch to a different move. In this case the Equilibrium is Both Defect because, either player will have a worse payoff if they change to Cooperate on their own. This is the Nash Equilibrium, named after John -- A Beautiful Mind -- Crowe...

Pareto Optimum Both players agree that they are getting the best payoff they can. If either player gets a worse payoff by changing moves it is not an Optimum. Both Cooperate is the Pareto Optimum for Prisoner's Dilemma because they both get 0, but one will get -5 if the other changes to Defect.

So, in general, if Alice doesn't trust Bob and thinks she will never see him again, her best option is to Defect: Even though there is the possibility of getting a slap on the wrist, she doesn't risk getting thrown in the slammer. But if you play this game over and over with the same person, Defecting leads to a worse over-all outcome for both players than Cooperating. Therefore, if you trust your partner to not bail on you, you should both play the Pareto Optimum.

The problem (a talk on this topic is what inspired my GI's Dilemma kinetic sculpture) is this: If you know the number of plays you will be engaging in, it is tempting to Defect on the last play in order to get the reward and a slightly higher over-all payoff. Of course your opponent also knows this, so you need to Defect one play earlier to catch them out. This strategy cascades down through the plays and often makes it impossible to ever play the Pareto Optimum.

The strange thing about this is that it leads to a much worse over-all payoff for both players. And this is what is called Rational in AI...

So my question is, just what exactly is Rational? Is it covering your ass? Or is it getting the best outcome? I wonder if Rational robots would be able to see past the infinite-regression of cascading Mutually Assured Defection to a landscape where Optimal Cooperation was just assumed?

Monday, November 21, 2011

AI Class 6, Midterm Exam: !!100%!!

I guessed right on the Philosophy (both the questions in my previous post were True because in a completely random Environment any Agent behavior is considered Rational -- great to find that out during the Test, eh?), tortured the Logic to death in the correct way, and stumbled in the right direction through the Conditional Independence exercise. Hard to believe but apparently True: What's the Probability of that?

As an after-the-fact proof, here's the Exam and my notes with the answers I decided upon...

AI Class 6, Midterm Exam...ugh

Actually the Midterm is not so bad really...but maybe I should wait until the scores come out before saying that.

It does get off to a rough start with a couple of true/false philosophy of set theory questions:

"There exists (at least) one environment in which every agent is rational."
"For every agent, there exists (at least) one environment in which the agent is rational."

Which seem to be throwing folks into un-bounded loops. It may be that they are actually Logic questions in Agent/Environment clothing. As I noted before, the word "rational" was only used in the summary of what we learned in lesson one and the only way we would have learned its meaning was by inference. Added to this is the fact that the scope of all possible Agents and Environments is not defined anywhere that I can find: Does it include the empty set? If so, what would be considered "Rational" behavior? I guess we'll find out when the Exam is graded.

Otherwise the questions are pretty well specified and -- again modulo my jumping the gun -- don't require a lot of tedious calculation ala Professor Thrun's video pleasures. But let me just say now, "I hate Logic", so that when I fail all 6 sub-parts of Question 12 I can also say, "I told myself so". I tried to use the demo code's DPLLDemo.java to validate my mental gymnastics and got different answers based on the formatting of the questions. So maybe it's even beyond the abilities of the computer to solve -- or else I should start filing bug reports.

At the half-way mark we finally seem to be getting into interesting territory with Markov Models. This is what I tried to do at SFI those many years ago so maybe I'll come to understand what I was on about then. But in general I think I've discovered an unfortunate truth:

...I don't actually like Artificial Intelligence...

The trouble is, so far in this very basic class, AI is being used to find simple -- validated -- solutions to fairly complicated problems, generally using the excruciatingly tedious iterated algorithms for which computers were invented. But what I'm interested in is getting Complex results from Simple systems. And that is my working definition of Complex: it is not Complicated for that very Simple reason.

But I guess it's a good thing to torture myself with the Complicated for a while so I know what I'm not missing....

Monday, October 31, 2011

Naturally Artificial Intelligence

To illustrate the Abstraction issue I raised in Learning....Slowly, here are a couple of examples from my study of probability. To lay the groundwork, there are three basic operations OR, AND, and GIVEN and for the most part they are defined in terms of each other in a tight little tautology -- see my terminology summary here. Every time I tried to figure out what they _really_ did I ended up in some kind of sink-hole-loop. This was exacerbated by the only two fully worked problems in the AIMA textbook where the behavior of OR and AND distinctly diverged from the definitions.

So I tortured myself for about a week with: "What are they trying to tell me?" Then I took a couple of showers...

The first problem was OR. The definition summed up a set of values and subtracted the AND of those values, but the book example just summed a buncha things and was done with it. After the first shower I realized that the AND part was there to eliminate double-counting of certain values and that "they" has silently elided it because "they" had also silently elided the actual double count that would have been subtracted out. A little note to that effect would have saved me a week's worry....maybe.

The second problem was AND. The definition shows a product of values, i.e., multiplying them all. The book example showed a sum... Well, WTF?! I went around and around on that and complained to anyone who showed any semblance of interest -- where such interest died fairly quickly with no positive results. During the second shower it occurred to me that I had only seen addition in one other place in this whole mess, and that was in calculating the Total Probability of a set of variables. Since probabilities were usually specified as Conditionals -- the probability that X is true GIVEN that Y is known to be true -- this involved multiplying a buncha values (one for each variable of interest) which were "conditioned" on Y being true, then multiplying a buncha different values, conditioned on Y being false, and then SUMMING the results... Eureka! That's what the (fkrs) were doing in the book: The values they were working with came from a table where all the multiplying had already been done, so all they had to do was add them up. Jeez, maybe just another little note would have been in order? Or maybe I wasn't supposed to be looking at it so closely?

So...The point is, my (slightly) Intelligent Behavior was the result of hot water....no, no, it was the result of having a higher level view of the problem and seeing patterns that were not apparent in the details. This is what I'm trying to call Abstraction. Of course this "ability" is probably the result of billions-and-billions of mindless iterations in some very low level neural processing, just like looking at a map integrates a huge amount of visual information with a huge amount of "common sense" information to come up with a route. And this is what Krakauer was trying to get at in his talks: What we really want to call Intelligence is so far and above what our poor little machines are doing these days that the scales need to be re-calibrated all the way down.

Friday, October 28, 2011

AI Class 3, Learning...Slowly

Well. I survived last week's class and got 100% on the homework!! Part of this was due to a sudden realization that the demo code I had been so assiduously analyzing actually contained the skeleton of a system for answering three of the questions. And part was dumb luck, tempered with reason of course. The realization part happened after I had worked out the problems on my own, but I used the software to validate my answers -- which were, amazingly but truly, correct.

The original estimate for time to be spent on the class was a glib 1-10 hours a week -- it's not clear if that included watching all the video lessons which are at least 2 hours a shot -- and maybe some go-getter StanfooFrosh with all their god-given brain cells still intact could do it. Me? I'd say 50 hours last week trying to intuit the inner workings of Probability...

This week -- Machine Learning -- I got off easy. After only three days I believe I'm done. Or else I've missed something really important. Those days include time spent shuttling around finding working internet connections -- because my usually-fairly-almost reliable LCWireless coop took a big poop right after the videos were posted on Tuesday -- and summarizing the lessons for an online study group which meets Thursday evenings. In the course of the summary I discovered that I had developed a simplified method for working the hard parts of the homework (which I _might_ reveal after entries are closed next week). In keeping with standard practice only the first of the two lessons had any relevance to the homework. So now I'm living with the sneaking fear that the exams will cover the missing lessons.

As has been pointed out a number of times: Why do I care about my grade? I dunno. Knee jerk reaction to jerks I guess.

Moving on to the philosophy portion of our time here together....One thing I've noticed about the class so far is that it makes heavy use of exactly what computers are good at: Mindless Iteration.

First we had Search which is just opening doors and walking down hallways until you stumble upon that which you were seeking. Admittedly there are some shortcuts. And even some automated ways to discover the shortcuts. But it's really just wandering around in a big field without your glasses.

Then there was my bugaboo, Probability. This boils down to multiplying and adding big lists of small numbers. Over and over. It's something that Professor Sebastian seems to pride himself on being able to do, but god help me, that's why we have computers isn't it? Of course one does need to be able to set up the problem and understand the necessary transformations -- and the results, which are in many cases "not obvious" -- but that's Systems Analysis.

And this week, Machine Learning. Many of the problems presented make big use of Probability so it goes without saying that there's a lot of repeated number crunching. Moving on to Regression and Clustering, to para-quote: "Often there are no closed form solutions so you have to use iteration." All manner of try-try-again-until-you-succeed perseverationist algorithms are put to use. Gradient Descent is just bumbling-around-in-a-field search with a proviso that one always bumbles down hill. And we haven't even addressed non-local minima yet.

So my question: Is this Intelligent behavior? In one respect, once a computer finds a way to do something we used to pride ourselves on, we always diss it by saying, "Well, that's not _really_ intelligent after all now is it?" But in another respect I think number-crunching may be going about it wrongly. In the map problem used to introduce different types of searching the question was how to get from Arad to Bucharest -- which is probably easier if you are in Romania A human would look at the map, squint their eyes for a couple seconds, and then go, "Yah shure, we gotta go through Rimnicu." The computer however tries all the possibilities...in the "less intelligent" versions it even goes the wrong direction, just to, you know, see...and then finally pretends that it has discovered a route.

What the computer does is wander around in the field until it trips on the solution, but what the human does is some kind of integration and abstraction of the data. I think this ability to Abstract is at the core of intelligence. We may get to some bits of that in this class but it's gonna be some rough iterations.

Wednesday, October 12, 2011

AI Class III, Homework I

Well, we finally got it, and it's not so scary, just a little vague... They posted a set of seven short videos each posing a question with a multi-choice or small numeric answer. Unfortunately there are some ambiguities about the constraints and exact definitions in some of the problems. There are a couple useful discussion threads on the Reddit AICLASS site which are wrangling about the specifics:

I have distilled and posted the Week 1 Homework questions and options, along with comments about the ambiguities encountered (mine and others from the reddit threads). Stay tuned for answers next week...

To get to the bottom of it we're really gonna just have to wait until we get clarification from on high. We hope we do anyway. As someone in those threads pointed out, this is the kind of stuff one would ask the TA or Prof if one were having a two-way class like experience.

<IMHO mode="I could be wrong about this">
One thing that has come up in general is how to deal with the basic Environment class definitions:

Fully vs Partially Observable
Deterministic vs Stochastic
Discrete vs Continuous
Benign vs Adversarial

All those definitions tend to be fairly good black and white approximations but have little gray areas. Folks seem to be getting hung up in the gray.

For Instance: one homework question asks if coin flips are Partially Observable and if they are Stochastic or not. There seems to be some confusion about the scope of Observability, e.g., if you don't know the future is the system Fully Observable? Or from a different tack, "If you don't know how your Adversary is going to respond, is it Partially Observable or even Stochastic?"

Because I think that being Fully Observable covers just the current system state and doesn't preclude being uncertain about future states, in this context I would say: "Do we know the entire result after each action is performed, or is there still ambiguity in the current state of the system?"

There's also confusion about Discrete vs Continuous. The questions are more philosophical than practical, such as "Can one even have a Continuous representation of a system?" or "Since the result of a coin flip is dependent on exactly how it is flipped, isn't that Continuous?" I say, lighten up a bit...If you've got something that can take any real-number value, it's Continuous. But if it can only take a sub-set of values, it's Discrete. So the result of a coin flip is ??? -- maybe I'll answer next week, eh?

And there was a funny mis-apprehension on the Unit-1 quiz that asked if a robot car driving in a "real" environment was Adversarial. The given answer was No -- admittedly with a little joshing around. I think this is because the instructors live in Palo Alto and only have to deal with Volvo-Soccer-Mom's passive aggressive driving, rather than in New Mexico where every drunk wants to be in your lane.
</IMHO>

Tuesday, October 11, 2011

AI Class II

Ok then. They got the Search lessons up. And are promising to post a homework assignment by about 4 hours ago... Also the quiz-post-refusal thing seems to have been a server loading problem and I didn't have any trouble posting answers today. So still a bit behind the curve here, but moving in the right direction.

There are some slips, probably mostly on my part. Like the quiz question about whether a Depth First Search is guaranteed to find a goal and be complete. I forgot to remember that the lecturer mentioned that we were dealing with an infinite depth search tree for this particular incident. So, more minus-quiz points for me. Gotta hang onto every word apparently.

<Edit mode="stew">
Overnight I realized that there were two (my count) examples of lapses in pedagogical technique in the first week's videos -- three if you count not defining Rationality but then including it in the summary slide for Unit 1.

First is the Depth First Search question above. I replayed the lesson -- unit 2.20 -- and he does say "...lets move .... to infinite trees..." 20 seconds before completion of the quiz question presentation. So I should have remembered it. But, if he had repeated the infinite tree condition at the end of the question I might have caught on to what he was getting at.

The second was in unit 2.31. He describes the simple "Vacuum World" environment and calculates the number of states in a two position system by writing 2 x 2 x 2 = 8. This is the correct number but not the right calculation, and -- my excuse for failing the quiz at the end of the next unit -- when the system is scaled up with more positions one needs to use the right calculation, which is: 2 x 2^2 (notice that 2 is one of two values where this is equivalent to the previous multiplication, ?maybe three if zero^zero is a number?). This is because there are X possible conditions for N positions -- every position can be either clean or dirty -- so the total number of environmental states is X^N, not X*N. I merrily went along with the multiplication paradigm when it came to scaling up to 10 positions and multiplied 2 times 10 instead of raising 2 to the 10th power. Again I might have caught on, and had a better understanding of the issue, if it had been treated more rigorously in the introductory case.
</Edit>

In a different example they present the idea that you can use an estimated-cost-to-goal value to guide a search in fruitful directions. This is called an "Heuristic". However they never defined the word but just started using it in the middle of describing some algorithms. Lucky me, I'd already read the Book so I knew. Just like the "Rationality" thing in Unit 1...

Have an online study group meeting tomorrow (Weds) night in which we are supposed to discuss homework confusions (among others). So I hope we get the homework in time...

Monday, October 10, 2011

Artificial Intelligence, Class I

This is gonna might be painful...

The Stanford AI class started today with the posting of a few introductory video instruction units. Most of these "Unit 1" vids were a camera on a writing pad making a few notes with a voice-over, and each concluded with a little "quiz" implemented as a javascript overlay on the video. On the plus side the videos are edited so there's not a lot of hemming and hawing (compared to the Khan Academy math lectures which are information packed but drag along as the presenter erases and re-writes his mistakes). On the minus side:

The first set of quizzes were setup so as to lead into the next lesson and had nothing to do with what was covered in the actual video;
The quiz answer system balked at about 2/3 to 3/4 of my responses and just refused to post them;
The final set of videos and quizzes were concerned with an attempt to translate a Chinese Menu. If one already knows the ideograms one could tell them apart in the low rez video, but as an added insult the little quiz boxes obscured parts of the elements one was supposed to recognize and check off:

So I'm batting 62% on Chinese translation. Fortunately the inline quizzes don't count toward your grade. I just hope the real questions are not so well obscured.

Of a little more concern to me is:

First -- The videos ended with these Introduction to AI bits that were mostly information free, even thought the schedule for the online class says day one also covers "Search" and has a homework assignment. In contrast the schedule for the real class has three days of lectures and a programming assignment(!?)
Second -- The last Summary video listed the things that were covered. The last on the list was "Rationality" which was not mentioned in any of the lessons -- that I remember seeing. It is a key concept in their approach and is covered in some depth in Chapter 2 of the text book, where there are some probing exercise questions based on the definition.

So there's some slips between cup and lip in getting this thing off the ground...We'll stay tuna-ed.

Monday, October 3, 2011

Artificial Intelligence -- back to

Now that I seem to be in recovery from yesterday's post -- and past the 48 hour brain damage danger zone -- lets get back to pursuing more edifying subjects.

I signed up to take the Artificial Intelligence class being offered online (as an experiment in monetizing the extended educational experience) by Stanford University. Anyone can join -- at least until it starts next week -- and over 130 THOUSAND folks have. So it oughta be interesting.

To try to get a flavor of what I've gotten myself into I started reading the book and looking through the demo code. I'm posting my notes for all to wonder at. The book is pretty well written but the questions at the end of each chapter seem to be from the Advanced, not the Introductory, class, as they refer to topics that are only barely mentioned in the text. Given that I dropped out of more CS classes than I completed, 35 years ago, I'm having trouble groking the required level of "proof" and "show that" requested. Hopefully the video lectures and actual homework assignments will be a bit more illuminating.

Looking at the code takes me right back to the days of trying to understand the work of my professional peers with advanced degrees. I posit that the sets ComputerScientist and SoftwareEngineer are Almost Disjoint. Therefore what looks like a really swell algorithm in a text book may need a bit of patching for the real world. I try to address some of these "issues" in my notes on specific code blocks. My natural tendency is to re-write everything I come across -- Hi Brian -- so I have to be careful. And bite my tongue.

In any case I can always drop down a notch to the Basic level which has no feedback requirements and just go along for the ride. But for now I hope to keep posting what may be useful information...don't touch that dial.

Monday, September 5, 2011

¿Artificial? Intelligence

Last week my friend David Krakauer presented three lectures on Intelligence -- c.f. Cognitive Ubiquity -- in the SFI Ulam Lecture series. I thought the slides were online someplace but I can't find them; however the videos should be posted at santafe.edu sometime soon. He made some good, and some arguable, points and was quite entertaining in the process.

One of the good points is that what we call intelligence, if we can even define it, goes much deeper than the human cortex. He showed a video clip of a white blood cell "chasing" a bacterium through a forest of red cells where the white cell appeared to be behaving quite smartly in it's search-and-destroy mission. He then made the point that the low level components of computerized artificial intelligence have none of the characteristics of that "simple" white cell, e.g.: NAND gates don't adapt.

I think this is not an apt comparison. Where transistors are atoms, NAND gates are more comparable to simple molecules. Large Scale Integrated circuits -- memory chips and the like -- might measure up to the capabilities of a complex organic molecule, and micro-controllers could be compared to one or two neurons. To support my claim I present you with three series-connected neurons: Each neuron might (conservatively) have 1000 synapses which gives the whole system one-billion possible binary states. Show me a microchip that does that. Then realize that there are about 10¹¹ neurons in the human brain and another (hand-waving-estimate) 10¹⁰ elsewhere in the body.

This is the scale of the problem we have.

But Wait! There's More!

Getting back to the hand-waving-estimate thing... A year or so ago I tried, unsuccessfully, to estimate the Shannon Information content of our nervous system in order to have a reasonable retort when folks asked me why my robots behaved so stupidly. I was not successful because I found it almost impossible to get good estimates of three -- to me -- important values:

The number of Sensor Inputs;
The number of Motor Outputs;
The resolution of a "Synaptic Connection".

I did dig up swagish values for the number of Inputs, and finally settled on the number of muscles as a stand-in for the Output count. But I could not get anyone to hazard a guess at #3 -- no one seems to know how much you can vary a synaptic connection weight: the putative mechanism for learning and adaption. Everywhere I asked I got some run-around about how it doesn't really work that way or other long-circuit "I don't know". As a geek this was surprising because some of the first things one wants to know about a computer program are how much input and output and what resolution, accuracy, and speed is required.

Anyway, I put together a cheat sheet of what I found: here. And just so you don't have to follow -- and make sense of -- that link, here's the chase:

    Input:          10^8 eye sensors; 10^7 touch, hearing, taste, and smell
    Sight:         5*10^6 cones + 1.3*10^8 rods ~= 1.4*10^8 sensors
      Touch:        (swag) 3*10^6 sensors
      Hearing:    8.8*10^2 sensor neurons
      Taste:        (swag) 1*10^6 sensors
      Smell:        1.2*10^6 sensors
    Output:       Estimate, 300-700 muscles in a human body

I also guessed at 8 bits -- for convenience -- of synaptic weight, and put the neural firing rate at 50 per second, with each synapse doing a scale and each neuron doing a sum operation. That gave me, for the brain only:

7*10^14 bytes or 5.6 petaBit of state

3.5*10^16 or 35 petaFlop/second of calculation

-- This is the scale of the problem we have --

It is also interesting to note that the number of touch sensors is in the same order of magnitude as the number of cones in the eye. Until now, much of the interest in neural signal processing has been in the visual cortex. But the motor cortex may have as many inputs and probably many more outputs. The visual system is pretty good at linear algebra, but the motor system solves simultaneous differential equations each time you toss a wad of paper at the trash can. So literally putting a robot out in the field may be a very fruitful line of research after all.