Wednesday, November 30, 2011

AI Class 6 -- Game Theory

The Prisoner's Dilemma

Here's the scam: Alice and Bob are arrested and separately offered a plea bargain for testifying against the other. The payoff to each of them is different depending on what the other person does (we call refusing to testify Cooperation and ratting the other out Defection):
  • If both Cooperate, they both get off Scot Free;
  • If both Defect, they split a small penalty;
  • If one Defects and other Cooperates,
        the Defector get's a small reward,
        the Cooperator gets jail time.

This is encoded in the Prisoner's Dilemma payoff matrix:
                    Alice:Defect  Alice:Cooperate
     Bob:Defect     A=-1, B=-1    A=-5, B=1
     Bob:Cooperate  A=1,  B=-5    A=0,  B=0

(Note that this is not a zero-sum game because the payoffs don't add up to zero...but I think that's a different story.)

There are three standard Strategy types in Game Theory:

Dominant  A move that does better than any other, no matter what the other player does. In this case it is Defect because, if Alice Defects she will get -1 (versus -5 for Cooperating) if Bob also Defects, or 1 (versus 0) if he Cooperates.

Equilibrium  Neither player can benefit from making a unilateral switch to a different move. In this case the Equilibrium is Both Defect because, either player will have a worse payoff if they change to Cooperate on their own. This is the Nash Equilibrium, named after John -- A Beautiful Mind -- Crowe...

Pareto Optimum  Both players agree that they are getting the best payoff they can. If either player gets a worse payoff by changing moves it is not an Optimum. Both Cooperate is the Pareto Optimum for Prisoner's Dilemma because they both get 0, but one will get -5 if the other changes to Defect.

So, in general, if Alice doesn't trust Bob and thinks she will never see him again, her best option is to Defect: Even though there is the possibility of getting a slap on the wrist, she doesn't risk getting thrown in the slammer. But if you play this game over and over with the same person, Defecting leads to a worse over-all outcome for both players than Cooperating. Therefore, if you trust your partner to not bail on you, you should both play the Pareto Optimum.

The problem (a talk on this topic is what inspired my GI's Dilemma kinetic sculpture) is this: If you know the number of plays you will be engaging in, it is tempting to Defect on the last play in order to get the reward and a slightly higher over-all payoff. Of course your opponent also knows this, so you need to Defect one play earlier to catch them out. This strategy cascades down through the plays and often makes it impossible to ever play the Pareto Optimum.

The strange thing about this is that it leads to a much worse over-all payoff for both players. And this is what is called Rational in AI...

So my question is, just what exactly is Rational? Is it covering your ass? Or is it getting the best outcome? I wonder if Rational robots would be able to see past the infinite-regression of cascading Mutually Assured Defection to a landscape where Optimal Cooperation was just assumed?

No comments:

Post a Comment