CS905N Lect 1 notes

Game Theory

Game Theory can be regarded as a multi-agent decision problem. Which means there are many people contending for limited rewards/payoffs. They have to make certain moves on which their payoff depends. These people have to follow certain rules while making these moves. Each player is suposed to behave rationally.

Rationality: In the language of Game Theory rationality implies that each player tries to maximize his/her payoff irrespective to what other players are doing.
In essence each player has to decide a set of moves which are in accordance with the rules of the game and which maximize his/her rewards.

Game Theory can be classified in two branches

Non co-operative game theory : In this case the players work independently without assuming anything about what other players are doing.
Co-operative game theory: Here players may co-operate with one another.

Game Theory has found applications in Economic, Evolutionary Biology, Sociology, Political Science etc, now Its finding applcations in Computer Science.

What is a game?

A game has the following

Set of players D = { P_i | 1 <= i <= n}
Set of rules R
Set of Strategies S_ifor each player P_i
Set of Outcomes. O
Pay off u_i(o) for each player i and for each outcome o e O

Example 1{Coin Matching Game}

Coin Matching Game : Two players choose independently either Head or Tail and report it to a central authority. If both choose the same side of the coin , player 1 wins, otherwise 2 wins.

A game has the following :-

1. Set of Players.
The two players who are choosing either Head or Tail in the Coin Matching Game form the set of players i.e. P={P1,P2}

2. Set of Rules. R
There are ceratin rules which each player has to follow while playing the game. Each player can safely assume that others are following these rules. In coin matching game each player can choose either Head or Tail. He has to act independently and made his selection only once. Player 1 wins if both selections are the same othrwise player 2 wins. These form the Rule set R for the Coin Matching Game.

3. Set Strategies S_ifor each player P_i
For example in Matching coins S₁ = { H, T} and S₂ = {H,T} are the strategies of the two players. Which means each of them can choose either Head or Tail.

3. Set of Outcomes. O
In matching Coins its {Loss, Win} for both players.

    This is a function of the strategy profile selected.
     In our example S₁ x S₂= {(H,H),(H,T),(T,H),(T,T)} is the strategy profile.
    clearly first and last are win situation for first player while the middle two are win cases for the second player.

4. Pay off u_i(o) for each player i and for each outcome o e O
This is the amount of benifit a player derives if a particular outcome happens. In general its different for different players.
Let the payoffs in Coin Matching Game be,

u₁(Win) = 100
u₁(Loss) = 0

u₂(Win) = 100
u₂(Loss) = 0

Both the players would like to maximize their payoffs (rationality) so both will try to win. Now lets consider a slightly different case. We redefine the payoffs as,
Player 1 is competetor so

u₁(Win) = 100
u₁(Loss) = 0

While player 2 is a very concerned about seeing player 1 happy (player 1 is his little brother) so for him

u₂(Win) = 10
u₂(Loss) = 100

In this situation only player 1 would try hard to win while player 2 will try to lose. The point to note is that each player tries maximize his payoff for which he/she would like to get the Outcome which gives him maximum payoff.

    Informally we can say the players sit across a table and play the game according to the set of rules. There is an outcome for each player when the game ends. each player derives a pay off from this outcome. For example an outcome of victory brings payoff in terms of awards and fame to the cricket players, while loss means no payoff. Because all the players are rational beings they will try to maximize their payoffs. In non co-operative games players don't know what other players are doing. So they have to make the moves without looking at what others are doing.
      Each player chooses a strategy i.e. set of moves he would play .
Strategy
    It is the set of moves that a player would play in a game. Being rational a player would chose the startegy in such a way as to maximize his/her payoff.

Zero Sum Game : In zero sum game sum of payoff's of all the players for each outcome of the game, is zero. Which means if one player is able to improve his payoff by using some good startegy the payoff of others is going to decrease.

Example 2{Tic-Tac-Toe}

In Tic-Tac-Toe game there are two players x and o. Outcomes are O = {x wins, o wins, draw}

ux(x wins ) = 1 ux(o wins) = -1        ux(draw) = 0
uo(x wins) = -1 uo(o wins) = 1        uo(draw) = 0
----------------------------------------
                        0                                0                           0

Constant sum Game : In zero sum game sum of payoff's of all the players for each outcome of the game, is a constant.
Zero-sum games are true games of conflict.Any gain on a player's side comes at the expense of his opponents.Think of dividing up a pie.The size of the pie doesn't change. Its all about redistribution of the pieces between the players.

Example 3{Chess}

Consider the game of Chess. There are two players one playing with White pieces and one playing with Black pieces. There are three possible outcomes. O = { Black Wins, White Wins , Draw }.

Lets define payoffs as

Black Wins White Wins Draw

U_b 1 0 ½

U_w 0 1 ½

Sum 1 1 1

This is a constant sum game, sum of payoffs is constant. If white increases his payoff by a win the payoff of black goes down and vice versa.

Above two types are called strictly competetive games. Win for one player is loss for the other

Non Zero Sum Game

Example 4{Cricket with advertisers}

There are three palyers in a Cricket match

1. India
2. Pakistan
3. Advertisers

There are three outcomes O = { India Wins , Pakistan Wins , There is a Tie } = {I,P,T}

if we define payoffs as

u_A(I) = 10         u_I(I) = 10                u_P(I) = 0
u_A(P) = 10         u_I(P) = 0                u_P(P) = 10
u_A(T) = 100        u_I (T) = 5               u_P(T) =   5

The payoff for advertisers remains the same whoever wins but in case of a thrilling tie, lots of people will watch the last few overs increasing advertisers' payoff manyfold.

Here we have a Non Zero Sum Game.

Example 4{Prisoner's Delimma}

There are two persons who have committed a crime of which there is no evidence. Police catches them and puts them in two separte cells. Beacuse there is no evidence against the convicts, they cannot be proven guilty. So the police tries to use one againt the other. Each Prisoner is given two options either to confess his crime or to deny it . If prisoner I confesses but prisoner II denies then the first prisoner serves as Testimony against the other and he gets no punishment, while the prisoner II gets full term of 10 yrs and vice versa. If both confess both get 5 years of imprisonment each as now police has evidence against both of them. If both deny the police has evidence against none, so maximum punishment that they can get is 1 yr each.

This can be represented in tabular form as.

I \ II Confess Deny

Confess 5,5 0,10

Deny 10,0 1,1

This the standard representation of 2 player game. Each cell has two payoffs, one for each player. The first number in a cell is the penalty of player 1 and the second number is the penalty of player two. Each row represents a startegy for player 1 and each column represents a strategy for player 2. So the bottom right column means if Player 1 denies and Player 2 denies then penalty for player 1 is 1 year and that of player two is also 1 year.

Now lets analyse the Game with player I 's perspective.

He doesn't know if player II is going to confess or deny, but he wants to decrease his punishment. So he considers two cases.

a) If player II confesses
In this case confessing gives 5 years imprisonment while denying gives 10 years
So its better to confess

b) If player II denies
In this case confessing gives only 1 years imprisonment while denying gives 1 years
Again its better to confess

So player I will like to confess if he is guilty.

Player II will argue on similar lines and will also like to confess if guilty.

Lets now assume some numbers to illustrate this fact. If player 1 assumes that player 2 would confess with probability 0.5 .The expected number of years in prison if player one confesses with probability 0.5 i
0.5 x 0.5 x ( 5 + 10 + 1 + 0 ) = 4 years.

If player I chooses Confess with probability 0.4 and Deny with probability 0.6
He assumes that player II would confess with probability 0.5

for player I

0.4 x 0.5 x 5 +
( I confesses ) ( II confesses ) ( I gets 5 years )

0.6 x 0.5 x 10 +
( I denies ) ( II confesses ) ( I gets 10 years )

0.4 x 0.5 x 0 +
( I confesses ) ( II denies ) ( I gets 0 years )

0.6 x 0.5 x 1
( I denies ) ( II denies ) ( I gets 1 year )

= 4.3 years

We see that if he is less likely to confess his penalty increases.

Illustration

Now we assume

Player I confesses with probability q
Player I assumes that player II would confess with probability p

for player I
5 pq + 0 x q(1-p) + 10 x ( 1-q )p + 1.(1-q)(1-p) years

= qp - q(4p+1) years

this is a decreasing function of q. So more likely player I is to confess less punishment he will get irrespective of what player II does

Example 5{Traffic Lights}

Individual's behaviour at a traffic intersection is also similar to prisoners delimma. When a commuter arrives and faces a red light he/she has two options.
a) Wait for light to turn Green
b) Jump the Red light

Lets call the strategy a as Obey and startegy b as Disobey. There are two players in this game. First player is the commuter and All other people at that intersection can be considered as the second player in the game. If the commuter obeys and others also obey he will have to suffer delay of 'd' that is the time required for the red light to turn green. If he disobeys but others obey his delay is 0. If he obeys but others disobey let additional delay is D ( due to congestion ) over 'd' . If all disobey total delay is D

Writing as Standard penalty Matrix

I \ II Obey Disobey

Obey d d+D

Disobey 0 D

This game is similar to prisoners delimma of exmple 4. If we analyse like last case the best option for the commuter is to disobey irrespective of what others do. This is what we see at traffic lights if there is no fine for jumping the traffic light.
Now if we introduce fines i.e. if the commuter is disobeying he can be caught by the traffic police with probability p .They fine imposed is equal to f. Let the penlty be c(d,f,p) i.e. a function of delay , fine and prob of being caught.

I \ II	Obey	Disobey
Obey	c(d,0,0)	c(d + D,0,0)
Disobey	c(0,f,p)	c(D, p,f)

if we define c (d,f,p) = d+pf the penalty matrix reduces to

I \ II Obey Disobey

Obey d d + D

Disobey 0+ pf D + pf

If we put the fine such that pf > d then we can see that obeying is the best strategy.