Game Theory
Game Theory can be regarded as a multi-agent decision problem. Which means
there are many people contending for limited rewards/payoffs. They have to
make certain moves on which their payoff depends. These people have to follow
certain rules while making these moves. Each player is suposed
to behave rationally.
Rationality: In the language of Game Theory
rationality implies that each player tries to maximize his/her payoff irrespective
to what other players are doing.
In essence each player has to decide a set of moves which are in accordance
with the rules of the game and which maximize his/her rewards.
Game Theory can be classified in two branches
Game Theory has found applications in Economic, Evolutionary Biology, Sociology, Political Science etc, now Its finding applcations in Computer Science.
What is a game?
A game has the following
Example 1{Coin Matching Game}
Coin Matching Game : Two players choose independently either Head or Tail and report it to a central authority. If both choose the same side of the coin , player 1 wins, otherwise 2 wins.
A game has the following :-
1. Set of Players.
The two players who are choosing either Head or Tail in
the Coin Matching Game form the set of players i.e. P={P1,P2}
2. Set of Rules.
R
There are ceratin rules which each player has to follow
while playing the game. Each player can safely assume that others are following
these rules. In coin matching game each player can choose either Head or
Tail. He has to act independently and made his selection only once. Player
1 wins if both selections are the same othrwise player 2 wins. These form
the Rule set R for the Coin Matching Game.
3. Set Strategies
Si for each player Pi
For example in Matching coins S1 = { H, T}
and S2 = {H,T} are the strategies of the two players. Which
means each of them can choose either Head or Tail.
3. Set of Outcomes.
O
In matching Coins its {Loss, Win} for both players.
This is a function of the strategy profile selected.
In our example S1 x S2 = {(H,H),(H,T),(T,H),(T,T)}
is the strategy profile.
clearly first and last are win situation for first player
while the middle two are win cases for the second player.
4. Pay off
ui (o) for each player i and for each outcome o e
O
This is the amount of benifit a player derives if a particular
outcome happens. In general its different for different players.
Let the payoffs in Coin Matching Game be,
u1(Win) = 100
u1(Loss) = 0
u2(Win) = 100
u2(Loss) = 0
Both the players would like to maximize their payoffs (rationality) so
both will try to win. Now lets consider a slightly different case. We redefine
the payoffs as,
Player 1 is competetor so
u1(Win) = 100
u1(Loss) = 0
While player 2 is a very concerned about seeing player 1 happy (player 1 is his little brother) so for him
u2(Win) = 10
u2(Loss) = 100
In this situation only player 1 would try hard to win while player 2 will
try to lose. The point to note is that each player tries maximize his payoff
for which he/she would like to get the Outcome which gives him maximum payoff.
Informally we can say the players sit across a table
and play the game according to the set of rules. There is an outcome for each
player when the game ends. each player derives a pay off from this
outcome. For example an outcome of victory brings payoff in terms of awards
and fame to the cricket players, while loss means no payoff. Because all
the players are rational beings they will try to maximize their payoffs.
In non co-operative games players don't know what other players are doing.
So they have to make the moves without looking at what others are doing.
Each player chooses a strategy i.e. set of
moves he would play .
Strategy
It is the set of moves that a player would play in
a game. Being rational a player would chose the startegy in such a way as
to maximize his/her payoff.
Example 2{Tic-Tac-Toe}
In Tic-Tac-Toe game there are two players x and o. Outcomes are O = {x wins,
o wins, draw}
ux(x wins ) = 1 ux(o wins) = -1
ux(draw) = 0
uo(x wins) = -1 uo(o wins) = 1
uo(draw) = 0
----------------------------------------
0
0
0
Constant sum Game : In zero sum game sum of payoff's of all the
players for each outcome of the game, is a constant.
Zero-sum games are true games of conflict.Any gain on
a player's side comes at the expense of his opponents.Think of dividing up
a pie.The size of the pie doesn't change. Its all about redistribution of
the pieces between the players.
Example 3{Chess}
Consider the game of Chess. There are two players one playing with White
pieces and one playing with Black pieces. There are three possible outcomes.
O = { Black Wins, White Wins , Draw }.
Lets define payoffs as
|
|
|
|
|
|
|
|
|
|
|
|
This is a constant sum game, sum of payoffs is constant. If white increases
his payoff by a win the payoff of black goes down and vice versa.
Above two types are called strictly competetive games. Win for one
player is loss for the other
Non Zero Sum Game
Example 4{Cricket with advertisers}
There are three palyers in a Cricket match
1. India
2. Pakistan
3. Advertisers
There are three outcomes O = { India Wins , Pakistan Wins , There is a Tie } = {I,P,T}
if we define payoffs as
uA(I) = 10
uI(I) = 10
uP(I) = 0
uA(P) = 10 u
I(P) = 0
uP(P) = 10
uA(T) = 100 uI
(T) = 5
uP(T) = 5
The payoff for advertisers remains the same whoever wins but in case of a thrilling tie, lots of people will watch the last few overs increasing advertisers' payoff manyfold.
Here we have a Non Zero Sum Game.
Example 4{Prisoner's Delimma}
There are two persons who have committed a crime of which there is no evidence. Police catches them and puts them in two separte cells. Beacuse there is no evidence against the convicts, they cannot be proven guilty. So the police tries to use one againt the other. Each Prisoner is given two options either to confess his crime or to deny it . If prisoner I confesses but prisoner II denies then the first prisoner serves as Testimony against the other and he gets no punishment, while the prisoner II gets full term of 10 yrs and vice versa. If both confess both get 5 years of imprisonment each as now police has evidence against both of them. If both deny the police has evidence against none, so maximum punishment that they can get is 1 yr each.
This can be represented in tabular form as.
I \ II | Confess | Deny |
Confess | 5,5 | 0,10 |
Deny | 10,0 | 1,1 |
This the standard representation of 2 player game. Each cell has
two payoffs, one for each player. The first number in a cell is the
penalty of player 1 and the second number is the penalty of player two. Each
row represents a startegy for player 1 and each column represents a strategy
for player 2. So the bottom right column means if Player 1 denies and Player
2 denies then penalty for player 1 is 1 year and that of player two
is also 1 year.
Now lets analyse the Game with player I 's perspective.
He doesn't know if player II is going to confess or deny, but he wants to decrease his punishment. So he considers two cases.
a) If player II confesses
In this case confessing gives 5 years imprisonment while
denying gives 10 years
So its better to confess
b) If player II denies
In this case confessing gives only 1 years imprisonment while
denying gives 1 years
Again its better to confess
So player I will like to confess if he is guilty.
Player II will argue on similar lines and will also like to confess if guilty.
Lets now assume some numbers to illustrate this fact. If player 1 assumes
that player 2 would confess with probability 0.5 .The expected number
of years in prison if player one confesses with probability 0.5 i
0.5 x 0.5 x ( 5 + 10 + 1 + 0 ) = 4 years.
If player I chooses Confess with probability 0.4 and Deny with probability
0.6
He assumes that player II would confess with probability 0.5
for player I
0.4
x
0.5 x
5
+
( I confesses )
( II confesses ) ( I gets 5 years
)
0.6
x 0.5
x 10
+
( I denies )
( II confesses )
( I gets 10 years )
0.4
x 0.5
x 0
+
( I confesses )
( II denies )
( I gets 0 years )
0.6
x 0.5
x 1
( I denies )
( II denies )
( I gets 1 year )
= 4.3 years
We see that if he is less likely to confess his penalty increases.
Illustration
Now we assume
Player I confesses with probability q
Player I assumes that player II would confess with probability p
for player I
5 pq + 0 x q(1-p) + 10 x ( 1-q )p + 1.(1-q)(1-p) years
= qp - q(4p+1) years
this is a decreasing function of q. So more likely player I is to confess less punishment he will get irrespective of what player II does
Example 5{Traffic Lights}
Individual's behaviour at a traffic intersection is also similar to prisoners
delimma. When a commuter arrives and faces a red light he/she has two options.
a) Wait for light to turn Green
b) Jump the Red light
Lets call the strategy a as Obey and startegy b as Disobey. There are two players in this game. First player is the commuter and All other people at that intersection can be considered as the second player in the game. If the commuter obeys and others also obey he will have to suffer delay of 'd' that is the time required for the red light to turn green. If he disobeys but others obey his delay is 0. If he obeys but others disobey let additional delay is D ( due to congestion ) over 'd' . If all disobey total delay is D
Writing as Standard penalty Matrix
I \ II | Obey | Disobey |
Obey | d | d+D |
Disobey | 0 | D |
This game is similar to prisoners delimma of exmple 4. If we analyse like
last case the best option for the commuter is to disobey irrespective of
what others do. This is what we see at traffic lights if there is no fine
for jumping the traffic light.
Now if we introduce fines i.e. if the commuter is
disobeying he can be caught by the traffic police with probability p .They
fine imposed is equal to f. Let the penlty be c(d,f,p) i.e. a function of
delay , fine and prob of being caught.
I \ II | Obey | Disobey |
Obey | c(d,0,0) | c(d + D,0,0) |
Disobey | c(0,f,p) | c(D, p,f) |
if we define c (d,f,p) = d+pf the penalty matrix reduces to
I \ II | Obey | Disobey |
Obey | d | d + D |
Disobey | 0+ pf | D + pf |
If we put the fine such that pf > d then we can see that obeying is the best strategy.