Lecture 4
Bräss' Paradox, and more on Mixed Strategy

6.8L transcribed by Satyavarta

1 Introduction

In this lecture, we will discuss Bräss' Paradox, mixed strategy NE in two-player zero-sum games, Min-Max theorem, and Extensive Form Games.

2 Bräss' Paradox

Consider the situation shown in the figure. A and B are two cities, with two connecting roads between them, one passing through C and the other through D. AC and CB are short but narrow roads, while AD and CB are wide but long.

Figure

Let X_i = the traffic (cars/sec) on road i, and D_i the delay on road i. Hence, for the situation in the figure,

                            D₁ = 5X₁ + 1
                            D₂ = X₂ + 32
                            D₃ = 5X₃ + 1
                            D₄ = X₄ + 32

Let the total traffic between the cities be 6 cars/sec.

The expression for the delay on the two possible paths may be expressed as

D_path₁ = 6X₁ + 33
D_path₂ = 6X₂ + 33

These conditions in equilibrium condition dictate the following distribution of traffic

X₁ = X₂ = X₃
D_P₁ = D_P₂ = 52

Suppose a wide road is built from C to D with constant delay zero.

Figure

From C,

the cost of CB is 32 + X

the cost of CDB is 5X + 1

This makes the path CDB cheaper than CB for all possible values of X Î [0,6].

Then, all cars go over ACDB. But this raises the average delay to 31+31=62. Thus if every individual acts greedy, the average delay for every individual may increase. The objective function that the society seeks to minimize is the total delay, so that the scheme is beneficial to everyone. Formally, this objective function is

minimize ĺ_p X_pd(P) (total delay)

Every individual can be charged equivalent to damage caused to society. Then the system has a desirable equilibrium. In the example here, a toll could be levied on the car to increase the efective cost of CD, so that the equilibrium exists at minimal average delay.

The cost is dependent on the individual. We shall soon study more theories on this behaviour, e.g. the Utility Theory.

3 Mixed Strategy Nash Equilibrium in two-player zero-sum games

A two-player, zero-sum game occurs as follows,

There are two players, denoted as row player and column player.
Each player makes a move simultaneously, chosen from corresponding sets of strategies known to both players.
Then, a payoff is determined based on the pair of moves made.
For this to be a zero-sum game, payoff of row player = -(payoff of column player)

Playing a mixed strategy means that the moves made by the players (in the second step above) are not made deterministically. Rather, each strategy is a probability distribution over rows (columns)

P = {p₁, p₂, ... p_n}

for p_i = probability of choosing i^th row (column).

Since exactly one row (column) may be selected (``played''),

S_ip_i = 1, p ł 0.

This is in contrast to a pure strategy, where p_i = 1 for exactly one i and 0 for all others.

Hence, the following symbols may be defined for further analysis of this game.

                            A : Payoff matrix
                            p : mixed strategy for row player
                            q : mixed strategy for column player

The payoff with such a scheme is calculated as a_ij for row player (-a_ij for column player), when i is selected by row player, and j by the column player. The expected payoff for a player is thus the sum of payoffs multiplied by the probability of that payoff. Thus,

Expected payoff to row player = S_i,jp_iq_ja_ij

Then, since we are considering a zero-sum game, the expected payoffs for the players are

row player U_r = p^TAq
column player U_c = -p^TAq

Each player wants to maximize his profit, and since he does not know what strategy the other one is using, he wants to maximize his expected profit irrespective of what the opponent plays. Hence row player wants p˘ to maximize min_q p˘^TAq, and column player wants q˘ to minimize max_p p^TAq˘.

For any pair of strategies (p˘,q˘)

min_q p˘^TAq Ł p˘^TAq˘ Ł max_p p^TAq˘

Now if a saddle point strategy (p^*, q^*) exists (so that saddle points exist in PAQ, where P and Q are vectors of p's and q's), then by definition of saddle point,

"p, q p^TAq^* Ł p^*^TAq^* Ł p^*^TAq

This is also a Nash Equilibrium, since no player can increase his own profit beyond p^*^TAq^*.

Assuming rational behaviour from the players so that they play the best best strategy available to them, the guaranteed payoffs to the players are

row player V_r = max_p min_q p^TAq = p^*^TAq₁

column player V_c = min_p max_q p^TAq = p₁^TAq^*

Property:

V_r Ł V_c

Proof: Using the fact that (p*^*,q^*) is a saddle point, we may write

V_r = p^{*^T}Aq₁ Ł p^{*^T}Aq^*

V_c = p₁^TAq^* ł p^{*^T}Aq^*

Hence,

V_r Ł V_c

We will use this property to prove the following property.

Property: Iff the guaranteed payoffs of the two players are equal, a saddle point (and hence, Nash Equilibrium) exists.

(p^*, q^*) is Saddle Point Ű V_r = V_c

Proof:

Ţ (p^*,q^*) is a Saddle Point. Then,
                            V_r = max_p min_q p^TAq = p^*^TAq^*
                            V_c = min_p max_q p^TAq = p^*^TAq^*
Ţ                             V_r = V_c
Ü V_r = V_c
From the above property,
V_r Ł p^*^TAq^* Ł V_c
Ţ V_r = V_c = p^*^TAq^*
Hence, the proof.

4 Min-Max Theorem

A relevant question to ask in such a situation is ``Does a Saddle Point always exist?'' The Min-Max Theorem says, yes, it does.

max
p

min
q
p^TAq =
min
q

max
p
p^TAq

Proof: The proof for this theorem is on the lines of the concept of duality in Linear Programming, wherein these two behave as duals of each other. (The proof I illustrate here is taken from the scribe notes by Tugkan Batu, ORIE630 Mathematical Programming Fall 1998, of lectures by Jon Kleinberg.)

We write two linear programs, one for each player with expected payoff being the objective function, and show that this pair of linear programs are actually duals of each other which will prove the theorem.

The row player's objective function is max_p min_q p^TAq. Given some x˘, the problem min((x˘A)y subject to S_iq_i = 1 , q ł 0) looks infinite, because of the possibilities of y. However, for a given x˘, there is an optimal pure strategy. The reason for this is if column player knows row player's mixed strategy, the column player can choose the column to maximize the expected payoff (instead of getting a weighted average over all columns). So, now we can write row columns' objective function as

max_p [ min_j S_i=1^mA_ijp_i ]

Now, we can write the row player's problem as a linear program as follows:

max Z

                                     Z - S_i=1ⁿA_ijp_i Ł 0 "j
                                     S_i=1ⁿ p_i = 1
                                     p ł 0

Similarly, the column player's problem is formulated as a linear program as follows:

min W

                                     W - S_j=1ⁿA_ijq_j Ł 0 "i
                                     S_j=1ⁿ q_j = 1
                                     q ł 0

Now, if we take e^* to be all the 1's vector then we can write this pair of linear programs as follows:

max 1.Z + 0.p	min 1.W + 0.y
Z.1 + p(-A) Ł 0	W.1 + (-A)y ł 0
Z.0 + p.e^* = 1	W.0 + e^*.y = 1
x ł 0	y ł 0

The generic primal/dual pair are described as

Primal	Dual
min cx +[`c][`x]	max yb + y˘b˘
x ł 0	y˘ ł 0
Mx+[`M][`x] = b	yM + y˘M˘ Ł c
M˘x+[`M]˘[`x] ł b	y[`M] + y˘[`M]˘ Ł c

Thus, the linear programs for the row and column player strategies are instantiations of this generic form with the substitution {b˘=0, b=1, c=0, [`c]=1, M=e^*, [`M]=0, M˘=-A, [`M]˘=e^*}.

5 Extensive Form Games

Modelling Chess

B(s) = set of moves black can make.

The moves of opponents are all a path along a tree of moves that can be constructed, with the board position at each node and all the possible next moves as children. The discussion on extensive form games spills over to the next lecture.

File translated from T_EX by T_TH, version 3.13.
On 14 Sep 2002, 00:11.