Bimatrix game equilibrium via nonlinear programming
These lecture notes (PDF) from Asu Ozdaglar give a bilinear program whose solution is the mixed-strategy equilibrium for a two-player, non-zero-sum game with finite action spaces—also known as a bimatrix game. I wanted to reproduce this highly practical result in a more accessible format and offer some implementation notes.
Definitions
A bimatrix game is characterized by two matrices When Alice chooses strategy and Bob chooses strategy , their payoffs are and respectively. Mixed-strategy probability vectors and yield expected payoffs of and respectively.
is an equilibrium if and for all probability vectors and
Result
Theorem: If is an equilibrium, then there exist and such that is the optimal solution to the following bilinear program:
Conversely, the optimal solution of the bilinear program is an equilibrium for the game.
Proof: For any feasible solution to the bilinear program, each element of is nonnegative (by ProbVec2), so we can use to combine the rows of the condition OptA to obtain a new valid inequality (by ProbVec1). Applying the same logic to and we find that
Now consider the equilibrium probability vectors and set and Then and we have only to show that this solution is feasible. To be an equilibrium, must earn Alice a better payoff against than does the th pure strategy: This is precisely the th row of OptA. OptB follows similarly.
The bilinear program is clearly feasible and bounded (as shown a moment ago). Let denote the optimal solution. By Nash’s theorem on the existence of mixed-strategy equilibria, we know that an equilibrium exists, and from the first part of the proof, we know how to use this equilibrium to produce a feasible solution to the bilinear program with an objective value of zero. Thus, which rearranges to
By the constraints OptA and OptB, each of the terms in parentheses is less than or equal to zero; thus, the inequality on can hold only when each of these terms equals zero exactly.
Now consider any probability vector and use it to combine the rows of OptA: We have which says that is a best response to . Applying the same logic to and completes the proof. ◼
Remarks
Solving the bilinear program is not necessarily easy. If merely solving a bimatrix game of this form is your goal, then mitigations against local optima such as iterated local search are essential. You can check whether a solution is local or global by comparing the objective value to zero, which is the global optimum guaranteed by Nash’s theorem.
In practice, I have used the bilinear program above when implementing the double oracle algorithm for games with complex action spaces. For example, in a modeling problem I am working on, pure strategies are subsets of with additional inequality and integrality constraints.
The “first oracle” in the double-oracle algorithm has you compute the equilibrium of a subgame with discrete action sets and uses the bilinear program above. In the second oracle (typically the hard part), you then augment these action sets by computing a pure-strategy best response for each player against the mixed-strategy equilibrium obtained from the first oracle. You can use the double oracle algorithm to approximate either pure- or mixed-strategy equilibria by checking convergence and terminating after either the first or second oracle, respectively.
In my use case, the open-source solver Ipopt
gives acceptable results for the first oracle. I also learned about
cyipopt, a nice set of Python bindings
for Ipopt that lets you @jit
your functions with Numba or Jax for performance.