# Bimatrix game equilibrium via nonlinear programming

These lecture notes (PDF) from Asu Ozdaglar give a bilinear program whose solution is the mixed-strategy equilibrium for a two-player, non-zero-sum game with finite action spaces—also known as a bimatrix game. I wanted to reproduce this highly practical result in a more accessible format and offer some implementation notes.

# Definitions

A *bimatrix game* is characterized by two matrices \(A, B \in \mathbb{R}^{n
\times m}.\) When Alice chooses strategy \(i\) and Bob chooses strategy \(j\),
their payoffs are \(A_{ij}\) and \(B_{ij},\) respectively. Mixed-strategy
probability vectors \(x\) and \(y\) yield expected payoffs of \(x^T A y\) and
\(x^T B y,\) respectively.

\((x^*, y^*)\) is an *equilibrium* if \(x^T A y^* \leq x^{*T} A y^*\) and
\(x^{*T} A y \leq x^{*T} A y^*\) for all probability vectors \(x\) and \(y.\)

# Result

**Theorem:** \((\implies)\) If \((x^*, y^*)\) is an equilibrium, then there
exist \(p^*\) and \(q^*\) such that \((x^*, y^*, p^*, q^*)\) is the optimal
solution to the following bilinear program:

\((\impliedby)\) Conversely, the optimal solution of the bilinear program is an equilibrium for the game.

**Proof:** \((\implies)\) For any feasible solution to the bilinear program,
each element of \(x\) is nonnegative (by ProbVec2), so we can use \(x\) to
combine the rows of the condition OptA to obtain a new valid inequality \(x^T A
y \leq x^T p \mathbf{1} = p \sum x_i = p\) (by ProbVec1). Applying the same
logic to \(y\) and \(q,\) we find that \(f(x, y, p, q) \leq 0.\)

Now consider the equilibrium probability vectors \((x^*, y^*)\) and set \(p^* = x^{*T} A y^*\) and \(q^* = x^{*T} B y^{*}.\) Then \(f(x^*, y^*, p^*, q^*) = 0,\) and we have only to show that this solution is feasible. To be an equilibrium, \(x^*\) must earn Alice a better payoff against \(y^{*}\) than does the \(i\)th pure strategy: \(p^* = x^{*T} A y^* \geq (A y^*)_i.\) This is precisely the \(i\)th row of OptA. OptB follows similarly.

\((\impliedby)\) The bilinear program is clearly feasible and bounded (as shown a moment ago). Let \((\bar x, \bar y, \bar p, \bar q)\) denote the optimal solution. By Nash’s theorem on the existence of mixed-strategy equilibria, we know that an equilibrium exists, and from the first part of the proof, we know how to use this equilibrium to produce a feasible solution to the bilinear program with an objective value of zero. Thus, \(f(\bar x, \bar y, \bar p, \bar q) \geq 0,\) which rearranges to

\[(\bar x^T A \bar y - \bar p) + (\bar x^T B \bar y - \bar q) \geq 0.\]By the constraints OptA and OptB, each of the terms in parentheses is less than
or equal to zero; thus, the inequality on \(f(\bar x, \bar y, \bar p, \bar q)\)
can hold only when each of these terms *equals* zero exactly.

Now consider any probability vector \(x\) and use it to combine the rows of OptA: We have \(x^T A \bar y \leq x^T \bar p \mathbf{1} = \bar p = \bar x^T A \bar y,\) which says that \(\bar x\) is a best response to \(\bar y\). Applying the same logic to \(\bar y\) and \(\bar q\) completes the proof. ◼

# Remarks

Solving the bilinear program is not necessarily easy. If merely solving a bimatrix game of this form is your goal, then mitigations against local optima such as iterated local search are essential. You can check whether a solution is local or global by comparing the objective value to zero, which is the global optimum guaranteed by Nash’s theorem.

In practice, I have used the bilinear program above when implementing the double oracle algorithm for games with complex action spaces. For example, in a modeling problem I am working on, pure strategies are subsets of \(\mathbb{R}^n\) with additional inequality and integrality constraints.

The “first oracle” in the double-oracle algorithm has you compute the
equilibrium of a subgame with discrete action sets and uses the bilinear program
above. In the second oracle (typically the hard part), you then augment these
action sets by computing a pure-strategy best response for each player against
the mixed-strategy equilibrium obtained from the first oracle. You can use the
double oracle algorithm to approximate either pure- *or* mixed-strategy
equilibria by checking convergence and terminating after either the first or
second oracle, respectively.

In my use case, the open-source solver Ipopt
gives acceptable results for the first oracle. I also learned about
cyipopt, a nice set of Python bindings
for Ipopt that lets you `@jit`

your functions with Numba or Jax for performance.