These lecture notes (PDF) from Asu Ozdaglar give a bilinear program whose solution is the mixed-strategy equilibrium for a two-player, non-zero-sum game with finite action spaces—also known as a bimatrix game. I wanted to reproduce this highly practical result in a more accessible format and offer some implementation notes.

Definitions

A bimatrix game is characterized by two matrices \(A, B \in \mathbb{R}^{n \times m}.\) When Alice chooses strategy \(i\) and Bob chooses strategy \(j\), their payoffs are \(A_{ij}\) and \(B_{ij},\) respectively. Mixed-strategy probability vectors \(x\) and \(y\) yield expected payoffs of \(x^T A y\) and \(x^T B y,\) respectively.

\((x^*, y^*)\) is an equilibrium if \(x^T A y^* \leq x^{*T} A y^*\) and \(x^{*T} A y \leq x^{*T} A y^*\) for all probability vectors \(x\) and \(y.\)

Result

Theorem: \((\implies)\) If \((x^*, y^*)\) is an equilibrium, then there exist \(p^*\) and \(q^*\) such that \((x^*, y^*, p^*, q^*)\) is the optimal solution to the following bilinear program:

\[\begin{aligned} \text{maximize} \quad & f(x, y, p, q) = x^T A y + x^T B y - p - q \\ \text{subject to} \quad & A y \leq p \mathbf{1} & \text{(OptA)} \\ & B^T x \leq q \mathbf{1} & \text{(OptB)} \\ & \sum x_i = \sum y_i = 1 & \text{(ProbVec1)} \\ & x \geq \mathbf{0}, y \geq \mathbf{0} & \text{(ProbVec2)} \end{aligned}\]

\((\impliedby)\) Conversely, the optimal solution of the bilinear program is an equilibrium for the game.

Proof: \((\implies)\) For any feasible solution to the bilinear program, each element of \(x\) is nonnegative (by ProbVec2), so we can use \(x\) to combine the rows of the condition OptA to obtain a new valid inequality \(x^T A y \leq x^T p \mathbf{1} = p \sum x_i = p\) (by ProbVec1). Applying the same logic to \(y\) and \(q,\) we find that \(f(x, y, p, q) \leq 0.\)

Now consider the equilibrium probability vectors \((x^*, y^*)\) and set \(p^* = x^{*T} A y^*\) and \(q^* = x^{*T} B y^{*}.\) Then \(f(x^*, y^*, p^*, q^*) = 0,\) and we have only to show that this solution is feasible. To be an equilibrium, \(x^*\) must earn Alice a better payoff against \(y^{*}\) than does the \(i\)th pure strategy: \(p^* = x^{*T} A y^* \geq (A y^*)_i.\) This is precisely the \(i\)th row of OptA. OptB follows similarly.

\((\impliedby)\) The bilinear program is clearly feasible and bounded (as shown a moment ago). Let \((\bar x, \bar y, \bar p, \bar q)\) denote the optimal solution. By Nash’s theorem on the existence of mixed-strategy equilibria, we know that an equilibrium exists, and from the first part of the proof, we know how to use this equilibrium to produce a feasible solution to the bilinear program with an objective value of zero. Thus, \(f(\bar x, \bar y, \bar p, \bar q) \geq 0,\) which rearranges to

\[(\bar x^T A \bar y - \bar p) + (\bar x^T B \bar y - \bar q) \geq 0.\]

By the constraints OptA and OptB, each of the terms in parentheses is less than or equal to zero; thus, the inequality on \(f(\bar x, \bar y, \bar p, \bar q)\) can hold only when each of these terms equals zero exactly.

Now consider any probability vector \(x\) and use it to combine the rows of OptA: We have \(x^T A \bar y \leq x^T \bar p \mathbf{1} = \bar p = \bar x^T A \bar y,\) which says that \(\bar x\) is a best response to \(\bar y\). Applying the same logic to \(\bar y\) and \(\bar q\) completes the proof. ◼

Remarks

Solving the bilinear program is not necessarily easy. If merely solving a bimatrix game of this form is your goal, then mitigations against local optima such as iterated local search are essential. You can check whether a solution is local or global by comparing the objective value to zero, which is the global optimum guaranteed by Nash’s theorem.

In practice, I have used the bilinear program above when implementing the double oracle algorithm for games with complex action spaces. For example, in a modeling problem I am working on, pure strategies are subsets of \(\mathbb{R}^n\) with additional inequality and integrality constraints.

The “first oracle” in the double-oracle algorithm has you compute the equilibrium of a subgame with discrete action sets and uses the bilinear program above. In the second oracle (typically the hard part), you then augment these action sets by computing a pure-strategy best response for each player against the mixed-strategy equilibrium obtained from the first oracle. You can use the double oracle algorithm to approximate either pure- or mixed-strategy equilibria by checking convergence and terminating after either the first or second oracle, respectively.

In my use case, the open-source solver Ipopt gives acceptable results for the first oracle. I also learned about cyipopt, a nice set of Python bindings for Ipopt that lets you @jit your functions with Numba or Jax for performance.