These lecture notes (PDF) from Asu Ozdaglar give a bilinear program whose solution is the mixed-strategy equilibrium for a two-player, non-zero-sum game with finite action spaces—also known as a bimatrix game. I wanted to reproduce this highly practical result in a more accessible format and offer some implementation notes.

Definitions

A bimatrix game is characterized by two matrices A,BRn×m.A, B \in \mathbb{R}^{n \times m}. When Alice chooses strategy ii and Bob chooses strategy jj, their payoffs are AijA_{ij} and Bij,B_{ij}, respectively. Mixed-strategy probability vectors xx and yy yield expected payoffs of xTAyx^T A y and xTBy,x^T B y, respectively.

(x,y)(x^*, y^*) is an equilibrium if xTAyxTAyx^T A y^* \leq x^{*T} A y^* and xTAyxTAyx^{*T} A y \leq x^{*T} A y^* for all probability vectors xx and y.y.

Result

Theorem: (    )(\implies) If (x,y)(x^*, y^*) is an equilibrium, then there exist pp^* and qq^* such that (x,y,p,q)(x^*, y^*, p^*, q^*) is the optimal solution to the following bilinear program:

maximizef(x,y,p,q)=xTAy+xTBypqsubject toAyp1(OptA)BTxq1(OptB)xi=yi=1(ProbVec1)x0,y0(ProbVec2)\begin{aligned} \text{maximize} \quad & f(x, y, p, q) = x^T A y + x^T B y - p - q \\ \text{subject to} \quad & A y \leq p \mathbf{1} & \text{(OptA)} \\ & B^T x \leq q \mathbf{1} & \text{(OptB)} \\ & \sum x_i = \sum y_i = 1 & \text{(ProbVec1)} \\ & x \geq \mathbf{0}, y \geq \mathbf{0} & \text{(ProbVec2)} \end{aligned}

(    )(\impliedby) Conversely, the optimal solution of the bilinear program is an equilibrium for the game.

Proof: (    )(\implies) For any feasible solution to the bilinear program, each element of xx is nonnegative (by ProbVec2), so we can use xx to combine the rows of the condition OptA to obtain a new valid inequality xTAyxTp1=pxi=px^T A y \leq x^T p \mathbf{1} = p \sum x_i = p (by ProbVec1). Applying the same logic to yy and q,q, we find that f(x,y,p,q)0.f(x, y, p, q) \leq 0.

Now consider the equilibrium probability vectors (x,y)(x^*, y^*) and set p=xTAyp^* = x^{*T} A y^* and q=xTBy.q^* = x^{*T} B y^{*}. Then f(x,y,p,q)=0,f(x^*, y^*, p^*, q^*) = 0, and we have only to show that this solution is feasible. To be an equilibrium, xx^* must earn Alice a better payoff against yy^{*} than does the iith pure strategy: p=xTAy(Ay)i.p^* = x^{*T} A y^* \geq (A y^*)_i. This is precisely the iith row of OptA. OptB follows similarly.

(    )(\impliedby) The bilinear program is clearly feasible and bounded (as shown a moment ago). Let (xˉ,yˉ,pˉ,qˉ)(\bar x, \bar y, \bar p, \bar q) denote the optimal solution. By Nash’s theorem on the existence of mixed-strategy equilibria, we know that an equilibrium exists, and from the first part of the proof, we know how to use this equilibrium to produce a feasible solution to the bilinear program with an objective value of zero. Thus, f(xˉ,yˉ,pˉ,qˉ)0,f(\bar x, \bar y, \bar p, \bar q) \geq 0, which rearranges to

(xˉTAyˉpˉ)+(xˉTByˉqˉ)0.(\bar x^T A \bar y - \bar p) + (\bar x^T B \bar y - \bar q) \geq 0.

By the constraints OptA and OptB, each of the terms in parentheses is less than or equal to zero; thus, the inequality on f(xˉ,yˉ,pˉ,qˉ)f(\bar x, \bar y, \bar p, \bar q) can hold only when each of these terms equals zero exactly.

Now consider any probability vector xx and use it to combine the rows of OptA: We have xTAyˉxTpˉ1=pˉ=xˉTAyˉ,x^T A \bar y \leq x^T \bar p \mathbf{1} = \bar p = \bar x^T A \bar y, which says that xˉ\bar x is a best response to yˉ\bar y. Applying the same logic to yˉ\bar y and qˉ\bar q completes the proof. ◼

Remarks

Solving the bilinear program is not necessarily easy. If merely solving a bimatrix game of this form is your goal, then mitigations against local optima such as iterated local search are essential. You can check whether a solution is local or global by comparing the objective value to zero, which is the global optimum guaranteed by Nash’s theorem.

In practice, I have used the bilinear program above when implementing the double oracle algorithm for games with complex action spaces. For example, in a modeling problem I am working on, pure strategies are subsets of Rn\mathbb{R}^n with additional inequality and integrality constraints.

The “first oracle” in the double-oracle algorithm has you compute the equilibrium of a subgame with discrete action sets and uses the bilinear program above. In the second oracle (typically the hard part), you then augment these action sets by computing a pure-strategy best response for each player against the mixed-strategy equilibrium obtained from the first oracle. You can use the double oracle algorithm to approximate either pure- or mixed-strategy equilibria by checking convergence and terminating after either the first or second oracle, respectively.

In my use case, the open-source solver Ipopt gives acceptable results for the first oracle. I also learned about cyipopt, a nice set of Python bindings for Ipopt that lets you @jit your functions with Numba or Jax for performance.