Monday, May 25, 2020

Counterfactual Regret Minimization with Kuhn Poker

Expected utility at $h$ for player $i$ is, with with $\sigma$ strategies of players is,

$$u_i(\sigma, h) = \sum_{z \in Z, h \sqsubset z} \pi^\sigma (h, z) u_i(z)$$

where,

  • $Z$ is the set of all terminal histories
  • $h \sqsubset z$ means $h$ is a prefix of $z$
  • $\pi^\sigma (h, z)$ is the probability of reaching $z$ from $h$ with $\sigma$ strategies of players.

$u_i(\sigma_{I \to a}, h)$ is the utility if it always took action $a$ at current information set.

We compute $u$ recursively,

$$ \begin{align} u_i(\sigma_{I \to a}, h) &= u_i(\sigma, h a) \\ u_i(\sigma, h) &= \sum_{a \in A(I)} \sigma(I, a) u_i(\sigma_{I \to a}, h) \\ \end{align} $$



from Hacker News https://ift.tt/36iiwLP

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.