[Pachi] A little bit lost in the code
pasky at ucw.cz
Thu Jan 5 10:11:47 CET 2012
On Thu, Jan 05, 2012 at 09:40:18AM +0100, Jean-m. a. wrote:
> So for the question 2),
> Does it mean than when you are on an adversary move you
> apply the algorithm on 1-rewards to minimise its regrets?
Yes! tree_node_get_value(tree, parity, ni->u.value) will return
the correct minimax value.
> And even if you explore the branch that make you loose,
> you will have a good estimate of the score for this branch, but
> the mean rewards of the root node will still be good.
Yes. This property is thanks to the theorem that the UCB1 bandit is
uniformly optimal, i.e. chooses the best arm exponentially more often
than any other arm. Therefore, exploration will produce only
exponentially small error in the parent's mean.
(Of course, the practice is not so shiny and there are many situations
where it may take rather long time to converge to the best value.
Otherwise, we would have pretty much nothing to improve in our Go
Petr "Pasky" Baudis
The goal of Computer Science is to build something that will
last at least until we've finished building it.
More information about the Pachi