[Pachi] A little bit lost in the code

Petr Baudis pasky at ucw.cz
Thu Jan 5 10:11:47 CET 2012


On Thu, Jan 05, 2012 at 09:40:18AM +0100, Jean-m. a. wrote:
> So for the question 2),
> Does it mean than when you are on an adversary move you
> apply the algorithm on 1-rewards to minimise its regrets?

Yes! tree_node_get_value(tree, parity, ni->u.value) will return
the correct minimax value.

> And even if you explore the branch that make you loose,
> you will have a good estimate of the score for this branch, but
> the mean rewards of the root node will still be good.

Yes. This property is thanks to the theorem that the UCB1 bandit is
uniformly optimal, i.e. chooses the best arm exponentially more often
than any other arm. Therefore, exploration will produce only
exponentially small error in the parent's mean.

(Of course, the practice is not so shiny and there are many situations
where it may take rather long time to converge to the best value.
Otherwise, we would have pretty much nothing to improve in our Go
programs. :-)

-- 
				Petr "Pasky" Baudis
	The goal of Computer Science is to build something that will
	last at least until we've finished building it.


More information about the Pachi mailing list