[Pachi] Questions after succeeded to compile Pachi by VC2012 on Windows

Yang, Jian Jian.Yang at amd.com
Mon Aug 5 03:09:01 CEST 2013


Thanks for your time.  
1) I will use +1000 or higher value for Infinity. 
2) I tested it by myself not by KGS (My level is only Tom 7K to 2D.) .  Pachi's level improved a lot with 4 threads. Will try . Spatial dictionary later.   It shows 1600 playouts/thread.    
    It makes sense now. 
3) Email thread of GPU computer-go@ mail list will help mu understand the status more.
4) A huge problem to change UCTS from 1 play per simulation to many plays per simulation.
   I have changed tree_node structure from pointer to index to adopt GPU tree search and tree build up.  
5) Convert it to C++ because VC compiler reports too many bugs with C compiler.
6) 8GB  is maximum memory size  for GPU gfx card till now. 16GB will appear in 1 to 2 years.  
   The whole tree node can be allocated in GPU only since GPU has more memory bandwidth than CPU, for example, 512 bits GDDR5 5.5G has 352GB/s while CPU with 128bits DDR3 2G = 32 GB/s. 

Thanks
Jian

-----Original Message-----
From: Petr Baudis [mailto:pasky at ucw.cz] 
Sent: 2013年8月3日 2:51
To: Yang, Jian
Cc: pachi at v.or.cz
Subject: Re: [Pachi] Questions after succeeded to compile Pachi by VC2012 on Windows

  Hello!

On Tue, Jul 30, 2013 at 03:28:32AM +0000, Yang, Jian wrote:
> After 30 hours' work,  pachi is converted  to VC2012 and compiled and run correctly under windows 7.

  Awesome work! Your effort is much appreciated. :-)

> I have several questions to begin my next investigation:
> 
> 1)      I can not find the macro definition for INFINITY.
> 
> Copy the comments from struct ucb1_policy
> 
>   /* First Play Urgency - if set to less than infinity (the MoGo paper
> 
>   * above reports 1.0 as the best), new branches are explored only
> 
>   * if none of the existing ones has higher urgency than fpu. */
> 
>   floating_t fpu;
> 
> so I used
> 
> #define FLOAT_INFINITY (1.1f)
> 
> 
> 
> Who can tell me the correct macro of FLOAT_INFINITY?

  INFINITY is a macro that should be provided by <math.h> / <cmath>
(I think both according to C99 and C++98). I'm not sure how do VC2012
users generate infinite float constants. One hint could be

	http://msdn.microsoft.com/en-us/library/6hthw3cb%28v=vs.80%29.aspx

In practice, the way Pachi uses the value, +1000 is as good as infinite
for our purposes. :-)

> 2)      I tried official Windows Build and my build.   The both level is about Tom-11K to Tom-12K on my Laptop (4 core CPU) .  I did not use KGS before
> 
> Tom 9D ==  Amateur 7D to  Professional 9D
> 
> Tom 8D ==  Amateur 5D to  Amateur 6D
> 
> Tom 7D ==  Amateur 5D entry level
> 
> Tom 6D == Amateur 4D
> 
> Tom 11K == Amateur 10K
> 
> 
> 
> Is it normal level for Pachi without fbook & spatial dictionary?

  I'm not sure exactly what CPU do you have in your laptop. My laptop
has 4-core Intel i3 CPU (sandy bridge) and Pachi's strength would be
perhaps between 1k and 5k (KGS) - I didn't do precise testing. In
general, 10k would be way too low, if Tom's ranks are any similar to
KGS ranks. How many playouts per second do you use?

  However, judging the strength of a computer program is a tricky
proposition and should be done in a statistically valid way -
estimations based e.g. on "the kind of moves it makes" do not work well
at all. How did you determine its level, did it play many games on the
Tom server?

  fbook has almost no influence on strength. Spatial dictionary can
bring it up by maybe two stones, but even without it, 10k is way too
low.

  Are you sure you are running Pachi in a single thread? Was the rest of
the machine idle according to process monitor?

  Even in single thread on a server machine, Pachi is 2k on KGS.

> 3)      I also did performance  profiling of Pachi.  Pachi is limited by UCB, board,  and patter operations.
> 
> Is it same with your profiling result?

  Yes.

> 4)      From memory profiling, the major memory allocation is tree_init. It will malloc 1GB to 1.5GB.
> 
> Is it right?  How many tree_node/board for 1.5GB?

  That sounds about right. I'm not sure about the number of nodes in the
tree, but you should be able to calculate that easily by dividing it
with sizeof(tree_node).

> 5)      I am investigating how to use GPU to accelerate computer go games.  I spent 1 week on Fuego. But Feugo is based on boost and very hard to do data parallel. So I spent another week to convert Pachi to VC2012.
> 
> Have you investigate GPU accelerating before?
> 
> For example, 1 high-eng GPU has 4K processors and 6GB memory.   My several small algorithms of data processing shows 100x to 200x speed-up with intel I7 8-Cores. 1000x faster than Intel i7 single core.
> 
> 
> 
> I believe that the memory size of struct board could be compressed to ½ .
> 
> But I am not sure 6GB is enough to play 2K moves at the  same time.

  The problem is that the 6GB memory is way too slow for continuous
frequent access to a shared data structure; you need to have that nearer
to the GPU in the local memory, which is very small. Another issue is
that we don't have a good way to make use of many simulations run in
parallel, an important aspect of MCTS is that it is very quickly
adaptive and each simulation is run from a different place in the tree.

  I recommend you to look up GPU-related threads in the computer-go@
mailing list, you will find also some experimental results there.

> 6)      Would you kindly to check my VC2012 conversion in source forge?
> 
> pthreadVSE2.dll is used  for VC2012.

  Good work! I see you also fixed some genuine bugs.

  What shall we do now? Are you interested in merging your work to
Pachi's source tree?

  In that case, we will have to do some compromises so that Pachi source
continues to be properly recognized as C code by other compilers and is
reasonably comfortable to work with. One issue I have is renaming all
files from .c to .cpp, is that really required? What happens if the
files remain with .c extension?

  I would recommend merging your changes piece by piece:

  (i) First, genuine bugs you have found, e.g. missing 'extern'
keywords.

  (ii) Second, portability adjustments that allow Pachi to be compiled
in the VC2012 Windows environment, e.g. the necessary util.h changes.

  (iii) Third, non-intrusive changes that make it easier to compile
Pachi as C++ code, e.g. naming enums - if it is really necessary to
compile Pachi as C++?

  I'm not as comfortable with others, like adding explicit typecasts to
*alloc() calls or removing compound type literals; this is one of the
reasons why I wonder if it is really required to compile Pachi as C++,
or if it is worth it to trade Pachi's code readability for full VC2012
compatibility.

  (iv) Possibly other more radical changes based on our further
discussion.

  Even if we decide that full VC2012 compatibility is not worth it on
Pachi's master branch, it will certainly make our life easier to make
some compromise and at least partially converge Pachi's source code to
VC2012 support. Also, many things (compound literals initialization,
explicit typecasts) could be done automatically every time before the
compilation by an automated pre-processor, if that doesn't sound like
a too horrible idea.

-- 
				Petr "Pasky" Baudis
	If I had more time, I would have written you a shorter
	letter.  -- Blaise Pascal



More information about the Pachi mailing list