LCZero update (2)

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: LCZero update

Post by Daniel Shawul »

Fixed number of playouts is sames as using fixed number of nodes or depth, so there shouldn't be any difference in strength on CPU and GPU with that setup.
Jhoravi
Posts: 291
Joined: Wed May 08, 2013 6:49 am

Re: LCZero update

Post by Jhoravi »

ID55 new update on http://play.lczero.org/ at 100 elo jump it prefered Kings Indian Defense in slow mode!
[pgn]1. d4 Nf6
2. c4 g6
3. Nc3 Bg7
4. e4 d6
5. Nf3 c6
6. h3 O-O
7. Bg5 h6
8. Be3 Nbd7
9. Qd2 e5
10. d5 Nc5
11. Qc2 Qe7
12. g4 Bd7
13. g5 hxg5
14. Bxg5 Rae8
15. Be2 a5
16. O-O-O a4
17. Rdg1 Kh8
18. h4 Qd8
19. h5 gxh5
20. Rxh5+ Kg8
21. Bxf6 Qxf6
22. Rhg5 Qh6
23. Qd2 f5
24. Rxg7+ Qxg7
25. Rxg7+ Kxg7
26. Qg5+ Kh7
27. Qh5+ Kg7
28. Ng5 f4
29. Bg4 Bxg4
30. Qxg4 Kf6
31. Nh7+ Ke7
32. Qg5+ Kd7
33. Qg7+ Re7
34. Nxf8+ Ke8
35. Qg8 Nd3+
36. Kd2 Nxb2
37. Ne6+ Kd7
38. Qd8#[/pgn]
User avatar
hgm
Posts: 27788
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: LCZero update

Post by hgm »

Daniel Shawul wrote:Fixed number of playouts is sames as using fixed number of nodes or depth, so there shouldn't be any difference in strength on CPU and GPU with that setup.
True. But there was doubt whether the GPU version was working correctly.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: LCZero update

Post by Laskos »

Joost Buijs wrote:
Guenther wrote:
Laskos wrote:
CMCanavessi wrote:New official version released:

https://github.com/glinscott/leela-ches ... s/tag/v0.4

Finally includes a windows build with all the dlls, and a working windows CPU-Only build as well
I have a weak video card, but I didn't expect that:
http://www.talkchess.com/forum/viewtopi ... 45&start=5

CPU version is performing much better. Is LCZero using the GPU card properly?
I had a very different experience with the (finally) working cpu version.
Here it was around 4 times slower on one thread despite having a cheap
gpu card. May be I create exact numbers again. Currently I have already
deleted the cpu version after my measurement.
Over here the CPU only version does about 400 n/s on a single core (Broadwell 3.8 GHz.), when I use my cheap GT-720 GPU with 192 Cuda cores this figure drops down to 250 n/s. On my GTX-1080Ti it runs at ~3500 n/s (when running 2 instances of the client).
Still intrigues me that using full CPU (4 cores), I can get speeds (NPS) achievable only with the best GPUs. Shouldn't top GPU be an order of magnitude faster than full CPU? On 4 cores, CPU version seems to be 1800+ CCRL Elo level. Gauntlet of games at 1s/move:

Code: Select all

Games Completed = 40 of 40 (Avg game length = 72.851 sec)
Settings = Gauntlet/64MB/1000ms per move/M 500cp for 3 moves, D 140 moves/EPD:C:\LittleBlitzer\2moves_v1.epd(32000)
Time = 3721 sec elapsed, 0 sec remaining

 1.  LCZero CPU 4 threads	    22.0/40	19-15-6  	(L: m=15 t=0 i=0 a=0)	(D: r=6 i=0 f=0 s=0 a=0)	(tpm=960.5 d=17.49 nps=3767)
 


 2.  Predateur 2.2.1 (1786)   	  10.0/20	10-10-0  	(L: m=10 t=0 i=0 a=0)	(D: r=0 i=0 f=0 s=0 a=0)	(tpm=887.6 d=55.72 nps=3133895)
 3.  Zurichess Appenzeller (1821)	8.0/20	  5-9-6  	(L: m=9 t=0 i=0 a=0)	(D: r=6 i=0 f=0 s=0 a=0)	(tpm=23.0 d=4.49 nps=959911)
Average NPS shown is about 3800. The pool consists of 2 stable about 1800 CCRL Elo engines.

I have the feeling that the new v4 client has problems uploading the games. This morning I let v4 run for some time, it produced about 30 games but only a few of them appear in the server statistics. Running the client with -debug doesn't give any extra information at all, so I really don't know what is going on.
Damir
Posts: 2801
Joined: Mon Feb 11, 2008 3:53 pm
Location: Denmark
Full name: Damir Desevac

Re: LCZero update

Post by Damir »

Here is the game against latest Network.

I was White.

History:
1. d4 d5
2. Nf3 Nf6
3. Bf4 g6
4. e3 Bg7
5. c4 O-O
6. Nc3 c6
7. Bd3 Nbd7
8. Ne5 Nxe5
9. dxe5 Ng4
10. h4 Nxe5
11. h5 Nxd3+
12. Qxd3 e5
13. Bg3 e4
14. Qd2 Qg5
15. hxg6 fxg6
16. cxd5 cxd5
17. Qxd5+ Qxd5
18. Nxd5 Bf5
19. Ne7+ Kh8
20. Nxf5 Rxf5
21. O-O-O Rb5
22. b3 a5
23. Rh4 Re8
24. Rd7 Kg8
25. Bf4 Rf8
26. Bh6 Bxh6
27. Rxh6 Rf7
28. Rxf7 Kxf7
29. Rxh7+ Kf6
30. Kc2 Rf5
31. Rxb7 Rxf2+
32. Kc3 Rxg2
33. a4 g5
34. Rb5 Rh2
35. Rxa5 g4
36. Ra8 g3
37. Rg8 g2
38. a5 Kf7
39. Rg3 Ke6
40. Kb4 Kd5
41. Kb5 Kd6
42. b4 Kc7
43. Rg7+ Kc8
44. Kb6 Rh6+
45. Kb5 Rh2
46. Rg6 Kc7
47. Kc4 Kb8
48. Kd4 Kc8
49. Kxe4 Kc7
50. Kd5 Kd8
51. b5 Kc7
52. Rg7+ Kb8
53. e4 Ka8
54. b6 Kb8
55. a6 Ka8
56. Rg6 Rh5+
57. Kc6 Rh6
58. Rxh6 g1=Q
59. Rh8+ Qg8
60. Rxg8#
Joost Buijs
Posts: 1563
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: LCZero update

Post by Joost Buijs »

Laskos wrote: Still intrigues me that using full CPU (4 cores), I can get speeds (NPS) achievable only with the best GPUs. Shouldn't top GPU be an order of magnitude faster than full CPU?
My expectation was that LCZero on GPU would run a lot faster than on CPU. On my i7-6950x (using 10 cores) the CPU version does ~2500 nps, my GTX-1080Ti does ~3500 nps, so not much difference at all.

I don't have experience with matrix multiplication on a GPU, but when I use the 1080Ti for 'public key encryption' it runs an order of magnitude faster than the 6950x and somehow I expected LCZero to perform in the same way. Maybe the OpenCL code is not optimal yet, and probably there are other things that can be optimized as well, the project is very new and my guess is that the code will mature over time.
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: LCZero update

Post by Daniel Shawul »

@HG, Ok got it.
Joost Buijs wrote:
Laskos wrote: Still intrigues me that using full CPU (4 cores), I can get speeds (NPS) achievable only with the best GPUs. Shouldn't top GPU be an order of magnitude faster than full CPU?
My expectation was that LCZero on GPU would run a lot faster than on CPU. On my i7-6950x (using 10 cores) the CPU version does ~2500 nps, my GTX-1080Ti does ~3500 nps, so not much difference at all.

I don't have experience with matrix multiplication on a GPU, but when I use the 1080Ti for 'public key encryption' it runs an order of magnitude faster than the 6950x and somehow I expected LCZero to perform in the same way. Maybe the OpenCL code is not optimal yet, and probably there are other things that can be optimized as well, the project is very new and my guess is that the code will mature over time.
It is probably because matrix-matrix multiplication is memory-bound not compute-bound. If you don't do much computation per byte loaded, your speedup over the CPU (using all cores) is probably not going to go above 5-6X. Moreover DGEMM etc have been optimized for years for vector CPU machines so they are hard to beat.

Daniel
Joost Buijs
Posts: 1563
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: LCZero update

Post by Joost Buijs »

Daniel Shawul wrote: It is probably because matrix-matrix multiplication is memory-bound not compute-bound. If you don't do much computation per byte loaded, your speedup over the CPU (using all cores) is probably not going to go above 5-6X. Moreover DGEMM etc have been optimized for years for vector CPU machines so they are hard to beat.
Daniel
You are right, but the performance seems to be lower than it can be.

I'm running under Windows, the project builds fine with MSVC-2017 but atm. I can't get it to work with the Intel compiler which is clearly a better compiler for FP work. I also would like to replace OpenBlas with Intel MKL, just because I'm curious to know which library performs better. When I have some time this weekend I will take a closer look at it.
User avatar
George Tsavdaris
Posts: 1627
Joined: Thu Mar 09, 2006 12:35 pm

Re: LCZero update

Post by George Tsavdaris »

CMCanavessi wrote:New official version released:

https://github.com/glinscott/leela-ches ... s/tag/v0.4

Finally includes a windows build with all the dlls, and a working windows CPU-Only build as well
The graph starts perhaps to show a point of diminishing returns?
Is this somewhat worrisome for the project?

http://lczero.org/
After his son's birth they've asked him:
"Is it a boy or girl?"
YES! He replied.....
User avatar
George Tsavdaris
Posts: 1627
Joined: Thu Mar 09, 2006 12:35 pm

Re: LCZero update

Post by George Tsavdaris »

jpqy wrote:
CMCanavessi wrote: It IS working, you just don't know how to use it. You need to specify the network file with -w <file>
It's indeed working when explained well.. for using it into Cutechess you need to make a play.bat file then the engine get loaded..Thanks with the help from Aloril and other guys on LCZero chat!
Hi, can you give some complete instructions (1,2,3,4 etc) about that?
A bat file containing what for example among other things? :D
After his son's birth they've asked him:
"Is it a boy or girl?"
YES! He replied.....