20x256 Leela seemed quite strong

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: 20x256 Leela seemed quite strong

Post by Laskos »

jkiliani wrote: Sun May 27, 2018 10:34 am
Laskos wrote: Sun May 27, 2018 8:49 am
Uri Blass wrote: Sun May 27, 2018 2:00 am

For comparison the CCRL got only 2651 elo for Lc0 w323 at 40/40
Note that I do not see that they tested it at 40/4 to see if it scales better than other engines.

I wonder what is the source of the difference in rating.
possible reasons for differences
1)342 is stronger than 323
2)better hardware for Leela that CCRL did not use
3)different setting that CCRL did not use.
4)different time control(40/40 and not 40/4)

http://www.computerchess.org.uk/ccrl/40 ... 4-bit_w323
I am not sure what hardware they used. Was LCZero running on GPU? If it was CPU v0.10 version, then the rating is pretty expected one. CPU version is MUCH weaker than a moderately strong GPU GTX 1060 (like mine) version, and I used cuDNN Lc0, which is maybe 150 Elo points stronger than the official GPU v0.10 version. Also the settings. I am discovering (the sample is still small) that what seemed I fiddled wrongly with , "virtual loss bug", is VERY important. When checking with it yesterday, I got a small improvement in tactical test-suites with a setting of 60.0 (the maximum is 100.0). With these overall settings

Code: Select all

Scale thinking time=2.8
Cpuct MCTS option=15
First Play Urgency Reduction=-0.30
Virtual loss bug=60.0
I got 12 wins of SF dev (4 cores) and 8 draws. I reverted to the default "virtual loss bug" of 0.0, and the result was disastrous: 19 SF dev wins and 1 draw. Seeing this, I put "virtual loss bug" at its maximum, 100.0, and as of now have an excellent result (very small sample) of 4 draws and 1 SF dev win. Will play for a total of 20 games with these settings:

Code: Select all

Scale thinking time=2.8
Cpuct MCTS option=15
First Play Urgency Reduction=-0.30
Virtual loss bug=100.0
If the result keeps staying even remotely close to this excellent 4 draws and 1 loss, then there are serious setting changes compared to default which can improve Lc0 play by hundreds of Elo points. I maximized the performance on tactical test-suites, which is much faster to do than in games, but I didn't expect that this "virtual loss bug" setting can be so important, as in test-suites it was only slightly improving the result. The games are at 5'+ 5'', openings are 3-mover balanced GM openings. In 4-5 hours I will report the result in 20 games. The samples are all small here, 20 games each, but these things are time consuming for a single user like me.

NN used was 342, which is probably stronger than NN323 used in CCRL, but this is sure not the most important factor here.
Virtual loss bug is a (different) type of FPU reduction, one that prunes little at the root but a lot at the leaves. Kind of strange to combine negative regular FPU reduction with positive FPU reduction by virtual loss, but if it works for you...
Thanks for explaining a bit, I was pretty in the dark what this "Virtual loss bug" stands for. Samples are small. With "Virtual loss bug=100", as of now I have 7 SF wins and 6 draws, in line with earlier outlandish value of 60 for this parameter. With "Virtual loss bug=0.0" (default), I had an outside error bars result of 19 losses and 1 draw. In fact, almost all the fitted parameters (CPUCT, FPUR, VLB) to tactical test-suites with 10s/position are sort of outlandish, but real games at comparable time control seem to show good performance, in fact significantly better than my earlier results with both default and CLOP fits. But samples are small. I might combine the last two matches with VLB=60 and VLB=100, as in tactical test suites they came equal in positions solved and almost equal in time-to-solved-position, for a total match of 40 games. Interesting to note that in the searches at 10s/position on testsuites, the results were pretty much deterministic.
Last edited by Laskos on Sun May 27, 2018 11:17 am, edited 5 times in total.
jkiliani
Posts: 143
Joined: Wed Jan 17, 2018 1:26 pm

Re: 20x256 Leela seemed quite strong

Post by jkiliani »

jp wrote: Sun May 27, 2018 10:59 am
jkiliani wrote: Sun May 27, 2018 10:34 am Virtual loss bug is a (different) type of FPU reduction, one that prunes little at the root but a lot at the leaves. Kind of strange to combine negative regular FPU reduction with positive FPU reduction by virtual loss, but if it works for you...
So are you saying FPU prunes a lot at the root & little at the leaves? And the combination used is kind of wide but not deep?
There is no setting for FPU, only for FPU reduction. Base FPU (i.e. the evaluation used for nodes that haven't been expanded yet) is the evaluation of their parent node. FPU reduction reduces this (if set to a positive value), or increases it if set negative. The amount of the adjustment here is largest if the parent node is already fully explored, i.e. all other child nodes are expanded, and smaller if no other child no is expanded so far. So regular FPU reduction prunes a lot at the root, little at the leaves.

FPU reduction by virtual loss, called "virtual loss bug" by Lc0, reduced FPU by an amount that is inversely proportional to the amount of visits that node already has. So it prunes little at the root, and a lot at the leaves.

Kai's tuning results suggest that pruning at the leaves while exploring the root fully and thoroughly is optimal for tactics, which is not that surprising to me.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: 20x256 Leela seemed quite strong

Post by Laskos »

Laskos wrote: Sun May 27, 2018 11:06 am
jkiliani wrote: Sun May 27, 2018 10:34 am
Laskos wrote: Sun May 27, 2018 8:49 am
I am not sure what hardware they used. Was LCZero running on GPU? If it was CPU v0.10 version, then the rating is pretty expected one. CPU version is MUCH weaker than a moderately strong GPU GTX 1060 (like mine) version, and I used cuDNN Lc0, which is maybe 150 Elo points stronger than the official GPU v0.10 version. Also the settings. I am discovering (the sample is still small) that what seemed I fiddled wrongly with , "virtual loss bug", is VERY important. When checking with it yesterday, I got a small improvement in tactical test-suites with a setting of 60.0 (the maximum is 100.0). With these overall settings

Code: Select all

Scale thinking time=2.8
Cpuct MCTS option=15
First Play Urgency Reduction=-0.30
Virtual loss bug=60.0
I got 12 wins of SF dev (4 cores) and 8 draws. I reverted to the default "virtual loss bug" of 0.0, and the result was disastrous: 19 SF dev wins and 1 draw. Seeing this, I put "virtual loss bug" at its maximum, 100.0, and as of now have an excellent result (very small sample) of 4 draws and 1 SF dev win. Will play for a total of 20 games with these settings:

Code: Select all

Scale thinking time=2.8
Cpuct MCTS option=15
First Play Urgency Reduction=-0.30
Virtual loss bug=100.0
If the result keeps staying even remotely close to this excellent 4 draws and 1 loss, then there are serious setting changes compared to default which can improve Lc0 play by hundreds of Elo points. I maximized the performance on tactical test-suites, which is much faster to do than in games, but I didn't expect that this "virtual loss bug" setting can be so important, as in test-suites it was only slightly improving the result. The games are at 5'+ 5'', openings are 3-mover balanced GM openings. In 4-5 hours I will report the result in 20 games. The samples are all small here, 20 games each, but these things are time consuming for a single user like me.

NN used was 342, which is probably stronger than NN323 used in CCRL, but this is sure not the most important factor here.
Virtual loss bug is a (different) type of FPU reduction, one that prunes little at the root but a lot at the leaves. Kind of strange to combine negative regular FPU reduction with positive FPU reduction by virtual loss, but if it works for you...
Thanks for explaining a bit, I was pretty in the dark what this "Virtual loss bug" stands for. Samples are small. With "Virtual loss bug=100", as of now I have 7 SF wins and 6 draws, in line with earlier outlandish value of 60 for this parameter. With "Virtual loss bug=0.0" (default), I had an outside error bars result of 19 losses and 1 draw. In fact, almost all the fitted parameters (CPUCT, FPUR, VLB) to tactical test-suites with 10s/position are sort of outlandish, but real games at comparable time control seem to show good performance, in fact significantly better than my earlier results with both default and CLOP fits. But samples are small. I might combine the last two matches with VLB=60 and VLB=100, as in tactical test suites they came equal in positions solved and almost equal in time-to-solved-position, for a total match of 40 games. Interesting to note that in the searches at 10s/position on testsuites, the results were pretty much deterministic.
Aside the common to both modification to these parameters:

Scale thinking time=2.8
Cpuct MCTS option=15
First Play Urgency Reduction=-0.30

with "Virtual loss bug"=100 the result in 20 games is identical to that with "Virtual loss bug"=60. 12 SF dev wins, 8 draws. All in all, combined these 2 matches with very large values of this parameter, 24 SF dev wins, 16 draws. That would give lc0 cuDNN NN342 a rating of about 3350 +/- 70 CCRL 40/4' Elo points. It is higher than what I previously got with default or CLOP settings.

These parameters optimize the results on tactical test suites compared to default, for example on WAC200, from 100/200 to 153/200 (10s/position NN342).
yanquis1972
Posts: 1766
Joined: Wed Jun 03, 2009 12:14 am

Re: 20x256 Leela seemed quite strong

Post by yanquis1972 »

kai, do you have smart pruning enabled? i rarely see it mentioned, but it's another variable.

have you run in tests with it off or with FPU at an even higher (negative) value with it on?
Nay Lin Tun
Posts: 708
Joined: Mon Jan 16, 2012 6:34 am

Re: 20x256 Leela seemed quite strong

Post by Nay Lin Tun »

So, I played another two blitz games with the same setting , whereas stockfish was updated to yesterday version.
Result- 1 draw, 1 win by stockfish.
Total update, games played= 4, 3 draws, 1 win by stockfish. (rating difference so far, +89 in favour of stockfish , with big error margin)
This draw game was exciting, whereas stockfish exchanged 1 minor vs 3 pawns, there was big disagreement between two engines since move 10, whereas stockfish saw he had big advantage from his side ,but leela saw big advantage from her side.

https://lichess.org/OspxZJ8J#111

https://ibb.co/ku8tDJ
Albert Silver
Posts: 3019
Joined: Wed Mar 08, 2006 9:57 pm
Location: Rio de Janeiro, Brazil

Re: 20x256 Leela seemed quite strong

Post by Albert Silver »

Laskos wrote: Sun May 27, 2018 1:48 pm
Aside the common to both modification to these parameters:

Scale thinking time=2.8
Cpuct MCTS option=15
First Play Urgency Reduction=-0.30

with "Virtual loss bug"=100 the result in 20 games is identical to that with "Virtual loss bug"=60. 12 SF dev wins, 8 draws. All in all, combined these 2 matches with very large values of this parameter, 24 SF dev wins, 16 draws. That would give lc0 cuDNN NN342 a rating of about 3350 +/- 70 CCRL 40/4' Elo points. It is higher than what I previously got with default or CLOP settings.

These parameters optimize the results on tactical test suites compared to default, for example on WAC200, from 100/200 to 153/200 (10s/position NN342).
I have been running CLOP on the FPU and cPUCT settings for a few days now, with over 900 games so far (each game lasts about 5 mins avg). I also test against 3 opponents (rated 3080-3100 CCRL) instead of the just one other tests seem to do. Here are the results until now:

Settings results:

Image

Win rate results:

Image
"Tactics are the bricks and sticks that make up a game, but positional play is the architectural blueprint."
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: 20x256 Leela seemed quite strong

Post by Laskos »

Albert Silver wrote: Mon May 28, 2018 6:30 pm
Laskos wrote: Sun May 27, 2018 1:48 pm
Aside the common to both modification to these parameters:

Scale thinking time=2.8
Cpuct MCTS option=15
First Play Urgency Reduction=-0.30

with "Virtual loss bug"=100 the result in 20 games is identical to that with "Virtual loss bug"=60. 12 SF dev wins, 8 draws. All in all, combined these 2 matches with very large values of this parameter, 24 SF dev wins, 16 draws. That would give lc0 cuDNN NN342 a rating of about 3350 +/- 70 CCRL 40/4' Elo points. It is higher than what I previously got with default or CLOP settings.

These parameters optimize the results on tactical test suites compared to default, for example on WAC200, from 100/200 to 153/200 (10s/position NN342).
I have been running CLOP on the FPU and cPUCT settings for a few days now, with over 900 games so far (each game lasts about 5 mins avg). I also test against 3 opponents (rated 3080-3100 CCRL) instead of the just one other tests seem to do. Here are the results until now:

Settings results:

Image

Win rate results:

Image
Thanks, Albert. I hope this gets something. I spent 3 days on something inconclusive with my "extreme" tactical settings (optimizing on tactical suites). Although on tactical suites in 10s/positions I got conclusive results 153/200 vs 100/200 with NN342 on WAC200 and similar result on ECM64, in games the results were either inconclusive, or depending on time control, or both. And very time consuming. Result with my extreme tactical settings versus default against SF dev (on 1 or 4 threads, depending on the test):

1m + 1s:
my "tactical": 5/40
default: 14/40

5m + 5s:
my "tactical": 8/40 (16 draws, 24 losses)
default: 6/40 (12 draws, 28 losses)

This last one already lasted a full day or so, and was encouraging, the performance improved dramatically with TC. Maybe at even longer time control, my settings become very much better than the default?

Seeing this, I decided to check at relatively LTC 15m + 15s:
my "tactical": 2 draws, 6 losses
default: 4 draws, 4 losses

Took more than half a day, and abandoned the test. Almost sure "my settings" are not dramatically better than the default ones at LTC, although the trend seemed to indicate an improvement with TC. I stopped this time consuming game with weak statistics, it annoys me.

Now I am trying to see the progress which stalled in their self-games from NN342 to NN352, but as of now (early), the progress is visible against an external engine at ultra-fast games. At least in this I will accumulate some statistic with ultra-fast games.
Albert Silver
Posts: 3019
Joined: Wed Mar 08, 2006 9:57 pm
Location: Rio de Janeiro, Brazil

Re: 20x256 Leela seemed quite strong

Post by Albert Silver »

Laskos wrote: Mon May 28, 2018 11:26 pm
Albert Silver wrote: Mon May 28, 2018 6:30 pm
Aside the common to both modification to these parameters:

Scale thinking time=2.8
Cpuct MCTS option=15
First Play Urgency Reduction=-0.30

with "Virtual loss bug"=100 the result in 20 games is identical to that with "Virtual loss bug"=60. 12 SF dev wins, 8 draws. All in all, combined these 2 matches with very large values of this parameter, 24 SF dev wins, 16 draws. That would give lc0 cuDNN NN342 a rating of about 3350 +/- 70 CCRL 40/4' Elo points. It is higher than what I previously got with default or CLOP settings.

These parameters optimize the results on tactical test suites compared to default, for example on WAC200, from 100/200 to 153/200 (10s/position NN342).
I have been running CLOP on the FPU and cPUCT settings for a few days now, with over 900 games so far (each game lasts about 5 mins avg). I also test against 3 opponents (rated 3080-3100 CCRL) instead of the just one other tests seem to do. Here are the results until now:

Now I am trying to see the progress which stalled in their self-games from NN342 to NN352, but as of now (early), the progress is visible against an external engine at ultra-fast games. At least in this I will accumulate some statistic with ultra-fast games.
Well, I think it was Dann who commented that the secret to Fruit's success was the sheer lack of bugs, proving it was more important than having clever ideas for algorithms. As such, in Discord today a rather glaring issue came up that is now the source of a serious hunt. Somehow, something happened precisely during the transition to 15x192. This is evidenced in the following spreadsheet:

https://docs.google.com/spreadsheets/d/ ... =449522518

The switch was made from NN226 (10x128) and NN227 (15x192) and it is striking.
"Tactics are the bricks and sticks that make up a game, but positional play is the architectural blueprint."
JohnS
Posts: 215
Joined: Sun Feb 24, 2008 2:08 am

Re: 20x256 Leela seemed quite strong

Post by JohnS »

Albert Silver wrote: Tue May 29, 2018 12:40 am
Well, I think it was Dann who commented that the secret to Fruit's success was the sheer lack of bugs, proving it was more important than having clever ideas for algorithms. As such, in Discord today a rather glaring issue came up that is now the source of a serious hunt. Somehow, something happened precisely during the transition to 15x192. This is evidenced in the following spreadsheet:

https://docs.google.com/spreadsheets/d/ ... =449522518

The switch was made from NN226 (10x128) and NN227 (15x192) and it is striking.
Albert. Which bug is this thanks.
jp
Posts: 1470
Joined: Mon Apr 23, 2018 7:54 am

Re: 20x256 Leela seemed quite strong

Post by jp »

Albert Silver wrote: Tue May 29, 2018 12:40 am Well, I think it was Dann who commented that the secret to Fruit's success was the sheer lack of bugs, proving it was more important than having clever ideas for algorithms. As such, in Discord today a rather glaring issue came up that is now the source of a serious hunt. Somehow, something happened precisely during the transition to 15x192.
The question is how Fruit got rid of bugs. What Leela needs is clever ideas for bug hunting.