LCZero: Progress and Scaling. Relation to CCRL Elo

Laskos · Post by **Laskos** » Mon Apr 16, 2018 10:51 am

Werewolf wrote:
Laskos wrote:
duncan wrote:
Laskos wrote: As of now, the progress does not seem to be plateauing. If it continues in the same fashion, and the framework continues its work, by the 20th of April we can expect LCZero to be about 2750 CCRL Elo points.
still on target ?
Well, my initial assessments of this engine were complete rubbish, but 2750 CCRL Elo seems already achieved, say on a very strong GPU like Nvidia 1080 and LTC of say 1 minute per move.

As of now, it seems to progress again slowly, by my preliminary results with meager CPU means and short time control of 1s/move, maybe these results are not very representative. I will post my results maybe this night or tomorrow.

I had to finish my match early because I need my PC back.

Sadly it didn't go as well as I expected.

LCZero 127 on Nvidia 1060 @ 15 sec/move
Colossus 2008b @15 sec/move

31.5/86

LCZero was dominating in the first third of the match but then went downhill really badly. Made me think there was either a bug or the hash (or whatever it uses) needs clearing.

Still a very good performance, around 2550-2600 CCRL Elo points.

I am getting very bad "progress" on my CPU testing with v0.6 from ID124 to ID134. The strongest "bignet" was ID124, and since then it seems to go downhill. I left overnight some testing batches, and the results came very bad as progression goes. Either my CPU testing at fast TC (1s/move and 10s/move) is unrepresentative, or the chosen opponent, a stable engine Jabba 1.0 (about 2050 CCRL) is a bad choice as an opponent, but here I am:

1/ Games at 1s/move on 1 CPU core (about 200 playouts per move on average) against Jabba 1.0 standard, stable engine:

ID124 (the second "bignet")
172.5/500

Code: Select all

Games Completed = 500 of 500 &#40;Avg game length = 103.231 sec&#41;
Settings = Gauntlet/64MB/1000ms per move/M 5500cp for 30 moves, D 150 moves/EPD&#58;C&#58;\LittleBlitzer\3moves_GM_04.epd&#40;817&#41;
Time = 14275 sec elapsed, 0 sec remaining
 1.  LCZero CPU ID124         	172.5/500	116-271-113  	&#40;L&#58; m=271 t=0 i=0 a=0&#41;	&#40;D&#58; r=82 i=13 f=13 s=2 a=3&#41;	&#40;tpm=948.5 d=12.49 nps=230&#41;
 2.  Jabba 1.0                	327.5/500	271-116-113  	&#40;L&#58; m=116 t=0 i=0 a=0&#41;	&#40;D&#58; r=82 i=13 f=13 s=2 a=3&#41;	&#40;tpm=802.5 d=9.11 nps=0&#41;

ID133
111.0/500

Code: Select all

Games Completed = 500 of 500 &#40;Avg game length = 89.922 sec&#41;
Settings = Gauntlet/64MB/1000ms per move/M 5500cp for 30 moves, D 150 moves/EPD&#58;C&#58;\LittleBlitzer\3moves_GM_04.epd&#40;817&#41;
Time = 12964 sec elapsed, 0 sec remaining
 1.  LCZero CPU ID133         	111.0/500	69-347-84  	&#40;L&#58; m=347 t=0 i=0 a=0&#41;	&#40;D&#58; r=68 i=10 f=5 s=0 a=1&#41;	&#40;tpm=945.7 d=12.49 nps=151&#41;
 2.  Jabba 1.0                	389.0/500	347-69-84  	&#40;L&#58; m=69 t=0 i=0 a=0&#41;	&#40;D&#58; r=68 i=10 f=5 s=0 a=1&#41;	&#40;tpm=803.7 d=8.94 nps=0&#41;

ID134
95.0/500

Code: Select all

Games Completed = 500 of 500 &#40;Avg game length = 90.115 sec&#41;
Settings = Gauntlet/64MB/1000ms per move/M 5500cp for 30 moves, D 150 moves/EPD&#58;C&#58;\LittleBlitzer\3moves_GM_04.epd&#40;817&#41;
Time = 12647 sec elapsed, 0 sec remaining
 1.  LCZero CPU ID134         	95.0/500	59-369-72  	&#40;L&#58; m=369 t=0 i=0 a=0&#41;	&#40;D&#58; r=60 i=4 f=5 s=0 a=3&#41;	&#40;tpm=943.8 d=12.52 nps=172&#41;
 2.  Jabba 1.0                	405.0/500	369-59-72  	&#40;L&#58; m=59 t=0 i=0 a=0&#41;	&#40;D&#58; r=60 i=4 f=5 s=0 a=3&#41;	&#40;tpm=804.8 d=8.72 nps=0&#41;

A clear and pronounced regression of about 140 Elo points, outside error margins. And that after some 500,000 games from ID124 to ID134.

I am not sure if this is representative for strong GPU results. But I tested also at 10s/move, or about 2000 playouts per move on my CPU, this is already not that few, and the results show again a regression almost outside error margins:

2/

ID124
13.0/20

Code: Select all

Games Completed = 20 of 20 &#40;Avg game length = 1261.311 sec&#41;
Settings = Gauntlet/64MB/10000ms per move/M 5500cp for 30 moves, D 150 moves/EPD&#58;C&#58;\LittleBlitzer\3moves_GM_04.epd&#40;817&#41;
Time = 6754 sec elapsed, 0 sec remaining
 1.  LCZero CPU ID124         	13.0/20	12-6-2  	&#40;L&#58; m=6 t=0 i=0 a=0&#41;	&#40;D&#58; r=2 i=0 f=0 s=0 a=0&#41;	&#40;tpm=7238.3 d=16.19 nps=185&#41;
 2.  Jabba 1.0                	7.0/20	6-12-2  	&#40;L&#58; m=12 t=0 i=0 a=0&#41;	&#40;D&#58; r=2 i=0 f=0 s=0 a=0&#41;	&#40;tpm=9803.3 d=11.84 nps=0&#41;

ID134
7.5/20

Code: Select all

Games Completed = 20 of 20 &#40;Avg game length = 948.178 sec&#41;
Settings = Gauntlet/64MB/10000ms per move/M 5500cp for 30 moves, D 150 moves/EPD&#58;C&#58;\LittleBlitzer\3moves_GM_04.epd&#40;817&#41;
Time = 5484 sec elapsed, 0 sec remaining
 1.  LCZero CPU ID134         	7.5/20	6-11-3  	&#40;L&#58; m=11 t=0 i=0 a=0&#41;	&#40;D&#58; r=3 i=0 f=0 s=0 a=0&#41;	&#40;tpm=7094.3 d=16.44 nps=639&#41;
 2.  Jabba 1.0                	12.5/20	11-6-3  	&#40;L&#58; m=6 t=0 i=0 a=0&#41;	&#40;D&#58; r=3 i=0 f=0 s=0 a=0&#41;	&#40;tpm=9781.1 d=11.55 nps=0&#41;

This should be more representative, but has few games.

I am unable to test with v0.6 EPD test suites, either with Polyglot, or any GUI. Seems that the PV is not outputted in standard form. I would be curious to see where this regression comes from, from tactical or positional factors (I have one very positional test suite and several very tactical). Maybe there is some conflict in the training between policy part of the network, which gives a probability distribution over possible moves, and the value part of the network, which gives probability of winning given the board.

Guenther · Post by **Guenther** » Mon Apr 16, 2018 2:07 pm

Laskos wrote:
I am getting very bad "progress" on my CPU testing with v0.6 from ID124 to ID134. The strongest "bignet" was ID124, and since then it seems to go downhill. I left overnight some testing batches, and the results came very bad as progression goes. Either my CPU testing at fast TC (1s/move and 10s/move) is unrepresentative, or the chosen opponent, a stable engine Jabba 1.0 (about 2050 CCRL) is a bad choice as an opponent, but here I am:

1/ Games at 1s/move on 1 CPU core (about 200 playouts per move on average) against Jabba 1.0 standard, stable engine:

ID124 (the second "bignet")
172.5/500
Code: Select all
Games Completed = 500 of 500 &#40;Avg game length = 103.231 sec&#41;
Settings = Gauntlet/64MB/1000ms per move/M 5500cp for 30 moves, D 150 moves/EPD&#58;C&#58;\LittleBlitzer\3moves_GM_04.epd&#40;817&#41;
Time = 14275 sec elapsed, 0 sec remaining
 1.  LCZero CPU ID124         	172.5/500	116-271-113  	&#40;L&#58; m=271 t=0 i=0 a=0&#41;	&#40;D&#58; r=82 i=13 f=13 s=2 a=3&#41;	&#40;tpm=948.5 d=12.49 nps=230&#41;
 2.  Jabba 1.0                	327.5/500	271-116-113  	&#40;L&#58; m=116 t=0 i=0 a=0&#41;	&#40;D&#58; r=82 i=13 f=13 s=2 a=3&#41;	&#40;tpm=802.5 d=9.11 nps=0&#41;
ID133
111.0/500
Code: Select all
Games Completed = 500 of 500 &#40;Avg game length = 89.922 sec&#41;
Settings = Gauntlet/64MB/1000ms per move/M 5500cp for 30 moves, D 150 moves/EPD&#58;C&#58;\LittleBlitzer\3moves_GM_04.epd&#40;817&#41;
Time = 12964 sec elapsed, 0 sec remaining
 1.  LCZero CPU ID133         	111.0/500	69-347-84  	&#40;L&#58; m=347 t=0 i=0 a=0&#41;	&#40;D&#58; r=68 i=10 f=5 s=0 a=1&#41;	&#40;tpm=945.7 d=12.49 nps=151&#41;
 2.  Jabba 1.0                	389.0/500	347-69-84  	&#40;L&#58; m=69 t=0 i=0 a=0&#41;	&#40;D&#58; r=68 i=10 f=5 s=0 a=1&#41;	&#40;tpm=803.7 d=8.94 nps=0&#41;
ID134
95.0/500
Code: Select all
Games Completed = 500 of 500 &#40;Avg game length = 90.115 sec&#41;
Settings = Gauntlet/64MB/1000ms per move/M 5500cp for 30 moves, D 150 moves/EPD&#58;C&#58;\LittleBlitzer\3moves_GM_04.epd&#40;817&#41;
Time = 12647 sec elapsed, 0 sec remaining
 1.  LCZero CPU ID134         	95.0/500	59-369-72  	&#40;L&#58; m=369 t=0 i=0 a=0&#41;	&#40;D&#58; r=60 i=4 f=5 s=0 a=3&#41;	&#40;tpm=943.8 d=12.52 nps=172&#41;
 2.  Jabba 1.0                	405.0/500	369-59-72  	&#40;L&#58; m=59 t=0 i=0 a=0&#41;	&#40;D&#58; r=60 i=4 f=5 s=0 a=3&#41;	&#40;tpm=804.8 d=8.72 nps=0&#41;
A clear and pronounced regression of about 140 Elo points, outside error margins. And that after some 500,000 games from ID124 to ID134.

I am not sure if this is representative for strong GPU results. But I tested also at 10s/move, or about 2000 playouts per move on my CPU, this is already not that few, and the results show again a regression almost outside error margins:

2/

ID124
13.0/20
Code: Select all
Games Completed = 20 of 20 &#40;Avg game length = 1261.311 sec&#41;
Settings = Gauntlet/64MB/10000ms per move/M 5500cp for 30 moves, D 150 moves/EPD&#58;C&#58;\LittleBlitzer\3moves_GM_04.epd&#40;817&#41;
Time = 6754 sec elapsed, 0 sec remaining
 1.  LCZero CPU ID124         	13.0/20	12-6-2  	&#40;L&#58; m=6 t=0 i=0 a=0&#41;	&#40;D&#58; r=2 i=0 f=0 s=0 a=0&#41;	&#40;tpm=7238.3 d=16.19 nps=185&#41;
 2.  Jabba 1.0                	7.0/20	6-12-2  	&#40;L&#58; m=12 t=0 i=0 a=0&#41;	&#40;D&#58; r=2 i=0 f=0 s=0 a=0&#41;	&#40;tpm=9803.3 d=11.84 nps=0&#41;
ID134
7.5/20
Code: Select all
Games Completed = 20 of 20 &#40;Avg game length = 948.178 sec&#41;
Settings = Gauntlet/64MB/10000ms per move/M 5500cp for 30 moves, D 150 moves/EPD&#58;C&#58;\LittleBlitzer\3moves_GM_04.epd&#40;817&#41;
Time = 5484 sec elapsed, 0 sec remaining
 1.  LCZero CPU ID134         	7.5/20	6-11-3  	&#40;L&#58; m=11 t=0 i=0 a=0&#41;	&#40;D&#58; r=3 i=0 f=0 s=0 a=0&#41;	&#40;tpm=7094.3 d=16.44 nps=639&#41;
 2.  Jabba 1.0                	12.5/20	11-6-3  	&#40;L&#58; m=6 t=0 i=0 a=0&#41;	&#40;D&#58; r=3 i=0 f=0 s=0 a=0&#41;	&#40;tpm=9781.1 d=11.55 nps=0&#41;
This should be more representative, but has few games.

I am unable to test with v0.6 EPD test suites, either with Polyglot, or any GUI. Seems that the PV is not outputted in standard form. I would be curious to see where this regression comes from, from tactical or positional factors (I have one very positional test suite and several very tactical). Maybe there is some conflict in the training between policy part of the network, which gives a probability distribution over possible moves, and the value part of the network, which gives probability of winning given the board.

Without a bench of your current cpu or gpu we cannot say much.
It is clear though that LCZero won't play good chess below around 2000 playouts.
It is also possible the cpu speed with version 0.6 and new NN went down a lot.
Here my cpu always is slower than my already very slow graphic card,
despite being a decent quadcore and it went down again for version 0.6.

Oh I see now you have added average depths. That explains a lot.
I wouldn't trust any results with depths below 18 or 19 for LCZero.
And you lost around full 4 plies in average to previous testing!
(BTW nps average does not tell much because LCZero outputs enormous nps a few moves before mates)

Laskos · Post by **Laskos** » Mon Apr 16, 2018 4:11 pm

Guenther wrote:
Without a bench of your current cpu or gpu we cannot say much.
It is clear though that LCZero won't play good chess below around 2000 playouts.
It is also possible the cpu speed with version 0.6 and new NN went down a lot.
Here my cpu always is slower than my already very slow graphic card,
despite being a decent quadcore and it went down again for version 0.6.

Oh I see now you have added average depths. That explains a lot.
I wouldn't trust any results with depths below 18 or 19 for LCZero.
And you lost around full 4 plies in average to previous testing!
(BTW nps average does not tell much because LCZero outputs enormous nps a few moves before mates)

Tests are on 1 core. On initial position the speed after 10 second on 1 core is about 150 nps (it maxes in several minutes to about 350 nps). So, at least the last tests, at 10s/move, are not that meaningless. They are from 1500 playouts at the beginning of the game to 5000 playouts towards the end of the game. Isn't this even larger number of playouts than their self-games? Tests at 10s/move show too at least stagnation, and more probably regression. This is confirmed by what I am getting right now as an intermediate result. At 10s/move (1500-5000 playouts per move) I decided to run 100 games gauntlet against this Jabba 1.0 standard stable engine, and the intermediate result confirms the previous one:

Code: Select all

Games Completed = 41 of 100 &#40;Avg game length = 1098.012 sec&#41;
Settings = Gauntlet/64MB/10000ms per move/M 9000cp for 30 moves, D 150 moves/EPD&#58;C&#58;\LittleBlitzer\3moves_GM_04.epd&#40;817&#41;
Time = 12501 sec elapsed, 17990 sec remaining
 1.  Jabba 1.0                	17.5/41	15-21-5  	&#40;L&#58; m=21 t=0 i=0 a=0&#41;	&#40;D&#58; r=2 i=2 f=1 s=0 a=0&#41;	&#40;tpm=9802.8 d=11.56 nps=0&#41;

 2.  LCZero CPU ID124         	14.0/21	13- 6-2  	&#40;L&#58; m=6 t=0 i=0 a=0&#41;	&#40;D&#58; r=0 i=2 f=0 s=0 a=0&#41;	&#40;tpm=7102.7 d=16.25 nps=461&#41;
 3.  LCZero CPU ID136         	 9.5/20	 8- 9-3  	&#40;L&#58; m=9 t=0 i=0 a=0&#41;	&#40;D&#58; r=2 i=0 f=1 s=0 a=0&#41;	&#40;tpm=7052.0 d=16.09 nps=235&#41;

ID136 seems to perform clearly not better than ID124. I will let this gauntlet run for longer. I am not sure why even those LC0 comparative results at 1s/move on 1 core are not of some significance, they are similar to ultra-fast games with standard engines, not always meaningless. Maybe the choice of adversary, Jabba 1.0, has some effect, and some more varied opponents should have been played.

Guenther · Post by **Guenther** » Mon Apr 16, 2018 4:27 pm

Laskos wrote:I am not sure why even those LC0 comparative results at 1s/move on 1 core are not of some significance, they are similar to ultra-fast games with standard engines, not always meaningless.

...because NN programs are no 'standard engines' and depend on much better gpu hardware. (and what is done in selft-training has no meaning for real play)

Actually it is the same problem AB programs had probably until the late nineties and I am pretty sure they also just scale relatively linear after a certain depth/nps was reached.
Of course nowadays cpu programs can be tested reliably with ultrafast tc, because the current hardware overcomes that 'sweet spot' even for those tc.
2000 playouts is just called 'hard' at the 'play LC0 site' but it isn't hard.

Guenther · Post by **Guenther** » Mon Apr 16, 2018 4:59 pm

[TimeControl "40/180"] Anechka/Gaia
[TimeControl "40/360"] LCZero

Played with timeodds 2:1 to overcome weakness of my gpu in relation to cpu + selecting tc with at least depth 18 avg.
Of course not enough games... due to above factors.
Calibrated to CCRL 40/4 (Anechka/Gaia with a slightly down rounding)
and 200 simuls in Ordo. Last match still running. (70 positions)

Code: Select all

A. Score range with scoring Probability&#58;

file         &#58; 20180413_0949.pgn
score window &#58; &#91;-32000.0, +32000.0&#93;
notes        &#58; 1. games and pts% are not affected by score window.
               2. Table is sorted by scoring probability in descending order

 nr                           player    games pts&#40;%) scoProb
  1                      Anechka_008      140   48.9    0.51
  2                   LCZero_05ID123      140   51.1    0.49


B. Time and Depth average&#58;

file         &#58; 20180413_0949.pgn
score window &#58; &#91;-32000.0, +32000.0&#93;
notes        &#58; 1. games and pts% are not affected by score window.
               2. Table is sorted by aveDep in descending order.
               3. aveTime is the average time/move in m&#58;s&#58;ms
               4. sumTime is in d&#58;h&#58;m&#58;s

 nr                           player    games  pts%      sumTime aveDep    aveTime
  1                   LCZero_05ID123      140  51.1  01&#58;00&#58;57&#58;45  18.48  00&#58;09&#58;704
  2                      Anechka_008      140  48.9  00&#58;12&#58;44&#58;39  10.76  00&#58;04&#58;921

Code: Select all

   # PLAYER            &#58;   RATING  ERROR  POINTS  PLAYED   (%)
   1 LCZero_05ID123    &#58;  2237.62  50.58    71.5     140  51.1
   2 Anechka_008       &#58;  2230.00   ----    68.5     140  48.9

White advantage = 42.79 +/- 27.27
Draw rate &#40;equal opponents&#41; = 20.92 % +/- 3.18

Code: Select all

A. Score range with scoring Probability&#58;

file         &#58; 20180415_2335.pgn
score window &#58; &#91;-32000.0, +32000.0&#93;
notes        &#58; 1. games and pts% are not affected by score window.
               2. Table is sorted by scoring probability in descending order

 nr                           player    games pts&#40;%) scoProb
  1                       Gaia_35-64       35   68.6    0.68
  2                   LCZero_06ID133       35   31.4    0.32


B. Time and Depth average&#58;

file         &#58; 20180415_2335.pgn
score window &#58; &#91;-32000.0, +32000.0&#93;
notes        &#58; 1. games and pts% are not affected by score window.
               2. Table is sorted by aveDep in descending order.
               3. aveTime is the average time/move in m&#58;s&#58;ms
               4. sumTime is in d&#58;h&#58;m&#58;s

 nr                           player    games  pts%      sumTime aveDep    aveTime
  1                   LCZero_06ID133       35  31.4  00&#58;05&#58;58&#58;41  18.20  00&#58;09&#58;664
  2                       Gaia_35-64       35  68.6  00&#58;02&#58;58&#58;57  13.73  00&#58;04&#58;810

Code: Select all

   # PLAYER            &#58;   RATING   ERROR  POINTS  PLAYED   (%)
   1 Gaia_35-64        &#58;  2420.00    ----    22.5      33  68.2
   2 LCZero_06ID133    &#58;  2284.68  126.48    10.5      33  31.8

White advantage = 29.10 +/- 64.50
Draw rate &#40;equal opponents&#41; = 9.88 % +/- 5.76

Laskos · Post by **Laskos** » Mon Apr 16, 2018 5:11 pm

Guenther wrote:
Laskos wrote:I am not sure why even those LC0 comparative results at 1s/move on 1 core are not of some significance, they are similar to ultra-fast games with standard engines, not always meaningless.
...because NN programs are no 'standard engines' and depend on much better gpu hardware. (and what is done in selft-training has no meaning for real play)

Actually it is the same problem AB programs had probably until the late nineties and I am pretty sure they also just scale relatively linear after a certain depth/nps was reached.
Of course nowadays cpu programs can be tested reliably with ultrafast tc, because the current hardware overcomes that 'sweet spot' even for those tc.
2000 playouts is just called 'hard' at the 'play LC0 site' but it isn't hard.

I am not sure why IIRC until Vas (2004 or so), no developers tested at ultra-fast TC. I think it was the lack of statistical knowledge and stopping rules. 1 core of 20 years ago was no slower than a factor of 10-15 compared to 1 core of today. People could have safely tested in late 1990s at 1s/move in say 1000+ games instead of what they usually did, say play 40 games at LTC. They saw 23:17 result, and decided that the patch passes.

1500-5000 playots per move are not that few here, and it is irrelevant whether they are on CPU or GPU. Still, you might be right, I also have reserves about my results. I will post the result of my gauntlet later this night, but people must take it with a grain of salt. I hope some folks with strong GPU can track the real progress of LC0.

Laskos · Post by **Laskos** » Mon Apr 16, 2018 5:31 pm

Guenther wrote:[TimeControl "40/180"] Anechka/Gaia
[TimeControl "40/360"] LCZero

Played with timeodds 2:1 to overcome weakness of my gpu in relation to cpu + selecting tc with at least depth 18 avg.
Of course not enough games... due to above factors.
Calibrated to CCRL 40/4 (Anechka/Gaia with a slightly down rounding)
and 200 simuls in Ordo. Last match still running. (70 positions)

Code: Select all

A. Score range with scoring Probability&#58;

file         &#58; 20180413_0949.pgn
score window &#58; &#91;-32000.0, +32000.0&#93;
notes        &#58; 1. games and pts% are not affected by score window.
               2. Table is sorted by scoring probability in descending order

 nr                           player    games pts&#40;%) scoProb
  1                      Anechka_008      140   48.9    0.51
  2                   LCZero_05ID123      140   51.1    0.49


B. Time and Depth average&#58;

file         &#58; 20180413_0949.pgn
score window &#58; &#91;-32000.0, +32000.0&#93;
notes        &#58; 1. games and pts% are not affected by score window.
               2. Table is sorted by aveDep in descending order.
               3. aveTime is the average time/move in m&#58;s&#58;ms
               4. sumTime is in d&#58;h&#58;m&#58;s

 nr                           player    games  pts%      sumTime aveDep    aveTime
  1                   LCZero_05ID123      140  51.1  01&#58;00&#58;57&#58;45  18.48  00&#58;09&#58;704
  2                      Anechka_008      140  48.9  00&#58;12&#58;44&#58;39  10.76  00&#58;04&#58;921

Code: Select all

   # PLAYER            &#58;   RATING  ERROR  POINTS  PLAYED   (%)
   1 LCZero_05ID123    &#58;  2237.62  50.58    71.5     140  51.1
   2 Anechka_008       &#58;  2230.00   ----    68.5     140  48.9

White advantage = 42.79 +/- 27.27
Draw rate &#40;equal opponents&#41; = 20.92 % +/- 3.18

Code: Select all

A. Score range with scoring Probability&#58;

file         &#58; 20180415_2335.pgn
score window &#58; &#91;-32000.0, +32000.0&#93;
notes        &#58; 1. games and pts% are not affected by score window.
               2. Table is sorted by scoring probability in descending order

 nr                           player    games pts&#40;%) scoProb
  1                       Gaia_35-64       35   68.6    0.68
  2                   LCZero_06ID133       35   31.4    0.32


B. Time and Depth average&#58;

file         &#58; 20180415_2335.pgn
score window &#58; &#91;-32000.0, +32000.0&#93;
notes        &#58; 1. games and pts% are not affected by score window.
               2. Table is sorted by aveDep in descending order.
               3. aveTime is the average time/move in m&#58;s&#58;ms
               4. sumTime is in d&#58;h&#58;m&#58;s

 nr                           player    games  pts%      sumTime aveDep    aveTime
  1                   LCZero_06ID133       35  31.4  00&#58;05&#58;58&#58;41  18.20  00&#58;09&#58;664
  2                       Gaia_35-64       35  68.6  00&#58;02&#58;58&#58;57  13.73  00&#58;04&#58;810

Code: Select all

   # PLAYER            &#58;   RATING   ERROR  POINTS  PLAYED   (%)
   1 Gaia_35-64        &#58;  2420.00    ----    22.5      33  68.2
   2 LCZero_06ID133    &#58;  2284.68  126.48    10.5      33  31.8

White advantage = 29.10 +/- 64.50
Draw rate &#40;equal opponents&#41; = 9.88 % +/- 5.76

The time control is better, but maybe you should test against the same engine (behavior against different opponents is not always the same), and more games are needed.

For now, I have at 1500-5000 playouts per move:

Code: Select all

Games Completed = 60 of 100 &#40;Avg game length = 1141.513 sec&#41;
Settings = Gauntlet/64MB/10000ms per move/M 9000cp for 30 moves, D 150 moves/EPD&#58;C&#58;\LittleBlitzer\3moves_GM_04.epd&#40;817&#41;
Time = 17808 sec elapsed, 11872 sec remaining
 1.  Jabba 1.0                	26.0/60	21-29-10  	&#40;L&#58; m=29 t=0 i=0 a=0&#41;	&#40;D&#58; r=4 i=2 f=2 s=0 a=2&#41;	&#40;tpm=9802.9 d=11.55 nps=0&#41;

 2.  LCZero CPU ID124         	18.5/30	16-9-5  	&#40;L&#58; m=9 t=0 i=0 a=0&#41;	&#40;D&#58; r=1 i=2 f=1 s=0 a=1&#41;	&#40;tpm=7263.1 d=16.24 nps=577&#41;
 3.  LCZero CPU ID136         	15.5/30	13-12-5  	&#40;L&#58; m=12 t=0 i=0 a=0&#41;	&#40;D&#58; r=3 i=0 f=1 s=0 a=1&#41;	&#40;tpm=7029.0 d=16.10 nps=212&#41;

So, if there is a progress, it is not large, and strictly speaking, a regression is more likely. But all your arguments stand, and one has to take my results with a grain of salt.

Guenther · Post by **Guenther** » Mon Apr 16, 2018 5:49 pm

Laskos wrote:
...

The time control is better, but maybe you should test against the same engine (behavior against different opponents is not always the same), and more games are needed.

Yes, but I had more matches, which I did not post with the first program
Anechka and I decided to add a stronger opponent.
Actually I would like to test it against a bunch of opponents each time,
but that is not possible if I want to keep that conditions.
Also I have currently only one PC here unlike in the past when I actively
tested for my RWBC site...

Laskos · Post by **Laskos** » Tue Apr 17, 2018 8:26 am

Guenther wrote:
Laskos wrote:
...

The time control is better, but maybe you should test against the same engine (behavior against different opponents is not always the same), and more games are needed.

Yes, but I had more matches, which I did not post with the first program
Anechka and I decided to add a stronger opponent.
Actually I would like to test it against a bunch of opponents each time,
but that is not possible if I want to keep that conditions.
Also I have currently only one PC here unlike in the past when I actively
tested for my RWBC site...

Here is a recent result of Carlos Canavessi till ID138, it also shows a significant regression.
https://media.discordapp.net/attachment ... height=679

AFAIK he is doing a 200 games gauntlet against varied opposition on a strong GPU. The regression seems to be confirmed.

At my 1500-5000 playouts per move (10s/move), I have with ID138 compared to ID124, both against Jabba 1.0:

v0.6

ID124:
43.0/70

ID138:
35.0/70

Again, seems regresion is more likely. It is also possible that my fast time controls tests were also of some worth (100-300 playouts per move), they were the first to show regression.

mirek · Post by **mirek** » Tue Apr 17, 2018 11:16 am

Laskos wrote: Here is a recent result of Carlos Canavessi till ID138, it also shows a significant regression.

Oh no, any explanation why is this happening? Is it already a reason for concern or is it still OK? And at which point if the trend continues will it become reason for concern? Shouldn't have the larger net just skyrocketed in performance? Could it be that if trained on the larger net all the way from the beginning the progress could have been higher?

LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo