Openings: Hert_250_lowdraws.pgn
TC: 60 sec + 0.6 sec
250 games
Code: Select all
Lc0 0.23.2 256x20-T40-1541 +40 +51/=177/-22 55.80% 139.5/250
Stockfish 11 -40 +22/=177/-51 44.20% 110.5/250
Moderators: hgm, Rebel, chrisw
Code: Select all
Lc0 0.23.2 256x20-T40-1541 +40 +51/=177/-22 55.80% 139.5/250
Stockfish 11 -40 +22/=177/-51 44.20% 110.5/250
Code: Select all
1 Lc0 0.23.2 256x20-T40-1541 +52 +62/=163/-25 57.40% 143.5/250
2 Stockfish 11 -52 +25/=163/-62 42.60% 106.5/250
Perhaps you should wait some more days:fastgm wrote: ↑Fri Jan 31, 2020 5:56 pm Here the results with Stockfish 11 and default Contempt=24
+52 EloCode: Select all
1 Lc0 0.23.2 256x20-T40-1541 +52 +62/=163/-25 57.40% 143.5/250 2 Stockfish 11 -52 +25/=163/-62 42.60% 106.5/250
I will soon test 256x20-T40-1541 with the Kiudee settings for my rating list.
I aborted that testrun. After 150 games, the KiudeeLaskos-setting was 2% weaker, than Kiudee-setting. So, you should try Kiudee...pohl4711 wrote: ↑Sat Feb 01, 2020 12:31 pmPerhaps you should wait some more days:fastgm wrote: ↑Fri Jan 31, 2020 5:56 pm Here the results with Stockfish 11 and default Contempt=24
+52 EloCode: Select all
1 Lc0 0.23.2 256x20-T40-1541 +52 +62/=163/-25 57.40% 143.5/250 2 Stockfish 11 -52 +25/=163/-62 42.60% 106.5/250
I will soon test 256x20-T40-1541 with the Kiudee settings for my rating list.
The first 45 games of the KiudeeLaskos-setting (kl= kiudee with CPuct=1.900) are played and at this point, it looks very promising.
Lc0 0.23.2kl t40-1541 (20x256) (kl= Kiudee with Laskos change CPuct=1.900) is at 62% vs. Stockfish 191210 (final result of Kiudee setting without Laskos CPuct-change was 57%), which would mean around +35 Elo more and a real destruction of Stockfish.
But 45 games does not mean a really reliable result - all can still change. We have to wait some days more, but the result is very good so far, so I let the test go on...
The testrun with 300 games will end in 5 days...if all works correctly.
Have to agree. 2% is 14 Elo and the error margins after 150 games would be +/- 40 Elo or so. In my tests, and I do a lot, I have seen runs where one side was -90 Elo after 30 games, -35 Elo after 100 games, and +44 by the end of a mere 300 games. Even 1000 games will see error margins of +/- 15 Elo roughly.Alayan wrote: ↑Mon Feb 03, 2020 2:43 pm Small sample size means a 2% difference after 150 games isn't anywhere good enough to know which is better.
Aborting tests that don't look promising while finishing those that do also introduce some bias as the results you end up publishing will be more lucky than average.