Sergio Vieri second net is out
Moderators: hgm, Dann Corbit, Harvey Williamson
Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Re: Sergio Vieri second net is out
Never mind wrong score read. Is there a working version of this engine that we can try out on Fritz 14 GUI where it works properly without any editing?
Thanks
Thanks
Re: Sergio Vieri second net is out
There are indeed too few games, however, some things indicate that this is a bit more than just statistical noise. First, draw ratio is relatively high and second 3m+2s on 92 threads is equivalent to LTC on a single thread so games quality must be quite high and subsequently noise lower than your typical test games.Laskos wrote: ↑Sat Jul 25, 2020 4:40 amToo few games to say anything with high confidence. Not even clear that 2344 is stronger than 2141, cherry picked LOS of 98% doesn't qualify as a stopping rule. 99.9% or higher are needed when cherry picking to have some confidence in superiority, and even higher for small number of games.MMarco wrote: ↑Sat Jul 25, 2020 3:11 amIts getting scary!!
Posted by SVieri:
Code: Select all
I ran 2344 vs 2141 overnight. TC: 3m+2s, 92 threads. Score of StockfishNNUE 2344 vs StockfishNNUE 20200722-2141: 14 - 5 - 33 [0.587] ... StockfishNNUE 2344 playing White: 12 - 0 - 14 [0.731] 26 ... StockfishNNUE 2344 playing Black: 2 - 5 - 19 [0.442] 26 ... White vs Black: 17 - 2 - 33 [0.644] 52 Elo difference: 60.7 +/- 56.8, LOS: 98.1 %, DrawRatio: 63.5 % 52 of 100 games finished.
Re: Sergio Vieri second net is out
I don't understand the noise other than statistical noise. Statistically, LTC and STC are the same, and there is no any quantification (estimation) of variance due to say "STC noise".Milos wrote: ↑Sat Jul 25, 2020 6:15 amThere are indeed too few games, however, some things indicate that this is a bit more than just statistical noise. First, draw ratio is relatively high and second 3m+2s on 92 threads is equivalent to LTC on a single thread so games quality must be quite high and subsequently noise lower than your typical test games.Laskos wrote: ↑Sat Jul 25, 2020 4:40 amToo few games to say anything with high confidence. Not even clear that 2344 is stronger than 2141, cherry picked LOS of 98% doesn't qualify as a stopping rule. 99.9% or higher are needed when cherry picking to have some confidence in superiority, and even higher for small number of games.MMarco wrote: ↑Sat Jul 25, 2020 3:11 amIts getting scary!!
Posted by SVieri:
Code: Select all
I ran 2344 vs 2141 overnight. TC: 3m+2s, 92 threads. Score of StockfishNNUE 2344 vs StockfishNNUE 20200722-2141: 14 - 5 - 33 [0.587] ... StockfishNNUE 2344 playing White: 12 - 0 - 14 [0.731] 26 ... StockfishNNUE 2344 playing Black: 2 - 5 - 19 [0.442] 26 ... White vs Black: 17 - 2 - 33 [0.644] 52 Elo difference: 60.7 +/- 56.8, LOS: 98.1 %, DrawRatio: 63.5 % 52 of 100 games finished.
Re: Sergio Vieri second net is out
This is looking real to me - 1134's "lucky" streak will soon be no moreMMarco wrote: ↑Sat Jul 25, 2020 3:11 amIts getting scary!!
Posted by SVieri:
Code: Select all
I ran 2344 vs 2141 overnight. TC: 3m+2s, 92 threads. Score of StockfishNNUE 2344 vs StockfishNNUE 20200722-2141: 14 - 5 - 33 [0.587] ... StockfishNNUE 2344 playing White: 12 - 0 - 14 [0.731] 26 ... StockfishNNUE 2344 playing Black: 2 - 5 - 19 [0.442] 26 ... White vs Black: 17 - 2 - 33 [0.644] 52 Elo difference: 60.7 +/- 56.8, LOS: 98.1 %, DrawRatio: 63.5 % 52 of 100 games finished.
Code: Select all
pgn file: c:/cluster.mfb/pgn/2007250215-23441134.pgn
tc/base+inc: 60+0.60
games planned: 4000
Threads: 2
Hash: 256
Current date : time (EDST)
Date: 07/25/20 : 02:31:59
Projected-> Time: 7h:6m:40s
Running -> Time: 0h:16m:52s
136 game(s) loaded
Rank Name Rating Δ + - # Σ Σ% W L D W% =% OppR
---------------------------------------------------------------------------------------------------------
1 2344 3515 0.0 38 38 136 74.0 54.4 38 26 72 27.9 52.9 3485
2 1134 3485 30.2 38 38 136 62.0 45.6 26 38 72 19.1 52.9 3515
---------------------------------------------------------------------------------------------------------
Δ = delta from the next higher rated opponent
# = number of games played
Σ = total score, 1 point for win, 1/2 point for draw
LOS:
23 11
2344 93
1134 6
136 game(s) loaded
loops scheduled: 5/190
waiting: 128
...seconds remaining: 48
Last edited by MikeB on Sat Jul 25, 2020 6:45 am, edited 2 times in total.
Re: Sergio Vieri second net is out
That's only because noise model we typically use in chess is just a crude (coin toss) approximation (just w/d/l and number of games) of real sources of noise.Laskos wrote: ↑Sat Jul 25, 2020 6:29 amI don't understand the noise other than statistical noise. Statistically, LTC and STC are the same, and there is no any quantification (estimation) of variance due to say "STC noise".Milos wrote: ↑Sat Jul 25, 2020 6:15 amThere are indeed too few games, however, some things indicate that this is a bit more than just statistical noise. First, draw ratio is relatively high and second 3m+2s on 92 threads is equivalent to LTC on a single thread so games quality must be quite high and subsequently noise lower than your typical test games.Laskos wrote: ↑Sat Jul 25, 2020 4:40 amToo few games to say anything with high confidence. Not even clear that 2344 is stronger than 2141, cherry picked LOS of 98% doesn't qualify as a stopping rule. 99.9% or higher are needed when cherry picking to have some confidence in superiority, and even higher for small number of games.MMarco wrote: ↑Sat Jul 25, 2020 3:11 amIts getting scary!!
Posted by SVieri:
Code: Select all
I ran 2344 vs 2141 overnight. TC: 3m+2s, 92 threads. Score of StockfishNNUE 2344 vs StockfishNNUE 20200722-2141: 14 - 5 - 33 [0.587] ... StockfishNNUE 2344 playing White: 12 - 0 - 14 [0.731] 26 ... StockfishNNUE 2344 playing Black: 2 - 5 - 19 [0.442] 26 ... White vs Black: 17 - 2 - 33 [0.644] 52 Elo difference: 60.7 +/- 56.8, LOS: 98.1 %, DrawRatio: 63.5 % 52 of 100 games finished.
Go ahead and run an experiment with 100 times 100 games match (of, in best case, 2 identical engines) at TC1 and 4xTC1 and plot Elo difference distribution (and calculate sigma of it) in both cases. You'd notice more difference in sigma than can be explained by just change of draw rate.
Re: Sergio Vieri second net is out
The reason might be higher correlation between paired side-reversed games in LTC. which in pentanomial model is counted as lower variance. I do observe that in LTC with unbalanced opening positions (side-reversed) I am using, and in this case the difference between LTC and STC can indeed amount to some 10% in variance. But if one is using balanced openings, the difference in variances is much smaller, as the correlations of paired games are smaller in general. One cannot say in general "LTC decreases the statistical variance", as statistical variances are additive.Milos wrote: ↑Sat Jul 25, 2020 6:39 amThat's only because noise model we typically use in chess is just a crude (coin toss) approximation (just w/d/l and number of games) of real sources of noise.Laskos wrote: ↑Sat Jul 25, 2020 6:29 amI don't understand the noise other than statistical noise. Statistically, LTC and STC are the same, and there is no any quantification (estimation) of variance due to say "STC noise".Milos wrote: ↑Sat Jul 25, 2020 6:15 amThere are indeed too few games, however, some things indicate that this is a bit more than just statistical noise. First, draw ratio is relatively high and second 3m+2s on 92 threads is equivalent to LTC on a single thread so games quality must be quite high and subsequently noise lower than your typical test games.Laskos wrote: ↑Sat Jul 25, 2020 4:40 amToo few games to say anything with high confidence. Not even clear that 2344 is stronger than 2141, cherry picked LOS of 98% doesn't qualify as a stopping rule. 99.9% or higher are needed when cherry picking to have some confidence in superiority, and even higher for small number of games.MMarco wrote: ↑Sat Jul 25, 2020 3:11 amIts getting scary!!
Posted by SVieri:
Code: Select all
I ran 2344 vs 2141 overnight. TC: 3m+2s, 92 threads. Score of StockfishNNUE 2344 vs StockfishNNUE 20200722-2141: 14 - 5 - 33 [0.587] ... StockfishNNUE 2344 playing White: 12 - 0 - 14 [0.731] 26 ... StockfishNNUE 2344 playing Black: 2 - 5 - 19 [0.442] 26 ... White vs Black: 17 - 2 - 33 [0.644] 52 Elo difference: 60.7 +/- 56.8, LOS: 98.1 %, DrawRatio: 63.5 % 52 of 100 games finished.
Go ahead and run an experiment with 100 times 100 games match (of, in best case, 2 identical engines) at TC1 and 4xTC1 and plot Elo difference distribution (and calculate sigma of it) in both cases. You'd notice more difference in sigma than can be explained by just change of draw rate.
Re: Sergio Vieri second net is out
Another quick test of 2344 against H6.03 using Nunn1 openings, G10s+0.2s - result +11 =8 -1.
Here is a crazy sacrifice by SFnnue on move 13. Of course - usual disclaimers and as Mike would say yomv and ymmv.
Here is a crazy sacrifice by SFnnue on move 13. Of course - usual disclaimers and as Mike would say yomv and ymmv.
Re: Sergio Vieri second net is out
The games you posted are out of this world, JohnS. Fascinating stuff 

Re: Sergio Vieri second net is out
This ribbit net is the same net as the much discussed 1134.MMarco wrote: ↑Fri Jul 24, 2020 1:41 pmCan someone with good hardware test this one?
Size=256. By ribbit on discord.
"My first little 256 network.. I tested it on Honey-XI-NN with pretty good result against stockfish-dev and Leela ... (d24 validation used, 6menTB)
ribbit_0.1 - https://rapidu.net/9571752717/nn.bin "
90% of coding is debugging, the other 10% is writing bugs.
Re: Sergio Vieri second net is out
The wins continue for 2344 - this time against Ethereal +13 =7 -0.
This was a nice game against the French with a nice rook sacrifice.
This was a nice game against the French with a nice rook sacrifice.