Sergio Vieri second net is out

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
M ANSARI
Posts: 3707
Joined: Thu Mar 16, 2006 7:10 pm

Re: Sergio Vieri second net is out

Post by M ANSARI »

Never mind wrong score read. Is there a working version of this engine that we can try out on Fritz 14 GUI where it works properly without any editing?

Thanks
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: Sergio Vieri second net is out

Post by Milos »

Laskos wrote: Sat Jul 25, 2020 6:40 am
MMarco wrote: Sat Jul 25, 2020 5:11 am Its getting scary!!

Posted by SVieri:

Code: Select all

I ran 2344 vs 2141 overnight. TC: 3m+2s, 92 threads.

Score of StockfishNNUE 2344 vs StockfishNNUE 20200722-2141: 14 - 5 - 33 [0.587]
...      StockfishNNUE 2344 playing White: 12 - 0 - 14  [0.731] 26
...      StockfishNNUE 2344 playing Black: 2 - 5 - 19  [0.442] 26
...      White vs Black: 17 - 2 - 33  [0.644] 52
Elo difference: 60.7 +/- 56.8, LOS: 98.1 %, DrawRatio: 63.5 %
52 of 100 games finished.
Too few games to say anything with high confidence. Not even clear that 2344 is stronger than 2141, cherry picked LOS of 98% doesn't qualify as a stopping rule. 99.9% or higher are needed when cherry picking to have some confidence in superiority, and even higher for small number of games.
There are indeed too few games, however, some things indicate that this is a bit more than just statistical noise. First, draw ratio is relatively high and second 3m+2s on 92 threads is equivalent to LTC on a single thread so games quality must be quite high and subsequently noise lower than your typical test games.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Sergio Vieri second net is out

Post by Laskos »

Milos wrote: Sat Jul 25, 2020 8:15 am
Laskos wrote: Sat Jul 25, 2020 6:40 am
MMarco wrote: Sat Jul 25, 2020 5:11 am Its getting scary!!

Posted by SVieri:

Code: Select all

I ran 2344 vs 2141 overnight. TC: 3m+2s, 92 threads.

Score of StockfishNNUE 2344 vs StockfishNNUE 20200722-2141: 14 - 5 - 33 [0.587]
...      StockfishNNUE 2344 playing White: 12 - 0 - 14  [0.731] 26
...      StockfishNNUE 2344 playing Black: 2 - 5 - 19  [0.442] 26
...      White vs Black: 17 - 2 - 33  [0.644] 52
Elo difference: 60.7 +/- 56.8, LOS: 98.1 %, DrawRatio: 63.5 %
52 of 100 games finished.
Too few games to say anything with high confidence. Not even clear that 2344 is stronger than 2141, cherry picked LOS of 98% doesn't qualify as a stopping rule. 99.9% or higher are needed when cherry picking to have some confidence in superiority, and even higher for small number of games.
There are indeed too few games, however, some things indicate that this is a bit more than just statistical noise. First, draw ratio is relatively high and second 3m+2s on 92 threads is equivalent to LTC on a single thread so games quality must be quite high and subsequently noise lower than your typical test games.
I don't understand the noise other than statistical noise. Statistically, LTC and STC are the same, and there is no any quantification (estimation) of variance due to say "STC noise".
User avatar
MikeB
Posts: 4889
Joined: Thu Mar 09, 2006 6:34 am
Location: Pen Argyl, Pennsylvania

Re: Sergio Vieri second net is out

Post by MikeB »

MMarco wrote: Sat Jul 25, 2020 5:11 am Its getting scary!!

Posted by SVieri:

Code: Select all

I ran 2344 vs 2141 overnight. TC: 3m+2s, 92 threads.

Score of StockfishNNUE 2344 vs StockfishNNUE 20200722-2141: 14 - 5 - 33 [0.587]
...      StockfishNNUE 2344 playing White: 12 - 0 - 14  [0.731] 26
...      StockfishNNUE 2344 playing Black: 2 - 5 - 19  [0.442] 26
...      White vs Black: 17 - 2 - 33  [0.644] 52
Elo difference: 60.7 +/- 56.8, LOS: 98.1 %, DrawRatio: 63.5 %
52 of 100 games finished.
This is looking real to me - 1134's "lucky" streak will soon be no more

Code: Select all

pgn file: c:/cluster.mfb/pgn/2007250215-23441134.pgn
tc/base+inc: 60+0.60
games planned: 4000
Threads: 2
Hash: 256

Current date : time (EDST)
Date: 07/25/20 : 02:31:59

Projected-> Time: 7h:6m:40s
Running  -> Time: 0h:16m:52s

136 game(s) loaded
Rank Name  Rating   Δ     +    -     #     Σ    Σ%     W    L    D   W%    =%   OppR
---------------------------------------------------------------------------------------------------------

   1 2344   3515   0.0   38   38   136   74.0  54.4   38   26   72  27.9  52.9  3485
   2 1134   3485  30.2   38   38   136   62.0  45.6   26   38   72  19.1  52.9  3515
---------------------------------------------------------------------------------------------------------

  Δ = delta from the next higher rated opponent
  # = number of games played
  Σ = total score, 1 point for win, 1/2 point for draw

LOS:
      23 11
2344     93
1134   6

136 game(s) loaded

loops scheduled: 5/190

waiting: 128
  ...seconds remaining:   48
Last edited by MikeB on Sat Jul 25, 2020 8:45 am, edited 2 times in total.
Image
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: Sergio Vieri second net is out

Post by Milos »

Laskos wrote: Sat Jul 25, 2020 8:29 am
Milos wrote: Sat Jul 25, 2020 8:15 am
Laskos wrote: Sat Jul 25, 2020 6:40 am
MMarco wrote: Sat Jul 25, 2020 5:11 am Its getting scary!!

Posted by SVieri:

Code: Select all

I ran 2344 vs 2141 overnight. TC: 3m+2s, 92 threads.

Score of StockfishNNUE 2344 vs StockfishNNUE 20200722-2141: 14 - 5 - 33 [0.587]
...      StockfishNNUE 2344 playing White: 12 - 0 - 14  [0.731] 26
...      StockfishNNUE 2344 playing Black: 2 - 5 - 19  [0.442] 26
...      White vs Black: 17 - 2 - 33  [0.644] 52
Elo difference: 60.7 +/- 56.8, LOS: 98.1 %, DrawRatio: 63.5 %
52 of 100 games finished.
Too few games to say anything with high confidence. Not even clear that 2344 is stronger than 2141, cherry picked LOS of 98% doesn't qualify as a stopping rule. 99.9% or higher are needed when cherry picking to have some confidence in superiority, and even higher for small number of games.
There are indeed too few games, however, some things indicate that this is a bit more than just statistical noise. First, draw ratio is relatively high and second 3m+2s on 92 threads is equivalent to LTC on a single thread so games quality must be quite high and subsequently noise lower than your typical test games.
I don't understand the noise other than statistical noise. Statistically, LTC and STC are the same, and there is no any quantification (estimation) of variance due to say "STC noise".
That's only because noise model we typically use in chess is just a crude (coin toss) approximation (just w/d/l and number of games) of real sources of noise.
Go ahead and run an experiment with 100 times 100 games match (of, in best case, 2 identical engines) at TC1 and 4xTC1 and plot Elo difference distribution (and calculate sigma of it) in both cases. You'd notice more difference in sigma than can be explained by just change of draw rate.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Sergio Vieri second net is out

Post by Laskos »

Milos wrote: Sat Jul 25, 2020 8:39 am
Laskos wrote: Sat Jul 25, 2020 8:29 am
Milos wrote: Sat Jul 25, 2020 8:15 am
Laskos wrote: Sat Jul 25, 2020 6:40 am
MMarco wrote: Sat Jul 25, 2020 5:11 am Its getting scary!!

Posted by SVieri:

Code: Select all

I ran 2344 vs 2141 overnight. TC: 3m+2s, 92 threads.

Score of StockfishNNUE 2344 vs StockfishNNUE 20200722-2141: 14 - 5 - 33 [0.587]
...      StockfishNNUE 2344 playing White: 12 - 0 - 14  [0.731] 26
...      StockfishNNUE 2344 playing Black: 2 - 5 - 19  [0.442] 26
...      White vs Black: 17 - 2 - 33  [0.644] 52
Elo difference: 60.7 +/- 56.8, LOS: 98.1 %, DrawRatio: 63.5 %
52 of 100 games finished.
Too few games to say anything with high confidence. Not even clear that 2344 is stronger than 2141, cherry picked LOS of 98% doesn't qualify as a stopping rule. 99.9% or higher are needed when cherry picking to have some confidence in superiority, and even higher for small number of games.
There are indeed too few games, however, some things indicate that this is a bit more than just statistical noise. First, draw ratio is relatively high and second 3m+2s on 92 threads is equivalent to LTC on a single thread so games quality must be quite high and subsequently noise lower than your typical test games.
I don't understand the noise other than statistical noise. Statistically, LTC and STC are the same, and there is no any quantification (estimation) of variance due to say "STC noise".
That's only because noise model we typically use in chess is just a crude (coin toss) approximation (just w/d/l and number of games) of real sources of noise.
Go ahead and run an experiment with 100 times 100 games match (of, in best case, 2 identical engines) at TC1 and 4xTC1 and plot Elo difference distribution (and calculate sigma of it) in both cases. You'd notice more difference in sigma than can be explained by just change of draw rate.
The reason might be higher correlation between paired side-reversed games in LTC. which in pentanomial model is counted as lower variance. I do observe that in LTC with unbalanced opening positions (side-reversed) I am using, and in this case the difference between LTC and STC can indeed amount to some 10% in variance. But if one is using balanced openings, the difference in variances is much smaller, as the correlations of paired games are smaller in general. One cannot say in general "LTC decreases the statistical variance", as statistical variances are additive.
JohnS
Posts: 215
Joined: Sun Feb 24, 2008 2:08 am

Re: Sergio Vieri second net is out

Post by JohnS »

Another quick test of 2344 against H6.03 using Nunn1 openings, G10s+0.2s - result +11 =8 -1.

Here is a crazy sacrifice by SFnnue on move 13. Of course - usual disclaimers and as Mike would say yomv and ymmv.

[pgn][Event "SF-NNUE - Houdini 6.03, Nunn1, G10s + 0.2s"]
[Site "Home"]
[Date "2020.07.25"]
[Round "1"]
[White "Stockfish+NNUE"]
[Black "Houdini 6.03"]
[Result "1-0"]
[TimeControl "10+0.2"]
[Time "17:49:43"]
[Board "9"]
[Termination "adjudication by engines' scores"]
[ECO "D11"]
[Opening "QGD Slav"]

1. d4 d5 2. c4 c6
3. Nf3 {D11: QGD Slav, 3.Nf3} e6 4. cxd5 exd5
5. Nc3 Nf6 6. Bg5 Be7
7. Qc2 Nbd7 8. e3 O-O
9. Bd3 Re8 10. O-O Nf8 {End of opening}
11. h3 {+0.55/21 1.7 1825739} Ne4 {-0.29/16 0.8 1659811} 12. Bf4 {+0.57/15 0.3 340609} f5 {-0.29/15 0.5 1077857}
13. Nxd5 {+2.78/17 0.3 365358} cxd5 {+2.19/14 0.2 541228} 14. Bc7 {-2.63/19 0.6 586658} Qd7 {+2.19/15 0.4 804820}
15. Rfc1 {-2.40/18 0.3 305302} a6 {+2.04/17 1.1 2499173} 16. Qb3 {-2.44/19 0.4 428054} Qe6 {+1.94/17 0.5 1132772}
17. Rc2 {-2.32/17 0.3 291583} b5 {+1.87/18 1.0 2388891} 18. Ne5 {-1.83/17 0.6 596654} Bb7 {+1.99/15 0.3 587827}
19. a4 {-1.68/21 1.1 1189223} b4 {+1.31/16 1.0 2281160} 20. f3 {-1.84/19 0.4 414811} Ng3 {+1.69/15 0.2 526878}
21. Rac1 {-0.68/17 0.4 413539} Qh6 {+0.89/17 1.9 4376992} 22. f4 {+0.00/20 0.2 258037} Ne6 {+0.78/17 0.8 1960946}
23. Rc6 {+0.00/20 0.4 479539} Rec8 {+0.49/17 0.9 2314155} 24. Rb6 {+0.00/20 0.3 363630} Ra7 {+1.47/12 0.2 418161}
25. Kh2 {+0.00/21 0.3 296852} Ne4 {-0.22/17 0.7 1695180} 26. Bxe4 {+1.25/19 0.3 307773} fxe4 {-0.22/16 0.0 637}
27. Qd1 {+1.91/23 1.9 2128183} Re8 {-0.73/19 1.1 2793150} 28. a5 {+3.28/17 0.2 208229} Qf6 {-1.09/18 0.9 2123492}
29. Qa4 {+3.83/21 0.3 350003} Rea8 {-1.21/19 0.4 1173762} 30. f5 {+3.02/23 0.8 895388} Qxf5 {-1.21/18 0.0 2235}
31. Qd7 {+4.07/19 0.2 282314} Bf8 {-1.31/19 0.4 974990} 32. Qxe6+ {+3.94/21 0.5 615758} Qxe6 {-1.20/18 0.3 819661}
33. Rxe6 {+4.06/21 0.3 422568} Rc8 {-1.35/19 0.3 1033236} 34. Kg3 {+4.81/19 0.3 344820} g6 {-1.55/17 0.6 1706853}
35. Rf1 {+4.70/17 0.3 399405} Ba8 {-2.14/17 0.4 952201} 36. Bd6 {+5.59/18 0.3 380173} Bg7 {-2.34/15 0.2 546724}
37. Kh4 {+6.52/17 0.2 270354} Kh8 {-2.27/14 0.2 544657} 38. Kg5 {+7.02/18 0.3 316968} Kg8 {-2.03/14 0.1 320824}
39. g4 {+7.70/18 0.3 413143} b3 {-2.38/15 0.3 742604} 40. h4 {+8.11/19 0.4 558349} Bh8 {-2.84/15 0.2 553155}
41. h5 {+8.56/18 0.3 419325} gxh5 {-2.91/17 0.2 474664} 42. gxh5 {+8.93/18 0.3 455113} Rd8 {-3.47/13 0.2 646262}
43. Ng4 {+9.46/18 0.4 506316} Bg7 {-6.58/15 0.2 586254} 44. Nh6+ {+9.53/16 0.1 155076} Bxh6+ {-6.46/18 0.2 615138}
45. Kxh6 {+11.84/19 0.2 245093} Rad7 {-5.65/16 0.1 312964} 46. Bc5 {+12.79/22 0.2 318832} Rc8 {-8.61/17 0.3 950241}
47. Ref6 {+18.81/29 0.2 309342} Rdd8 {-17.45/21 0.2 743347} 48. Be7 {M+7/41 0.2 353040} Bb7 {M-6/21 0.0 107765}
49. Rg1+ {M+6/51 0.2 326491} Kh8 {M-5/11 0.0 724} 1-0[/pgn]
carldaman
Posts: 2283
Joined: Sat Jun 02, 2012 2:13 am

Re: Sergio Vieri second net is out

Post by carldaman »

The games you posted are out of this world, JohnS. Fascinating stuff :)
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: Sergio Vieri second net is out

Post by Rebel »

MMarco wrote: Fri Jul 24, 2020 3:41 pm Can someone with good hardware test this one?

Size=256. By ribbit on discord.

"My first little 256 network.. I tested it on Honey-XI-NN with pretty good result against stockfish-dev and Leela ... (d24 validation used, 6menTB)

ribbit_0.1 - https://rapidu.net/9571752717/nn.bin "
This ribbit net is the same net as the much discussed 1134.
90% of coding is debugging, the other 10% is writing bugs.
JohnS
Posts: 215
Joined: Sun Feb 24, 2008 2:08 am

Re: Sergio Vieri second net is out

Post by JohnS »

The wins continue for 2344 - this time against Ethereal +13 =7 -0.

This was a nice game against the French with a nice rook sacrifice.

[pgn][Event "SF-NNUE - Ethereal 12.25, Nunn1, G10s + 0.2s"]
[Site "Home"]
[Date "2020.07.25"]
[Round "1"]
[White "Stockfish+NNUE"]
[Black "Ethereal 12.25"]
[Result "1-0"]
[TimeControl "10+0.2"]
[Time "21:07:13"]
[Board "5"]
[Termination "adjudication by engines' scores"]
[ECO "C18"]
[Opening "French"]

1. e4 e6 2. d4 d5
3. Nc3 Bb4 4. e5 c5
5. a3 Bxc3+ 6. bxc3 Qc7 {C18: French, Winawer, classical variation}
7. Nf3 Ne7 8. a4 b6
9. Bb5+ Bd7 10. Bd3 Nbc6
11. O-O {End of opening} h6 {-0.41/16 1.2 2449026} 12. Ba3 {+0.48/18 1.5 1600506} Na5 {-0.45/17 0.3 661137}
13. Nh4 {+0.70/16 0.3 350244} O-O {-0.42/17 0.3 608708} 14. Qg4 {+0.83/17 0.3 282295} f5 {-0.43/17 0.9 1912771}
15. Qh3 {+0.66/19 1.4 1452569} Rf7 {-0.12/17 1.0 1941690} 16. Qg3 {+0.61/15 0.2 226097} Rc8 {-0.21/16 0.7 1273626}
17. Bc1 {+0.53/19 1.2 1296173} Kh7 {-0.28/16 0.6 1220973} 18. Qh3 {+0.60/16 0.2 260294} Kg8 {-0.27/18 1.1 2185186}
19. Nf3 {+0.78/19 1.4 1483556} Qd8 {+0.01/16 0.3 518432} 20. g4 {+0.95/20 0.9 971479} fxg4 {+0.05/16 0.5 967065}
21. Qxg4 {+0.75/17 0.3 277698} Nf5 {-0.55/15 0.4 765651} 22. Kh1 {+1.27/16 0.2 272051} Qe7 {-0.47/16 0.4 773740}
23. Rg1 {+2.32/15 0.2 237219} Rc6 {-0.49/15 0.3 659995} 24. Qh3 {+3.30/17 0.5 563792} Kh8 {-1.79/16 1.0 1913869}
25. Rg6 {+3.64/16 0.2 278287} Be8 {-2.46/15 0.5 1055444} 26. Bg5 {+5.78/17 0.3 318371} Qf8 {-4.62/17 1.1 2171419}
27. Rg1 {+6.92/16 0.3 357235} Rb7 {-4.40/18 1.4 2666157} 28. Bf6 {+7.75/21 0.5 611424} Bxg6 {-5.30/17 0.5 1089103}
29. Rxg6 {+7.96/20 0.2 233353} gxf6 {-5.30/18 0.4 809134} 30. Bxf5 {+8.08/21 0.3 363269} exf5 {-5.53/18 0.2 450564}
31. Rxh6+ {+8.40/21 0.4 593581} Qxh6 {-5.78/19 0.7 1519466} 32. Qxh6+ {+8.64/22 0.3 421093} Kg8 {-4.79/19 0.2 498926}
33. exf6 {+8.62/23 0.4 543051} Rf7 {-5.14/17 0.1 288948} 34. Qg6+ {+8.66/21 0.3 374197} Kf8 {-5.97/20 0.4 879663}
35. Ng5 {+8.76/21 0.2 344836} Rcxf6 {-5.95/18 0.2 514069} 36. Nh7+ {+8.75/19 0.2 332470} Rxh7 {-6.34/18 0.2 217938}
37. Qxf6+ {+8.89/19 0.5 729310} Rf7 {-6.59/18 0.2 481860} 38. Qh8+ {+8.93/19 0.4 642739} Ke7 {-7.02/17 0.2 269212}
39. Qe5+ {+8.91/20 0.3 458542} Kd8 {-7.38/17 0.2 442845} 1-0[/pgn]