Are we sure that Stockfish NNUE is better than the Normal Stockfish ?

RogerC · Post by **RogerC** » Thu Sep 10, 2020 12:53 am

Here are the results of SF 12 on just 1CPU, 1 Thread :

CCRL BLITZ : SF12 n°1 above LC0 +49 ELO
http://ccrl.chessdom.com/ccrl/404/

CEGT 40/20 (Very Long Time Control) : SF12 is above LC0 +32 ELO
http://www.cegt.net/40_40%20Rating%20Li ... liste.html

So yes SF with NNUE net is the best Chess Engine, at this time. The combination of Classical evaluations and Neural network is the best as it is evaluating very fast with only a single thread CPU that consume just a dozen watts of energy. No need of heavy and costly big Turing GPUs with 4352 CUBA cores and 250 watts of energy to play with.

Everybody have to notice that all benchmarks above are just on 1 CPU and 1 thread. So on 8 CPU/Thread, Stockfish will go further in the A/B search tree classical and neural net, and will be at least +50 ELO points.

Also, NNUE nets are less than 2 months old. Wait 1 year of Reinforcement learning (NNUE has a structure that allow updating the net without growing in size) ;

It will grow in strenght very fast with tuning almost every day on the master branch in the git repository!

On just 1 month SFdev ELO has grown +50 on 1 CPU core (+83 ELO compare to SF11 on 06/08/20 / +133 ELO 02/09/20 when SF12 is launch)

mwyoung · Post by **mwyoung** » Thu Sep 10, 2020 2:32 am

RogerC wrote: ↑Thu Sep 10, 2020 12:53 am Here are the results of SF 12 on just 1CPU, 1 Thread :

CCRL BLITZ : SF12 n°1 above LC0 +49 ELO
http://ccrl.chessdom.com/ccrl/404/

CEGT 40/20 (Very Long Time Control) : SF12 is above LC0 +32 ELO
http://www.cegt.net/40_40%20Rating%20Li ... liste.html

So yes SF with NNUE net is the best Chess Engine, at this time. The combination of Classical evaluations and Neural network is the best as it is evaluating very fast with only a single thread CPU that consume just a dozen watts of energy. No need of heavy and costly big Turing GPUs with 4352 CUBA cores and 250 watts of energy to play with.

Everybody have to notice that all benchmarks above are just on 1 CPU and 1 thread. So on 8 CPU/Thread, Stockfish will go further in the A/B search tree classical and neural net, and will be at least +50 ELO points.

Also, NNUE nets are less than 2 months old. Wait 1 year of Reinforcement learning (NNUE has a structure that allow updating the net without growing in size) ;

It will grow in strenght very fast with tuning almost every day on the master branch in the git repository!

On just 1 month SFdev ELO has grown +50 on 1 CPU core (+83 ELO compare to SF11 on 06/08/20 / +133 ELO 02/09/20 when SF12 is launch)

Thanks for the data!

It should be noted that both testing sites are using old versions of Lc0, and are using subpar NN in the testing. Things change fast! I know it is hard to keep up for everyone.

CCRL - Lc0 0.25.1 t40-1541 RTX2080

CEGT - LCZero 0.25.1 Cuda (LS15.0)

RogerC · Post by **RogerC** » Thu Sep 10, 2020 3:42 am

mwyoung wrote: ↑Thu Sep 10, 2020 2:32 am

Thanks for the data!

It should be noted that both testing sites are using old versions of Lc0, and are using subpar NN in the testing. Things change fast! I know it is hard to keep up for everyone.

CCRL - Lc0 0.25.1 t40-1541 RTX2080

CEGT - LCZero 0.25.1 Cuda (LS15.0)

Yes i noticed also. The best net on LC0 0.26 is JD 0.92-130 that has a score of 339671 as of Stefan Pohl testing :
https://www.sp-cc.de/files/mea_1node_30x384.txt

But lc0 0.25.1 LS 15 (the best 20x256 net) has a score of 330153 points (less than -3% difference from JD 0.92-130) so it is not much big difference.
https://www.sp-cc.de/files/mea_1node_20x256.txt

The problem is that the NN nets for LC0 are huge if you want the best ! Compare to NNUE nets : just 20MB...

mwyoung · Post by **mwyoung** » Thu Sep 10, 2020 3:43 am

Werewolf wrote: ↑Tue Sep 08, 2020 2:31 pm
mwyoung wrote: ↑Sat Sep 05, 2020 5:49 am

Lc0 is clearly improving faster then Stockfish at this point in time. Even at 3m+2s time controls vs past matches at the same time controls.

Is this definitely true? The T60 graph has been flat for ages. I know there are other improvements outside of the nets, but this is quite a big thing

That is why I don't use them anymore. Try these, J92-145 was just released today.
https://github.com/jhorthos/lczero-trai ... a-Training

Here is his rating chart, I am testing J92-145 now with real TC.

Code: Select all

SF11.5 = stockfish_20061707
   # PLAYER             :  RATING  ERROR  PLAYED   (%)      W      L      D  D(%)  CFS(%)
   1 lc0.net.J92-145    :    16.7    8.4    3093    52    789    645   1659    54      95
   2 lc0.net.J92-130    :     8.5    5.3    8000    51   1987   1798   4215    53      61
   3 lc0.net.J92-120    :     7.4    5.2    8000    51   1930   1764   4306    54      58
   4 lc0.net.J92-115    :     6.6    5.5    8000    51   1971   1823   4206    53      55
   5 lc0.net.J92-70     :     6.2    4.3   12000    51   2924   2716   6360    53      54
   6 lc0.net.J92-100    :     5.9    4.3   12000    51   2891   2693   6416    53      79
   7 lc0.net.J92-85     :     3.1    5.3    8000    50   1924   1854   4222    53      68
   8 lc0.net.J92-55     :     1.3    5.1    8000    50   1912   1882   4206    53      70
   9 SF11.5             :     0.0   ----  115093    50  26969  27002  61122    53      79
  10 lc0.net.J92-40     :    -2.3    5.5    8000    50   1862   1913   4225    53      85
  11 lc0.net.J92-25     :    -6.3    5.2    8000    49   1794   1934   4272    53      56
  12 lc0.net.J92-20     :    -6.9    5.2    8000    49   1790   1943   4267    53      75
  13 lc0.net.SV4300     :    -9.5    5.4    8000    49   1770   1982   4248    53      63
  14 lc0.net.SV4585     :   -10.7    5.2    8000    49   1748   1987   4265    53      84
  15 lc0.net.SV4619     :   -14.6    5.2    8000    48   1710   2035   4255    53     ---

mwyoung · Post by **mwyoung** » Thu Sep 10, 2020 4:55 am

RogerC wrote: ↑Thu Sep 10, 2020 3:42 am
mwyoung wrote: ↑Thu Sep 10, 2020 2:32 am

Thanks for the data!

It should be noted that both testing sites are using old versions of Lc0, and are using subpar NN in the testing. Things change fast! I know it is hard to keep up for everyone.

CCRL - Lc0 0.25.1 t40-1541 RTX2080

CEGT - LCZero 0.25.1 Cuda (LS15.0)
The problem is that the NN nets for LC0 are huge if you want the best ! Compare to NNUE nets : just 20MB...

I don't understand. A problem for who? I am just a guy with a computer like anyone else, or CCRL, or CEGT. How hard is it to keep up to date. They can keep SF up to date, but run old versions of Lc0 and NN. Poor testing IMO. For the best 2 engines on the planet.

corres · Post by **corres** » Thu Sep 10, 2020 10:16 am

As the basement of NNUE-net is the Stockfish evaluation, the chess power of NNUE-net is restricted. During the process of reinforcement learning the NNUE-net does not get new information so the enhancement of chess power of NNUE-net is also restricted. I think the power of SF+NNUE may reach near its height and now the development of Classical Stockfish can enhance the chess power of SF+NNUE. But it is a much more slower course than you can expect from the earlier results.

RogerC · Post by **RogerC** » Thu Sep 10, 2020 8:32 pm

corres wrote: ↑Thu Sep 10, 2020 10:16 am As the basement of NNUE-net is the Stockfish evaluation, the chess power of NNUE-net is restricted. During the process of reinforcement learning the NNUE-net does not get new information so the enhancement of chess power of NNUE-net is also restricted. I think the power of SF+NNUE may reach near its height and now the development of Classical Stockfish can enhance the chess power of SF+NNUE. But it is a much more slower course than you can expect from the earlier results.

I don't think chess power of NNUE net will be restricted :

- SFdev is 204 machines 1464 cores 1590.98M total nps and 1403 games/minute testing lots of tuning to bring more and more ELO nealy every day. Stockfish evaluations are more accurate day after day.

- NNUE nets are based on SF11/8 plies evaluations for now. A new bench of learning enhancement will bring SF12 evaluations into new NNUE nets so they will necessarily be more powefull.

- All nets are based on 8 plies evaluations, just push evaluations and learning on 12 plies and NNUE net will necessarily be more powefull.

SF12dev has already +7.2 ELO in just 1 week : https://tests.stockfishchess.org/tests/ ... c2a401eb7c. And a new net has been validated and embedded since SF12.

mwyoung · Post by **mwyoung** » Thu Sep 10, 2020 8:52 pm

mwyoung wrote: ↑Tue Sep 08, 2020 6:21 pm
mwyoung wrote: ↑Tue Sep 08, 2020 5:29 pm
Milos wrote: ↑Tue Sep 08, 2020 3:10 pm
mwyoung wrote: ↑Sat Sep 05, 2020 5:49 am
Chessqueen wrote: ↑Sat Sep 05, 2020 5:21 am
Dann Corbit wrote: ↑Thu Sep 03, 2020 1:55 am Stockfish nnue has a secret weapon. The Kamehameha blast. Of course, he has to go to level 5 before he can use it. You don't just go Kamehameha blasting stuff willy-nilly.
At the very end it will be LCZero Vs Stockfish NNUE, but I predict a very close encounter of the 3rd kind, LCZero from Planet 1140b Vs StockFish NNUE from Planet Earth, Now I am more convinced than ever
https://tcec-chess.com/live.html
I agree. I just played 200 games with Stockfish 12 Vs Lc0 26.2. Stockfish 12 won by only 24 Elo in 200 games at 3m+2s. And in testing. We can see how badly Stockfish NNUE has scaled in past testing. At longer time controls.

Both are the best chess engines, and the winner may only be decided by hardware and time controls.

The sprinter Stockfish 12 vs. the marathon runner Lc0. Who wins the race. May depend on the distance of the race!

Lc0 is clearly improving faster then Stockfish at this point in time. Even at 3m+2s time controls vs past matches at the same time controls.
Code: Select all
Result:
--------------------------------------------------------------------------
  #  name          games    wins   draws  losses   score    los%  elo+/-
  1. Stockfish 12    200      16     182       2   107.0   100.0    24.4
  2. Lc0 v0.26.2     200       2     182      16    93.0     0.0   -24.4

Cross table:
--------------------------------------------------------------------------
  #  name             score   games                                                                                                                                                                                                        1                                                                                                                                                                                                        2
  1. Stockfish 12     107.0     200                                                                                                                                                                                                        x =====1==1===1====================1======1========11========================1==========================1============================================1====1===1=========1==0================1===1=1==0====
  2. Lc0 v0.26.2       93.0     200 =====0==0===0====================0======0========00========================0==========================0============================================0====0===0=========0==1================0===0=0==1====                                                                                                                                                                                                        x

Tech:
--------------------------------------------------------------------------

Tech (average nodes, depths, time/m per move, others per game), counted for computing moves only, ignored moves with zero nodes:
  #  name            nodes/m         NPS  depth/m   time/m    moves     time
  1. Stockfish 12    125173K    26565996     42.5      4.7     54.1    255.1
  2. Lc0 v0.26.2        101K       20342     10.0      4.9     54.1    267.2
     all ---          61216K    12984844     26.3      4.8     54.1    261.2
For years ppl come up with the BS theory that A/B engines tuned in micro-bullet would be weak in LTC and for years they are so bluntly proven wrong. Impact of eval on horizon effects is minimal and it doesn't change whether you search to depth 20 or depth 100. SF-NN search is SF and SF is proven to scale better than Lc0 (and as a matter of fact any MCTS engine) in LTC. Ergo SF-NN scales better than Lc0 in LTC.
Your claims are simply BS reflecting your cluelessness in the matter. You effectively draw conclusions from STC (just because it's not micro-bullet but blitz instead) with a sample size that is a joke.
The result in the superfinal will be much worse sweep than last year. And then ppl like you would be astonished and would come up with all kind of ridiculous excuses to justify what is basically their cluelessness.
The only one that is clueless here is you. As I test at the longer time controls, as well as short time controls. Along with 1 core testing, and up to 32 threads.

And I am not talking about A/B engine only testing at micro-bullet. And I never have. I am talking about NNUE! And my sample size is huge. This is not my only test. I test non stop.

My conclusion is what the data is showing us, and if it changes all will see that also. I test openly, and to video.

"SF-NN search is SF and SF is proven to scale better than Lc0"
"The result in the superfinal will be much worse sweep than last year. And then ppl like you would be astonished and would come up with all kind of ridiculous excuses to justify what is basically their cluelessness"

For reference here are the results of last season's superfinal...

TCEC Season 184 May 2020 – Jul 2020 Stockfish 202006170741 LCZero v0.25.1-svjio-t60-3972-mlh + 23 = 61 - 16
Code: Select all
Wins	Losses	Draws
23
16
61
Points		Games
53.5
/	
100
Winning percentage
53.5

Elo difference : 
+24

Milos when can we expect to see Stockfish 12's epic domination at TCEC at long time controls. Will it be any time soon?

https://tcec-chess.com/live.html

Code: Select all

1	
Stockfish
202008260719_nn-82215d0fd0df
14.5		
 
 
 
 
 
 
 
 
 
2	
LCZero
v0.26.2-rc1_J92-100
14.5	
 
 
 
 
 
 
 
 
 
 
3	
AllieStein
v0.8-120f959_net-15.0
12.5	
 
 
 
 
 
 
 
 
 
4	
Stoofvlees
II a14
11.5	
 
 
 
 
 
 
 
 
 
5	
ScorpioNN
3.0.8.3
10.5	
 
 
 
 
 
 
 
 
 
 
6	
Ethereal
12.43
10	
 
 
 
 
 
 
 
 
 
7	
Fire
8_beta
10	
 
 
 
 
 
 
 
 
 
8	
Komodo
2576.00
9.5

RogerC · Post by **RogerC** » Sat Sep 26, 2020 8:52 pm

SF dev : +13.9 ELO vs SF12 in just 19 days !

During this time 2 new NNUE nets and lots of AB search improvements.

SFNNUE development is the fastest of all versions of SF :

https://github.com/glinscott/fishtest/w ... sion-Tests

syzygy · Post by **syzygy** » Sat Sep 26, 2020 10:11 pm

corres wrote: ↑Thu Sep 10, 2020 10:16 am As the basement of NNUE-net is the Stockfish evaluation, the chess power of NNUE-net is restricted.

Alpha Zero and LC0 both started out with zero, so I don't see what could restrict NNUE apart from the size and structure of the net.

During the process of reinforcement learning the NNUE-net does not get new information so the enhancement of chess power of NNUE-net is also restricted.

It does get new information from the results of the searches on which it is trained.

How do you think Alpha Zero got and LC0 gets "new information"?

Are we sure that Stockfish NNUE is better than the Normal Stockfish ?

Re: Are we sure that Stockfish NNUE is better than the Normal Stockfish ?

Re: Are we sure that Stockfish NNUE is better than the Normal Stockfish ?

Re: Are we sure that Stockfish NNUE is better than the Normal Stockfish ?

Re: Are we sure that Stockfish NNUE is better than the Normal Stockfish ?

Re: Are we sure that Stockfish NNUE is better than the Normal Stockfish ?

Re: Are we sure that Stockfish NNUE is better than the Normal Stockfish ?

Re: Are we sure that Stockfish NNUE is better than the Normal Stockfish ?

Re: Are we sure that Stockfish NNUE is better than the Normal Stockfish ?

Re: Are we sure that Stockfish NNUE is better than the Normal Stockfish ?

Re: Are we sure that Stockfish NNUE is better than the Normal Stockfish ?