Are we sure that Stockfish NNUE is better than the Normal Stockfish ?

Madeleine Birchfield · Wed Sep 30, 2020 3:40 pm

Vinvin wrote: ↑Wed Sep 30, 2020 11:51 am "As chess is drawish by nature."
Note 1 : this is a belief not a proven fact. We saw this kind of allegation in different times :
- In the years '70s, when Russian GMs made more and more draws.
- When Rybka (around year 2010) was dominating the chess scene, the number of draws was rising so much that many people said that we reached a limit where only draws was possible.
- and so on ...
But now top engines are 500 Elo above Rybka and 1000 Elo above the level of play of '70s

Chess is drawish at the highest level with very long time controls. See correspondence chess.

Chess in fast time controls is not drawish at all, filled with many wins and losses. So is chess with weak players, or chess with a huge strength differential between the players.

Chessqueen · Post by **Chessqueen** » Mon Oct 05, 2020 5:34 pm

Laskos wrote: ↑Wed Sep 02, 2020 10:42 am
DrCliche wrote: ↑Wed Sep 02, 2020 8:46 am Sure enough that three draws (lol) at TCEC shouldn't measurably change our extreme confidence, yes.
Just a different worldview and perceptions. I guess the priors of both of you have a similar mean but very different widths. Both of you expected SF NNUE to be stronger than SF Classic by a similar margin, say 124 Elo points, but with different emotional and mental attitude about your confidence in that.

I pictured the priors here:

After these 3 TCEC draws, the posterior estimate of the difference between SF NNUE and SF Classic in his case dropped to 32 Elo points from a priori 124 Elo points, in your case dropped to 113 Elo points from the same 124 a priori Elo points. So he is legit in asking "Are we sure that Stockfish NNUE is better than the Normal Stockfish ?" and you are right in replying "Sure enough that three draws (lol) at TCEC shouldn't measurably change our extreme confidence, yes."

Can you post a better photo of you, Probably you can find a better photo of you when were a child

Chessqueen · Post by **Chessqueen** » Mon Oct 05, 2020 5:37 pm

Madeleine Birchfield wrote: ↑Wed Sep 30, 2020 3:40 pm
Vinvin wrote: ↑Wed Sep 30, 2020 11:51 am "As chess is drawish by nature."
Note 1 : this is a belief not a proven fact. We saw this kind of allegation in different times :
- In the years '70s, when Russian GMs made more and more draws.
- When Rybka (around year 2010) was dominating the chess scene, the number of draws was rising so much that many people said that we reached a limit where only draws was possible.
- and so on ...
But now top engines are 500 Elo above Rybka and 1000 Elo above the level of play of '70s
Chess is drawish at the highest level with very long time controls. See correspondence chess.

Chess in fast time controls is not drawish at all, filled with many wins and losses. So is chess with weak players, or chess with a huge strength differential between the players.

Stockfish NNUE is NOT really dominating LCZero

==> https://tcec-chess.com/live.html

mwyoung · Post by **mwyoung** » Mon Oct 05, 2020 5:49 pm

mwyoung wrote: ↑Tue Sep 08, 2020 5:29 pm
Milos wrote: ↑Tue Sep 08, 2020 3:10 pm
mwyoung wrote: ↑Sat Sep 05, 2020 5:49 am
Chessqueen wrote: ↑Sat Sep 05, 2020 5:21 am
Dann Corbit wrote: ↑Thu Sep 03, 2020 1:55 am Stockfish nnue has a secret weapon. The Kamehameha blast. Of course, he has to go to level 5 before he can use it. You don't just go Kamehameha blasting stuff willy-nilly.
At the very end it will be LCZero Vs Stockfish NNUE, but I predict a very close encounter of the 3rd kind, LCZero from Planet 1140b Vs StockFish NNUE from Planet Earth, Now I am more convinced than ever
https://tcec-chess.com/live.html
I agree. I just played 200 games with Stockfish 12 Vs Lc0 26.2. Stockfish 12 won by only 24 Elo in 200 games at 3m+2s. And in testing. We can see how badly Stockfish NNUE has scaled in past testing. At longer time controls.

Both are the best chess engines, and the winner may only be decided by hardware and time controls.

The sprinter Stockfish 12 vs. the marathon runner Lc0. Who wins the race. May depend on the distance of the race!

Lc0 is clearly improving faster then Stockfish at this point in time. Even at 3m+2s time controls vs past matches at the same time controls.
Code: Select all
Result:
--------------------------------------------------------------------------
  #  name          games    wins   draws  losses   score    los%  elo+/-
  1. Stockfish 12    200      16     182       2   107.0   100.0    24.4
  2. Lc0 v0.26.2     200       2     182      16    93.0     0.0   -24.4

Cross table:
--------------------------------------------------------------------------
  #  name             score   games                                                                                                                                                                                                        1                                                                                                                                                                                                        2
  1. Stockfish 12     107.0     200                                                                                                                                                                                                        x =====1==1===1====================1======1========11========================1==========================1============================================1====1===1=========1==0================1===1=1==0====
  2. Lc0 v0.26.2       93.0     200 =====0==0===0====================0======0========00========================0==========================0============================================0====0===0=========0==1================0===0=0==1====                                                                                                                                                                                                        x

Tech:
--------------------------------------------------------------------------

Tech (average nodes, depths, time/m per move, others per game), counted for computing moves only, ignored moves with zero nodes:
  #  name            nodes/m         NPS  depth/m   time/m    moves     time
  1. Stockfish 12    125173K    26565996     42.5      4.7     54.1    255.1
  2. Lc0 v0.26.2        101K       20342     10.0      4.9     54.1    267.2
     all ---          61216K    12984844     26.3      4.8     54.1    261.2
For years ppl come up with the BS theory that A/B engines tuned in micro-bullet would be weak in LTC and for years they are so bluntly proven wrong. Impact of eval on horizon effects is minimal and it doesn't change whether you search to depth 20 or depth 100. SF-NN search is SF and SF is proven to scale better than Lc0 (and as a matter of fact any MCTS engine) in LTC. Ergo SF-NN scales better than Lc0 in LTC.
Your claims are simply BS reflecting your cluelessness in the matter. You effectively draw conclusions from STC (just because it's not micro-bullet but blitz instead) with a sample size that is a joke.
The result in the superfinal will be much worse sweep than last year. And then ppl like you would be astonished and would come up with all kind of ridiculous excuses to justify what is basically their cluelessness.
The only one that is clueless here is you. As I test at the longer time controls, as well as short time controls. Along with 1 core testing, and up to 32 threads.

And I am not talking about A/B engine only testing at micro-bullet. And I never have. I am talking about NNUE! And my sample size is huge. This is not my only test. I test non stop.

My conclusion is what the data is showing us, and if it changes all will see that also. I test openly, and to video.

"SF-NN search is SF and SF is proven to scale better than Lc0"

Milos Lc0 looks good so far as expected.

mwyoung · Post by **mwyoung** » Mon Oct 05, 2020 5:56 pm

mwyoung wrote: ↑Mon Oct 05, 2020 5:49 pm
mwyoung wrote: ↑Tue Sep 08, 2020 5:29 pm
Milos wrote: ↑Tue Sep 08, 2020 3:10 pm
mwyoung wrote: ↑Sat Sep 05, 2020 5:49 am
Chessqueen wrote: ↑Sat Sep 05, 2020 5:21 am
Dann Corbit wrote: ↑Thu Sep 03, 2020 1:55 am Stockfish nnue has a secret weapon. The Kamehameha blast. Of course, he has to go to level 5 before he can use it. You don't just go Kamehameha blasting stuff willy-nilly.
At the very end it will be LCZero Vs Stockfish NNUE, but I predict a very close encounter of the 3rd kind, LCZero from Planet 1140b Vs StockFish NNUE from Planet Earth, Now I am more convinced than ever
https://tcec-chess.com/live.html
I agree. I just played 200 games with Stockfish 12 Vs Lc0 26.2. Stockfish 12 won by only 24 Elo in 200 games at 3m+2s. And in testing. We can see how badly Stockfish NNUE has scaled in past testing. At longer time controls.

Both are the best chess engines, and the winner may only be decided by hardware and time controls.

The sprinter Stockfish 12 vs. the marathon runner Lc0. Who wins the race. May depend on the distance of the race!

Lc0 is clearly improving faster then Stockfish at this point in time. Even at 3m+2s time controls vs past matches at the same time controls.
Code: Select all
Result:
--------------------------------------------------------------------------
  #  name          games    wins   draws  losses   score    los%  elo+/-
  1. Stockfish 12    200      16     182       2   107.0   100.0    24.4
  2. Lc0 v0.26.2     200       2     182      16    93.0     0.0   -24.4

Cross table:
--------------------------------------------------------------------------
  #  name             score   games                                                                                                                                                                                                        1                                                                                                                                                                                                        2
  1. Stockfish 12     107.0     200                                                                                                                                                                                                        x =====1==1===1====================1======1========11========================1==========================1============================================1====1===1=========1==0================1===1=1==0====
  2. Lc0 v0.26.2       93.0     200 =====0==0===0====================0======0========00========================0==========================0============================================0====0===0=========0==1================0===0=0==1====                                                                                                                                                                                                        x

Tech:
--------------------------------------------------------------------------

Tech (average nodes, depths, time/m per move, others per game), counted for computing moves only, ignored moves with zero nodes:
  #  name            nodes/m         NPS  depth/m   time/m    moves     time
  1. Stockfish 12    125173K    26565996     42.5      4.7     54.1    255.1
  2. Lc0 v0.26.2        101K       20342     10.0      4.9     54.1    267.2
     all ---          61216K    12984844     26.3      4.8     54.1    261.2
For years ppl come up with the BS theory that A/B engines tuned in micro-bullet would be weak in LTC and for years they are so bluntly proven wrong. Impact of eval on horizon effects is minimal and it doesn't change whether you search to depth 20 or depth 100. SF-NN search is SF and SF is proven to scale better than Lc0 (and as a matter of fact any MCTS engine) in LTC. Ergo SF-NN scales better than Lc0 in LTC.
Your claims are simply BS reflecting your cluelessness in the matter. You effectively draw conclusions from STC (just because it's not micro-bullet but blitz instead) with a sample size that is a joke.
The result in the superfinal will be much worse sweep than last year. And then ppl like you would be astonished and would come up with all kind of ridiculous excuses to justify what is basically their cluelessness.
The only one that is clueless here is you. As I test at the longer time controls, as well as short time controls. Along with 1 core testing, and up to 32 threads.

And I am not talking about A/B engine only testing at micro-bullet. And I never have. I am talking about NNUE! And my sample size is huge. This is not my only test. I test non stop.

My conclusion is what the data is showing us, and if it changes all will see that also. I test openly, and to video.

"SF-NN search is SF and SF is proven to scale better than Lc0"
Milos Lc0 looks good so far as expected.

Lc0 takes the lead after 33 games.

Jouni · Post by **Jouni** » Mon Oct 05, 2020 6:26 pm

Funny ending where SF sees it loses much before Lc0 that it wins! In blitz SF should switch to classic totally?

Cornfed · Post by **Cornfed** » Mon Oct 05, 2020 11:47 pm

mwyoung wrote: ↑Mon Oct 05, 2020 5:56 pm

Lc0 takes the lead after 33 games.

And as quick and the blink of an eye, order is restores....see game 34.

mwyoung · Post by **mwyoung** » Tue Oct 06, 2020 1:36 am

Cornfed wrote: ↑Mon Oct 05, 2020 11:47 pm
mwyoung wrote: ↑Mon Oct 05, 2020 5:56 pm

Lc0 takes the lead after 33 games.
And as quick and the blink of an eye, order is restores....see game 34.

I know. But that is not the point. Milos said SF would crush Lc0. Worse then last year. My testing said no.

Stockfish and Lc0 despite the hype are very close in strength at LTC.

And the reason for this is TCEC uses bias openings. If a match shows a win for both engines in the same line. It is a busted opening. But we already know from TCEC own words. Wins are more important for views. So this is expected.

Nay Lin Tun · Post by **Nay Lin Tun** » Tue Oct 06, 2020 3:47 am

mwyoung wrote: ↑Tue Oct 06, 2020 1:36 am
Cornfed wrote: ↑Mon Oct 05, 2020 11:47 pm
mwyoung wrote: ↑Mon Oct 05, 2020 5:56 pm

Lc0 takes the lead after 33 games.
And as quick and the blink of an eye, order is restores....see game 34.
I know. But that is not the point. Milos said SF would crush Lc0. Worse then last year. My testing said no.

Stockfish and Lc0 despite the hype are very close in strength at LTC.

And the reason for this is TCEC uses bias openings. If a match shows a win for both engines in the same line. It is a busted opening. But we already know from TCEC own words. Wins are more important for views. So this is expected.

It become pretty Normal these days.

Public can easily access Stockfish progress from fishtest and regression tests.

Meanwhile, Leela progress is available on Leela discord only and hard to interpret results from various individual testers.

" If you believe SF had +100 elo recently, you have to believe Leela had +100 elo recently".

In fact both are not true.

corres · Post by **corres** » Tue Oct 06, 2020 11:06 am

Madeleine Birchfield wrote: ↑Wed Sep 30, 2020 3:40 pm ...
Chess is drawish at the highest level with very long time controls. See correspondence chess.
Chess in fast time controls is not drawish at all, filled with many wins and losses. So is chess with weak players, or chess with a huge strength differential between the players.

I totally agree you.

Are we sure that Stockfish NNUE is better than the Normal Stockfish ?

Re: Are we sure that Stockfish NNUE is better than the Normal Stockfish ?

Re: Are we sure that Stockfish NNUE is better than the Normal Stockfish ?

Re: Are we sure that Stockfish NNUE is better than the Normal Stockfish ?

Re: Are we sure that Stockfish NNUE is better than the Normal Stockfish ?

Re: Are we sure that Stockfish NNUE is better than the Normal Stockfish ?

Re: Are we sure that Stockfish NNUE is better than the Normal Stockfish ?

Re: Are we sure that Stockfish NNUE is better than the Normal Stockfish ?

Re: Are we sure that Stockfish NNUE is better than the Normal Stockfish ?

Re: Are we sure that Stockfish NNUE is better than the Normal Stockfish ?

Re: Are we sure that Stockfish NNUE is better than the Normal Stockfish ?