The new NNUE-net (nn-308..) seems being weaker

corres · Post by **corres** » Sun Sep 06, 2020 12:56 am

chrisw wrote: ↑Sun Sep 06, 2020 12:37 am
mehmet123 wrote: ↑Sun Sep 06, 2020 12:31 am
Terje wrote: ↑Sat Sep 05, 2020 7:36 pm This thread is just spam, mods should remove it.
I think you are right. No score, no pgn games.
Someone tell me this is just a dream

I waited for you, chrisw, together with your bad dream, till.

corres · Post by **corres** » Sun Sep 06, 2020 1:06 am

mehmet123 wrote: ↑Sun Sep 06, 2020 12:31 am
Terje wrote: ↑Sat Sep 05, 2020 7:36 pm This thread is just spam, mods should remove it.
I think you are right. No score, no pgn games.

Maybe you can not read in English?
The score was 3 : 1 for nn-82215.. net in the test what consisted of 100 games. Draw were 96 (obviously).

corres · Post by **corres** » Sun Sep 06, 2020 1:19 am

Terje wrote: ↑Sat Sep 05, 2020 7:36 pm This thread is just spam, mods should remove it.

If it stings your eyes, ask them.
But you are not that person who can decide about what it is "spam".
Nowhere is writing down that it is obligatory to attach pgn to a post.

mehmet123 · Post by **mehmet123** » Sun Sep 06, 2020 1:24 am

corres wrote: ↑Sat Sep 05, 2020 9:56 pm
mehmet123 wrote: ↑Sat Sep 05, 2020 7:10 pm
corres wrote: ↑Sat Sep 05, 2020 3:15 pm I made a short (100 games test (TC 1 min + 2 sec/ move) between SF+NNUE with nn-82215..) and SF+NNUE with nn-308..) and I got a result of 3 : 1 for nn-82215..
The number of games are relative few, but watching the games the tendency is obvious.
What's the match result. Is 3:1 means %75 score against SV 1739 /nn-308d71810dff.nnue)
For you maybe nothing.
But I made this test for me.

This is your first answer to my question. It was this strange answer that bothered me.

A few days ago I claimed that SV 1705 net was stronger than the default net (SV 2257) according to my tests. In my tests SV 1705 was +2 elo stronger than defult net. But SV 1705 net was failed at Fishtest ( -3 elo) at 10 sec + 0.6 sec test. Then a new test was done and at this test SV 1705 net beat default net 60 sec + 0.6 tc fishtest (+1 elo).

http://talkchess.com/forum3/viewtopic.p ... 8&start=90

corres · Post by **corres** » Sun Sep 06, 2020 1:49 am

mehmet123 wrote: ↑Sun Sep 06, 2020 1:24 am
corres wrote: ↑Sat Sep 05, 2020 9:56 pm
mehmet123 wrote: ↑Sat Sep 05, 2020 7:10 pm
corres wrote: ↑Sat Sep 05, 2020 3:15 pm I made a short (100 games test (TC 1 min + 2 sec/ move) between SF+NNUE with nn-82215..) and SF+NNUE with nn-308..) and I got a result of 3 : 1 for nn-82215..
The number of games are relative few, but watching the games the tendency is obvious.
What's the match result. Is 3:1 means %75 score against SV 1739 /nn-308d71810dff.nnue)
For you maybe nothing.
But I made this test for me.
This is your first answer to my question. It was this strange answer that bothered me.

A few days ago I claimed that SV 1705 net was stronger than the default net (SV 2257) according to my tests. In my tests SV 1705 was +2 elo stronger than defult net. But SV 1705 net was failed at Fishtest ( -3 elo) at 10 sec + 0.6 sec test. Then a new test was done and at this test SV 1705 net beat default net 60 sec + 0.6 tc fishtest (+1 elo).

http://talkchess.com/forum3/viewtopic.p ... 8&start=90

A question:
From how times 10000 games can you calculate those 1-2 Elo difference?
I know only the the marks of nets from Stockfish developers and not from Sergio.
So what is SV-1705 and SV-2257 nets?

Alayan · Post by **Alayan** » Sun Sep 06, 2020 1:51 am

Choosing heads or tails and flipping a fair coin 4 times, there is a 31.25% probability to lose 0-4 or 1-3.

For an engine A having a long-term probability of winning X% of the decisive (non-draw) games against an engine B, odds of losing 0-4 or 1-3 in a random sample of 4 decisive games :

Code: Select all

- 30% decisive games won, 70% lost => 65.1% of losing a 4-decisive games sample, 34.9% of winning or drawing.
- 40% decisive games won, 60% lost => 52.5% of losing a 4-decisive games sample, 47.5% of winning or drawing.
- 45% decisive games won, 55% lost => 39.1% of losing a 4-decisive games sample, 60.9% of winning or drawing.
- 50% decisive games won, 50% lost => 31.2% of losing a 4-decisive games sample, 68.8% of winning or drawing.
- 55% decisive games won, 45% lost => 24.1% of losing a 4-decisive games sample, 75.9% of winning or drawing.
- 60% decisive games won, 40% lost => 17.9% of losing a 4-decisive games sample, 82.1% of winning or drawing.
- 2 to 1 decisive game win ratio (~66.7% to ~33.3%) =>  11.1% of losing a 4-decisive games sample, 88.9% of winning or drawing.
- 70% decisive games won, 30% lost =>  8.4% of losing a 4-decisive games sample, 91.6% of winning or drawing.
- 80% decisive games won, 20% lost =>  2.7% of losing a 4-decisive games sample, 97.8% of winning or drawing.
- 90% decisive games won, 10% lost =>  0.4% of losing a 4-decisive games sample, 99.6% of winning or drawing.

From your 3-1 results, you can conclude with high confidence that the old net has a double-digit percent chance of winning a decisive game against the new net, and that's it.

Claiming that because it's your test it's enough for you to believe the new net is weaker is missing the point.

You can believe all the BS you want, but as soon as you share it in a forum thread, you open yourself to criticism. Others just skimming through thread titles or not well-versed in statistics might give credit to an outright falsehood. It's not acceptable to spread disinformation even if you didn't mean to harm.

MikeB · Post by **MikeB** » Sun Sep 06, 2020 2:14 am

Time to play nice - let’s hit the pause button. Thanks.

corres · Post by **corres** » Sun Sep 06, 2020 9:39 am

Alayan wrote: ↑Sun Sep 06, 2020 1:51 am ...
You can believe all the BS you want, but as soon as you share it in a forum thread, you open yourself to criticism. Others just skimming through thread titles or not well-versed in statistics might give credit to an outright falsehood. It's not acceptable to spread disinformation even if you didn't mean to harm.

I get every critics with kindly, but after that I was the man who stated my 100 games test is too few to prove the nn-308... is the weaker, every "critics" is no more than evil-minded attack against me. A typical example for this is the post of Terje, who mix a political site (TCF) to a kind of technical forum.
Maybe he do not like my sentence "I am not a polcorrect man. I used to say the sincere". Yes, terje, I am not a polcorrect man, and I used to say the sincere, you like it or me, or not.
Chrisw know this about me for this was the cause why he feels needing the necessity to bring up wind hire.

yurikvelo · Post by **yurikvelo** » Sun Sep 06, 2020 9:44 am

corres wrote: ↑Sun Sep 06, 2020 12:20 am the gotten result will not show the power difference between the two net? I think it will show.

measured result is +1.06 ELO @ STC and +4.23 ELO @ LTC

To measure such small difference as 1.06 ELO, 108328 games were played.
Fishtest play games in a batches of 200. 554 batches, each 200 games were played.

238 batches (200 games each) out of 554 had MORE wins for older (weaker) NET.
238/554= 43% = expected probability that in your particular 200-game run weaker version will receive more wins.

32 batches (200 games each) had [Loss-Wins > 10]
Weaker net won 32 series (200 games each) by more than 10 netto-wins!

In 2 runs (200 games each) weaker net won by a margin of 20 games:
-20 +40 =140
-12 +32 =156

Impressive -35 ELO regression?!

corres · Post by **corres** » Sun Sep 06, 2020 9:57 am

yurikvelo wrote: ↑Sun Sep 06, 2020 9:44 am
corres wrote: ↑Sun Sep 06, 2020 12:20 am the gotten result will not show the power difference between the two net? I think it will show.
measured result is +1.06 ELO @ STC and +4.23 ELO @ LTC

To measure such small difference as 1.06 ELO, 108328 games were played.
Fishtest play games in a batches of 200. 554 batches, each 200 games were played.

238 batches (200 games each) out of 554 had MORE wins for older (weaker) NET.
238/554= 43% = expected probability that in your particular 200-game run weaker version will receive more wins.

32 batches (200 games each) had [Loss-Wins > 10]
Weaker net won 32 series (200 games each) by more than 10 netto-wins!

In 2 runs (200 games each) weaker net won by a margin of 20 games:
-20 +40 =140
-12 +32 =156

OK, but these lot of test games was played not on my machine, with not those starting position what I used, not on my moving time.
And - mainly - no one who can decide what net I must use. If somebody can not agree my sentence about that net, ask for those opinion what he likes. That is all.
From my side this stupid debate is ended. Point.

The new NNUE-net (nn-308..) seems being weaker

Re: The new NNUE-net (nn-308..) seems being weaker

Re: The new NNUE-net (nn-308..) seems being weaker

Re: The new NNUE-net (nn-308..) seems being weaker

Re: The new NNUE-net (nn-308..) seems being weaker

Re: The new NNUE-net (nn-308..) seems being weaker

Re: The new NNUE-net (nn-308..) seems being weaker

Re: The new NNUE-net (nn-308..) seems being weaker

Re: The new NNUE-net (nn-308..) seems being weaker

Re: The new NNUE-net (nn-308..) seems being weaker

Re: The new NNUE-net (nn-308..) seems being weaker