The new NNUE-net (nn-308..) seems being weaker

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: The new NNUE-net (nn-308..) seems being weaker

Post by corres »

chrisw wrote: Sun Sep 06, 2020 12:37 am
mehmet123 wrote: Sun Sep 06, 2020 12:31 am
Terje wrote: Sat Sep 05, 2020 7:36 pm This thread is just spam, mods should remove it.
I think you are right. No score, no pgn games.
Someone tell me this is just a dream
I waited for you, chrisw, together with your bad dream, till.
Last edited by corres on Sun Sep 06, 2020 1:22 am, edited 1 time in total.
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: The new NNUE-net (nn-308..) seems being weaker

Post by corres »

mehmet123 wrote: Sun Sep 06, 2020 12:31 am
Terje wrote: Sat Sep 05, 2020 7:36 pm This thread is just spam, mods should remove it.
I think you are right. No score, no pgn games.
Maybe you can not read in English?
The score was 3 : 1 for nn-82215.. net in the test what consisted of 100 games. Draw were 96 (obviously).
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: The new NNUE-net (nn-308..) seems being weaker

Post by corres »

Terje wrote: Sat Sep 05, 2020 7:36 pm This thread is just spam, mods should remove it.
If it stings your eyes, ask them.
But you are not that person who can decide about what it is "spam".
Nowhere is writing down that it is obligatory to attach pgn to a post.
mehmet123
Posts: 699
Joined: Sun Jan 26, 2020 10:38 pm
Location: Turkey
Full name: Mehmet Karaman

Re: The new NNUE-net (nn-308..) seems being weaker

Post by mehmet123 »

corres wrote: Sat Sep 05, 2020 9:56 pm
mehmet123 wrote: Sat Sep 05, 2020 7:10 pm
corres wrote: Sat Sep 05, 2020 3:15 pm I made a short (100 games test (TC 1 min + 2 sec/ move) between SF+NNUE with nn-82215..) and SF+NNUE with nn-308..) and I got a result of 3 : 1 for nn-82215..
The number of games are relative few, but watching the games the tendency is obvious.
What's the match result. Is 3:1 means %75 score against SV 1739 /nn-308d71810dff.nnue)
For you maybe nothing.
But I made this test for me.
This is your first answer to my question. It was this strange answer that bothered me.

A few days ago I claimed that SV 1705 net was stronger than the default net (SV 2257) according to my tests. In my tests SV 1705 was +2 elo stronger than defult net. But SV 1705 net was failed at Fishtest ( -3 elo) at 10 sec + 0.6 sec test. Then a new test was done and at this test SV 1705 net beat default net 60 sec + 0.6 tc fishtest (+1 elo).

http://talkchess.com/forum3/viewtopic.p ... 8&start=90
https://PrivateLadyEscorts.com - Live Local Dating - No Verify - Anonymous Casual Dating - Chat Local Singles
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: The new NNUE-net (nn-308..) seems being weaker

Post by corres »

mehmet123 wrote: Sun Sep 06, 2020 1:24 am
corres wrote: Sat Sep 05, 2020 9:56 pm
mehmet123 wrote: Sat Sep 05, 2020 7:10 pm
corres wrote: Sat Sep 05, 2020 3:15 pm I made a short (100 games test (TC 1 min + 2 sec/ move) between SF+NNUE with nn-82215..) and SF+NNUE with nn-308..) and I got a result of 3 : 1 for nn-82215..
The number of games are relative few, but watching the games the tendency is obvious.
What's the match result. Is 3:1 means %75 score against SV 1739 /nn-308d71810dff.nnue)
For you maybe nothing.
But I made this test for me.
This is your first answer to my question. It was this strange answer that bothered me.

A few days ago I claimed that SV 1705 net was stronger than the default net (SV 2257) according to my tests. In my tests SV 1705 was +2 elo stronger than defult net. But SV 1705 net was failed at Fishtest ( -3 elo) at 10 sec + 0.6 sec test. Then a new test was done and at this test SV 1705 net beat default net 60 sec + 0.6 tc fishtest (+1 elo).

http://talkchess.com/forum3/viewtopic.p ... 8&start=90
A question:
From how times 10000 games can you calculate those 1-2 Elo difference?
I know only the the marks of nets from Stockfish developers and not from Sergio.
So what is SV-1705 and SV-2257 nets?
Alayan
Posts: 550
Joined: Tue Nov 19, 2019 8:48 pm
Full name: Alayan Feh

Re: The new NNUE-net (nn-308..) seems being weaker

Post by Alayan »

Choosing heads or tails and flipping a fair coin 4 times, there is a 31.25% probability to lose 0-4 or 1-3.

For an engine A having a long-term probability of winning X% of the decisive (non-draw) games against an engine B, odds of losing 0-4 or 1-3 in a random sample of 4 decisive games :

Code: Select all

- 30% decisive games won, 70% lost => 65.1% of losing a 4-decisive games sample, 34.9% of winning or drawing.
- 40% decisive games won, 60% lost => 52.5% of losing a 4-decisive games sample, 47.5% of winning or drawing.
- 45% decisive games won, 55% lost => 39.1% of losing a 4-decisive games sample, 60.9% of winning or drawing.
- 50% decisive games won, 50% lost => 31.2% of losing a 4-decisive games sample, 68.8% of winning or drawing.
- 55% decisive games won, 45% lost => 24.1% of losing a 4-decisive games sample, 75.9% of winning or drawing.
- 60% decisive games won, 40% lost => 17.9% of losing a 4-decisive games sample, 82.1% of winning or drawing.
- 2 to 1 decisive game win ratio (~66.7% to ~33.3%) =>  11.1% of losing a 4-decisive games sample, 88.9% of winning or drawing.
- 70% decisive games won, 30% lost =>  8.4% of losing a 4-decisive games sample, 91.6% of winning or drawing.
- 80% decisive games won, 20% lost =>  2.7% of losing a 4-decisive games sample, 97.8% of winning or drawing.
- 90% decisive games won, 10% lost =>  0.4% of losing a 4-decisive games sample, 99.6% of winning or drawing.
From your 3-1 results, you can conclude with high confidence that the old net has a double-digit percent chance of winning a decisive game against the new net, and that's it.

Claiming that because it's your test it's enough for you to believe the new net is weaker is missing the point.

You can believe all the BS you want, but as soon as you share it in a forum thread, you open yourself to criticism. Others just skimming through thread titles or not well-versed in statistics might give credit to an outright falsehood. It's not acceptable to spread disinformation even if you didn't mean to harm.
User avatar
MikeB
Posts: 4889
Joined: Thu Mar 09, 2006 6:34 am
Location: Pen Argyl, Pennsylvania

Re: The new NNUE-net (nn-308..) seems being weaker

Post by MikeB »

Time to play nice - let’s hit the pause button. Thanks.
Image
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: The new NNUE-net (nn-308..) seems being weaker

Post by corres »

Alayan wrote: Sun Sep 06, 2020 1:51 am ...
You can believe all the BS you want, but as soon as you share it in a forum thread, you open yourself to criticism. Others just skimming through thread titles or not well-versed in statistics might give credit to an outright falsehood. It's not acceptable to spread disinformation even if you didn't mean to harm.
I get every critics with kindly, but after that I was the man who stated my 100 games test is too few to prove the nn-308... is the weaker, every "critics" is no more than evil-minded attack against me. A typical example for this is the post of Terje, who mix a political site (TCF) to a kind of technical forum.
Maybe he do not like my sentence "I am not a polcorrect man. I used to say the sincere". Yes, terje, I am not a polcorrect man, and I used to say the sincere, you like it or me, or not.
Chrisw know this about me for this was the cause why he feels needing the necessity to bring up wind hire.
Last edited by corres on Sun Sep 06, 2020 10:02 am, edited 2 times in total.
User avatar
yurikvelo
Posts: 710
Joined: Sat Dec 06, 2014 1:53 pm

Re: The new NNUE-net (nn-308..) seems being weaker

Post by yurikvelo »

corres wrote: Sun Sep 06, 2020 12:20 am the gotten result will not show the power difference between the two net? I think it will show.
measured result is +1.06 ELO @ STC and +4.23 ELO @ LTC

To measure such small difference as 1.06 ELO, 108328 games were played.
Fishtest play games in a batches of 200. 554 batches, each 200 games were played.

238 batches (200 games each) out of 554 had MORE wins for older (weaker) NET.
238/554= 43% = expected probability that in your particular 200-game run weaker version will receive more wins.


32 batches (200 games each) had [Loss-Wins > 10]
Weaker net won 32 series (200 games each) by more than 10 netto-wins!

In 2 runs (200 games each) weaker net won by a margin of 20 games:
-20 +40 =140
-12 +32 =156

Impressive -35 ELO regression?!
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: The new NNUE-net (nn-308..) seems being weaker

Post by corres »

yurikvelo wrote: Sun Sep 06, 2020 9:44 am
corres wrote: Sun Sep 06, 2020 12:20 am the gotten result will not show the power difference between the two net? I think it will show.
measured result is +1.06 ELO @ STC and +4.23 ELO @ LTC

To measure such small difference as 1.06 ELO, 108328 games were played.
Fishtest play games in a batches of 200. 554 batches, each 200 games were played.

238 batches (200 games each) out of 554 had MORE wins for older (weaker) NET.
238/554= 43% = expected probability that in your particular 200-game run weaker version will receive more wins.


32 batches (200 games each) had [Loss-Wins > 10]
Weaker net won 32 series (200 games each) by more than 10 netto-wins!

In 2 runs (200 games each) weaker net won by a margin of 20 games:
-20 +40 =140
-12 +32 =156
OK, but these lot of test games was played not on my machine, with not those starting position what I used, not on my moving time.
And - mainly - no one who can decide what net I must use. If somebody can not agree my sentence about that net, ask for those opinion what he likes. That is all.
From my side this stupid debate is ended. Point.