The new NNUE-net (nn-308..) seems being weaker

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

User avatar
yurikvelo
Posts: 710
Joined: Sat Dec 06, 2014 1:53 pm

Re: The new NNUE-net (nn-308..) seems being weaker

Post by yurikvelo »

OK, but these lot of test games was played not on my machine, with not those starting position what I used, not on my moving time.
still 43% for weaker net to get more wins
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: The new NNUE-net (nn-308..) seems being weaker

Post by corres »

yurikvelo wrote: Sun Sep 06, 2020 10:26 am
OK, but these lot of test games was played not on my machine, with not those starting position what I used, not on my moving time.
still 43% for weaker net to get more wins
So you use it with good health...
chrisw
Posts: 4910
Joined: Tue Apr 03, 2012 4:28 pm
Location: Anywhere but the Western Empire
Full name: Christopher Whittington

Re: The new NNUE-net (nn-308..) seems being weaker

Post by chrisw »

Alayan wrote: Sun Sep 06, 2020 1:51 am Choosing heads or tails and flipping a fair coin 4 times, there is a 31.25% probability to lose 0-4 or 1-3.

For an engine A having a long-term probability of winning X% of the decisive (non-draw) games against an engine B, odds of losing 0-4 or 1-3 in a random sample of 4 decisive games :

Code: Select all

- 30% decisive games won, 70% lost => 65.1% of losing a 4-decisive games sample, 34.9% of winning or drawing.
- 40% decisive games won, 60% lost => 52.5% of losing a 4-decisive games sample, 47.5% of winning or drawing.
- 45% decisive games won, 55% lost => 39.1% of losing a 4-decisive games sample, 60.9% of winning or drawing.
- 50% decisive games won, 50% lost => 31.2% of losing a 4-decisive games sample, 68.8% of winning or drawing.
- 55% decisive games won, 45% lost => 24.1% of losing a 4-decisive games sample, 75.9% of winning or drawing.
- 60% decisive games won, 40% lost => 17.9% of losing a 4-decisive games sample, 82.1% of winning or drawing.
- 2 to 1 decisive game win ratio (~66.7% to ~33.3%) =>  11.1% of losing a 4-decisive games sample, 88.9% of winning or drawing.
- 70% decisive games won, 30% lost =>  8.4% of losing a 4-decisive games sample, 91.6% of winning or drawing.
- 80% decisive games won, 20% lost =>  2.7% of losing a 4-decisive games sample, 97.8% of winning or drawing.
- 90% decisive games won, 10% lost =>  0.4% of losing a 4-decisive games sample, 99.6% of winning or drawing.
From your 3-1 results, you can conclude with high confidence that the old net has a double-digit percent chance of winning a decisive game against the new net, and that's it.

Claiming that because it's your test it's enough for you to believe the new net is weaker is missing the point.

You can believe all the BS you want, but as soon as you share it in a forum thread, you open yourself to criticism.
Sure, but not to the extent of being accused of “spam” nor to the extent of having his post deleted.

Others just skimming through thread titles or not well-versed in statistics might give credit to an outright falsehood.
Outright falsehood is a wild exaggeration. Unless he actually made the numbers up which is a) extremely unlikely and b) completely unprovable, even the word “falsehood” is inappropriate, numbers are numbers, the data is what it is, results are results.

It's not acceptable to spread disinformation even if you didn't mean to harm.
Information is not disinformation. The data is what it is. He also qualified it with size of data btw.

All the comments being made, yours included, are wildly critical, unsustainable without also asserting Corres is a liar in this case, and are, in the grand scheme of things, basically a giant so-what. A very hostile and threatening thread on one person, corres, has been generated and kept going by several people. It’s not very pleasant. And for what? The status of a set of network weights?

A 10000 run says X, a 100 run says Y. This does not mean that an informational post about a 100 run should be banned.

Your stats about stats of sets are well written and useful btw.
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: The new NNUE-net (nn-308..) seems being weaker

Post by corres »

chrisw wrote: Sun Sep 06, 2020 11:17 am ...
All the comments being made, yours included, are wildly critical, unsustainable without also asserting Corres is a liar in this case, and are, in the grand scheme of things, basically a giant so-what. A very hostile and threatening thread on one person, corres, has been generated and kept going by several people. It’s not very pleasant. And for what? The status of a set of network weights?
...
In this mode the mass is working...
The mud slinging, even if it has no any basement is a good opportunity to conduct down the mental tension of the post writers.
Terje
Posts: 347
Joined: Tue Nov 19, 2019 4:34 am
Location: https://github.com/TerjeKir/weiss
Full name: Terje Kirstihagen

Re: The new NNUE-net (nn-308..) seems being weaker

Post by Terje »

chrisw wrote: Sun Sep 06, 2020 11:17 am
Alayan wrote: Sun Sep 06, 2020 1:51 am Choosing heads or tails and flipping a fair coin 4 times, there is a 31.25% probability to lose 0-4 or 1-3.

For an engine A having a long-term probability of winning X% of the decisive (non-draw) games against an engine B, odds of losing 0-4 or 1-3 in a random sample of 4 decisive games :

Code: Select all

- 30% decisive games won, 70% lost => 65.1% of losing a 4-decisive games sample, 34.9% of winning or drawing.
- 40% decisive games won, 60% lost => 52.5% of losing a 4-decisive games sample, 47.5% of winning or drawing.
- 45% decisive games won, 55% lost => 39.1% of losing a 4-decisive games sample, 60.9% of winning or drawing.
- 50% decisive games won, 50% lost => 31.2% of losing a 4-decisive games sample, 68.8% of winning or drawing.
- 55% decisive games won, 45% lost => 24.1% of losing a 4-decisive games sample, 75.9% of winning or drawing.
- 60% decisive games won, 40% lost => 17.9% of losing a 4-decisive games sample, 82.1% of winning or drawing.
- 2 to 1 decisive game win ratio (~66.7% to ~33.3%) =>  11.1% of losing a 4-decisive games sample, 88.9% of winning or drawing.
- 70% decisive games won, 30% lost =>  8.4% of losing a 4-decisive games sample, 91.6% of winning or drawing.
- 80% decisive games won, 20% lost =>  2.7% of losing a 4-decisive games sample, 97.8% of winning or drawing.
- 90% decisive games won, 10% lost =>  0.4% of losing a 4-decisive games sample, 99.6% of winning or drawing.
From your 3-1 results, you can conclude with high confidence that the old net has a double-digit percent chance of winning a decisive game against the new net, and that's it.

Claiming that because it's your test it's enough for you to believe the new net is weaker is missing the point.

You can believe all the BS you want, but as soon as you share it in a forum thread, you open yourself to criticism.
Sure, but not to the extent of being accused of “spam” nor to the extent of having his post deleted.
By spam I mean it is "pointless".

The thread title implies SF devs may have made a mistake switching to a weaker net, but there is no worthwhile evidence for that to be found here. I, along with others, have already in a different thread (on leela kiudee settings vs at the time current settings) not too long ago explained to him that a sample size like this means literally nothing for 2 engines of similar strength.

Also, you're making it sound like a post being removed is a big deal when it's really not. An attempt was made at making something useful, turns out it wasn't, that's all.
corres wrote: Sun Sep 06, 2020 9:39 am
I get every critics with kindly, but after that I was the man who stated my 100 games test is too few to prove the nn-308... is the weaker, every "critics" is no more than evil-minded attack against me. A typical example for this is the post of Terje, who mix a political site (TCF) to a kind of technical forum.
Maybe he do not like my sentence "I am not a polcorrect man. I used to say the sincere". Yes, terje, I am not a polcorrect man, and I used to say the sincere, you like it or me, or not.
Finally, I'm 'attacking' your post for being pointless, not you personally, please stop trying to frame it as an attack on your character.
chrisw
Posts: 4910
Joined: Tue Apr 03, 2012 4:28 pm
Location: Anywhere but the Western Empire
Full name: Christopher Whittington

Re: The new NNUE-net (nn-308..) seems being weaker

Post by chrisw »

Terje wrote: Sun Sep 06, 2020 3:52 pm
chrisw wrote: Sun Sep 06, 2020 11:17 am
Alayan wrote: Sun Sep 06, 2020 1:51 am Choosing heads or tails and flipping a fair coin 4 times, there is a 31.25% probability to lose 0-4 or 1-3.

For an engine A having a long-term probability of winning X% of the decisive (non-draw) games against an engine B, odds of losing 0-4 or 1-3 in a random sample of 4 decisive games :

Code: Select all

- 30% decisive games won, 70% lost => 65.1% of losing a 4-decisive games sample, 34.9% of winning or drawing.
- 40% decisive games won, 60% lost => 52.5% of losing a 4-decisive games sample, 47.5% of winning or drawing.
- 45% decisive games won, 55% lost => 39.1% of losing a 4-decisive games sample, 60.9% of winning or drawing.
- 50% decisive games won, 50% lost => 31.2% of losing a 4-decisive games sample, 68.8% of winning or drawing.
- 55% decisive games won, 45% lost => 24.1% of losing a 4-decisive games sample, 75.9% of winning or drawing.
- 60% decisive games won, 40% lost => 17.9% of losing a 4-decisive games sample, 82.1% of winning or drawing.
- 2 to 1 decisive game win ratio (~66.7% to ~33.3%) =>  11.1% of losing a 4-decisive games sample, 88.9% of winning or drawing.
- 70% decisive games won, 30% lost =>  8.4% of losing a 4-decisive games sample, 91.6% of winning or drawing.
- 80% decisive games won, 20% lost =>  2.7% of losing a 4-decisive games sample, 97.8% of winning or drawing.
- 90% decisive games won, 10% lost =>  0.4% of losing a 4-decisive games sample, 99.6% of winning or drawing.
From your 3-1 results, you can conclude with high confidence that the old net has a double-digit percent chance of winning a decisive game against the new net, and that's it.

Claiming that because it's your test it's enough for you to believe the new net is weaker is missing the point.

You can believe all the BS you want, but as soon as you share it in a forum thread, you open yourself to criticism.
Sure, but not to the extent of being accused of “spam” nor to the extent of having his post deleted.
By spam I mean it is "pointless".

The thread title implies SF devs may have made a mistake switching to a weaker net, but there is no worthwhile evidence for that to be found here. I, along with others, have already in a different thread (on leela kiudee settings vs at the time current settings) not too long ago explained to him that a sample size like this means literally nothing for 2 engines of similar strength.
yup, I entirely understand the point you are making, and also that it can be quite tiresome when certain stuff gets posted (for various reasons), but, the whole essence of these sorts of special interest newsgroups is interplay of data/ideas/bla bla etc. Else, for the topic on hand, an html page of stats would suffice, and that’s not quite the idea.

Also, you're making it sound like a post being removed is a big deal when it's really not.
It is a big deal. Erasing somebody’s writing is pretty serious stuff. The rules of this forum have stood for almost 25 years, I wrote them, and they surely don’t include the idea that erasing posts is ok. Actually exactly the opposite. In particular a data post about chess entities.

An attempt was made at making something useful, turns out it wasn't, that's all.
corres wrote: Sun Sep 06, 2020 9:39 am
I get every critics with kindly, but after that I was the man who stated my 100 games test is too few to prove the nn-308... is the weaker, every "critics" is no more than evil-minded attack against me. A typical example for this is the post of Terje, who mix a political site (TCF) to a kind of technical forum.
Maybe he do not like my sentence "I am not a polcorrect man. I used to say the sincere". Yes, terje, I am not a polcorrect man, and I used to say the sincere, you like it or me, or not.
Finally, I'm 'attacking' your post for being pointless, not you personally, please stop trying to frame it as an attack on your character.
I get that. Problem is more on the social plane, that people often do take critique of their data/work/opinion as personal, even when not meant. When it’s one person and a lot of individuals making the criticism, is slightly parallel to death by a thousand cuts. Each one no big deal, but the mass is something else.
User avatar
MikeB
Posts: 4889
Joined: Thu Mar 09, 2006 6:34 am
Location: Pen Argyl, Pennsylvania

Re: The new NNUE-net (nn-308..) seems being weaker

Post by MikeB »

The original post was almost like a question - of disbelief really since he used the word "seems" which by definition, make his statement less forceful.
The OP has every right to make his post in the context of how he made his made his post and did nothing wrong — period. Then it a got a little more personal and out of control. We all can choose to hit the pause button once that is recognized - there is no need to keep this going . We can and should be a little bit more civil with each other. Also once somebody makes their point - there is no need for others to choose sides to pile it on with the essentially with the same point. It is clearly time to move forward and put this thread behind us.

Thank you.
Image
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: The new NNUE-net (nn-308..) seems being weaker

Post by corres »

MikeB wrote: Sun Sep 06, 2020 6:42 pm The original post was almost like a question - of disbelief really since he used the word "seems" which by definition, make his statement less forceful.
The OP has every right to make his post in the context of how he made his made his post and did nothing wrong — period. Then it a got a little more personal and out of control. We all can choose to hit the pause button once that is recognized - there is no need to keep this going . We can and should be a little bit more civil with each other. Also once somebody makes their point - there is no need for others to choose sides to pile it on with the essentially with the same point. It is clearly time to move forward and put this thread behind us.

Thank you.
Really, Thank you, Mike.
It is pity, but lot of people read a post such a novel, and he think in, what he like to imagine in. A people who inclined to aggression search for that point in what he can wrap up and he keeps his opinion independently from the real meaning the argued text. I do not like this stupid and superfluous dispute, but sometimes
I am forced to continue it to a certain point.