The new NNUE-net (nn-308..) seems being weaker

yurikvelo · Post by **yurikvelo** » Sun Sep 06, 2020 10:26 am

OK, but these lot of test games was played not on my machine, with not those starting position what I used, not on my moving time.

still 43% for weaker net to get more wins

corres · Post by **corres** » Sun Sep 06, 2020 10:32 am

yurikvelo wrote: ↑Sun Sep 06, 2020 10:26 am
OK, but these lot of test games was played not on my machine, with not those starting position what I used, not on my moving time.
still 43% for weaker net to get more wins

So you use it with good health...

chrisw · Post by **chrisw** » Sun Sep 06, 2020 11:17 am

Alayan wrote: ↑Sun Sep 06, 2020 1:51 am Choosing heads or tails and flipping a fair coin 4 times, there is a 31.25% probability to lose 0-4 or 1-3.

For an engine A having a long-term probability of winning X% of the decisive (non-draw) games against an engine B, odds of losing 0-4 or 1-3 in a random sample of 4 decisive games :
Code: Select all
- 30% decisive games won, 70% lost => 65.1% of losing a 4-decisive games sample, 34.9% of winning or drawing.
- 40% decisive games won, 60% lost => 52.5% of losing a 4-decisive games sample, 47.5% of winning or drawing.
- 45% decisive games won, 55% lost => 39.1% of losing a 4-decisive games sample, 60.9% of winning or drawing.
- 50% decisive games won, 50% lost => 31.2% of losing a 4-decisive games sample, 68.8% of winning or drawing.
- 55% decisive games won, 45% lost => 24.1% of losing a 4-decisive games sample, 75.9% of winning or drawing.
- 60% decisive games won, 40% lost => 17.9% of losing a 4-decisive games sample, 82.1% of winning or drawing.
- 2 to 1 decisive game win ratio (~66.7% to ~33.3%) =>  11.1% of losing a 4-decisive games sample, 88.9% of winning or drawing.
- 70% decisive games won, 30% lost =>  8.4% of losing a 4-decisive games sample, 91.6% of winning or drawing.
- 80% decisive games won, 20% lost =>  2.7% of losing a 4-decisive games sample, 97.8% of winning or drawing.
- 90% decisive games won, 10% lost =>  0.4% of losing a 4-decisive games sample, 99.6% of winning or drawing.
From your 3-1 results, you can conclude with high confidence that the old net has a double-digit percent chance of winning a decisive game against the new net, and that's it.

Claiming that because it's your test it's enough for you to believe the new net is weaker is missing the point.

You can believe all the BS you want, but as soon as you share it in a forum thread, you open yourself to criticism.

Sure, but not to the extent of being accused of “spam” nor to the extent of having his post deleted.

Others just skimming through thread titles or not well-versed in statistics might give credit to an outright falsehood.

Outright falsehood is a wild exaggeration. Unless he actually made the numbers up which is a) extremely unlikely and b) completely unprovable, even the word “falsehood” is inappropriate, numbers are numbers, the data is what it is, results are results.

It's not acceptable to spread disinformation even if you didn't mean to harm.

Information is not disinformation. The data is what it is. He also qualified it with size of data btw.

All the comments being made, yours included, are wildly critical, unsustainable without also asserting Corres is a liar in this case, and are, in the grand scheme of things, basically a giant so-what. A very hostile and threatening thread on one person, corres, has been generated and kept going by several people. It’s not very pleasant. And for what? The status of a set of network weights?

A 10000 run says X, a 100 run says Y. This does not mean that an informational post about a 100 run should be banned.

Your stats about stats of sets are well written and useful btw.

corres · Post by **corres** » Sun Sep 06, 2020 12:45 pm

chrisw wrote: ↑Sun Sep 06, 2020 11:17 am ...
All the comments being made, yours included, are wildly critical, unsustainable without also asserting Corres is a liar in this case, and are, in the grand scheme of things, basically a giant so-what. A very hostile and threatening thread on one person, corres, has been generated and kept going by several people. It’s not very pleasant. And for what? The status of a set of network weights?
...

In this mode the mass is working...
The mud slinging, even if it has no any basement is a good opportunity to conduct down the mental tension of the post writers.

Terje · Post by **Terje** » Sun Sep 06, 2020 3:52 pm

chrisw wrote: ↑Sun Sep 06, 2020 11:17 am
Alayan wrote: ↑Sun Sep 06, 2020 1:51 am Choosing heads or tails and flipping a fair coin 4 times, there is a 31.25% probability to lose 0-4 or 1-3.

For an engine A having a long-term probability of winning X% of the decisive (non-draw) games against an engine B, odds of losing 0-4 or 1-3 in a random sample of 4 decisive games :
Code: Select all
- 30% decisive games won, 70% lost => 65.1% of losing a 4-decisive games sample, 34.9% of winning or drawing.
- 40% decisive games won, 60% lost => 52.5% of losing a 4-decisive games sample, 47.5% of winning or drawing.
- 45% decisive games won, 55% lost => 39.1% of losing a 4-decisive games sample, 60.9% of winning or drawing.
- 50% decisive games won, 50% lost => 31.2% of losing a 4-decisive games sample, 68.8% of winning or drawing.
- 55% decisive games won, 45% lost => 24.1% of losing a 4-decisive games sample, 75.9% of winning or drawing.
- 60% decisive games won, 40% lost => 17.9% of losing a 4-decisive games sample, 82.1% of winning or drawing.
- 2 to 1 decisive game win ratio (~66.7% to ~33.3%) =>  11.1% of losing a 4-decisive games sample, 88.9% of winning or drawing.
- 70% decisive games won, 30% lost =>  8.4% of losing a 4-decisive games sample, 91.6% of winning or drawing.
- 80% decisive games won, 20% lost =>  2.7% of losing a 4-decisive games sample, 97.8% of winning or drawing.
- 90% decisive games won, 10% lost =>  0.4% of losing a 4-decisive games sample, 99.6% of winning or drawing.
From your 3-1 results, you can conclude with high confidence that the old net has a double-digit percent chance of winning a decisive game against the new net, and that's it.

Claiming that because it's your test it's enough for you to believe the new net is weaker is missing the point.

You can believe all the BS you want, but as soon as you share it in a forum thread, you open yourself to criticism.
Sure, but not to the extent of being accused of “spam” nor to the extent of having his post deleted.

By spam I mean it is "pointless".

The thread title implies SF devs may have made a mistake switching to a weaker net, but there is no worthwhile evidence for that to be found here. I, along with others, have already in a different thread (on leela kiudee settings vs at the time current settings) not too long ago explained to him that a sample size like this means literally nothing for 2 engines of similar strength.

Also, you're making it sound like a post being removed is a big deal when it's really not. An attempt was made at making something useful, turns out it wasn't, that's all.

corres wrote: ↑Sun Sep 06, 2020 9:39 am
I get every critics with kindly, but after that I was the man who stated my 100 games test is too few to prove the nn-308... is the weaker, every "critics" is no more than evil-minded attack against me. A typical example for this is the post of Terje, who mix a political site (TCF) to a kind of technical forum.
Maybe he do not like my sentence "I am not a polcorrect man. I used to say the sincere". Yes, terje, I am not a polcorrect man, and I used to say the sincere, you like it or me, or not.

Finally, I'm 'attacking' your post for being pointless, not you personally, please stop trying to frame it as an attack on your character.

chrisw · Post by **chrisw** » Sun Sep 06, 2020 4:36 pm

Terje wrote: ↑Sun Sep 06, 2020 3:52 pm
chrisw wrote: ↑Sun Sep 06, 2020 11:17 am
Alayan wrote: ↑Sun Sep 06, 2020 1:51 am Choosing heads or tails and flipping a fair coin 4 times, there is a 31.25% probability to lose 0-4 or 1-3.

For an engine A having a long-term probability of winning X% of the decisive (non-draw) games against an engine B, odds of losing 0-4 or 1-3 in a random sample of 4 decisive games :
Code: Select all
- 30% decisive games won, 70% lost => 65.1% of losing a 4-decisive games sample, 34.9% of winning or drawing.
- 40% decisive games won, 60% lost => 52.5% of losing a 4-decisive games sample, 47.5% of winning or drawing.
- 45% decisive games won, 55% lost => 39.1% of losing a 4-decisive games sample, 60.9% of winning or drawing.
- 50% decisive games won, 50% lost => 31.2% of losing a 4-decisive games sample, 68.8% of winning or drawing.
- 55% decisive games won, 45% lost => 24.1% of losing a 4-decisive games sample, 75.9% of winning or drawing.
- 60% decisive games won, 40% lost => 17.9% of losing a 4-decisive games sample, 82.1% of winning or drawing.
- 2 to 1 decisive game win ratio (~66.7% to ~33.3%) =>  11.1% of losing a 4-decisive games sample, 88.9% of winning or drawing.
- 70% decisive games won, 30% lost =>  8.4% of losing a 4-decisive games sample, 91.6% of winning or drawing.
- 80% decisive games won, 20% lost =>  2.7% of losing a 4-decisive games sample, 97.8% of winning or drawing.
- 90% decisive games won, 10% lost =>  0.4% of losing a 4-decisive games sample, 99.6% of winning or drawing.
From your 3-1 results, you can conclude with high confidence that the old net has a double-digit percent chance of winning a decisive game against the new net, and that's it.

Claiming that because it's your test it's enough for you to believe the new net is weaker is missing the point.

You can believe all the BS you want, but as soon as you share it in a forum thread, you open yourself to criticism.
Sure, but not to the extent of being accused of “spam” nor to the extent of having his post deleted.
By spam I mean it is "pointless".

The thread title implies SF devs may have made a mistake switching to a weaker net, but there is no worthwhile evidence for that to be found here. I, along with others, have already in a different thread (on leela kiudee settings vs at the time current settings) not too long ago explained to him that a sample size like this means literally nothing for 2 engines of similar strength.

yup, I entirely understand the point you are making, and also that it can be quite tiresome when certain stuff gets posted (for various reasons), but, the whole essence of these sorts of special interest newsgroups is interplay of data/ideas/bla bla etc. Else, for the topic on hand, an html page of stats would suffice, and that’s not quite the idea.

Also, you're making it sound like a post being removed is a big deal when it's really not.

It is a big deal. Erasing somebody’s writing is pretty serious stuff. The rules of this forum have stood for almost 25 years, I wrote them, and they surely don’t include the idea that erasing posts is ok. Actually exactly the opposite. In particular a data post about chess entities.

An attempt was made at making something useful, turns out it wasn't, that's all.

corres wrote: ↑Sun Sep 06, 2020 9:39 am
I get every critics with kindly, but after that I was the man who stated my 100 games test is too few to prove the nn-308... is the weaker, every "critics" is no more than evil-minded attack against me. A typical example for this is the post of Terje, who mix a political site (TCF) to a kind of technical forum.
Maybe he do not like my sentence "I am not a polcorrect man. I used to say the sincere". Yes, terje, I am not a polcorrect man, and I used to say the sincere, you like it or me, or not.
Finally, I'm 'attacking' your post for being pointless, not you personally, please stop trying to frame it as an attack on your character.

I get that. Problem is more on the social plane, that people often do take critique of their data/work/opinion as personal, even when not meant. When it’s one person and a lot of individuals making the criticism, is slightly parallel to death by a thousand cuts. Each one no big deal, but the mass is something else.

MikeB · Post by **MikeB** » Sun Sep 06, 2020 6:42 pm

The original post was almost like a question - of disbelief really since he used the word "seems" which by definition, make his statement less forceful.
The OP has every right to make his post in the context of how he made his made his post and did nothing wrong — period. Then it a got a little more personal and out of control. We all can choose to hit the pause button once that is recognized - there is no need to keep this going . We can and should be a little bit more civil with each other. Also once somebody makes their point - there is no need for others to choose sides to pile it on with the essentially with the same point. It is clearly time to move forward and put this thread behind us.

Thank you.

corres · Post by **corres** » Sun Sep 06, 2020 10:09 pm

MikeB wrote: ↑Sun Sep 06, 2020 6:42 pm The original post was almost like a question - of disbelief really since he used the word "seems" which by definition, make his statement less forceful.
The OP has every right to make his post in the context of how he made his made his post and did nothing wrong — period. Then it a got a little more personal and out of control. We all can choose to hit the pause button once that is recognized - there is no need to keep this going . We can and should be a little bit more civil with each other. Also once somebody makes their point - there is no need for others to choose sides to pile it on with the essentially with the same point. It is clearly time to move forward and put this thread behind us.

Thank you.

Really, Thank you, Mike.
It is pity, but lot of people read a post such a novel, and he think in, what he like to imagine in. A people who inclined to aggression search for that point in what he can wrap up and he keeps his opinion independently from the real meaning the argued text. I do not like this stupid and superfluous dispute, but sometimes
I am forced to continue it to a certain point.

The new NNUE-net (nn-308..) seems being weaker

Re: The new NNUE-net (nn-308..) seems being weaker

Re: The new NNUE-net (nn-308..) seems being weaker

Re: The new NNUE-net (nn-308..) seems being weaker

Re: The new NNUE-net (nn-308..) seems being weaker

Re: The new NNUE-net (nn-308..) seems being weaker

Re: The new NNUE-net (nn-308..) seems being weaker

Re: The new NNUE-net (nn-308..) seems being weaker

Re: The new NNUE-net (nn-308..) seems being weaker