still 43% for weaker net to get more winsOK, but these lot of test games was played not on my machine, with not those starting position what I used, not on my moving time.
The new NNUE-net (nn-308..) seems being weaker
Moderator: Ras
-
yurikvelo
- Posts: 710
- Joined: Sat Dec 06, 2014 1:53 pm
Re: The new NNUE-net (nn-308..) seems being weaker
-
corres
- Posts: 3657
- Joined: Wed Nov 18, 2015 11:41 am
- Location: hungary
-
chrisw
- Posts: 4910
- Joined: Tue Apr 03, 2012 4:28 pm
- Location: Anywhere but the Western Empire
- Full name: Christopher Whittington
Re: The new NNUE-net (nn-308..) seems being weaker
Sure, but not to the extent of being accused of “spam” nor to the extent of having his post deleted.Alayan wrote: ↑Sun Sep 06, 2020 1:51 am Choosing heads or tails and flipping a fair coin 4 times, there is a 31.25% probability to lose 0-4 or 1-3.
For an engine A having a long-term probability of winning X% of the decisive (non-draw) games against an engine B, odds of losing 0-4 or 1-3 in a random sample of 4 decisive games :From your 3-1 results, you can conclude with high confidence that the old net has a double-digit percent chance of winning a decisive game against the new net, and that's it.Code: Select all
- 30% decisive games won, 70% lost => 65.1% of losing a 4-decisive games sample, 34.9% of winning or drawing. - 40% decisive games won, 60% lost => 52.5% of losing a 4-decisive games sample, 47.5% of winning or drawing. - 45% decisive games won, 55% lost => 39.1% of losing a 4-decisive games sample, 60.9% of winning or drawing. - 50% decisive games won, 50% lost => 31.2% of losing a 4-decisive games sample, 68.8% of winning or drawing. - 55% decisive games won, 45% lost => 24.1% of losing a 4-decisive games sample, 75.9% of winning or drawing. - 60% decisive games won, 40% lost => 17.9% of losing a 4-decisive games sample, 82.1% of winning or drawing. - 2 to 1 decisive game win ratio (~66.7% to ~33.3%) => 11.1% of losing a 4-decisive games sample, 88.9% of winning or drawing. - 70% decisive games won, 30% lost => 8.4% of losing a 4-decisive games sample, 91.6% of winning or drawing. - 80% decisive games won, 20% lost => 2.7% of losing a 4-decisive games sample, 97.8% of winning or drawing. - 90% decisive games won, 10% lost => 0.4% of losing a 4-decisive games sample, 99.6% of winning or drawing.
Claiming that because it's your test it's enough for you to believe the new net is weaker is missing the point.
You can believe all the BS you want, but as soon as you share it in a forum thread, you open yourself to criticism.
Outright falsehood is a wild exaggeration. Unless he actually made the numbers up which is a) extremely unlikely and b) completely unprovable, even the word “falsehood” is inappropriate, numbers are numbers, the data is what it is, results are results.
Others just skimming through thread titles or not well-versed in statistics might give credit to an outright falsehood.
Information is not disinformation. The data is what it is. He also qualified it with size of data btw.
It's not acceptable to spread disinformation even if you didn't mean to harm.
All the comments being made, yours included, are wildly critical, unsustainable without also asserting Corres is a liar in this case, and are, in the grand scheme of things, basically a giant so-what. A very hostile and threatening thread on one person, corres, has been generated and kept going by several people. It’s not very pleasant. And for what? The status of a set of network weights?
A 10000 run says X, a 100 run says Y. This does not mean that an informational post about a 100 run should be banned.
Your stats about stats of sets are well written and useful btw.
-
corres
- Posts: 3657
- Joined: Wed Nov 18, 2015 11:41 am
- Location: hungary
Re: The new NNUE-net (nn-308..) seems being weaker
In this mode the mass is working...chrisw wrote: ↑Sun Sep 06, 2020 11:17 am ...
All the comments being made, yours included, are wildly critical, unsustainable without also asserting Corres is a liar in this case, and are, in the grand scheme of things, basically a giant so-what. A very hostile and threatening thread on one person, corres, has been generated and kept going by several people. It’s not very pleasant. And for what? The status of a set of network weights?
...
The mud slinging, even if it has no any basement is a good opportunity to conduct down the mental tension of the post writers.
-
Terje
- Posts: 347
- Joined: Tue Nov 19, 2019 4:34 am
- Location: https://github.com/TerjeKir/weiss
- Full name: Terje Kirstihagen
Re: The new NNUE-net (nn-308..) seems being weaker
By spam I mean it is "pointless".chrisw wrote: ↑Sun Sep 06, 2020 11:17 amSure, but not to the extent of being accused of “spam” nor to the extent of having his post deleted.Alayan wrote: ↑Sun Sep 06, 2020 1:51 am Choosing heads or tails and flipping a fair coin 4 times, there is a 31.25% probability to lose 0-4 or 1-3.
For an engine A having a long-term probability of winning X% of the decisive (non-draw) games against an engine B, odds of losing 0-4 or 1-3 in a random sample of 4 decisive games :From your 3-1 results, you can conclude with high confidence that the old net has a double-digit percent chance of winning a decisive game against the new net, and that's it.Code: Select all
- 30% decisive games won, 70% lost => 65.1% of losing a 4-decisive games sample, 34.9% of winning or drawing. - 40% decisive games won, 60% lost => 52.5% of losing a 4-decisive games sample, 47.5% of winning or drawing. - 45% decisive games won, 55% lost => 39.1% of losing a 4-decisive games sample, 60.9% of winning or drawing. - 50% decisive games won, 50% lost => 31.2% of losing a 4-decisive games sample, 68.8% of winning or drawing. - 55% decisive games won, 45% lost => 24.1% of losing a 4-decisive games sample, 75.9% of winning or drawing. - 60% decisive games won, 40% lost => 17.9% of losing a 4-decisive games sample, 82.1% of winning or drawing. - 2 to 1 decisive game win ratio (~66.7% to ~33.3%) => 11.1% of losing a 4-decisive games sample, 88.9% of winning or drawing. - 70% decisive games won, 30% lost => 8.4% of losing a 4-decisive games sample, 91.6% of winning or drawing. - 80% decisive games won, 20% lost => 2.7% of losing a 4-decisive games sample, 97.8% of winning or drawing. - 90% decisive games won, 10% lost => 0.4% of losing a 4-decisive games sample, 99.6% of winning or drawing.
Claiming that because it's your test it's enough for you to believe the new net is weaker is missing the point.
You can believe all the BS you want, but as soon as you share it in a forum thread, you open yourself to criticism.
The thread title implies SF devs may have made a mistake switching to a weaker net, but there is no worthwhile evidence for that to be found here. I, along with others, have already in a different thread (on leela kiudee settings vs at the time current settings) not too long ago explained to him that a sample size like this means literally nothing for 2 engines of similar strength.
Also, you're making it sound like a post being removed is a big deal when it's really not. An attempt was made at making something useful, turns out it wasn't, that's all.
Finally, I'm 'attacking' your post for being pointless, not you personally, please stop trying to frame it as an attack on your character.corres wrote: ↑Sun Sep 06, 2020 9:39 am
I get every critics with kindly, but after that I was the man who stated my 100 games test is too few to prove the nn-308... is the weaker, every "critics" is no more than evil-minded attack against me. A typical example for this is the post of Terje, who mix a political site (TCF) to a kind of technical forum.
Maybe he do not like my sentence "I am not a polcorrect man. I used to say the sincere". Yes, terje, I am not a polcorrect man, and I used to say the sincere, you like it or me, or not.
-
chrisw
- Posts: 4910
- Joined: Tue Apr 03, 2012 4:28 pm
- Location: Anywhere but the Western Empire
- Full name: Christopher Whittington
Re: The new NNUE-net (nn-308..) seems being weaker
yup, I entirely understand the point you are making, and also that it can be quite tiresome when certain stuff gets posted (for various reasons), but, the whole essence of these sorts of special interest newsgroups is interplay of data/ideas/bla bla etc. Else, for the topic on hand, an html page of stats would suffice, and that’s not quite the idea.Terje wrote: ↑Sun Sep 06, 2020 3:52 pmBy spam I mean it is "pointless".chrisw wrote: ↑Sun Sep 06, 2020 11:17 amSure, but not to the extent of being accused of “spam” nor to the extent of having his post deleted.Alayan wrote: ↑Sun Sep 06, 2020 1:51 am Choosing heads or tails and flipping a fair coin 4 times, there is a 31.25% probability to lose 0-4 or 1-3.
For an engine A having a long-term probability of winning X% of the decisive (non-draw) games against an engine B, odds of losing 0-4 or 1-3 in a random sample of 4 decisive games :From your 3-1 results, you can conclude with high confidence that the old net has a double-digit percent chance of winning a decisive game against the new net, and that's it.Code: Select all
- 30% decisive games won, 70% lost => 65.1% of losing a 4-decisive games sample, 34.9% of winning or drawing. - 40% decisive games won, 60% lost => 52.5% of losing a 4-decisive games sample, 47.5% of winning or drawing. - 45% decisive games won, 55% lost => 39.1% of losing a 4-decisive games sample, 60.9% of winning or drawing. - 50% decisive games won, 50% lost => 31.2% of losing a 4-decisive games sample, 68.8% of winning or drawing. - 55% decisive games won, 45% lost => 24.1% of losing a 4-decisive games sample, 75.9% of winning or drawing. - 60% decisive games won, 40% lost => 17.9% of losing a 4-decisive games sample, 82.1% of winning or drawing. - 2 to 1 decisive game win ratio (~66.7% to ~33.3%) => 11.1% of losing a 4-decisive games sample, 88.9% of winning or drawing. - 70% decisive games won, 30% lost => 8.4% of losing a 4-decisive games sample, 91.6% of winning or drawing. - 80% decisive games won, 20% lost => 2.7% of losing a 4-decisive games sample, 97.8% of winning or drawing. - 90% decisive games won, 10% lost => 0.4% of losing a 4-decisive games sample, 99.6% of winning or drawing.
Claiming that because it's your test it's enough for you to believe the new net is weaker is missing the point.
You can believe all the BS you want, but as soon as you share it in a forum thread, you open yourself to criticism.
The thread title implies SF devs may have made a mistake switching to a weaker net, but there is no worthwhile evidence for that to be found here. I, along with others, have already in a different thread (on leela kiudee settings vs at the time current settings) not too long ago explained to him that a sample size like this means literally nothing for 2 engines of similar strength.
It is a big deal. Erasing somebody’s writing is pretty serious stuff. The rules of this forum have stood for almost 25 years, I wrote them, and they surely don’t include the idea that erasing posts is ok. Actually exactly the opposite. In particular a data post about chess entities.
Also, you're making it sound like a post being removed is a big deal when it's really not.
I get that. Problem is more on the social plane, that people often do take critique of their data/work/opinion as personal, even when not meant. When it’s one person and a lot of individuals making the criticism, is slightly parallel to death by a thousand cuts. Each one no big deal, but the mass is something else.
An attempt was made at making something useful, turns out it wasn't, that's all.
Finally, I'm 'attacking' your post for being pointless, not you personally, please stop trying to frame it as an attack on your character.corres wrote: ↑Sun Sep 06, 2020 9:39 am
I get every critics with kindly, but after that I was the man who stated my 100 games test is too few to prove the nn-308... is the weaker, every "critics" is no more than evil-minded attack against me. A typical example for this is the post of Terje, who mix a political site (TCF) to a kind of technical forum.
Maybe he do not like my sentence "I am not a polcorrect man. I used to say the sincere". Yes, terje, I am not a polcorrect man, and I used to say the sincere, you like it or me, or not.
-
MikeB
- Posts: 4889
- Joined: Thu Mar 09, 2006 6:34 am
- Location: Pen Argyl, Pennsylvania
Re: The new NNUE-net (nn-308..) seems being weaker
The original post was almost like a question - of disbelief really since he used the word "seems" which by definition, make his statement less forceful.
The OP has every right to make his post in the context of how he made his made his post and did nothing wrong — period. Then it a got a little more personal and out of control. We all can choose to hit the pause button once that is recognized - there is no need to keep this going . We can and should be a little bit more civil with each other. Also once somebody makes their point - there is no need for others to choose sides to pile it on with the essentially with the same point. It is clearly time to move forward and put this thread behind us.
Thank you.
The OP has every right to make his post in the context of how he made his made his post and did nothing wrong — period. Then it a got a little more personal and out of control. We all can choose to hit the pause button once that is recognized - there is no need to keep this going . We can and should be a little bit more civil with each other. Also once somebody makes their point - there is no need for others to choose sides to pile it on with the essentially with the same point. It is clearly time to move forward and put this thread behind us.
Thank you.
-
corres
- Posts: 3657
- Joined: Wed Nov 18, 2015 11:41 am
- Location: hungary
Re: The new NNUE-net (nn-308..) seems being weaker
Really, Thank you, Mike.MikeB wrote: ↑Sun Sep 06, 2020 6:42 pm The original post was almost like a question - of disbelief really since he used the word "seems" which by definition, make his statement less forceful.
The OP has every right to make his post in the context of how he made his made his post and did nothing wrong — period. Then it a got a little more personal and out of control. We all can choose to hit the pause button once that is recognized - there is no need to keep this going . We can and should be a little bit more civil with each other. Also once somebody makes their point - there is no need for others to choose sides to pile it on with the essentially with the same point. It is clearly time to move forward and put this thread behind us.
Thank you.
It is pity, but lot of people read a post such a novel, and he think in, what he like to imagine in. A people who inclined to aggression search for that point in what he can wrap up and he keeps his opinion independently from the real meaning the argued text. I do not like this stupid and superfluous dispute, but sometimes
I am forced to continue it to a certain point.
