A match between SF12+NNUE and Leele ver.0.26.2

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Dann Corbit, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Zenmastur
Posts: 919
Joined: Sat May 31, 2014 6:28 am

Re: A match between SF12+NNUE and Leele ver.0.26.2

Post by Zenmastur » Thu Sep 24, 2020 3:56 am

mwyoung wrote:
Thu Sep 24, 2020 2:45 am
Remember Elo is statistics, and SPRT is statistics, LOS is statistics. "And statistics is a game of probability, and it cannot be known for certain whether statistical conclusions are correct. Whenever there is uncertainty, there is the possibility of making an error. Considering this nature of statistics science, all statistical hypothesis tests have a probability of making type I and type II errors."

But the probability is low. And a tool to be used.

Again I level with with the LOS table.

Code: Select all

	
	10	50	100	200	500	1000
0	50.00%	50.00%	50.00%	50.00%	50.00%	50.00%
1	64.51%	56.77%	54.81%	53.41%	52.16%	51.53%
2	77.22%	63.34%	59.55%	56.80%	54.32%	53.06%
3	86.95%	69.55%	64.16%	60.14%	56.46%	54.58%
4	93.43%	75.24%	68.57%	63.40%	58.58%	56.09%
5	97.14%	80.32%	72.73%	66.57%	60.68%	57.60%
6	98.95%	84.71%	76.60%	69.63%	62.75%	59.10%
7	99.68%	88.40%	80.14%	72.55%	64.78%	60.58%
8	99.93%	91.41%	83.34%	75.33%	66.77%	62.05%
9	99.99%	93.80%	86.19%	77.96%	68.72%	63.50%
10	100.00%	95.64%	88.69%	80.41%	70.61%	64.93%
11		97.02%	90.85%	82.69%	72.45%	66.34%
12		98.01%	92.68%	84.80%	74.23%	67.73%
13		98.71%	94.23%	86.72%	75.95%	69.09%
14		99.19%	95.50%	88.48%	77.60%	70.43%
15		99.50%	96.54%	90.06%	79.19%	71.74%
16		99.71%	97.37%	91.48%	80.71%	73.02%
17		99.83%	98.03%	92.74%	82.16%	74.27%
18		99.91%	98.55%	93.85%	83.54%	75.49%
19		99.95%	98.94%	94.82%	84.85%	76.68%
20		99.97%	99.24%	95.67%	86.08%	77.84%
21		99.99%	99.46%	96.41%	87.25%	78.96%
22		99.99%	99.62%	97.03%	88.35%	80.05%
23		100.00%	99.74%	97.57%	89.38%	81.11%
24			99.82%	98.02%	90.34%	82.12%
25			99.88%	98.40%	91.23%	83.11%
26			99.92%	98.71%	92.07%	84.06%
27			99.95%	98.97%	92.84%	84.97%
28			99.97%	99.18%	93.55%	85.85%
29			99.98%	99.36%	94.21%	86.69%
30			99.99%	99.50%	94.81%	87.50%
31			99.99%	99.61%	95.36%	88.27%
32			100.00%	99.70%	95.86%	89.01%
33				99.77%	96.32%	89.71%
34				99.82%	96.74%	90.38%
35				99.87%	97.11%	91.02%
36				99.90%	97.45%	91.62%
37				99.93%	97.76%	92.20%
38				99.95%	98.03%	92.74%
39				99.96%	98.28%	93.26%
40				99.97%	98.50%	93.74%
41				99.98%	98.69%	94.20%
42				99.98%	98.86%	94.63%
43				99.99%	99.02%	95.04%
44				99.99%	99.15%	95.42%
45				99.99%	99.27%	95.78%
46				100.00%	99.37%	96.11%
47					99.46%	96.42%
48					99.54%	96.72%
49					99.61%	96.99%
50					99.67%	97.24%
51					99.72%	97.47%
52					99.76%	97.69%
53					99.80%	97.89%
54					99.83%	98.08%
55					99.86%	98.25%
56					99.88%	98.41%
57					99.90%	98.56%
58					99.92%	98.69%
59					99.93%	98.82%
60					99.94%	98.93%
61					99.95%	99.03%
62					99.96%	99.13%
63					99.97%	99.22%
64					99.97%	99.29%
65					99.98%	99.37%
66					99.98%	99.43%
67					99.99%	99.49%
68					99.99%	99.54%
69					99.99%	99.59%
70					99.99%	99.64%
71					99.99%	99.68%
72					100.00%	99.71%
73						99.74%
74						99.77%
75						99.80%
76						99.82%
77						99.84%
78						99.86%
79						99.88%
80						99.89%
81						99.91%
82						99.92%
83						99.93%
84						99.94%
85						99.94%
86						99.95%
87						99.96%
88						99.96%
89						99.97%
90						99.97%
91						99.98%
92						99.98%
93						99.98%
94						99.98%
95						99.99%
96						99.99%
97						99.99%
98						99.99%
99						99.99%
100						99.99%
101						99.99%
102						100.00%
I'm not sure why you would bother engaging in this argument with theses trolls. There have been many, many discussions on this board about LOS, LLR, SPRT, ELO error bars, biased and unbiased estimators etc. and there are innumerable articles on the web on each of these subjects. Trying to argue a point with someone that doesn't really care what the truth of the matter is and is too lazy to do any research on the subject is a waste of time and effort. End the discussion! Who cares what they believe. Let them believe whatever they like and get on with your life.

Just my two cents worth.

Regards,

Zenmastur
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.

mwyoung
Posts: 2727
Joined: Wed May 12, 2010 8:00 pm

Re: A match between SF12+NNUE and Leele ver.0.26.2

Post by mwyoung » Thu Sep 24, 2020 4:23 am

Zenmastur wrote:
Thu Sep 24, 2020 3:56 am
mwyoung wrote:
Thu Sep 24, 2020 2:45 am
Remember Elo is statistics, and SPRT is statistics, LOS is statistics. "And statistics is a game of probability, and it cannot be known for certain whether statistical conclusions are correct. Whenever there is uncertainty, there is the possibility of making an error. Considering this nature of statistics science, all statistical hypothesis tests have a probability of making type I and type II errors."

But the probability is low. And a tool to be used.

Again I level with with the LOS table.

Code: Select all

	
	10	50	100	200	500	1000
0	50.00%	50.00%	50.00%	50.00%	50.00%	50.00%
1	64.51%	56.77%	54.81%	53.41%	52.16%	51.53%
2	77.22%	63.34%	59.55%	56.80%	54.32%	53.06%
3	86.95%	69.55%	64.16%	60.14%	56.46%	54.58%
4	93.43%	75.24%	68.57%	63.40%	58.58%	56.09%
5	97.14%	80.32%	72.73%	66.57%	60.68%	57.60%
6	98.95%	84.71%	76.60%	69.63%	62.75%	59.10%
7	99.68%	88.40%	80.14%	72.55%	64.78%	60.58%
8	99.93%	91.41%	83.34%	75.33%	66.77%	62.05%
9	99.99%	93.80%	86.19%	77.96%	68.72%	63.50%
10	100.00%	95.64%	88.69%	80.41%	70.61%	64.93%
11		97.02%	90.85%	82.69%	72.45%	66.34%
12		98.01%	92.68%	84.80%	74.23%	67.73%
13		98.71%	94.23%	86.72%	75.95%	69.09%
14		99.19%	95.50%	88.48%	77.60%	70.43%
15		99.50%	96.54%	90.06%	79.19%	71.74%
16		99.71%	97.37%	91.48%	80.71%	73.02%
17		99.83%	98.03%	92.74%	82.16%	74.27%
18		99.91%	98.55%	93.85%	83.54%	75.49%
19		99.95%	98.94%	94.82%	84.85%	76.68%
20		99.97%	99.24%	95.67%	86.08%	77.84%
21		99.99%	99.46%	96.41%	87.25%	78.96%
22		99.99%	99.62%	97.03%	88.35%	80.05%
23		100.00%	99.74%	97.57%	89.38%	81.11%
24			99.82%	98.02%	90.34%	82.12%
25			99.88%	98.40%	91.23%	83.11%
26			99.92%	98.71%	92.07%	84.06%
27			99.95%	98.97%	92.84%	84.97%
28			99.97%	99.18%	93.55%	85.85%
29			99.98%	99.36%	94.21%	86.69%
30			99.99%	99.50%	94.81%	87.50%
31			99.99%	99.61%	95.36%	88.27%
32			100.00%	99.70%	95.86%	89.01%
33				99.77%	96.32%	89.71%
34				99.82%	96.74%	90.38%
35				99.87%	97.11%	91.02%
36				99.90%	97.45%	91.62%
37				99.93%	97.76%	92.20%
38				99.95%	98.03%	92.74%
39				99.96%	98.28%	93.26%
40				99.97%	98.50%	93.74%
41				99.98%	98.69%	94.20%
42				99.98%	98.86%	94.63%
43				99.99%	99.02%	95.04%
44				99.99%	99.15%	95.42%
45				99.99%	99.27%	95.78%
46				100.00%	99.37%	96.11%
47					99.46%	96.42%
48					99.54%	96.72%
49					99.61%	96.99%
50					99.67%	97.24%
51					99.72%	97.47%
52					99.76%	97.69%
53					99.80%	97.89%
54					99.83%	98.08%
55					99.86%	98.25%
56					99.88%	98.41%
57					99.90%	98.56%
58					99.92%	98.69%
59					99.93%	98.82%
60					99.94%	98.93%
61					99.95%	99.03%
62					99.96%	99.13%
63					99.97%	99.22%
64					99.97%	99.29%
65					99.98%	99.37%
66					99.98%	99.43%
67					99.99%	99.49%
68					99.99%	99.54%
69					99.99%	99.59%
70					99.99%	99.64%
71					99.99%	99.68%
72					100.00%	99.71%
73						99.74%
74						99.77%
75						99.80%
76						99.82%
77						99.84%
78						99.86%
79						99.88%
80						99.89%
81						99.91%
82						99.92%
83						99.93%
84						99.94%
85						99.94%
86						99.95%
87						99.96%
88						99.96%
89						99.97%
90						99.97%
91						99.98%
92						99.98%
93						99.98%
94						99.98%
95						99.99%
96						99.99%
97						99.99%
98						99.99%
99						99.99%
100						99.99%
101						99.99%
102						100.00%
I'm not sure why you would bother engaging in this argument with theses trolls. There have been many, many discussions on this board about LOS, LLR, SPRT, ELO error bars, biased and unbiased estimators etc. and there are innumerable articles on the web on each of these subjects. Trying to argue a point with someone that doesn't really care what the truth of the matter is and is too lazy to do any research on the subject is a waste of time and effort. End the discussion! Who cares what they believe. Let them believe whatever they like and get on with your life.

Just my two cents worth.

Regards,

Zenmastur
You are correct. And I have seen and read all of the arguments. I have been here at the beginning of CCC and before. But I have been testing chess engines since 1980. I got into chess engine testing when chess testing was very corrupt. And wanted people to know the truth, and how this all works. This is not for the Andrew Grants of this world. Who are trolls, but for the one who sees the truth. Even if it is in my very sarcastic style.

I honor the legacy L.K.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.

OliverBr
Posts: 669
Joined: Tue Dec 18, 2007 8:38 pm
Location: Munich, Germany
Full name: Dr. Oliver Brausch
Contact:

Re: A match between SF12+NNUE and Leele ver.0.26.2

Post by OliverBr » Thu Sep 24, 2020 8:37 pm

mwyoung wrote:
Wed Sep 23, 2020 11:08 pm
OliverBr wrote:
Wed Sep 23, 2020 9:34 pm
mwyoung wrote:
Wed Sep 23, 2020 9:12 pm
That is because you are not very bright.
And what rank is your engine, @mwyoung?
For the third time. I am a engine tester. I do not have a engine. So say my engine has a rating of 0. And that would rank it very close to your engine.

In case you ask again. This will save us both time.

For the 4th time. I am a engine tester. I do not have a engine. So say my engine has a rating of 0. And that would rank it very close to your engine.

For the 5th time. I am a engine tester. I do not have a engine. So say my engine has a rating of 0. And that would rank it very close to your engine.

For the 6th time. I am a engine tester. I do not have a engine. So say my engine has a rating of 0. And that would rank it very close to your engine..............
@mwyoung, you are one of the most boring trolls ever. And you have absolute no clue about chess programming.
Chess Engine OliThink: http://brausch.org/home/chess
OliThink GitHub:https://github.com/olithink

corres
Posts: 3657
Joined: Wed Nov 18, 2015 10:41 am
Location: hungary

Re: A match between SF12+NNUE and Leele ver.0.26.2

Post by corres » Thu Sep 24, 2020 8:53 pm

mwyoung uses such a valuing method for his tests what he wants and you make such a strong chess engine what you can make.
Still, are you not tired from this superfluous dispute??

mwyoung
Posts: 2727
Joined: Wed May 12, 2010 8:00 pm

Re: A match between SF12+NNUE and Leele ver.0.26.2

Post by mwyoung » Thu Sep 24, 2020 10:02 pm

OliverBr wrote:
Thu Sep 24, 2020 8:37 pm
mwyoung wrote:
Wed Sep 23, 2020 11:08 pm
OliverBr wrote:
Wed Sep 23, 2020 9:34 pm
mwyoung wrote:
Wed Sep 23, 2020 9:12 pm
That is because you are not very bright.
And what rank is your engine, @mwyoung?
For the third time. I am a engine tester. I do not have a engine. So say my engine has a rating of 0. And that would rank it very close to your engine.

In case you ask again. This will save us both time.

For the 4th time. I am a engine tester. I do not have a engine. So say my engine has a rating of 0. And that would rank it very close to your engine.

For the 5th time. I am a engine tester. I do not have a engine. So say my engine has a rating of 0. And that would rank it very close to your engine.

For the 6th time. I am a engine tester. I do not have a engine. So say my engine has a rating of 0. And that would rank it very close to your engine..............
@mwyoung, you are one of the most boring trolls ever. And you have absolute no clue about chess programming.
We don't believe you. As you came here trolling. And are still trolling. So you must get something out of it. And you a projecting again.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.

Alayan
Posts: 529
Joined: Tue Nov 19, 2019 7:48 pm
Full name: Alayan Feh

Re: A match between SF12+NNUE and Leele ver.0.26.2

Post by Alayan » Thu Sep 24, 2020 10:27 pm

Zenmastur wrote:
Thu Sep 24, 2020 3:56 am

I'm not sure why you would bother engaging in this argument with theses trolls.
Trolls, why ? Mwyoung called "very interesting" a 200 games sample (he might have thought that 92% draws was the interesting element, he never clarified it however) and then when Oliver commentated that the test was not sufficient to conclude which of the two engines is stronger, mwyoung jumped to say that a tester doesn't need 5000 games to conclude on superiority. This whole "a tester doesn't need anywhere as much games as engine devs" is what sparkled the arguing in this thread.

And sure, the number of games required to establish superiority depends on the strength difference, that wasn't disputed. Of course the biggest the strength difference, the less games you need to establish confidence of superiority.

mwyoung
Posts: 2727
Joined: Wed May 12, 2010 8:00 pm

Re: A match between SF12+NNUE and Leele ver.0.26.2

Post by mwyoung » Thu Sep 24, 2020 10:50 pm

Alayan wrote:
Thu Sep 24, 2020 10:27 pm
Zenmastur wrote:
Thu Sep 24, 2020 3:56 am

I'm not sure why you would bother engaging in this argument with theses trolls.
Trolls, why ? Mwyoung called "very interesting" a 200 games sample (he might have thought that 92% draws was the interesting element, he never clarified it however) and then when Oliver commentated that the test was not sufficient to conclude which of the two engines is stronger, mwyoung jumped to say that a tester doesn't need 5000 games to conclude on superiority. This whole "a tester doesn't need anywhere as much games as engine devs" is what sparkled the arguing in this thread.

And sure, the number of games required to establish superiority depends on the strength difference, that wasn't disputed. Of course the biggest the strength difference, the less games you need to establish confidence of superiority.
Just to set the record right on this thread.

"That is just false. It can be done with 10 to 20 games. What matters is the Elo difference between A vs B. An Example Engine A scores 10 wins in 10 games."

I posted this on 9/8/2020 and after this the thread was done and dormant.

But on the 9/19/2020 OliverBr commented.

mwyoung wrote: ↑Tue Sep 08, 2020 11:20 pm
That is just false. It can be done with 10 to 20 games. What matters is the Elo difference between A vs B. An Example Engine A scores 10 wins in 10 games.

by OliverBr
Sorry, I have to correct this statement, because it is absolutely wrong. I have seen test series, where an engine lead after over 1000 games and still was finally beaten.

For sure, 20 games are not enough. Statistically alone this makes no sense.

And today's comment by Alayan---"And sure, the number of games required to establish superiority depends on the strength difference, :lol: that wasn't disputed.:lol: Of course the biggest the strength difference, the less games you need to establish confidence of superiority."

My job is done here. And I have educated the trolls. And they now agree with my statement! That is nice to know.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.

Post Reply