A match between SF12+NNUE and Leele ver.0.26.2

Zenmastur · Post by **Zenmastur** » Thu Sep 24, 2020 5:56 am

mwyoung wrote: ↑Thu Sep 24, 2020 4:45 am Remember Elo is statistics, and SPRT is statistics, LOS is statistics. "And statistics is a game of probability, and it cannot be known for certain whether statistical conclusions are correct. Whenever there is uncertainty, there is the possibility of making an error. Considering this nature of statistics science, all statistical hypothesis tests have a probability of making type I and type II errors."

But the probability is low. And a tool to be used.

Again I level with with the LOS table.

Code: Select all

	
	10	50	100	200	500	1000
0	50.00%	50.00%	50.00%	50.00%	50.00%	50.00%
1	64.51%	56.77%	54.81%	53.41%	52.16%	51.53%
2	77.22%	63.34%	59.55%	56.80%	54.32%	53.06%
3	86.95%	69.55%	64.16%	60.14%	56.46%	54.58%
4	93.43%	75.24%	68.57%	63.40%	58.58%	56.09%
5	97.14%	80.32%	72.73%	66.57%	60.68%	57.60%
6	98.95%	84.71%	76.60%	69.63%	62.75%	59.10%
7	99.68%	88.40%	80.14%	72.55%	64.78%	60.58%
8	99.93%	91.41%	83.34%	75.33%	66.77%	62.05%
9	99.99%	93.80%	86.19%	77.96%	68.72%	63.50%
10	100.00%	95.64%	88.69%	80.41%	70.61%	64.93%
11		97.02%	90.85%	82.69%	72.45%	66.34%
12		98.01%	92.68%	84.80%	74.23%	67.73%
13		98.71%	94.23%	86.72%	75.95%	69.09%
14		99.19%	95.50%	88.48%	77.60%	70.43%
15		99.50%	96.54%	90.06%	79.19%	71.74%
16		99.71%	97.37%	91.48%	80.71%	73.02%
17		99.83%	98.03%	92.74%	82.16%	74.27%
18		99.91%	98.55%	93.85%	83.54%	75.49%
19		99.95%	98.94%	94.82%	84.85%	76.68%
20		99.97%	99.24%	95.67%	86.08%	77.84%
21		99.99%	99.46%	96.41%	87.25%	78.96%
22		99.99%	99.62%	97.03%	88.35%	80.05%
23		100.00%	99.74%	97.57%	89.38%	81.11%
24			99.82%	98.02%	90.34%	82.12%
25			99.88%	98.40%	91.23%	83.11%
26			99.92%	98.71%	92.07%	84.06%
27			99.95%	98.97%	92.84%	84.97%
28			99.97%	99.18%	93.55%	85.85%
29			99.98%	99.36%	94.21%	86.69%
30			99.99%	99.50%	94.81%	87.50%
31			99.99%	99.61%	95.36%	88.27%
32			100.00%	99.70%	95.86%	89.01%
33				99.77%	96.32%	89.71%
34				99.82%	96.74%	90.38%
35				99.87%	97.11%	91.02%
36				99.90%	97.45%	91.62%
37				99.93%	97.76%	92.20%
38				99.95%	98.03%	92.74%
39				99.96%	98.28%	93.26%
40				99.97%	98.50%	93.74%
41				99.98%	98.69%	94.20%
42				99.98%	98.86%	94.63%
43				99.99%	99.02%	95.04%
44				99.99%	99.15%	95.42%
45				99.99%	99.27%	95.78%
46				100.00%	99.37%	96.11%
47					99.46%	96.42%
48					99.54%	96.72%
49					99.61%	96.99%
50					99.67%	97.24%
51					99.72%	97.47%
52					99.76%	97.69%
53					99.80%	97.89%
54					99.83%	98.08%
55					99.86%	98.25%
56					99.88%	98.41%
57					99.90%	98.56%
58					99.92%	98.69%
59					99.93%	98.82%
60					99.94%	98.93%
61					99.95%	99.03%
62					99.96%	99.13%
63					99.97%	99.22%
64					99.97%	99.29%
65					99.98%	99.37%
66					99.98%	99.43%
67					99.99%	99.49%
68					99.99%	99.54%
69					99.99%	99.59%
70					99.99%	99.64%
71					99.99%	99.68%
72					100.00%	99.71%
73						99.74%
74						99.77%
75						99.80%
76						99.82%
77						99.84%
78						99.86%
79						99.88%
80						99.89%
81						99.91%
82						99.92%
83						99.93%
84						99.94%
85						99.94%
86						99.95%
87						99.96%
88						99.96%
89						99.97%
90						99.97%
91						99.98%
92						99.98%
93						99.98%
94						99.98%
95						99.99%
96						99.99%
97						99.99%
98						99.99%
99						99.99%
100						99.99%
101						99.99%
102						100.00%

I'm not sure why you would bother engaging in this argument with theses trolls. There have been many, many discussions on this board about LOS, LLR, SPRT, ELO error bars, biased and unbiased estimators etc. and there are innumerable articles on the web on each of these subjects. Trying to argue a point with someone that doesn't really care what the truth of the matter is and is too lazy to do any research on the subject is a waste of time and effort. End the discussion! Who cares what they believe. Let them believe whatever they like and get on with your life.

Just my two cents worth.

Regards,

Zenmastur

mwyoung · Post by **mwyoung** » Thu Sep 24, 2020 6:23 am

Zenmastur wrote: ↑Thu Sep 24, 2020 5:56 am
mwyoung wrote: ↑Thu Sep 24, 2020 4:45 am Remember Elo is statistics, and SPRT is statistics, LOS is statistics. "And statistics is a game of probability, and it cannot be known for certain whether statistical conclusions are correct. Whenever there is uncertainty, there is the possibility of making an error. Considering this nature of statistics science, all statistical hypothesis tests have a probability of making type I and type II errors."

But the probability is low. And a tool to be used.

Again I level with with the LOS table.
Code: Select all
	
	10	50	100	200	500	1000
0	50.00%	50.00%	50.00%	50.00%	50.00%	50.00%
1	64.51%	56.77%	54.81%	53.41%	52.16%	51.53%
2	77.22%	63.34%	59.55%	56.80%	54.32%	53.06%
3	86.95%	69.55%	64.16%	60.14%	56.46%	54.58%
4	93.43%	75.24%	68.57%	63.40%	58.58%	56.09%
5	97.14%	80.32%	72.73%	66.57%	60.68%	57.60%
6	98.95%	84.71%	76.60%	69.63%	62.75%	59.10%
7	99.68%	88.40%	80.14%	72.55%	64.78%	60.58%
8	99.93%	91.41%	83.34%	75.33%	66.77%	62.05%
9	99.99%	93.80%	86.19%	77.96%	68.72%	63.50%
10	100.00%	95.64%	88.69%	80.41%	70.61%	64.93%
11		97.02%	90.85%	82.69%	72.45%	66.34%
12		98.01%	92.68%	84.80%	74.23%	67.73%
13		98.71%	94.23%	86.72%	75.95%	69.09%
14		99.19%	95.50%	88.48%	77.60%	70.43%
15		99.50%	96.54%	90.06%	79.19%	71.74%
16		99.71%	97.37%	91.48%	80.71%	73.02%
17		99.83%	98.03%	92.74%	82.16%	74.27%
18		99.91%	98.55%	93.85%	83.54%	75.49%
19		99.95%	98.94%	94.82%	84.85%	76.68%
20		99.97%	99.24%	95.67%	86.08%	77.84%
21		99.99%	99.46%	96.41%	87.25%	78.96%
22		99.99%	99.62%	97.03%	88.35%	80.05%
23		100.00%	99.74%	97.57%	89.38%	81.11%
24			99.82%	98.02%	90.34%	82.12%
25			99.88%	98.40%	91.23%	83.11%
26			99.92%	98.71%	92.07%	84.06%
27			99.95%	98.97%	92.84%	84.97%
28			99.97%	99.18%	93.55%	85.85%
29			99.98%	99.36%	94.21%	86.69%
30			99.99%	99.50%	94.81%	87.50%
31			99.99%	99.61%	95.36%	88.27%
32			100.00%	99.70%	95.86%	89.01%
33				99.77%	96.32%	89.71%
34				99.82%	96.74%	90.38%
35				99.87%	97.11%	91.02%
36				99.90%	97.45%	91.62%
37				99.93%	97.76%	92.20%
38				99.95%	98.03%	92.74%
39				99.96%	98.28%	93.26%
40				99.97%	98.50%	93.74%
41				99.98%	98.69%	94.20%
42				99.98%	98.86%	94.63%
43				99.99%	99.02%	95.04%
44				99.99%	99.15%	95.42%
45				99.99%	99.27%	95.78%
46				100.00%	99.37%	96.11%
47					99.46%	96.42%
48					99.54%	96.72%
49					99.61%	96.99%
50					99.67%	97.24%
51					99.72%	97.47%
52					99.76%	97.69%
53					99.80%	97.89%
54					99.83%	98.08%
55					99.86%	98.25%
56					99.88%	98.41%
57					99.90%	98.56%
58					99.92%	98.69%
59					99.93%	98.82%
60					99.94%	98.93%
61					99.95%	99.03%
62					99.96%	99.13%
63					99.97%	99.22%
64					99.97%	99.29%
65					99.98%	99.37%
66					99.98%	99.43%
67					99.99%	99.49%
68					99.99%	99.54%
69					99.99%	99.59%
70					99.99%	99.64%
71					99.99%	99.68%
72					100.00%	99.71%
73						99.74%
74						99.77%
75						99.80%
76						99.82%
77						99.84%
78						99.86%
79						99.88%
80						99.89%
81						99.91%
82						99.92%
83						99.93%
84						99.94%
85						99.94%
86						99.95%
87						99.96%
88						99.96%
89						99.97%
90						99.97%
91						99.98%
92						99.98%
93						99.98%
94						99.98%
95						99.99%
96						99.99%
97						99.99%
98						99.99%
99						99.99%
100						99.99%
101						99.99%
102						100.00%
I'm not sure why you would bother engaging in this argument with theses trolls. There have been many, many discussions on this board about LOS, LLR, SPRT, ELO error bars, biased and unbiased estimators etc. and there are innumerable articles on the web on each of these subjects. Trying to argue a point with someone that doesn't really care what the truth of the matter is and is too lazy to do any research on the subject is a waste of time and effort. End the discussion! Who cares what they believe. Let them believe whatever they like and get on with your life.

Just my two cents worth.

Regards,

Zenmastur

You are correct. And I have seen and read all of the arguments. I have been here at the beginning of CCC and before. But I have been testing chess engines since 1980. I got into chess engine testing when chess testing was very corrupt. And wanted people to know the truth, and how this all works. This is not for the Andrew Grants of this world. Who are trolls, but for the one who sees the truth. Even if it is in my very sarcastic style.

I honor the legacy L.K.

OliverBr · Post by **OliverBr** » Thu Sep 24, 2020 10:37 pm

mwyoung wrote: ↑Thu Sep 24, 2020 1:08 am
OliverBr wrote: ↑Wed Sep 23, 2020 11:34 pm
mwyoung wrote: ↑Wed Sep 23, 2020 11:12 pm That is because you are not very bright.
And what rank is your engine, @mwyoung?
For the third time. I am a engine tester. I do not have a engine. So say my engine has a rating of 0. And that would rank it very close to your engine.

In case you ask again. This will save us both time.

For the 4th time. I am a engine tester. I do not have a engine. So say my engine has a rating of 0. And that would rank it very close to your engine.

For the 5th time. I am a engine tester. I do not have a engine. So say my engine has a rating of 0. And that would rank it very close to your engine.

For the 6th time. I am a engine tester. I do not have a engine. So say my engine has a rating of 0. And that would rank it very close to your engine..............

@mwyoung, you are one of the most boring trolls ever. And you have absolute no clue about chess programming.

corres · Post by **corres** » Thu Sep 24, 2020 10:53 pm

mwyoung uses such a valuing method for his tests what he wants and you make such a strong chess engine what you can make.
Still, are you not tired from this superfluous dispute??

mwyoung · Post by **mwyoung** » Fri Sep 25, 2020 12:02 am

OliverBr wrote: ↑Thu Sep 24, 2020 10:37 pm
mwyoung wrote: ↑Thu Sep 24, 2020 1:08 am
OliverBr wrote: ↑Wed Sep 23, 2020 11:34 pm
mwyoung wrote: ↑Wed Sep 23, 2020 11:12 pm That is because you are not very bright.
And what rank is your engine, @mwyoung?
For the third time. I am a engine tester. I do not have a engine. So say my engine has a rating of 0. And that would rank it very close to your engine.

In case you ask again. This will save us both time.

For the 4th time. I am a engine tester. I do not have a engine. So say my engine has a rating of 0. And that would rank it very close to your engine.

For the 5th time. I am a engine tester. I do not have a engine. So say my engine has a rating of 0. And that would rank it very close to your engine.

For the 6th time. I am a engine tester. I do not have a engine. So say my engine has a rating of 0. And that would rank it very close to your engine..............
@mwyoung, you are one of the most boring trolls ever. And you have absolute no clue about chess programming.

We don't believe you. As you came here trolling. And are still trolling. So you must get something out of it. And you a projecting again.

Alayan · Post by **Alayan** » Fri Sep 25, 2020 12:27 am

Zenmastur wrote: ↑Thu Sep 24, 2020 5:56 am
I'm not sure why you would bother engaging in this argument with theses trolls.

Trolls, why ? Mwyoung called "very interesting" a 200 games sample (he might have thought that 92% draws was the interesting element, he never clarified it however) and then when Oliver commentated that the test was not sufficient to conclude which of the two engines is stronger, mwyoung jumped to say that a tester doesn't need 5000 games to conclude on superiority. This whole "a tester doesn't need anywhere as much games as engine devs" is what sparkled the arguing in this thread.

And sure, the number of games required to establish superiority depends on the strength difference, that wasn't disputed. Of course the biggest the strength difference, the less games you need to establish confidence of superiority.

mwyoung · Post by **mwyoung** » Fri Sep 25, 2020 12:50 am

Alayan wrote: ↑Fri Sep 25, 2020 12:27 am
Zenmastur wrote: ↑Thu Sep 24, 2020 5:56 am
I'm not sure why you would bother engaging in this argument with theses trolls.
Trolls, why ? Mwyoung called "very interesting" a 200 games sample (he might have thought that 92% draws was the interesting element, he never clarified it however) and then when Oliver commentated that the test was not sufficient to conclude which of the two engines is stronger, mwyoung jumped to say that a tester doesn't need 5000 games to conclude on superiority. This whole "a tester doesn't need anywhere as much games as engine devs" is what sparkled the arguing in this thread.

And sure, the number of games required to establish superiority depends on the strength difference, that wasn't disputed. Of course the biggest the strength difference, the less games you need to establish confidence of superiority.

Just to set the record right on this thread.

"That is just false. It can be done with 10 to 20 games. What matters is the Elo difference between A vs B. An Example Engine A scores 10 wins in 10 games."

I posted this on 9/8/2020 and after this the thread was done and dormant.

But on the 9/19/2020 OliverBr commented.

mwyoung wrote: ↑Tue Sep 08, 2020 11:20 pm
That is just false. It can be done with 10 to 20 games. What matters is the Elo difference between A vs B. An Example Engine A scores 10 wins in 10 games.

by OliverBr
Sorry, I have to correct this statement, because it is absolutely wrong. I have seen test series, where an engine lead after over 1000 games and still was finally beaten.

For sure, 20 games are not enough. Statistically alone this makes no sense.

And today's comment by Alayan---"And sure, the number of games required to establish superiority depends on the strength difference,

that wasn't disputed.

Of course the biggest the strength difference, the less games you need to establish confidence of superiority."

My job is done here. And I have educated the trolls. And they now agree with my statement! That is nice to know.

A match between SF12+NNUE and Leele ver.0.26.2

Re: A match between SF12+NNUE and Leele ver.0.26.2

Re: A match between SF12+NNUE and Leele ver.0.26.2

Re: A match between SF12+NNUE and Leele ver.0.26.2

Re: A match between SF12+NNUE and Leele ver.0.26.2

Re: A match between SF12+NNUE and Leele ver.0.26.2

Re: A match between SF12+NNUE and Leele ver.0.26.2

Re: A match between SF12+NNUE and Leele ver.0.26.2