A match between SF12+NNUE and Leele ver.0.26.2

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

AndrewGrant
Posts: 1960
Joined: Tue Apr 19, 2016 6:08 am
Location: U.S.A
Full name: Andrew Grant

Re: A match between SF12+NNUE and Leele ver.0.26.2

Post by AndrewGrant »

mwyoung wrote: Tue Sep 22, 2020 6:29 pm The question is have you ever seen a engine win all games in a 10 game test. Then lose a test match in 1000 games or 10,000 games to the same engine. No!
Uh, no. I see Ethereal win 10 games straight and then fail a 1,000 game test once every few days.

I can never tell if users are trolling. Do you _really_ think a sample size of 10 games means _anything_ ?

If I play 100,000 games against Stockfish with Ethereal. Do you not think I can find a 10 game chunk where Ethereak beats Stockfish handedly?
Alayan
Posts: 550
Joined: Tue Nov 19, 2019 8:48 pm
Full name: Alayan Feh

Re: A match between SF12+NNUE and Leele ver.0.26.2

Post by Alayan »

Ethereal 12.50 vs Stockfish 12 CCRL FRC testing has this chunk of games right at the very end of the 300 games they played :
1 1 0 = 0 0 = 1 0 1 1 1
+6-4=2 for Ethereal. +4-1 in the last 5 games.

Overall results are +22-185=93.

One thing that should be said is that error bars are directly correlated to the results variance. Play e.g. 100 games N times and measure the distribution of results. When the draw rate is higher (as it is with longer TC and more threads), the standard deviation will go down, while the sample size is unchanged. But I don't know if intrinsic WDL is enough or if you need experimental data to build a good prior for error bar computations.

A simple thought experiment involves an engine X playing a drawless game. In self-play playing both sides in equal amount, the intrinsic WDL will always be 50/0/50. However, at very long TC the engine might achieve 100% win with the strong side and no variance at all, while at very short TC the weak side at the start of the game could snatch wins, creating variance. So, it seems intrinsic WDL by itself isn't enough to properly assess variance. And of course, measured WDL will be off compared to the intrinsic WDL, so this increases unreliability.

In the end, that you might get away with less games for the same confidence doesn't mean multiple orders of magnitude less.
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: A match between SF12+NNUE and Leele ver.0.26.2

Post by corres »

AndrewGrant wrote: Tue Sep 22, 2020 6:45 pm
mwyoung wrote: Tue Sep 22, 2020 6:29 pm The question is have you ever seen a engine win all games in a 10 game test. Then lose a test match in 1000 games or 10,000 games to the same engine. No!
Uh, no. I see Ethereal win 10 games straight and then fail a 1,000 game test once every few days.
I can never tell if users are trolling. Do you _really_ think a sample size of 10 games means _anything_ ?
If I play 100,000 games against Stockfish with Ethereal. Do you not think I can find a 10 game chunk where Ethereak beats Stockfish handedly?
How many is the probability only the first 10 games, or only the last 10 games Ethereal will win from 1000 games and then all 990 Ethereal will loose?
It is very-very small.
Note
The statistics what in generally used for estimating the exactness of Elo is appropriate for many thousand
games because the Gauss curve is a continuous curve what is approached only with lot of measuring point.
For a data-set with a few element is do not appropriate to the Gauss statistics. But this fact is not an obstacle to draw conclusion for the power-line of engines. Obviously as the number of games and the power-difference is low, as the uncertainty is higher.
In general, I publish my test method and the results, and everybody can use it as he want.
This is the all.
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: A match between SF12+NNUE and Leele ver.0.26.2

Post by mwyoung »

AndrewGrant wrote: Tue Sep 22, 2020 6:45 pm
mwyoung wrote: Tue Sep 22, 2020 6:29 pm The question is have you ever seen a engine win all games in a 10 game test. Then lose a test match in 1000 games or 10,000 games to the same engine. No!
Uh, no. I see Ethereal win 10 games straight and then fail a 1,000 game test once every few days.

I can never tell if users are trolling. Do you _really_ think a sample size of 10 games means _anything_ ?

If I play 100,000 games against Stockfish with Ethereal. Do you not think I can find a 10 game chunk where Ethereak beats Stockfish handedly?
I know you have many issues.

What engine! What testing conditions! And I can do this test also. :lol: Your always full of B.S.

This is LOS statistics. This is not Mark's statistics.

"If I play 100,000 games against Stockfish with Ethereal. Do you not think I can find a 10 game chunk where Ethereak beats Stockfish handedly?"

Very unlikely, and that is the point. But you might be able to cherry pick 10 games. And say this was a 10 game test. I put nothing past you with your record.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: A match between SF12+NNUE and Leele ver.0.26.2

Post by Milos »

mwyoung wrote: Tue Sep 22, 2020 8:30 pm I know you have many issues.

What engine! What testing conditions! And I can do this test also. :lol: Your always full of B.S.

This is LOS statistics. This is not Mark's statistics.

"If I play 100,000 games against Stockfish with Ethereal. Do you not think I can find a 10 game chunk where Ethereak beats Stockfish handedly?"

Very unlikely, and that is the point. But you might be able to cherry pick 10 games. And say this was a 10 game test. I put nothing past you with your record.
Oh shut up you clueless troll. You are ignorant beyond comprehension. I wonder why they even try to explain you anything.
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: A match between SF12+NNUE and Leele ver.0.26.2

Post by mwyoung »

Milos wrote: Tue Sep 22, 2020 9:34 pm
mwyoung wrote: Tue Sep 22, 2020 8:30 pm I know you have many issues.

What engine! What testing conditions! And I can do this test also. :lol: Your always full of B.S.

This is LOS statistics. This is not Mark's statistics.

"If I play 100,000 games against Stockfish with Ethereal. Do you not think I can find a 10 game chunk where Ethereak beats Stockfish handedly?"

Very unlikely, and that is the point. But you might be able to cherry pick 10 games. And say this was a 10 game test. I put nothing past you with your record.
Oh shut up you clueless troll. You are ignorant beyond comprehension. I wonder why they even try to explain you anything.
Funny, when I was not the one posting here first. You came to me. :lol:

You have been called out. Now prove LOS stats wrong and invalid.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: A match between SF12+NNUE and Leele ver.0.26.2

Post by mwyoung »

mwyoung wrote: Tue Sep 22, 2020 9:43 pm
Milos wrote: Tue Sep 22, 2020 9:34 pm
mwyoung wrote: Tue Sep 22, 2020 8:30 pm I know you have many issues.

What engine! What testing conditions! And I can do this test also. :lol: Your always full of B.S.

This is LOS statistics. This is not Mark's statistics.

"If I play 100,000 games against Stockfish with Ethereal. Do you not think I can find a 10 game chunk where Ethereak beats Stockfish handedly?"

Very unlikely, and that is the point. But you might be able to cherry pick 10 games. And say this was a 10 game test. I put nothing past you with your record.
Oh shut up you clueless troll. You are ignorant beyond comprehension. I wonder why they even try to explain you anything.
Funny, when I was not the one posting here first. You came to me. :lol:

You have been called out. Now prove LOS stats wrong and invalid.
LOS practical testing. How LOS can be used. Here we have 2 new engines. Stockfish 210920 and Ethereal 12.50. I have never tested them. And I have no clue what engine is better! How many games does it take with these two engines with correct application of LOS to determine which engine is stronger.

Let us find out.

Live Stream:

Live Stockfish 12 (210920) vs Ethereal 12.50 (3m+2s) LOS testing.

Testing conditions.

Hardware 2950x, RTX 2080 Ti

Ethereal 12.50
Stockfish 12 (210920)
Ponder off.
TC=3m+2s
200 Games
32 threads.
4 Gb hash.
6 man TB, and the top ten 7 man TB.
Opening book 6 moves random.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
OliverBr
Posts: 846
Joined: Tue Dec 18, 2007 9:38 pm
Location: Munich, Germany
Full name: Dr. Oliver Brausch

Re: A match between SF12+NNUE and Leele ver.0.26.2

Post by OliverBr »

mwyoung wrote: Tue Sep 22, 2020 10:03 pm 200 Games
Why are you playing 200 games? I thought 10 games are more than enough...
OliThink GitHub: https://github.com/olithink
Nice arcticle about OlIThink: https://www.chessengeria.eu/post/olithink-oldie-goldie
Chess Engine OliThink Homepage: http://brausch.org/home/chess
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: A match between SF12+NNUE and Leele ver.0.26.2

Post by mwyoung »

OliverBr wrote: Tue Sep 22, 2020 10:37 pm
mwyoung wrote: Tue Sep 22, 2020 10:03 pm 200 Games
Why are you playing 200 games? I thought 10 games is more than enough...
Again your stupidity shows. And you have no clue what or how to apply LOS testing. I do not know what engine is better, or if one engine will win 10 games in the first 10 games.

But right now after 8 games only. It is likely that SF is the better engine.
LOS score after 8 games is SF 95.8% and Ethereal 4.2%. :shock:

The match continues until we know for sure....100%
Last edited by mwyoung on Tue Sep 22, 2020 10:52 pm, edited 1 time in total.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
OliverBr
Posts: 846
Joined: Tue Dec 18, 2007 9:38 pm
Location: Munich, Germany
Full name: Dr. Oliver Brausch

Re: A match between SF12+NNUE and Leele ver.0.26.2

Post by OliverBr »

mwyoung wrote: Tue Sep 22, 2020 10:41 pm Again your stupidity shows.
What, again, is your engine? Could you please post the git link?
Thank you very much.
OliThink GitHub: https://github.com/olithink
Nice arcticle about OlIThink: https://www.chessengeria.eu/post/olithink-oldie-goldie
Chess Engine OliThink Homepage: http://brausch.org/home/chess