Lco road to 8000 or 10000! self play elo.

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

mar
Posts: 2554
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: Lco road to 8000 or 10000! self play elo.

Post by mar »

Laskos wrote: Fri Nov 23, 2018 8:24 pm To check the truthfulness of 2300+ Elo difference, even 3 games can be more than enough for high confidence of its falsification.
Except that there's no 2000 elo difference, the LC0 self-play elo graph is just accumulated error at best.
I wonder why people play 4, 10, 20 games between engines of similar strength and draw conclusions based on that.
Engine devs play (tens of) thousands per patch. There's no shortcut unless you have an oracle.
Martin Sedlak
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Lco road to 8000 or 10000! self play elo.

Post by Laskos »

mar wrote: Fri Nov 23, 2018 9:38 pm
Laskos wrote: Fri Nov 23, 2018 8:24 pm To check the truthfulness of 2300+ Elo difference, even 3 games can be more than enough for high confidence of its falsification.
Except that there's no 2000 elo difference, the LC0 self-play elo graph is just accumulated error at best.
I wonder why people play 4, 10, 20 games between engines of similar strength and draw conclusions based on that.
Engine devs play (tens of) thousands per patch. There's no shortcut unless you have an oracle.
No, I misread that post, I thought that he was comparing two nets of the same 30xxx run, and 3 games can be enough to show that their self-Elo is a bogus number, almost arbitrary.
alex67a
Posts: 50
Joined: Mon Sep 10, 2018 10:15 am
Location: Denmark
Full name: Alexander Spence

Re: Lco road to 8000 or 10000! self play elo.

Post by alex67a »

mar wrote: Fri Nov 23, 2018 9:38 pm
Laskos wrote: Fri Nov 23, 2018 8:24 pm To check the truthfulness of 2300+ Elo difference, even 3 games can be more than enough for high confidence of its falsification.
Except that there's no 2000 elo difference, the LC0 self-play elo graph is just accumulated error at best.
I wonder why people play 4, 10, 20 games between engines of similar strength and draw conclusions based on that.
Engine devs play (tens of) thousands per patch. There's no shortcut unless you have an oracle.
Normally you would be right
But there is a problem: if on 4 games an engine wins 3 and draws one it is obvious that there is a big difference between the two engines
Similar engines give many draws, this is not happen here

If I make 4 games between Stockfish and Arasan and the first one wins three games, do you think it is not reliable as an approximate test?
mar
Posts: 2554
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: Lco road to 8000 or 10000! self play elo.

Post by mar »

No, I don't think that's realiable - this can happen quite easily (again, speaking of engines of similar strength, not hundreds of elo gap)
It took me 10 seconds to find 4 games where two versions of my engine scored 3 1/2 - 1/2 in self-play, but after 10k it was a wash and within error bars.
I see this all the time in tournaments with engines of similar strength, in one tournament you score 40% or less, in another 60% against the same opponent etc.
The problem is small number of samples, that's all.
Martin Sedlak
whereagles
Posts: 565
Joined: Thu Nov 13, 2014 12:03 pm

Re: Lco road to 8000 or 10000! self play elo.

Post by whereagles »

not sure if it's small samples... i think we may be seeing some bootstrap mechanism at work

a > b > c > a ... etc

abc = NN weight sets

there: endless rise, zero progress
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: Lco road to 8000 or 10000! self play elo.

Post by Milos »

mar wrote: Fri Nov 23, 2018 10:19 pm No, I don't think that's realiable - this can happen quite easily (again, speaking of engines of similar strength, not hundreds of elo gap)
It took me 10 seconds to find 4 games where two versions of my engine scored 3 1/2 - 1/2 in self-play, but after 10k it was a wash and within error bars.
I see this all the time in tournaments with engines of similar strength, in one tournament you score 40% or less, in another 60% against the same opponent etc.
The problem is small number of samples, that's all.
Your self play TC was 1min per game or similar, his games were 30sec per move. Due to draw rate you might have higher error margins with 20 super fast self-play games than with 4 30sec/move games.
In his case if nets were really of equal strength draw probability could be easily 80%. So probability of 3 wins out of 4 games for one engine in case of engines of equal strength would be like 0.1%.
So 4 games could be indeed more than sufficient to prove with almost 100% certainty that engine A is stronger than engine B.
Ofc one would need to have some knowledge of statistics which doesn't seem to be your case...
Nay Lin Tun
Posts: 708
Joined: Mon Jan 16, 2012 6:34 am

Re: Lco road to 8000 or 10000! self play elo.

Post by Nay Lin Tun »

MikeB wrote: Fri Nov 23, 2018 8:38 pm
Nay Lin Tun wrote: Fri Nov 23, 2018 8:47 am
P.S, Testers say 30xx network is still -100 to -150 elo beyond 11248.
Fake news....

anyone can verify or disprove your claims after even just a few games, I believe it's actually worse...

Code: Select all

Rank Name             Rating   Δ     +    -     #     Σ    Σ%     W    L    D   W%    =%   OppR 
---------------------------------------------------------------------------------------------------------
   1 Lc0 v0.19.0 11261   3198   0.0   84   84    50   38.5  77.0   33    6   11  66.0  22.0  3002 
   2 Lc0 v0.19.0 31493   3002 195.7   84   84    50   11.5  23.0    6   33   11  12.0  22.0  3198
---------------------------------------------------------------------------------------------------------
custom openings that may exaggerate Elo differences ( due to the unbalance nature of the openings )..
Hmm, I read this forum post just before posting this. According to his test with decent hardware ( that would closely reflect performance in Tcec or CCCC) the estimate is -200 elo below latest SF, whereas best 11248 is known to be below -100 elo.(speed ratio 1:1000). And also, in your slow GPU or very short time control, you are testing mostly the strength of policy heads because the value net (MCTS dont have a good chance to correct the mistakes done by policy head)
https://ibb.co/eq90FV
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: Lco road to 8000 or 10000! self play elo.

Post by Milos »

Nay Lin Tun wrote: Sat Nov 24, 2018 12:10 am
MikeB wrote: Fri Nov 23, 2018 8:38 pm
Nay Lin Tun wrote: Fri Nov 23, 2018 8:47 am
P.S, Testers say 30xx network is still -100 to -150 elo beyond 11248.
Fake news....

anyone can verify or disprove your claims after even just a few games, I believe it's actually worse...

Code: Select all

Rank Name             Rating   Δ     +    -     #     Σ    Σ%     W    L    D   W%    =%   OppR 
---------------------------------------------------------------------------------------------------------
   1 Lc0 v0.19.0 11261   3198   0.0   84   84    50   38.5  77.0   33    6   11  66.0  22.0  3002 
   2 Lc0 v0.19.0 31493   3002 195.7   84   84    50   11.5  23.0    6   33   11  12.0  22.0  3198
---------------------------------------------------------------------------------------------------------
custom openings that may exaggerate Elo differences ( due to the unbalance nature of the openings )..
Hmm, I read this forum post just before posting this. According to his test with decent hardware ( that would closely reflect performance in Tcec or CCCC) the estimate is -200 elo below latest SF, whereas best 11248 is known to be below -100 elo.(speed ratio 1:1000). And also, in your slow GPU or very short time control, you are testing mostly the strength of policy heads because the value net (MCTS dont have a good chance to correct the mistakes done by policy head)
https://ibb.co/eq90FV
20Mnpmove close to TCEC performance for SFdev???
Didn't know TCEC used TC of 10''+0.1''. :lol: :lol: :lol:
chrisw
Posts: 4317
Joined: Tue Apr 03, 2012 4:28 pm

Re: Lco road to 8000 or 10000! self play elo.

Post by chrisw »

Milos wrote: Fri Nov 23, 2018 11:46 pm
mar wrote: Fri Nov 23, 2018 10:19 pm No, I don't think that's realiable - this can happen quite easily (again, speaking of engines of similar strength, not hundreds of elo gap)
It took me 10 seconds to find 4 games where two versions of my engine scored 3 1/2 - 1/2 in self-play, but after 10k it was a wash and within error bars.
I see this all the time in tournaments with engines of similar strength, in one tournament you score 40% or less, in another 60% against the same opponent etc.
The problem is small number of samples, that's all.
Your self play TC was 1min per game or similar, his games were 30sec per move. Due to draw rate you might have higher error margins with 20 super fast self-play games than with 4 30sec/move games.
In his case if nets were really of equal strength draw probability could be easily 80%. So probability of 3 wins out of 4 games for one engine in case of engines of equal strength would be like 0.1%.
So 4 games could be indeed more than sufficient to prove with almost 100% certainty that engine A is stronger than engine B.
Ofc one would need to have some knowledge of statistics which doesn't seem to be your case...
People with a weird knowledge of statistics may have heard of the German Tank Problem which is not exactly the same, but has the similarity of estimating a total number of things when you have only four of the things (as in this case). This particular chess score problem is made much easier when you only want to prove A better than B rather than A better than B by 200ELO.
Nay Lin Tun
Posts: 708
Joined: Mon Jan 16, 2012 6:34 am

Re: Lco road to 8000 or 10000! self play elo.

Post by Nay Lin Tun »

Milos wrote: Sat Nov 24, 2018 12:20 am
Nay Lin Tun wrote: Sat Nov 24, 2018 12:10 am
MikeB wrote: Fri Nov 23, 2018 8:38 pm
Nay Lin Tun wrote: Fri Nov 23, 2018 8:47 am
P.S, Testers say 30xx network is still -100 to -150 elo beyond 11248.
Fake news....

anyone can verify or disprove your claims after even just a few games, I believe it's actually worse...

Code: Select all

Rank Name             Rating   Δ     +    -     #     Σ    Σ%     W    L    D   W%    =%   OppR 
---------------------------------------------------------------------------------------------------------
   1 Lc0 v0.19.0 11261   3198   0.0   84   84    50   38.5  77.0   33    6   11  66.0  22.0  3002 
   2 Lc0 v0.19.0 31493   3002 195.7   84   84    50   11.5  23.0    6   33   11  12.0  22.0  3198
---------------------------------------------------------------------------------------------------------
custom openings that may exaggerate Elo differences ( due to the unbalance nature of the openings )..
Hmm, I read this forum post just before posting this. According to his test with decent hardware ( that would closely reflect performance in Tcec or CCCC) the estimate is -200 elo below latest SF, whereas best 11248 is known to be below -100 elo.(speed ratio 1:1000). And also, in your slow GPU or very short time control, you are testing mostly the strength of policy heads because the value net (MCTS dont have a good chance to correct the mistakes done by policy head)
https://ibb.co/eq90FV
20Mnpmove close to TCEC performance for SFdev???
Didn't know TCEC used TC of 10''+0.1''. :lol: :lol: :lol:

Among testers, his hardware setup is most similar to those TCEC/CCCC. I think the average speeds of Lco and SF in last CCCC was around 40knps vs 80 MNps(1:2000), that would add another -50 elo gap between Lco and SF.