TCEC Question

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Dann Corbit
Posts: 12540
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: TCEC Question

Post by Dann Corbit »

Cornfed wrote: Mon Jun 29, 2020 2:28 am
Dann Corbit wrote: Sun Jun 28, 2020 4:16 am In less than 1000 games practically any outcome is possible amongst approximate equals.
I guess that they are very close to equal, but SF had some fortunate outcomes.
And if SF is stronger, it is not by an enormous margain, as evidenced by the draw count.
I think the proverbial 'sample size' answer just kind of begs the question.

What does "fortunate" mean? Did LZ0 stay out late partying the night before?
If I flip a fair coin 100 times, 50 heads and 50 tails as the actual outcome is not likely[*]. The possible outcomes form a Gaussian curve and a 1 SD wide swath holds lots of different possibilities.
After 71 games SF leads 37.5 vs 33.5 (and game 72 looks like it will end in the Fish's favor as well...). A 4 to 5 pt lead at this point is actually reasonably significant. That said, there are more games to be played and Game 72 has LCZero defending the Latvian Counter Gambit...which is bad. Has SF yet to defend it? I don't know. The Devil is in the details.

EDIT: It has, the game before and SF lost...just as LCZero looks to at the moment.
[*] OK, it is the most likely SINGLE outcome, but the probability is enormously close to 49/51 and 51/49 and 48/52 and 52/48, etc, with the probability tailing off gradually.

To convince yourself, get a PRNG that generates random numbers between zero and one and run it one hundred times for 1000 cycles and record the different outcomes (numbers above and below one half) that actually occur. You will see some 50/50 outcomes, but you will also see some off a bit and a few that are way off. Remember, now that the "opponents" of "above a half" and "below a half" have exactly the same strength.

For another really funny outcome, see how many numbers are exactly one half with your generator. If it is an 8 byte floating point number and the values are uniformly distributed I would guess zero results of exactly one half for an individual value will show up in all 100,000 emitted elements. Of course, I would insist on testing for equality using == rather than the more usual definition because 1/2 is a special number that can be represented exactly and that is the odd outcome I refer to.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Dann Corbit
Posts: 12540
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: TCEC Question

Post by Dann Corbit »

An equivalent experiment, perhaps closer to the mark, would be to play SF against itself 100 times for 1000 trials (if you play game in 1 second it would take less than two days since there
are 86,400 seconds per day). (Edit:Umm, close to two and a third days because game in one second usually gives one second to EACH engine so we would need 200,000 seconds).
200000/86400=2.3148 (148 repeats)

I guess you would see a rare few landslide victories, though most 100 game runs would be somewhere close to 50%
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: TCEC Question

Post by bob »

One more point. More games == smaller standard deviation. Which means the samples will be closer to the middle value. If you run an infinite number of samples, you will end up exactly on the midpoint. 1000 games has a pretty high standard deviation. 100 significantly larger. 1 game? pretty much a random outcome no matter what.
Cornfed
Posts: 511
Joined: Sun Apr 26, 2020 11:40 pm
Full name: Brian D. Smith

Re: TCEC Question

Post by Cornfed »

Dann Corbit wrote: Mon Jun 29, 2020 5:30 am
Cornfed wrote: Mon Jun 29, 2020 2:28 am
Dann Corbit wrote: Sun Jun 28, 2020 4:16 am In less than 1000 games practically any outcome is possible amongst approximate equals.
I guess that they are very close to equal, but SF had some fortunate outcomes.
And if SF is stronger, it is not by an enormous margain, as evidenced by the draw count.
I think the proverbial 'sample size' answer just kind of begs the question.

What does "fortunate" mean? Did LZ0 stay out late partying the night before?
If I flip a fair coin 100 times, 50 heads and 50 tails as the actual outcome is not likely[*]. The possible outcomes form a Gaussian curve and a 1 SD wide swath holds lots of different possibilities.
After 71 games SF leads 37.5 vs 33.5 (and game 72 looks like it will end in the Fish's favor as well...). A 4 to 5 pt lead at this point is actually reasonably significant. That said, there are more games to be played and Game 72 has LCZero defending the Latvian Counter Gambit...which is bad. Has SF yet to defend it? I don't know. The Devil is in the details.

EDIT: It has, the game before and SF lost...just as LCZero looks to at the moment.
[*] OK, it is the most likely SINGLE outcome, but the probability is enormously close to 49/51 and 51/49 and 48/52 and 52/48, etc, with the probability tailing off gradually.

To convince yourself, get a PRNG that generates random numbers between zero and one and run it one hundred times for 1000 cycles and record the different outcomes (numbers above and below one half) that actually occur. You will see some 50/50 outcomes, but you will also see some off a bit and a few that are way off. Remember, now that the "opponents" of "above a half" and "below a half" have exactly the same strength.

For another really funny outcome, see how many numbers are exactly one half with your generator. If it is an 8 byte floating point number and the values are uniformly distributed I would guess zero results of exactly one half for an individual value will show up in all 100,000 emitted elements. Of course, I would insist on testing for equality using == rather than the more usual definition because 1/2 is a special number that can be represented exactly and that is the odd outcome I refer to.
I do have a reasonable understanding of statistical probability...but keep in mind that these engines are playing the same opening from the same points with both Black and White so the variables really are, in a sense, largely knowable. Sure 1000 games are better than 100 and 10,000 games are better than 1,000...if the starting points are from different positions. If you run the same (lets say 100, as in the SuperFinal) positions over and over and over again in successive tests, odds are that the winner of the first test would be the winner of the second and third...

One would not expect the margin to be very large. The question from Leo was:" Why is SF doing so well against LCO in the latest TCEC?" That IS a fact some 73 games in. Without doing an indepth analysis, to a degree, we would be guessing. Some guesses better than others. My 'guess', is that SF just sees a bit further and evaluates a bit better WHERE IT COUNTS and I would think others answers would revolve around that to some degree.
Leo
Posts: 1080
Joined: Fri Sep 16, 2016 6:55 pm
Location: USA/Minnesota
Full name: Leo Anger

Re: TCEC Question

Post by Leo »

ernest wrote: Mon Jun 29, 2020 2:36 am
Leo wrote: Mon Jun 29, 2020 2:24 am I haven't heard anyone complaining about the fairness for a long time.
Not complaining, just asking ! 8-)

(looking at the Knodes/sec)
I wasn't saying you were complaining.
Advanced Micro Devices fan.
Leo
Posts: 1080
Joined: Fri Sep 16, 2016 6:55 pm
Location: USA/Minnesota
Full name: Leo Anger

Re: TCEC Question

Post by Leo »

Cornfed wrote: Mon Jun 29, 2020 6:14 am
Dann Corbit wrote: Mon Jun 29, 2020 5:30 am
Cornfed wrote: Mon Jun 29, 2020 2:28 am
Dann Corbit wrote: Sun Jun 28, 2020 4:16 am In less than 1000 games practically any outcome is possible amongst approximate equals.
I guess that they are very close to equal, but SF had some fortunate outcomes.
And if SF is stronger, it is not by an enormous margain, as evidenced by the draw count.
I think the proverbial 'sample size' answer just kind of begs the question.

What does "fortunate" mean? Did LZ0 stay out late partying the night before?
If I flip a fair coin 100 times, 50 heads and 50 tails as the actual outcome is not likely[*]. The possible outcomes form a Gaussian curve and a 1 SD wide swath holds lots of different possibilities.
After 71 games SF leads 37.5 vs 33.5 (and game 72 looks like it will end in the Fish's favor as well...). A 4 to 5 pt lead at this point is actually reasonably significant. That said, there are more games to be played and Game 72 has LCZero defending the Latvian Counter Gambit...which is bad. Has SF yet to defend it? I don't know. The Devil is in the details.

EDIT: It has, the game before and SF lost...just as LCZero looks to at the moment.
[*] OK, it is the most likely SINGLE outcome, but the probability is enormously close to 49/51 and 51/49 and 48/52 and 52/48, etc, with the probability tailing off gradually.

To convince yourself, get a PRNG that generates random numbers between zero and one and run it one hundred times for 1000 cycles and record the different outcomes (numbers above and below one half) that actually occur. You will see some 50/50 outcomes, but you will also see some off a bit and a few that are way off. Remember, now that the "opponents" of "above a half" and "below a half" have exactly the same strength.

For another really funny outcome, see how many numbers are exactly one half with your generator. If it is an 8 byte floating point number and the values are uniformly distributed I would guess zero results of exactly one half for an individual value will show up in all 100,000 emitted elements. Of course, I would insist on testing for equality using == rather than the more usual definition because 1/2 is a special number that can be represented exactly and that is the odd outcome I refer to.
I do have a reasonable understanding of statistical probability...but keep in mind that these engines are playing the same opening from the same points with both Black and White so the variables really are, in a sense, largely knowable. Sure 1000 games are better than 100 and 10,000 games are better than 1,000...if the starting points are from different positions. If you run the same (lets say 100, as in the SuperFinal) positions over and over and over again in successive tests, odds are that the winner of the first test would be the winner of the second and third...

One would not expect the margin to be very large. The question from Leo was:" Why is SF doing so well against LCO in the latest TCEC?" That IS a fact some 73 games in. Without doing an indepth analysis, to a degree, we would be guessing. Some guesses better than others. My 'guess', is that SF just sees a bit further and evaluates a bit better WHERE IT COUNTS and I would think others answers would revolve around that to some degree.
Sf lost the last superfinal and now its winning this one. I wonder what has changed? I was ready to give up on SF defeating LCO in a 100 game match.
Advanced Micro Devices fan.
Cornfed
Posts: 511
Joined: Sun Apr 26, 2020 11:40 pm
Full name: Brian D. Smith

Re: TCEC Question

Post by Cornfed »

I've not seen anyone opine really.
But with a 6 pt lead and 15 to go, it is a foregone conclusion that The Fish have pulled LC0 down to some murky depths from which it will not surface.

It simply does what it needs to do in game play better than LC0...whatever that is.
dkappe
Posts: 1631
Joined: Tue Aug 21, 2018 7:52 pm
Full name: Dietrich Kappe

Re: TCEC Question

Post by dkappe »

Just to throw some more fuel on the fire, the GPU server was rebooted after 26 games because admins thought there might be something amiss. Before reboot, SF +4. After reboot: SF +1. Who knows.
Fat Titz by Stockfish, the engine with the bodaciously big net. Remember: size matters. If you want to learn more about this engine just google for "Fat Titz".
Dann Corbit
Posts: 12540
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: TCEC Question

Post by Dann Corbit »

Try this experiment:
Take stockfish.exe and copy it to purple.exe
Take stockfish.exe and copy it to gold.exe
Run one hundred games between purple and gold and one will turn out to be stronger than the other.
You can run games at game in one second if you like, so you can run the experiment in a minute and 40 seconds.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
MMarco
Posts: 195
Joined: Sun Apr 12, 2020 1:09 am
Full name: Marc-O Moisan-Plante

Re: TCEC Question

Post by MMarco »

Leo wrote: Sun Jun 28, 2020 1:54 am Why is SF doing so well against LCO in the latest TCEC?
Maybe simply because Lc0 net isn't the best around. It seems to be 75 elo lower than other big nets. Have a look here: http://talkchess.com/forum3/viewtopic.p ... 29#p849247

It is on the 30th rank.