TCEC Question

Dann Corbit · Post by **Dann Corbit** » Mon Jun 29, 2020 5:30 am

Cornfed wrote: ↑Mon Jun 29, 2020 2:28 am
Dann Corbit wrote: ↑Sun Jun 28, 2020 4:16 am In less than 1000 games practically any outcome is possible amongst approximate equals.
I guess that they are very close to equal, but SF had some fortunate outcomes.
And if SF is stronger, it is not by an enormous margain, as evidenced by the draw count.
I think the proverbial 'sample size' answer just kind of begs the question.

What does "fortunate" mean? Did LZ0 stay out late partying the night before?

If I flip a fair coin 100 times, 50 heads and 50 tails as the actual outcome is not likely[*]. The possible outcomes form a Gaussian curve and a 1 SD wide swath holds lots of different possibilities.

After 71 games SF leads 37.5 vs 33.5 (and game 72 looks like it will end in the Fish's favor as well...). A 4 to 5 pt lead at this point is actually reasonably significant. That said, there are more games to be played and Game 72 has LCZero defending the Latvian Counter Gambit...which is bad. Has SF yet to defend it? I don't know. The Devil is in the details.

EDIT: It has, the game before and SF lost...just as LCZero looks to at the moment.

[*] OK, it is the most likely SINGLE outcome, but the probability is enormously close to 49/51 and 51/49 and 48/52 and 52/48, etc, with the probability tailing off gradually.

To convince yourself, get a PRNG that generates random numbers between zero and one and run it one hundred times for 1000 cycles and record the different outcomes (numbers above and below one half) that actually occur. You will see some 50/50 outcomes, but you will also see some off a bit and a few that are way off. Remember, now that the "opponents" of "above a half" and "below a half" have exactly the same strength.

For another really funny outcome, see how many numbers are exactly one half with your generator. If it is an 8 byte floating point number and the values are uniformly distributed I would guess zero results of exactly one half for an individual value will show up in all 100,000 emitted elements. Of course, I would insist on testing for equality using == rather than the more usual definition because 1/2 is a special number that can be represented exactly and that is the odd outcome I refer to.

Dann Corbit · Post by **Dann Corbit** » Mon Jun 29, 2020 5:37 am

An equivalent experiment, perhaps closer to the mark, would be to play SF against itself 100 times for 1000 trials (if you play game in 1 second it would take less than two days since there
are 86,400 seconds per day). (Edit:Umm, close to two and a third days because game in one second usually gives one second to EACH engine so we would need 200,000 seconds).
200000/86400=2.3148 (148 repeats)

I guess you would see a rare few landslide victories, though most 100 game runs would be somewhere close to 50%

bob · Post by **bob** » Mon Jun 29, 2020 5:46 am

One more point. More games == smaller standard deviation. Which means the samples will be closer to the middle value. If you run an infinite number of samples, you will end up exactly on the midpoint. 1000 games has a pretty high standard deviation. 100 significantly larger. 1 game? pretty much a random outcome no matter what.

Cornfed · Post by **Cornfed** » Mon Jun 29, 2020 6:14 am

Dann Corbit wrote: ↑Mon Jun 29, 2020 5:30 am
Cornfed wrote: ↑Mon Jun 29, 2020 2:28 am
Dann Corbit wrote: ↑Sun Jun 28, 2020 4:16 am In less than 1000 games practically any outcome is possible amongst approximate equals.
I guess that they are very close to equal, but SF had some fortunate outcomes.
And if SF is stronger, it is not by an enormous margain, as evidenced by the draw count.
I think the proverbial 'sample size' answer just kind of begs the question.

What does "fortunate" mean? Did LZ0 stay out late partying the night before?
If I flip a fair coin 100 times, 50 heads and 50 tails as the actual outcome is not likely[*]. The possible outcomes form a Gaussian curve and a 1 SD wide swath holds lots of different possibilities.
After 71 games SF leads 37.5 vs 33.5 (and game 72 looks like it will end in the Fish's favor as well...). A 4 to 5 pt lead at this point is actually reasonably significant. That said, there are more games to be played and Game 72 has LCZero defending the Latvian Counter Gambit...which is bad. Has SF yet to defend it? I don't know. The Devil is in the details.

EDIT: It has, the game before and SF lost...just as LCZero looks to at the moment.
[*] OK, it is the most likely SINGLE outcome, but the probability is enormously close to 49/51 and 51/49 and 48/52 and 52/48, etc, with the probability tailing off gradually.

To convince yourself, get a PRNG that generates random numbers between zero and one and run it one hundred times for 1000 cycles and record the different outcomes (numbers above and below one half) that actually occur. You will see some 50/50 outcomes, but you will also see some off a bit and a few that are way off. Remember, now that the "opponents" of "above a half" and "below a half" have exactly the same strength.

For another really funny outcome, see how many numbers are exactly one half with your generator. If it is an 8 byte floating point number and the values are uniformly distributed I would guess zero results of exactly one half for an individual value will show up in all 100,000 emitted elements. Of course, I would insist on testing for equality using == rather than the more usual definition because 1/2 is a special number that can be represented exactly and that is the odd outcome I refer to.

I do have a reasonable understanding of statistical probability...but keep in mind that these engines are playing the same opening from the same points with both Black and White so the variables really are, in a sense, largely knowable. Sure 1000 games are better than 100 and 10,000 games are better than 1,000...if the starting points are from different positions. If you run the same (lets say 100, as in the SuperFinal) positions over and over and over again in successive tests, odds are that the winner of the first test would be the winner of the second and third...

One would not expect the margin to be very large. The question from Leo was:" Why is SF doing so well against LCO in the latest TCEC?" That IS a fact some 73 games in. Without doing an indepth analysis, to a degree, we would be guessing. Some guesses better than others. My 'guess', is that SF just sees a bit further and evaluates a bit better WHERE IT COUNTS and I would think others answers would revolve around that to some degree.

Leo · Post by **Leo** » Mon Jun 29, 2020 4:22 pm

ernest wrote: ↑Mon Jun 29, 2020 2:36 am
Leo wrote: ↑Mon Jun 29, 2020 2:24 am I haven't heard anyone complaining about the fairness for a long time.
Not complaining, just asking !

(looking at the Knodes/sec)

I wasn't saying you were complaining.

Leo · Post by **Leo** » Mon Jun 29, 2020 4:39 pm

Cornfed wrote: ↑Mon Jun 29, 2020 6:14 am
Dann Corbit wrote: ↑Mon Jun 29, 2020 5:30 am
Cornfed wrote: ↑Mon Jun 29, 2020 2:28 am
Dann Corbit wrote: ↑Sun Jun 28, 2020 4:16 am In less than 1000 games practically any outcome is possible amongst approximate equals.
I guess that they are very close to equal, but SF had some fortunate outcomes.
And if SF is stronger, it is not by an enormous margain, as evidenced by the draw count.
I think the proverbial 'sample size' answer just kind of begs the question.

What does "fortunate" mean? Did LZ0 stay out late partying the night before?
If I flip a fair coin 100 times, 50 heads and 50 tails as the actual outcome is not likely[*]. The possible outcomes form a Gaussian curve and a 1 SD wide swath holds lots of different possibilities.
After 71 games SF leads 37.5 vs 33.5 (and game 72 looks like it will end in the Fish's favor as well...). A 4 to 5 pt lead at this point is actually reasonably significant. That said, there are more games to be played and Game 72 has LCZero defending the Latvian Counter Gambit...which is bad. Has SF yet to defend it? I don't know. The Devil is in the details.

EDIT: It has, the game before and SF lost...just as LCZero looks to at the moment.
[*] OK, it is the most likely SINGLE outcome, but the probability is enormously close to 49/51 and 51/49 and 48/52 and 52/48, etc, with the probability tailing off gradually.

To convince yourself, get a PRNG that generates random numbers between zero and one and run it one hundred times for 1000 cycles and record the different outcomes (numbers above and below one half) that actually occur. You will see some 50/50 outcomes, but you will also see some off a bit and a few that are way off. Remember, now that the "opponents" of "above a half" and "below a half" have exactly the same strength.

For another really funny outcome, see how many numbers are exactly one half with your generator. If it is an 8 byte floating point number and the values are uniformly distributed I would guess zero results of exactly one half for an individual value will show up in all 100,000 emitted elements. Of course, I would insist on testing for equality using == rather than the more usual definition because 1/2 is a special number that can be represented exactly and that is the odd outcome I refer to.
I do have a reasonable understanding of statistical probability...but keep in mind that these engines are playing the same opening from the same points with both Black and White so the variables really are, in a sense, largely knowable. Sure 1000 games are better than 100 and 10,000 games are better than 1,000...if the starting points are from different positions. If you run the same (lets say 100, as in the SuperFinal) positions over and over and over again in successive tests, odds are that the winner of the first test would be the winner of the second and third...

One would not expect the margin to be very large. The question from Leo was:" Why is SF doing so well against LCO in the latest TCEC?" That IS a fact some 73 games in. Without doing an indepth analysis, to a degree, we would be guessing. Some guesses better than others. My 'guess', is that SF just sees a bit further and evaluates a bit better WHERE IT COUNTS and I would think others answers would revolve around that to some degree.

Sf lost the last superfinal and now its winning this one. I wonder what has changed? I was ready to give up on SF defeating LCO in a 100 game match.

Cornfed · Post by **Cornfed** » Wed Jul 01, 2020 12:40 am

I've not seen anyone opine really.
But with a 6 pt lead and 15 to go, it is a foregone conclusion that The Fish have pulled LC0 down to some murky depths from which it will not surface.

It simply does what it needs to do in game play better than LC0...whatever that is.

dkappe · Post by **dkappe** » Wed Jul 01, 2020 12:52 am

Just to throw some more fuel on the fire, the GPU server was rebooted after 26 games because admins thought there might be something amiss. Before reboot, SF +4. After reboot: SF +1. Who knows.

Dann Corbit · Post by **Dann Corbit** » Wed Jul 01, 2020 12:58 am

Try this experiment:
Take stockfish.exe and copy it to purple.exe
Take stockfish.exe and copy it to gold.exe
Run one hundred games between purple and gold and one will turn out to be stronger than the other.
You can run games at game in one second if you like, so you can run the experiment in a minute and 40 seconds.

MMarco · Post by **MMarco** » Wed Jul 01, 2020 2:26 am

Leo wrote: ↑Sun Jun 28, 2020 1:54 am Why is SF doing so well against LCO in the latest TCEC?

Maybe simply because Lc0 net isn't the best around. It seems to be 75 elo lower than other big nets. Have a look here: http://talkchess.com/forum3/viewtopic.p ... 29#p849247

It is on the 30th rank.

TCEC Question

Re: TCEC Question

Re: TCEC Question

Re: TCEC Question

Re: TCEC Question

Re: TCEC Question

Re: TCEC Question

Re: TCEC Question

Re: TCEC Question

Re: TCEC Question

Re: TCEC Question