Repeating games with switched colors reduces Elo error. All matches should be done like this

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Dann Corbit, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
User avatar
Ovyron
Posts: 4422
Joined: Tue Jul 03, 2007 2:30 am

Re: Repeating games with switched colors reduces Elo error. All matches should be done like this

Post by Ovyron » Wed Feb 26, 2020 3:34 am

jp wrote:
Wed Feb 26, 2020 3:08 am
He does not have bad performances in Chess960. He had one day (or 1 1/2) of bad performance in Chess960.
Yeah, I said "imagine he was only as good as his rating indicates for the openings he wants to play", meaning, you have to imagine a world different from ours where that is true where my arguments apply.

Like saying "imagine 1.Nf3 was lost with perfect play" wouldn't mean that it does, just that to carry on discussion we have to get into the hypothetical.

Uri Blass
Posts: 9108
Joined: Wed Mar 08, 2006 11:37 pm
Location: Tel-Aviv Israel

Re: Repeating games with switched colors reduces Elo error. All matches should be done like this

Post by Uri Blass » Wed Feb 26, 2020 6:36 pm

<snipped>
Ovyron wrote:
Wed Feb 26, 2020 2:22 am
Alayan wrote:
Wed Feb 26, 2020 12:12 am
DragonMist, former ICCF world champion, told me that at this point he considers top-level CC dead because it's becoming near-impossible to get wins against strong well-prepared opponents.
I'm not convinced about it, I think that if the prize was 1000000 dollars we'd see top-level CC alive and kicking with some amazing chess we have yet to witness, and that time travelers with software and hardware from 2025 would destroy today's top-level CC, so those winning strings exist, but nobody's life have depended on finding them so they'd rather play an easy game they can draw than getting into a complex position with 5% drawing chances that they could lose, but win as well.
I think that with a big prize money
2 different players are going to agree that if they play one lose on purpose so one of them is going to have a win.

The test should be beating engines when the engines use no opening book at long time control(let say 24 hours per move).
I do not expect to be able to beat stockfish with these conditions and I am not sure if it is possible to do it in most cases(remember that stockfish is not deterministic with many cores so even if you win one game you cannot find one line that prove that it is possible to do it always with the same color).

User avatar
Ovyron
Posts: 4422
Joined: Tue Jul 03, 2007 2:30 am

Re: Repeating games with switched colors reduces Elo error. All matches should be done like this

Post by Ovyron » Wed Feb 26, 2020 7:00 pm

Uri Blass wrote:
Wed Feb 26, 2020 6:36 pm
I think that with a big prize money
2 different players are going to agree that if they play one lose on purpose so one of them is going to have a win.
Maybe they're given the money with the condition that they're going to track it and if it's found it's shared with other participants they take it away. Or they don't give them money, but agree to buy them whatever they want up to $1000000, but they monitor if they're wanting to buy things for an opponent that tanked games. Or something. I don't know, but the spirit of the incentive is there

The idea is people would try harder because of the incentive and would manage to beat people from today that seem unbeatable, without requiring collusion. Who knows if this was made on the times before Alpha Zero was announced Google would have participated and won the bounty (today that's not a way, because you can use Leela Zero to avoid those loses), or what secret chess technology exists today that would make an appearance on those games to win them (like a chip specially designed for chess that plays like Stockfish 15.)

jp
Posts: 1450
Joined: Mon Apr 23, 2018 5:54 am

Re: Repeating games with switched colors reduces Elo error. All matches should be done like this

Post by jp » Fri Feb 28, 2020 3:41 am

Ovyron wrote:
Wed Feb 26, 2020 7:00 pm
Who knows if this was made on the times before Alpha Zero was announced Google would have participated and won the bounty
Only if they had bribed everyone else to make them not use any opening books.
When AZ was made to play SF8+cerebellum, it was >90% draws.

jp
Posts: 1450
Joined: Mon Apr 23, 2018 5:54 am

Re: Repeating games with switched colors reduces Elo error. All matches should be done like this

Post by jp » Fri Feb 28, 2020 3:47 am

mmt wrote:
Tue Feb 25, 2020 12:42 pm
Then I've used the bootstrap method of Elo error estimation. Playing with switched colors reduces Elo error. The tools like Ordo do not take the switched colors games into account and as a result, their error is too large. Instead of taking a match with 100 games (50 pairs) and picking individual games they should treat this match as 50 games, each having a result of 0, 0.5, 1, 1.5, or 2.

I can run this test for many matches if somebody wants to see the hard numbers.
Can you first just give us the numbers for the tests you've already run?

User avatar
Ovyron
Posts: 4422
Joined: Tue Jul 03, 2007 2:30 am

Re: Repeating games with switched colors reduces Elo error. All matches should be done like this

Post by Ovyron » Fri Feb 28, 2020 5:19 am

jp wrote:
Fri Feb 28, 2020 3:41 am
Only if they had bribed everyone else to make them not use any opening books.
When AZ was made to play SF8+cerebellum, it was >90% draws.
I missed this, when did AlphaZero play SF8+cerebellum?

mmt
Posts: 343
Joined: Sun Aug 25, 2019 6:33 am
Full name: .

Re: Repeating games with switched colors reduces Elo error. All matches should be done like this

Post by mmt » Fri Feb 28, 2020 9:16 am

jp wrote:
Fri Feb 28, 2020 3:47 am
Can you first just give us the numbers for the tests you've already run?
# of games in the match, regular bootstrap method Elo error, pentanomial (considering switched-sides) bootstrap method Elo error
500, 37.2, 24.3
20000, 4.9, 3.4
800, 21.6, 12.7
500, 36.1, 24.5
2000, 15.4, 9.8
990, 15.5, 10.4
990, 13.0, 8.1
998, 10.0, 5.3
986, 13.0, 7.8
150, 54.8, 33.6
500, 36.5, 22.9

jp
Posts: 1450
Joined: Mon Apr 23, 2018 5:54 am

Re: Repeating games with switched colors reduces Elo error. All matches should be done like this

Post by jp » Mon Mar 02, 2020 7:58 am

Ovyron wrote:
Fri Feb 28, 2020 5:19 am
I missed this, when did AlphaZero play SF8+cerebellum?
When the paper reviewers forced the DM authors to do it.
The results were >90% draws, <5% wins.

Obviously whatever was left was losses, but DM did not give the exact numbers.

corres
Posts: 3657
Joined: Wed Nov 18, 2015 10:41 am
Location: hungary

Re: Repeating games with switched colors reduces Elo error. All matches should be done like this

Post by corres » Mon Mar 02, 2020 11:00 am

Ovyron wrote:
Tue Feb 25, 2020 7:05 pm
...
All books are meant to do is adding variety, not contaminating with bad lines, that's why the only good way is having specific variety books for each engine, that takes care about not having them play out-of-book positions they don't like, and this will not require switched colors (because each engine has different preferences).
In this case you value the Engine + Its special book together.
In general maker of rating list use the own books and prohibit the usage of engine special books.
I agree this method. Because this method gives real picture about the power of engines.

corres
Posts: 3657
Joined: Wed Nov 18, 2015 10:41 am
Location: hungary

Re: Repeating games with switched colors reduces Elo error. All matches should be done like this

Post by corres » Mon Mar 02, 2020 11:05 am

mmt wrote:
Tue Feb 25, 2020 12:42 pm
This only applies to tests with the same opening book for both sides. It makes intuitive sense that the results will be more accurate if player A and player B play both sides of all openings. But I couldn't find any empirical results so I wrote a utility to test it out myself.

First, I've compared predictions that can be made after the first n games (multiple runs ordered randomly for higher accuracy) about the rest of the match. The results of matches with switched colors give more accurate predictions about the rest of the match.

Then I've used the bootstrap method of Elo error estimation. Playing with switched colors reduces Elo error. The tools like Ordo do not take the switched colors games into account and as a result, their error is too large. Instead of taking a match with 100 games (50 pairs) and picking individual games they should treat this match as 50 games, each having a result of 0, 0.5, 1, 1.5, or 2.

I can run this test for many matches if somebody wants to see the hard numbers. But it's clear that matches not using the switched color system are unnecessarily wasting CPU/GPU time by having to run more games to get the same accuracy as the matches with switched colors.
Many thanks for your works because the effect is commonly known. But only a few works tried to prove it.

Post Reply