Repeating games with switched colors reduces Elo error. All matches should be done like this

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

mmt
Posts: 343
Joined: Sun Aug 25, 2019 8:33 am
Full name: .

Repeating games with switched colors reduces Elo error. All matches should be done like this

Post by mmt »

This only applies to tests with the same opening book for both sides. It makes intuitive sense that the results will be more accurate if player A and player B play both sides of all openings. But I couldn't find any empirical results so I wrote a utility to test it out myself.

First, I've compared predictions that can be made after the first n games (multiple runs ordered randomly for higher accuracy) about the rest of the match. The results of matches with switched colors give more accurate predictions about the rest of the match.

Then I've used the bootstrap method of Elo error estimation. Playing with switched colors reduces Elo error. The tools like Ordo do not take the switched colors games into account and as a result, their error is too large. Instead of taking a match with 100 games (50 pairs) and picking individual games they should treat this match as 50 games, each having a result of 0, 0.5, 1, 1.5, or 2.

I can run this test for many matches if somebody wants to see the hard numbers. But it's clear that matches not using the switched color system are unnecessarily wasting CPU/GPU time by having to run more games to get the same accuracy as the matches with switched colors.
User avatar
Ovyron
Posts: 4556
Joined: Tue Jul 03, 2007 4:30 am

Re: Repeating games with switched colors reduces Elo error. All matches should be done like this

Post by Ovyron »

That's strange. If engine A plays really well the Sicilian and engine B plays it really badly, it would be expected that engine B would get a higher elo in the end if it never played it at all, instead of being forced to play it with switched colors. If engine B gets the same elo eventually and it gets there faster by playing it that's mysterious, because one expected that engine B's elo would be higher if it never had to play positions from the book that made it under-perform.
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: Repeating games with switched colors reduces Elo error. All matches should be done like this

Post by mwyoung »

mmt wrote: Tue Feb 25, 2020 1:42 pm This only applies to tests with the same opening book for both sides. It makes intuitive sense that the results will be more accurate if player A and player B play both sides of all openings. But I couldn't find any empirical results so I wrote a utility to test it out myself.

First, I've compared predictions that can be made after the first n games (multiple runs ordered randomly for higher accuracy) about the rest of the match. The results of matches with switched colors give more accurate predictions about the rest of the match.

Then I've used the bootstrap method of Elo error estimation. Playing with switched colors reduces Elo error. The tools like Ordo do not take the switched colors games into account and as a result, their error is too large. Instead of taking a match with 100 games (50 pairs) and picking individual games they should treat this match as 50 games, each having a result of 0, 0.5, 1, 1.5, or 2.

I can run this test for many matches if somebody wants to see the hard numbers. But it's clear that matches not using the switched color system are unnecessarily wasting CPU/GPU time by having to run more games to get the same accuracy as the matches with switched colors.
I find it better to use repeated opening with color switching. I do not use a set of positions. But a good opening book, but it is still a book of games. And with today's engines. They do find errors that are losing from the opening from time to time. Playing reverse colors cancels out that opening to a draw result.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
User avatar
Deberger
Posts: 91
Joined: Sat Nov 02, 2019 6:42 pm
Full name: ɹǝƃɹǝqǝᗡ ǝɔnɹꓭ

Re: Repeating games with switched colors reduces Elo error. All matches should be done like this

Post by Deberger »

User avatar
Ovyron
Posts: 4556
Joined: Tue Jul 03, 2007 4:30 am

Re: Repeating games with switched colors reduces Elo error. All matches should be done like this

Post by Ovyron »

mwyoung wrote: Tue Feb 25, 2020 7:57 pm Playing reverse colors cancels out that opening to a draw result.
It's not cancelled, it's on there contaminating the results you'd have gotten if a good book line instead was played. Imagine you play Stockfish v Crafty and they draw that opening even though Stockfish could have easily won with a better line.

All books are meant to do is adding variety, not contaminating with bad lines, that's why the only good way is having specific variety books for each engine, that takes care about not having them play out-of-book positions they don't like, and this will not require switched colors (because each engine has different preferences).

NOTE: This isn't specific to engines, that's why you don't have humans playing chess with switched colors.
Alayan
Posts: 550
Joined: Tue Nov 19, 2019 8:48 pm
Full name: Alayan Feh

Re: Repeating games with switched colors reduces Elo error. All matches should be done like this

Post by Alayan »

When an engine is rated, it should not be rated over a tiny subset of chess positions it likes and can go in if using an opening book meant to exclude most of the lines it does worse in.

An engine should instead be tested over a wide range of positions, in order to measure its general chess ability. This gives much better insight on the abilities of the engine to analyze general positions. Opening book used in most rating lists favor regular opening lines over offbeat lines, but that still gives much more diversity than going from the start position.

Switching colors when forcing opening lines onto engines is standard procedure. This has been done at fishtest, TCEC, etc. for a long time.
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: Repeating games with switched colors reduces Elo error. All matches should be done like this

Post by mwyoung »

Ovyron wrote: Tue Feb 25, 2020 8:05 pm
mwyoung wrote: Tue Feb 25, 2020 7:57 pm Playing reverse colors cancels out that opening to a draw result.
It's not cancelled, it's on there contaminating the results you'd have gotten if a good book line instead was played. Imagine you play Stockfish v Crafty and they draw that opening even though Stockfish could have easily won with a better line.

All books are meant to do is adding variety, not contaminating with bad lines, that's why the only good way is having specific variety books for each engine, that takes care about not having them play out-of-book positions they don't like, and this will not require switched colors (because each engine has different preferences).

NOTE: This isn't specific to engines, that's why you don't have humans playing chess with switched colors.
I think I said that. Not cancelled, cancels out to a draw result. 1-1 = draw.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
Michel
Posts: 2272
Joined: Mon Sep 29, 2008 1:50 am

Re: Repeating games with switched colors reduces Elo error. All matches should be done like this

Post by Michel »

mmt wrote: Tue Feb 25, 2020 1:42 pm This only applies to tests with the same opening book for both sides. It makes intuitive sense that the results will be more accurate if player A and player B play both sides of all openings. But I couldn't find any empirical results so I wrote a utility to test it out myself.

First, I've compared predictions that can be made after the first n games (multiple runs ordered randomly for higher accuracy) about the rest of the match. The results of matches with switched colors give more accurate predictions about the rest of the match.

Then I've used the bootstrap method of Elo error estimation. Playing with switched colors reduces Elo error. The tools like Ordo do not take the switched colors games into account and as a result, their error is too large. Instead of taking a match with 100 games (50 pairs) and picking individual games they should treat this match as 50 games, each having a result of 0, 0.5, 1, 1.5, or 2.

I can run this test for many matches if somebody wants to see the hard numbers. But it's clear that matches not using the switched color system are unnecessarily wasting CPU/GPU time by having to run more games to get the same accuracy as the matches with switched colors.
Congratulations. You reinvented the pentanomial model :) https://github.com/glinscott/fishtest/c ... def14a68f6
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
mmt
Posts: 343
Joined: Sun Aug 25, 2019 8:33 am
Full name: .

Re: Repeating games with switched colors reduces Elo error. All matches should be done like this

Post by mmt »

Great!
mmt
Posts: 343
Joined: Sun Aug 25, 2019 8:33 am
Full name: .

Re: Repeating games with switched colors reduces Elo error. All matches should be done like this

Post by mmt »

Michel wrote: Tue Feb 25, 2020 10:12 pm Congratulations. You reinvented the pentanomial model :) https://github.com/glinscott/fishtest/c ... def14a68f6
Never claimed it's anything new. The point of the post was that I did some tests to confirm that it works in practice (from the disagreement with this http://talkchess.com/forum3/viewtopic.p ... 27#p830327). It's great that you got it in the code, though. My next step will be to test out my idea of getting additional info from draws (and to a lesser extent from wins and losses).