Could we use partial scoring for draws to improve evaluation of engine vs engine matches and tournaments?

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

mmt
Posts: 343
Joined: Sun Aug 25, 2019 8:33 am
Full name: .

Could we use partial scoring for draws to improve evaluation of engine vs engine matches and tournaments?

Post by mmt »

There have been various proposals over the years to change the rules of chess to reduce the number of draws. Correspondence players want something done https://en.chessbase.com/post/how-many- ... for-a-draw and there is a precedent in other games like Janggi (Korean chess) https://en.wikipedia.org/wiki/Janggi#Mi ... eous_rules.

I'm not proposing any rule changes but we could look at the draws in engine vs engine games and determine who had an advantage near the end (e.g. if both engines scored previous positions before the draw scores as >0.5 for white, then white should get 0.6-0.7 points instead of 0.5). The idea is that 90% of time spent playing engine matches goes to draws and we get no information from them. If one side can consistently get an advantage, this should be rewarded. If chess engines optimize towards this scoring, it becomes a worthless measure. But the current engines don't, so it's possible.
User avatar
Ovyron
Posts: 4556
Joined: Tue Jul 03, 2007 4:30 am

Re: Could we use partial scoring for draws to improve evaluation of engine vs engine matches and tournaments?

Post by Ovyron »

mmt wrote: Wed Feb 05, 2020 2:28 am e.g. if both engines scored previous positions before the draw scores as >0.5 for white, then white should get 0.6-0.7 points instead of 0.5
No, it's the other way around, if the position is drawn because no engine can win, then engines should be punished for misevaluating the positions. The engines that should be getting those score bonuses should be the ones that rightly show a score of 0.00 first, or show the lowest score.

An idea I tried back in 2008 was to simply award 0.45 points for draws. The idea is that engines would be encouraged to win even if they lost because a 1.0 win and a 0.0 loss would give more score than 2 * 0.45 draws, but it was ditched because engines weren't aware of this when playing.
mmt
Posts: 343
Joined: Sun Aug 25, 2019 8:33 am
Full name: .

Re: Could we use partial scoring for draws to improve evaluation of engine vs engine matches and tournaments?

Post by mmt »

Ovyron wrote: Wed Feb 05, 2020 9:08 am
mmt wrote: Wed Feb 05, 2020 2:28 am e.g. if both engines scored previous positions before the draw scores as >0.5 for white, then white should get 0.6-0.7 points instead of 0.5
No, it's the other way around, if the position is drawn because no engine can win, then engines should be punished for misevaluating the positions. The engines that should be getting those score bonuses should be the ones that rightly show a score of 0.00 first, or show the lowest score.
This doesn't make sense to me. What if both engines had a position as 1.2 for white? They agree that white has played better up to this point and the chances (let's say from self-play) are that it's 80% win/15% draw/5% loss. I would say that the draw could be counted as some points for white, even if that 15% happened. But I could be wrong and this disagreement can be solved with data instead of theory:
Take actual games between two engines with a known ELO difference and see who had the lead in the draws. Then compare the average score to the ELO difference. Do this with multiple tournaments and you will see not just if better engines had leads but also the correlation that could help you figure out how to score the draws.
Ovyron wrote: Wed Feb 05, 2020 9:08 am An idea I tried back in 2008 was to simply award 0.45 points for draws. The idea is that engines would be encouraged to win even if they lost because a 1.0 win and a 0.0 loss would give more score than 2 * 0.45 draws, but it was ditched because engines weren't aware of this when playing.
https://en.wikipedia.org/wiki/Bilbao_Ch ... ters_Final 3-1-0 soccer-like scoring system has been used in this tournament starting in 2008. It was also used in 2nd London Classic in 2010. I think it's a superior method.
User avatar
Ovyron
Posts: 4556
Joined: Tue Jul 03, 2007 4:30 am

Re: Could we use partial scoring for draws to improve evaluation of engine vs engine matches and tournaments?

Post by Ovyron »

mmt wrote: Wed Feb 05, 2020 10:21 am This doesn't make sense to me. What if both engines had a position as 0.8 for white?
If the position is 0.00 then they're both wrong and should be punished. The development of both engines had them producing a wrong score for a drawn position, a more accurate engine would rightly show 0.00, so that one should get the bonus.

If the position really has some 80% win/15% draw/5% performance then WHITE should be punished for failing to win it and BLACK should be rewarded for managing to save it against 80/20 odds, but this would be regardless of their scores.

I hold that rewarding white for managing to reach such a position is backwards, because it wasn't able to win it, so the award would go to the one that was able to defend such a hard-to-defend position (but I do agree with the idea of punishments/rewards for draws.)
mmt
Posts: 343
Joined: Sun Aug 25, 2019 8:33 am
Full name: .

Re: Could we use partial scoring for draws to improve evaluation of engine vs engine matches and tournaments?

Post by mmt »

The idea is practical, not about reward or punishment. If all better engines (or even engine self-play using longer time controls) on average have better "draw scores," then this is valuable data that we can use to see which engine is better sooner, without playing 1000 games to get lower uncertainty. I'm guessing _all_ engines will show the behavior of higher "draw scores" for higher ELO engines in direct matchups but I could be wrong.

The soccer (football) analogy would be that the game has ended in a 0-0 tie but one side controlled the ball, had more shots on goal, while the other side has never truly threatened it. If you know no other info, on which side would you bet for the 2nd game between them?

It might also be possible to get additional information out of wins - e.g. if the win is quicker then you get more points. It's intuitive - SF 11 will crush me quicker than a player 200 ELO better than me who doesn't know my ELO.
User avatar
hgm
Posts: 27796
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Could we use partial scoring for draws to improve evaluation of engine vs engine matches and tournaments?

Post by hgm »

One should never award a result on the basis of the evaluation. That would just encourage engines to lie about their evaluation.
mmt
Posts: 343
Joined: Sun Aug 25, 2019 8:33 am
Full name: .

Re: Could we use partial scoring for draws to improve evaluation of engine vs engine matches and tournaments?

Post by mmt »

hgm wrote: Wed Feb 05, 2020 12:24 pm One should never award a result on the basis of the evaluation. That would just encourage engines to lie about their evaluation.
I mentioned this in my first post.
If chess engines optimize towards this scoring, it becomes a worthless measure.
But actually now I'm thinking that it might be possible even if they do lie - by self-playing them and playing them against opponents at various time limits - "you think you're ahead by 2.2? Prove it."
User avatar
Ovyron
Posts: 4556
Joined: Tue Jul 03, 2007 4:30 am

Re: Could we use partial scoring for draws to improve evaluation of engine vs engine matches and tournaments?

Post by Ovyron »

mmt wrote: Wed Feb 05, 2020 1:16 pm "you think you're ahead by 2.2? Prove it."
But why do I need to prove it if I can just show very high evals for everything (even if I'm getting mated...) to maximize my reward in case it's a draw? Basically, currently engine developers use different scales for their engines, changing their scale would trivially make them score more points on draws with everything else being the same, even Stockfish has a single value on its source code for this, so just doubling it would make it score twice the bonus on draws.
mmt
Posts: 343
Joined: Sun Aug 25, 2019 8:33 am
Full name: .

Re: Could we use partial scoring for draws to improve evaluation of engine vs engine matches and tournaments?

Post by mmt »

Ovyron wrote: Wed Feb 05, 2020 1:36 pm
mmt wrote: Wed Feb 05, 2020 1:16 pm "you think you're ahead by 2.2? Prove it."
But why do I need to prove it if I can just show very high evals for everything (even if I'm getting mated...) to maximize my reward in case it's a draw?
Once we know that engines optimize for this type of scoring, we will no longer use engine scores at all and instead make them show that they can beat other engines or themselves. For each engine, we can find out what probability of winning its score represents.
Ovyron wrote: Wed Feb 05, 2020 1:36 pm Basically, currently engine developers use different scales for their engines, changing their scale would trivially make them score more points on draws with everything else being the same, even Stockfish has a single value on its source code for this, so just doubling it would make it score twice the bonus on draws.
It wouldn't work with the idea above.

Can anybody point me to a good size set of tournament engine games with scores at each ply using current engines? Or matches?
Edit: LC0 Discord has a good set of LC0 games.
jp
Posts: 1470
Joined: Mon Apr 23, 2018 7:54 am

Re: Could we use partial scoring for draws to improve evaluation of engine vs engine matches and tournaments?

Post by jp »

mmt wrote: Wed Feb 05, 2020 10:21 am 3-1-0 soccer-like scoring system has been used in this tournament starting in 2008. It was also used in 2nd London Classic in 2010. I think it's a superior method.
In soccer and Bilbao, the scoring is designed to encourage going for wins. It's not designed to be better at determining the elos of the players.