Repetition detection structure.

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

gladius
Posts: 568
Joined: Tue Dec 12, 2006 10:10 am
Full name: Gary Linscott

Re: Repetition detection structure.

Post by gladius »

Uri Blass wrote:Maybe the 50 move rule could convince garbochess to push a pawn but I am unsure about it and even assuming garbo push the h pawn I am not sure if garbo could win it.
Garbochess 0.21 didn't implement the 50 move rule, so it wouldn't push the pawn, and would probably happily walk into the draw. It seems like the GUI resign threshold is set a bit too low in this case, as engines are notorious for thinking they have big endgame advantages, when they don't really exist :).
User avatar
Graham Banks
Posts: 41423
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

Re: Repetition detection structure.

Post by Graham Banks »

hgm wrote:Well, I just gave the proof above, didn't I?
You're correct. The game will be edited to a draw. Apologies for the error.

Regards, Graham.
gbanksnz at gmail.com
User avatar
hgm
Posts: 27790
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Repetition detection structure.

Post by hgm »

gladius wrote: It seems like the GUI resign threshold is set a bit too low in this case, as engines are notorious for thinking they have big endgame advantages, when they don't really exist :).
Well, this is basically my point, to which I did not get a clear answer so far. Why was the sero awarded? Was this because of scores the engines print? If so, I can easily fix it and earn a couple of dozen Elo points by having Joker always print a score of +100.

But I think CCRL would need to do some heavy soul searching if they really want to give engines that print a fixed +100 as score a higher rating on their list than engines that print, say, the material balance...
User avatar
Graham Banks
Posts: 41423
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

Re: Repetition detection structure.

Post by Graham Banks »

hgm wrote:
gladius wrote: It seems like the GUI resign threshold is set a bit too low in this case, as engines are notorious for thinking they have big endgame advantages, when they don't really exist :).
Well, this is basically my point, to which I did not get a clear answer so far. Why was the sero awarded? Was this because of scores the engines print? If so, I can easily fix it and earn a couple of dozen Elo points by having Joker always print a score of +100.

But I think CCRL would need to do some heavy soul searching if they really want to give engines that print a fixed +100 as score a higher rating on their list than engines that print, say, the material balance...
There is a good argument not to allow the GUI to interfere at all with resignations or draw acceptance and to allow games to play out to mate or a draw by stalemate, insufficient material or threefold repetition.
However, at longer time controls, that could lead to interminably long and boring games. Therefore testers have the choice to allow the GUI to use the etiquette (draw) and resignation options.
User adjudication is rare and only used when the result is a foregone conclusion. In the case you cited as a draw, an inexcusable error was made, thus emphasising that the utmost care that must be taken.
With regard to resign thresholds, I think there will always be some cases where resignation is premature because I've seen cases where one engine scores a position as +9 whereas the other engines sees it as a draw.
gbanksnz at gmail.com
gladius
Posts: 568
Joined: Tue Dec 12, 2006 10:10 am
Full name: Gary Linscott

Re: Repetition detection structure.

Post by gladius »

hgm wrote:
gladius wrote: It seems like the GUI resign threshold is set a bit too low in this case, as engines are notorious for thinking they have big endgame advantages, when they don't really exist :).
Well, this is basically my point, to which I did not get a clear answer so far. Why was the sero awarded? Was this because of scores the engines print? If so, I can easily fix it and earn a couple of dozen Elo points by having Joker always print a score of +100.

But I think CCRL would need to do some heavy soul searching if they really want to give engines that print a fixed +100 as score a higher rating on their list than engines that print, say, the material balance...
I've got to think it was adjudication by the GUI when it saw both engines "agreed" that the game was a loss (in this case, around 4 pawns). You are probably right that you could get an ELO boost by printing incorrect scores, but that's not a great solution.

Of course, the disadvantage of no adjudication is games can take much longer, as the engines will happily play on until mated. Setting the threshold to something like 8 pawns seems like a reasonable amount. This can still get it wrong of course, but the probablility of both engines being wrong for a few turns about an 8 pawn advantage is pretty low.
User avatar
hgm
Posts: 27790
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Repetition detection structure.

Post by hgm »

What you say is all true, but my basic objection is exactly against the stamenet you put in quotes, that a score of -4 is interpreted as an admission of the engine that the game is lost. In Joker it is no such thing, it just indicates that it has one less Rook than the opponent on the board. And I see no reason why it should be anything different, as no one clearly stated that by printing something there that the GUI does not find agreeable, you will forfeit the game.

If one wants to adjudicate games for practical reasons (which could be very valid), a game should never be adjudicated on the score alone. Games where you have a Queen and the opponent has no King safety can never be considered lost, not even at -18. So one should always these features into account, or at least not adjudicate a loss for a side that still has a Queen.

My policy is never to resign, and in the past two months micro-Max 4.8 has earned 1.5 point because of this policy: In CCT it drew Twisted Logic from a -79.96 score (= "checkmated in 3" in uMax), because of a 3-fold repetition, and against Simontachi in ChessWar F it won from a -79.98 score because Simontachi forfeited. So why should I ever resign?

Also scores are not supposed to take the time on the clock into account, while the remaining time could convey a decisive advantage to a side that is at -4 (e.g. if it is 4 Pawns behind in a Queens ending, but has 20 times more time left for the remaining 10 moves.)
Uri Blass
Posts: 10281
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Repetition detection structure.

Post by Uri Blass »

hgm wrote:What you say is all true, but my basic objection is exactly against the stamenet you put in quotes, that a score of -4 is interpreted as an admission of the engine that the game is lost. In Joker it is no such thing, it just indicates that it has one less Rook than the opponent on the board. And I see no reason why it should be anything different, as no one clearly stated that by printing something there that the GUI does not find agreeable, you will forfeit the game.

If one wants to adjudicate games for practical reasons (which could be very valid), a game should never be adjudicated on the score alone. Games where you have a Queen and the opponent has no King safety can never be considered lost, not even at -18. So one should always these features into account, or at least not adjudicate a loss for a side that still has a Queen.

My policy is never to resign, and in the past two months micro-Max 4.8 has earned 1.5 point because of this policy: In CCT it drew Twisted Logic from a -79.96 score (= "checkmated in 3" in uMax), because of a 3-fold repetition, and against Simontachi in ChessWar F it won from a -79.98 score because Simontachi forfeited. So why should I ever resign?

Also scores are not supposed to take the time on the clock into account, while the remaining time could convey a decisive advantage to a side that is at -4 (e.g. if it is 4 Pawns behind in a Queens ending, but has 20 times more time left for the remaining 10 moves.)
It is obvious that if you care only about rating that it is better never to resign but people have other reasons to resign.

One reason is that testers prefer engines that resign to save computer time and get almost the same result.
I know that Leo want people to resign and when programs do not resign he is using winboard to adjudicate games based on the score.

If you decide to print always +100 you may earn rating points but it is also possible that testers are going to decide simply not to test your engine in order to save computer time so it is not a good solution(What you practically can do is to divide your score by 1.5 and I guess that in that case you can be sure that testers are not going to stop testing your engine and you may earn some rating points).

I know that +4 does not mean winning but usually engines win positions with +4 pawns so adjudicating these positions save a lot of time and practically does not change the rating by more than 10 elo points assuming that we do not talk about weak or buggy engines(the statistical error is usually higher).

Uri
User avatar
hgm
Posts: 27790
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Repetition detection structure.

Post by hgm »

If testers want to put a requirement on the engine in order to test it without wasting too much time, they should simply require it to implement the resign feature. Not abuse the score reported by that engine in a sneaky way for purposes it was not meant for.

I do not buy the time-saving argument anyway: It won't take many moves to win from a score of +8. And if it is, the score of +8 is not real, such as with the perpetual examples here, and all the more reason not to adjudicate. The main problem wasting testing time is games that are dead draws, but where both engines count themselves rich or have large contempt setting. Such games drag on forever, especially if the engines are 50-move aware and push Pawns or sac material every 49 moves to keep the game going. So if you want to save testing time you should exclude engines that use the 50-move rule in their search! :lol: At least my engines have the courtesy to simply accept the draw, if they cannot make progress in 50 moves, rather than to maliciously prolong the game.

Requiring engines to resign because they can be checkmated in three (or one) doesn't make any sense at all. Engines should move instantly in such a situation. (And if they don't, that would be a good reason to exclude them!)

If testers think it is a waste of time to continue games with scores above a certain limit, they should simply reduce the time the engines get to finish the game. Not make silly adjudications. If the score is really +8, it should be winnable at a 1-min sudden-death time control; the inaccuracy that might creep in due to faster play should be negligible compared to +8. And if the +8 is really a draw by a perpetual, the engines should have no problem to play it out at 1 min, as it is all in their hash tables. Even if they don't recognize it as a draw, they usually go for the perpetual. E.g. the Joker-Amateur game, if Joker would not keep checking, the two white Queens would perform a near-instant checkmate on the black King, that should be well within the horizon (or at least a forced Queen trade would be within the horizon, which at this disadvantage would have a very negative score.)
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Repetition detection structure.

Post by bob »

hgm wrote:What you say is all true, but my basic objection is exactly against the stamenet you put in quotes, that a score of -4 is interpreted as an admission of the engine that the game is lost. In Joker it is no such thing, it just indicates that it has one less Rook than the opponent on the board. And I see no reason why it should be anything different, as no one clearly stated that by printing something there that the GUI does not find agreeable, you will forfeit the game.

If one wants to adjudicate games for practical reasons (which could be very valid), a game should never be adjudicated on the score alone. Games where you have a Queen and the opponent has no King safety can never be considered lost, not even at -18. So one should always these features into account, or at least not adjudicate a loss for a side that still has a Queen.

My policy is never to resign, and in the past two months micro-Max 4.8 has earned 1.5 point because of this policy: In CCT it drew Twisted Logic from a -79.96 score (= "checkmated in 3" in uMax), because of a 3-fold repetition, and against Simontachi in ChessWar F it won from a -79.98 score because Simontachi forfeited. So why should I ever resign?

Also scores are not supposed to take the time on the clock into account, while the remaining time could convey a decisive advantage to a side that is at -4 (e.g. if it is 4 Pawns behind in a Queens ending, but has 20 times more time left for the remaining 10 moves.)
I think this is all pretty interesting in light of the discussion we had about winboard_F and its apparently policy of terminating a game when the rules of chess do not allow it. In the case we were discussing, false draw claims, and possibly illegal moves were grounds for terminating the game on the spot. Even though a program _could_ recover, either by playing on if the draw is disallowed, or play a different move it ifs first move was somehow illegal.

My take has always been that the GUI enforces rules, and only rules. Claiming a game is lost just because both programs say -5 is ridiculous, because I have seen _many_ games where Crafty was at -9.0 and still drew, and some where it was at +9.0 and also drew. So the scores are really "private data" that is displayed for the benefit of observers, but should not be used in adjudicating games.
User avatar
Graham Banks
Posts: 41423
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

Re: Repetition detection structure.

Post by Graham Banks »

hgm wrote:If testers want to put a requirement on the engine in order to test it without wasting too much time, they should simply require it to implement the resign feature. Not abuse the score reported by that engine in a sneaky way for purposes it was not meant for.
For what it's worth, Arena adjudicated that game, not me. In fact I didn't even run the tournament, now that I look further.
There is nothing sneaky about our testing.
gbanksnz at gmail.com