A RandomGame() benchmark

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
hgm
Posts: 28457
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: A RandomGame() benchmark

Post by hgm »

lucasart wrote:Another possibility is simply to have an en-passant square set and the ep capture can't be played because no pawn attacks it. For example in the start position after 1. e4, the ep square is e3, but it can't be captures by a black pawn, so it doesn't count from the perspective of the 3-repetition rule.
Oh, but for performance sake you should always take that into account. I never set the e.p. square when no e.p. capture is actually possible (except in micro-Max, of course). Doing so would massively reduce your number of hash hits.
That's why I chose the conservative approach: basic zobrist with only piece placement + turn of play (leave ep and castling out) for use in 3-rep detection.

For an engine, none of this matter at all, and fixing it is certainly not going to be measurable. In fact, the only measurable effect it will have is that is slows down the program significantly, hence makes it weaker.
Fixing proper setting of the e.p. square usually causes a very significant speedup (time-to-depth-wise), in engines with hash tables.
There's also an annoying exception to the 50-repetition rule: when the half move counter hits 100 and it's mate, then the result is mate not draw. Again, I don't care in DiscoCheck and it certainly has no impact that can be measured.
I have seen engines lose in competition because of this bug. (In particular, Arasan in ChessWar, blundering away a dead draw by allowing itself to be mated on the 50th move.)

I am pretty sure that not paying attention to castling through check also would have an insignificantly impact on rating. The point is more that it looks so silly when the engine does it. Doing it once in a thousand games does far more damage to the reputation of your engine than increasing its rating by 50 Elo would make up for, although the 0.05% score loss because of it corresponds only to 0.35 Elo.

The point is that to cause a significant loss in strength at the 95% confidence level in a test of a million games, something needs to happen once every 500 games, assuming it would be so disastrous that it instantly blunders away half a point. Most of us don't test to the 0.1% level, however, and for those that can afford only 10,000 games per trial, only catastrophes that happen more frequently than once every 50 games stick out above the noise.

Now the question is: would you want your engine to do something absolutely moronic every 50 games? If the answer is 'no', then 'significance' of the Elo change involved with it is not really a relevant measure...
lucasart
Posts: 3243
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: A RandomGame() benchmark

Post by lucasart »

hgm wrote:
lucasart wrote:Another possibility is simply to have an en-passant square set and the ep capture can't be played because no pawn attacks it. For example in the start position after 1. e4, the ep square is e3, but it can't be captures by a black pawn, so it doesn't count from the perspective of the 3-repetition rule.
Oh, but for performance sake you should always take that into account. I never set the e.p. square when no e.p. capture is actually possible (except in micro-Max, of course). Doing so would massively reduce your number of hash hits.
Thank you for the tip! I'll experiment with that :D
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
tpetzke
Posts: 686
Joined: Thu Mar 03, 2011 4:57 pm
Location: Germany

Re: A RandomGame() benchmark

Post by tpetzke »

But the FIDE rule actually says that a game is drawo by 3-repetition if:
(i) the position has been repeted 3-time. position here means just piece placement + turn of play. ...
Actually No. The FIDE rules don't say that. The game is not automatically a draw. One player can just make a draw claim in such a situation.

So we should not mention FIDE rules here because the general implementation of engines and also GUIs doesn't follow the FIDE rules here anyway.

Thomas...
syzygy
Posts: 5872
Joined: Tue Feb 28, 2012 11:56 pm

Re: A RandomGame() benchmark

Post by syzygy »

lucasart wrote:Unfortunately it's not as simple as enriching zobrist keys with castling rights and en passant square. The only 100% correct way to check is basically to generate all the legal en passant and castling moves in each of the potentially repeated positions and compare the move lists.
Castling rights are not a problem at all. There is no need to check whether legal castling is possible in the current position. It is sufficient that the right has not been lost, and all non-broken engines already keep track of that.
I think (ii) is a completely ridiculous and useless rule. But I don't make the rules, FIDE does.
It would be much more annoying if the position after a double pawn push with no legal ep captures would be "different" for the purpose of the 3-time repetition rule, because that would mean either accepting a loss in hashing efficiency or complicating the check for 3-time repetition.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Yet another million

Post by sje »

This time, from a old iBook with a 700 MHz PowerPC G3 (32 bit, single core):

Code: Select all

0.153868   checkmate
0.194382   fiftymoves
0.565738   insufficient
0.025217   repetition
0.060795   stalemate

mean length: 334.31
limit: 1000000
usage: 2366.02
frequency: 422.65
period: 0.00236602
This Mac from the year 2002 can produce a billion random games a month. But would that be worth the cost of electricity?