A RandomGame() benchmark

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

A RandomGame() benchmark

Post by sje »

Symbolic's Position class has an instance method named RandomGame() which produces a game composed of randomly selected moves. The game is as long as it needs to be to encounter a mate or draw condition. After the game is produced, the moves are then played in reverse with various consistency checks made with each move retraction.

On a single thread on a 3.4 GHz Intel Core i7-2600, the program can generate about 6,400 random games per second. That's about 2.2 billion games per day if all four cores were used.

The RandomGame() method also determines which of the five different terminations occurred. This data is useful as a statistical check that the termination detectors are working, particularly the somewhat tricky repetition detector.
User avatar
Steve Maughan
Posts: 1315
Joined: Wed Mar 08, 2006 8:28 pm
Location: Florida, USA

Re: A RandomGame() benchmark

Post by Steve Maughan »

Out of interest, what is the average length of a random game?

Steve
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: A RandomGame() benchmark

Post by sje »

From memory, the average length is about 330 ply. There is a large variance.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

After a million games

Post by sje »

After a million games:

Code: Select all

0.153455 checkmate
0.193411 fiftymoves
0.566669 insufficient
0.025198 repetition
0.061267 stalemate

mean length: 334.153
lucasart
Posts: 3243
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: A RandomGame() benchmark

Post by lucasart »

sje wrote: This data is useful as a statistical check that the termination detectors are working, particularly the somewhat tricky repetition detector.
The 3-repetition FIDE rule is actually a nightmare to implement correctly. I suppose almost everyone (at least in engine perhaps not in GUI) does it by using just zobrist keys. In he vast majority of cases, that's fine. But the FIDE rule actually says that a game is drawo by 3-repetition if:
(i) the position has been repeted 3-time. position here means just piece placement + turn of play.
(ii) the legal moves for the turn to play are the same in the 3 occurrences. Unfortunately it's not as simple as enriching zobrist keys with castling rights and en passant square. The only 100% correct way to check is basically to generate all the legal en passant and castling moves in each of the potentially repeated positions and compare the move lists.

I think (ii) is a completely ridiculous and useless rule. But I don't make the rules, FIDE does.

I didn't implement that silly rule in DiscoCheck (for obvious performance reason). Just checking (i). And I'm gessing everyone does either (i) or (i) and a pseudo-(ii) by using zobrist en-passant square and castling rights.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

After ten million games

Post by sje »

Code: Select all

0.15315   checkmate 
0.193467  fiftymoves 
0.567002  insufficient 
0.0251866 repetition 
0.0611952 stalemate 

mean length: 334.358
User avatar
hgm
Posts: 28464
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: A RandomGame() benchmark

Post by hgm »

lucasart wrote:Unfortunately it's not as simple as enriching zobrist keys with castling rights and en passant square.
You refer to the possibility that e.p. capture will leave you in check?

I agree that there is no need in the engine to check that. Just as there is no need to wait for a third repeat, rather than score draw at the second, in search. Because the 'repeated' position will have fewer moves, so the possibility to break out of the repeat are even less than they were initially. A position with e.p. rights always must be the first of a repeat loop.

I don't expect checking this would have a measurable impact on performance. Generating all legal moves is needlessly expensive. It suffices to test in case of a repeat to test if the e.p. square was set in the initial position, if there is a Pawn that could move to it, and if there is, perform the capture and test if that puts you in check. The real cost is that if you reject the repeat for that, you have to search on one more ply. But of course that will almost never happen.

In Xiangqi it is horrendously more expensive to score repeats, as they typically are a win for one side on the other, depending on if you were constantly attacking unprotected pieces ('perpetual chasing'). For that, you have to test in every position of the repeat loop (both sides to move) whether the opponent has legal recaptures when you captured something that the previous move attacked. Yet the impact on speed is almost immeasurable. Repeats are quite rare in the tree.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Another million

Post by sje »

This time, from a single thread on a 3.1 GHz AMD Athlon II X4:

Code: Select all

0.152886   checkmate
0.193492   fiftymoves
0.567434   insufficient
0.025461   repetition
0.060727   stalemate

mean length: 334.392
limit: 1000000
usage: 191.004
frequency: 5235.49
period: 0.000191004
Note: the mates have the first and equal priority, then comes fiftymoves, insufficient, and repetition.

The fiftymoves test comes before repetition because of its faster speed of testing. I don't know what FIDE says about this and I don't really care too much.

Some thirty years ago there was a Chess Champion Mark V (and VI) computer which took great pains to correctly implement FIDE repetition draw recognition. It would even display the Laws of Chess paragraph number when making a claim. I wish the computer's developers had spent more time on bug removal, like the one where the program's king would occasionally disappear sometime after 150 moves.

http://www.schach-computer.info/wiki/in ... ion_Mark_V

I still have one of these, but I can't remember the last time I used it; maybe twenty years ago. I should try to see if it still works.
lucasart
Posts: 3243
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: A RandomGame() benchmark

Post by lucasart »

hgm wrote:
lucasart wrote:Unfortunately it's not as simple as enriching zobrist keys with castling rights and en passant square.
You refer to the possibility that e.p. capture will leave you in check?

I agree that there is no need in the engine to check that. Just as there is no need to wait for a third repeat, rather than score draw at the second, in search. Because the 'repeated' position will have fewer moves, so the possibility to break out of the repeat are even less than they were initially. A position with e.p. rights always must be the first of a repeat loop.

I don't expect checking this would have a measurable impact on performance. Generating all legal moves is needlessly expensive. It suffices to test in case of a repeat to test if the e.p. square was set in the initial position, if there is a Pawn that could move to it, and if there is, perform the capture and test if that puts you in check. The real cost is that if you reject the repeat for that, you have to search on one more ply. But of course that will almost never happen.
Another possibility is simply to have an en-passant square set and the ep capture can't be played because no pawn attacks it. For example in the start position after 1. e4, the ep square is e3, but it can't be captures by a black pawn, so it doesn't count from the perspective of the 3-repetition rule. That's why I chose the conservative approach: basic zobrist with only piece placement + turn of play (leave ep and castling out) for use in 3-rep detection.

For an engine, none of this matter at all, and fixing it is certainly not going to be measurable. In fact, the only measurable effect it will have is that is slows down the program significantly, hence makes it weaker.

There's also an annoying exception to the 50-repetition rule: when the half move counter hits 100 and it's mate, then the result is mate not draw. Again, I don't care in DiscoCheck and it certainly has no impact that can be measured.

The problem with these silly rules is for GUI developpers. The GUI is the ultimate referee in engine/engine matches, so it's important that it implements the rules 100% correctly.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: A RandomGame() benchmark

Post by michiguel »

lucasart wrote:
hgm wrote:
lucasart wrote:Unfortunately it's not as simple as enriching zobrist keys with castling rights and en passant square.
You refer to the possibility that e.p. capture will leave you in check?

I agree that there is no need in the engine to check that. Just as there is no need to wait for a third repeat, rather than score draw at the second, in search. Because the 'repeated' position will have fewer moves, so the possibility to break out of the repeat are even less than they were initially. A position with e.p. rights always must be the first of a repeat loop.

I don't expect checking this would have a measurable impact on performance. Generating all legal moves is needlessly expensive. It suffices to test in case of a repeat to test if the e.p. square was set in the initial position, if there is a Pawn that could move to it, and if there is, perform the capture and test if that puts you in check. The real cost is that if you reject the repeat for that, you have to search on one more ply. But of course that will almost never happen.
Another possibility is simply to have an en-passant square set and the ep capture can't be played because no pawn attacks it. For example in the start position after 1. e4, the ep square is e3, but it can't be captures by a black pawn, so it doesn't count from the perspective of the 3-repetition rule. That's why I chose the conservative approach: basic zobrist with only piece placement + turn of play (leave ep and castling out) for use in 3-rep detection.
I am not sure I understand why you leave castling rights out. It is easy to include them.

For ep, yes, at least, it should be taken into account whether there is potential pawn to capture the square. The positions after 1.e4 e5 or after 1.e3 e6 2. e4 e5 are the same, and not considering the same hurts a tiny bit the hashtable efficiency (transpositions are missed).

For an engine, none of this matter at all, and fixing it is certainly not going to be measurable. In fact, the only measurable effect it will have is that is slows down the program significantly, hence makes it weaker.

There's also an annoying exception to the 50-repetition rule: when the half move counter hits 100 and it's mate, then the result is mate not draw. Again, I don't care in DiscoCheck and it certainly has no impact that can be measured.
Technically, this is not an exception, it is a matter of what rule has higher priority. Always, mate ends the game in chess. Since this has no exceptions, it has higher priority. For instance, In OTB chess, if checkmate is on the board it will not even matter if you lost on time.

I suggest to fix it because it is much more common than you may imagine. The strength increase will be lost in the statistical errors, but it is really frustrating to see your engine getting mated at the 50th move.

The problem with these silly rules is for GUI developpers. The GUI is the ultimate referee in engine/engine matches, so it's important that it implements the rules 100% correctly.
For a chess player, all these rules make a lot of sense. They are not silly.

Miguel