EPD destruction tests

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: EPD destruction tests

Post by bob »

hgm wrote: Thu Feb 20, 2020 10:58 am
Dann Corbit wrote: Thu Feb 20, 2020 12:20 am I notice something else -- the rule does not mention a castle move which is also not reversible.
The 50-move rule is not about reversibility, but about progress. Castling doesn't bring you any closer to winning the game, other than in the sense that it sometimes can be a good move. But we do not reset the counter on good moves; that would be too subjective. Pushing a Pawn brings it closer to its promotion square, though. Or, more importantly, failing to push any Pawn for an extended time means you are NOT getting any closer to promotion.

That Pawn moves are irreversible is in a sense just a coincidence. Although there would be a real problem in reformulating the 50-move rule when Pawns could also move backwards. Some Chess variants (such as Chu Shogi) have pieces with a decisive promotion that are fully reversible. You would not want to reset the counter all the time when a player is just moving a Pawn back and forth between two squares. A solution would be to compare each position with the position 50 moves earlier, and count the total Pawn advance. If there has been no capture or promotion (which would reset the counter), all Pawns must still be there, and if on the average they would not have advanced any while the reversible counter is also above 50, you could declare draw. But this is rather hard to check. Not taking Pawn advance into account at all would be unsatisfactory, though: many games on the way of being won would then suddenly become draws.
Actually a really good point. I can't see any safe way of not hashing in castle status. And then I can't see an obvious (simple) way to solve missing the repetition. IE what about Kf2 ... Ke1. I will have to think about this for a minute. Maybe the answer is no castling in the repetition list. I'll be damned.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: EPD destruction tests

Post by bob »

OK. In Crafty, the castle status is always factored in when using the Hash Signature. Which means hash probes/stores, repetition tests, and even book moves. I believe this will do what I intended. I don't reset the 50 move counter, but for repetition checking that is purely an optimization since I only need to check positions since the last irreversible move, and in this case I could go back 50 moves unnecessarily. But no harm since the positions with and without castling still have different signatures, so there's no repetition until two positions with same exact hash signature, which does factor in castle and EP status (and side to move of course).
chrisw
Posts: 4315
Joined: Tue Apr 03, 2012 4:28 pm

Re: EPD destruction tests

Post by chrisw »

bob wrote: Fri Feb 21, 2020 1:12 am
hgm wrote: Thu Feb 20, 2020 10:58 am
Dann Corbit wrote: Thu Feb 20, 2020 12:20 am I notice something else -- the rule does not mention a castle move which is also not reversible.
The 50-move rule is not about reversibility, but about progress. Castling doesn't bring you any closer to winning the game, other than in the sense that it sometimes can be a good move. But we do not reset the counter on good moves; that would be too subjective. Pushing a Pawn brings it closer to its promotion square, though. Or, more importantly, failing to push any Pawn for an extended time means you are NOT getting any closer to promotion.

That Pawn moves are irreversible is in a sense just a coincidence. Although there would be a real problem in reformulating the 50-move rule when Pawns could also move backwards. Some Chess variants (such as Chu Shogi) have pieces with a decisive promotion that are fully reversible. You would not want to reset the counter all the time when a player is just moving a Pawn back and forth between two squares. A solution would be to compare each position with the position 50 moves earlier, and count the total Pawn advance. If there has been no capture or promotion (which would reset the counter), all Pawns must still be there, and if on the average they would not have advanced any while the reversible counter is also above 50, you could declare draw. But this is rather hard to check. Not taking Pawn advance into account at all would be unsatisfactory, though: many games on the way of being won would then suddenly become draws.
Actually a really good point. I can't see any safe way of not hashing in castle status. And then I can't see an obvious (simple) way to solve missing the repetition. IE what about Kf2 ... Ke1. I will have to think about this for a minute. Maybe the answer is no castling in the repetition list. I'll be damned.
Wrt draw detection, you can avoid hashing the four castling status bits if you keep two counters.
One is a normal 50 move counter, reset by pawn move or capture.
The other is the same, but also reset by any touch to the castling status bits. Eg the position after moving Ra1a2 and destroying an existing long castling is now irreversibly different, so you wouldn’t want to track back past it into history looking for repetition draws. Likewise the move of a castling-able king should reset the counter.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: EPD destruction tests

Post by bob »

I could not find a match prior to the castling move (or a move that loses castling rights.) At the point the right is lost, the hash signature is updated with an XOR for a rights-random-number, and nothing before that move could possibly match with something past it unless a real collision occurs...

I'm just thinking about this to be sure that I have done that which makes the most sense.

Another example, null-move. Many said to xor in a null-move constant so that repetitions and hash hits after a null can't match positions prior to the null. I decided that didn't make any sense. All null-move does is alter side to move, and since that is hashed as well, any match after the null move should be just as useful as any match prior to the null-move... Castling is obviously handled differently in my code...
chrisw
Posts: 4315
Joined: Tue Apr 03, 2012 4:28 pm

Re: EPD destruction tests

Post by chrisw »

bob wrote: Fri Feb 21, 2020 3:51 am I could not find a match prior to the castling move (or a move that loses castling rights.) At the point the right is lost, the hash signature is updated with an XOR for a rights-random-number, and nothing before that move could possibly match with something past it unless a real collision occurs...

I'm just thinking about this to be sure that I have done that which makes the most sense.

Another example, null-move. Many said to xor in a null-move constant so that repetitions and hash hits after a null can't match positions prior to the null. I decided that didn't make any sense. All null-move does is alter side to move, and since that is hashed as well, any match after the null move should be just as useful as any match prior to the null-move... Castling is obviously handled differently in my code...
Well, I am going to get round the problem (bug?) in loading epds with the half reversible move counter already set to 40 or something. I had been just overriding and setting to zero, on the grounds the epd had no hash history, and I use that counter to calculate how far into the hash history to be comparing for repetitions. Problem with zeroing it though, is that one loses draw by no progress state from the EPD.

So, solution is going to be:
Maintain the reversible count from the EPD to use for 50move rule.
Initialise another counter which gets reset same as usual but includes reset by touching castle status. And use this counter to know how much hash history to check for draws.

Disadvantage: maintain another counter.
Advantage: fewer history hash compares. get to keep the original EPD set counter.

It would be nice then to dump castling state from the hash completely, but I think that will lead to errors across the search tree. Probably not an option.
User avatar
hgm
Posts: 27790
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: EPD destruction tests

Post by hgm »

You can also hash the history. Then you only have to check one history item (or in case of a collison, which should be rare if the table is large enough, sometimes two). Then you don't need to update any counter, and effectively compare to the entire history. In Crazyhouse or Shogi you would have to do that anyway, as there are no irreversible moves there.

The extra counter also is a high risk for being a case of a cure that is worse than the disease. It only prevents searching a few extra history positions that could not possibly match (because the castling rights are part of the hash key). How often would that happen during a game? Once both sides castled, the counter would become a useless copy of the 50-move counter for the rest of the game, its updating and restoring a pure waste of time.
chrisw
Posts: 4315
Joined: Tue Apr 03, 2012 4:28 pm

Re: EPD destruction tests

Post by chrisw »

hgm wrote: Fri Feb 21, 2020 1:55 pm You can also hash the history. Then you only have to check one history item (or in case of a collison, which should be rare if the table is large enough, sometimes two). Then you don't need to update any counter, and effectively compare to the entire history. In Crazyhouse or Shogi you would have to do that anyway, as there are no irreversible moves there.

The extra counter also is a high risk for being a case of a cure that is worse than the disease. It only prevents searching a few extra history positions that could not possibly match (because the castling rights are part of the hash key). How often would that happen during a game? Once both sides castled, the counter would become a useless copy of the 50-move counter for the rest of the game, its updating and restoring a pure waste of time.
Already thought of that one. If EPDs are not checked, and the EPD 50 move count is very high, you'll be running back through a hash history of many many entries. That's fine, if you don't mind the cycles, but then the size of memory most programmers would set, maybe 256 or 350 or so hash entries total, including forwards into the search and back into the game history, to avoid running over the edge of data, needs not just to be [assumed max length of game] plus [assumed max search depth], but should be that rogue 80 or 100 entry or even the 400 in addition. It's better defensive programming (future proof and robust) to have a 50 move counter (resets on pawns/captures) accurate for draw detection, which can be any positive value, because it's not being used to create a list pointer; and a secondary reversible move counter(resets on pawns/captures/castling status touches) which is robust against EPD abnormally high movenums.

8/2p2kb1/np1p2p1/pN1Pp1Pp/2P1P2P/1P6/P4BK1/8 w - - 80 100
8/2p2kb1/np1p2p1/pN1Pp1Pp/2P1P2P/1P6/P4BK1/8 w - - 400 500

I don't go with "How often would that happen during a game" or "could not possibly match" or "only prevents searching a few extra history positions". If code is based on assumptions of what is, and potentially precarious and subject to being forgotten about, so some later change elsewhere apparently disconnected, causes a difficult to track bug. But hey, bugs are inevitable, they say, well, not if you program defensively they're not. Second counter is another example of something that uses microscopically small number of cycles compared to one move-unmove cycle and reduces risk and does correct number of compare loops as opposed to random number of extra compares. Good payoff, imo.
User avatar
hgm
Posts: 27790
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: EPD destruction tests

Post by hgm »

Well, always searching the history all the way back to the beginning of the game is not wrong (and in fact what you have to do in Crazyhouse to not be wrong). But because for some of the history comparison is pointless, it is an unnecessary performance drain. But so is keeping track of a duplicat counter. It might just be a few cycles per node, but you do it millions of time on every move of the game. An entire node doesn't actually take that many cycles: cycles go in ns, nodes in microseconds. That is still 0.1%. Searching back to the last reset of the 50-move counter is a smaller performance drain than searching always back to the beginning, and just as reliable, as moves from before the reset cannot be the repetition you are searching for. Nothing precarious there.

The risk that you find a false repetition because of a key collision is a tiny bit larger by searching some extra moves, as even the moves from before the loss of castling rights could match by collision. This might negatively impact Elo. None of that counts as a bug, and you cannot avoid to do one or the other anyway. It is just a matter which impacts Elo most: a bit extra confusion by some extra imagined repetitions plus the performance drain by sometimes searching past rights-destruction, or the performance drain by keeping an extra counter. Searching through extra history moves will take longer than incrementing a counter, but you won't do it the entire game and in every node, so on average it could very well be cheaper.

BTW, most engines are entirely developed based on assumptions. Namely that the statistics of whatever happens in the test games they are playing to tweek search and evaluation is representative for all possible chess games. Most engine developers would just test the patch that prevented the pointless part of the history search by means of an extra counter for Elo gain over allowing it, and assume that the algorithm that gave the highest Elo is indeed the best solution, and stick with that.

Engine code is typically based on the assumption that hash keys are a reliable method to determine if positions are equal. Under that assumption searching the extra part of history his harmless apart from the performance drain. If the assumption is false, your engine is broken to begin with, whether you search the extra part of history or not. If you somehow manage to prove that your hash keys are reliable, then indeed you have proven that the keys from before the destruction of castling rights cannot possibly match, and it is no longer an assumption.

Having a counter that resets on rights destruction relieves you from the necessity to work castling rights into the hash key for the purpose of repetition detection. But if you want the hash key to be also useful for TT access, you would still need to have the rights in there; there is usually no easy way to know if hash entries are from before or after the rights destruction.

Maximum search depth is usually not assumed but enforced. Maximum game length is more tricky, as you don't want to be too generous in table space for reasons of cache contention. But normally you would clear the history cache at game level on any irreversible move. And if the game gets longer than the cache allows (which is especially a problem in Shogi, as there are no irreversible moves, so that you always have to keep the entire game history), you can just double the cache size, and reload the game into it.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: EPD destruction tests

Post by bob »

That tweaked my interest. Later tonight (assuming no interruptions) I am going to compare two versions of Crafty. One with castle status for hash table probes, one without. I suspect all that will change will be node counts, but I am now curious as to what this actually saves in terms of accuracy.
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: EPD destruction tests

Post by lucasart »

chrisw wrote: Wed Feb 19, 2020 2:35 pm Some of these are fine, and some are broken.
What should engines do? What do engines do?
Disclaimer/admission: my C code fails because of not much integrity testing at all, but is on the todo list. While my test engine doesn't return in any sensible time in the case of 30 queens. Meanwhile, in Python my EPD checker is robust up to 2,000,000,000 epds thrown at it.

Code: Select all

rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1; startpos
rnbqkbn1/ppppppppr/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1; extra piece on rank
rnbqkbnr/ppppNNpp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1; white 18 pieces
rnbqkbnr/pppppppp/8/8/8/8/PPPPPnPP/RNBQKBNR w KQkq - 0 1; black 17 pieces
rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR K KQkq - 0 1; colour
rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 400 500; movenums
rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 10 5; movenums
4r1k1/p1qr1p2/2pb1Bp1/1p5p/3P1n1R/1B3P2/PP3PK1/2Q4R w - -; no movenums 
rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQAb - 0 1; castle status
    rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR    w   KQkq  -   0    1 ; extra white space
rnbqkbnr.pppppppp.8.8.8.8.PPPPPPPP.RNBQKBNR w KQkq - 0 1; slash chars
nrbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1; castle status
  rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1; leading spaces
rnbqkbnr/ppppippp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1; bad character
rnbqk0nr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1; bad numeric
rnbqkbnr/pppppppp/8/6/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1; square count too low
N7/8/2KQ2rp/6k1/4p3p/2p4P/4PP2/5N2 w - - 0 1; square count too high
rnkqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1; 3 kings
rnbq1bnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1; 1 king
nnnnknnn/nnnnnnnn/8/8/8/8/NNNNNNNN/NNNNKNNN w - - 0 1; 30 knights
rnbqkbnr/nnnnnnnn/8/8/8/8/NNNNNNNN/RNBQKBNR w KQkq - 0 1; 18 knights
rnbqkbnr/bbbbbbbb/8/8/8/8/BBBBBBBB/RNBQKBNR w KQkq - 0 1; 18 bishops
rnbqkbnr/rrrrrrrr/8/8/8/8/RRRRRRRR/RNBQKBNR w KQkq - 0 1; 18 rooks
rnbqkbnr/qqqqqqqq/8/8/8/8/QQQQQQQQ/RNBQKBNR w KQkq - 0 1; 18 queens
qqqqkqqq/qqqqqqqq/8/8/8/8/QQQQQQQQ/QQQQKQQQ w - - 0 1; 30 queens
8/PPPPPPPP/rnbqkbnr/8/8/RNBQKBNR/pppppppp/8 w - - 0 1; lots of promotions
blablabl/blablabl/blablabl/blablabl/blablabl/blablabl/blablabl/blablabl w - - 0 1; garbage
hello have a nice day w - - 0 1; garbage


1R6/1brk2p1/4p2p/p1P1Pp2/P7/6P1/1P4P1/2R3K1 w - - 0 1 bm b8b7
4r1k1/p1qr1p2/2pb1Bp1/1p5p/3P1n1R/1B3P2/PP3PK1/2Q4R w - - 0 1 bm c1f4
r1b2rk1/ppq1bppp/2p1pn2/8/2NP4/2N1P3/PP2BPPP/2RQK2R w K - 0 1 bm e1g1
r3r1k1/1bppq2p/1p1n1bp1/3P1p2/p4N1P/P1PB1P2/1P1Q1BP1/1K1R3R w - - 0 1 bm g2g4
r1bqrk2/pp1n1n1p/3p1p2/P1pP1P1Q/2PpP1NP/6R1/2PB4/4RBK1 w - - 0 1 bm h5f7
rn1q1rk1/p2p1pb1/bp3np1/2pP2Bp/7P/P1N2N2/1PQ1PPP1/R3KB1R w KQ - 0 1 bm g2g4
N7/8/2KQ2rp/6k1/3p3p/2p4P/4PP2/5N2 w - - 0 1 bm f2f4
r1bqk2r/ppn1bppp/2n5/2p1p3/8/2NP1NP1/PP1BPPBP/R2Q1RK1 b kq - 0 1 bm e8g8
8/2p2kb1/np1p2p1/pN1Pp1Pp/2P1P2P/1P6/P4BK1/8 w - - 0 1 bm b5c7
4r1rk/p3qpp1/1pnp1n1p/5P2/P1PPP3/4Q2P/2BB2R1/6RK w - - 0 1 bm g2g7
rn1q1rk1/pp2bppp/1n2p1b1/8/2pPP3/1BN1BP2/PP2N1PP/R2Q1RK1 w - - 0 1 bm b3c2
r4r1k/1p4qp/b1pNRbp1/2P5/3p1P1P/PQ4P1/3R2BK/8 w - - 0 1 bm d2d1
2b2r1k/4q2p/3p2pQ/2pBp3/8/6P1/1PP2P1P/R5K1 w - - 0 1 bm a1a7
r2q1r2/1b2bpkp/n3p1p1/2ppP1P1/p6R/1PN1BQR1/NPP2P1P/4K3 w - - 0 1 bm f3f6
r1b1k2r/ppppqppp/8/2bP4/3p4/6P1/PPQPPPBP/R1B2RK1 b kq - 0 1 bm e8g8
rnbqkb1r/p3pppp/1p6/2ppP3/3N4/2P5/PPP1QPPP/R1B1KB1R w KQkq - 0 1 bm e5e6
FEN string is input, and it's not the engine's reponsability to perform input validation (supposed to come from a UI, not typed by hand). With that said, we at least want to spot the problem easily. So in Demolito, I handle as follows:
- release mode (DNDEBUG defined): segfault, probably... I don't care.
- debug mode (DNDEBUG not defined): i expect an assert to fire immediately, ie. after "position fen ...", not millions of nodes into the search several moves later. Here, I do care, because i don't want to spend hours debugging, because of a stupid typo in a FEN, looking for a bug that doesn't exist...

I tested every single one of your FEN in Demolito (debug compile). All of them are now caught by an assert, except these, which are correct, and that Demolito will handle just fine:

Code: Select all

rnbqkbnr/ppppNNpp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1; white 18 pieces
rnbqkbnr/pppppppp/8/8/8/8/PPPPPnPP/RNBQKBNR w KQkq - 0 1; black 17 pieces
=> No problem. It's not a requirement that position can be reached from start pos (or any Chess960 / shuffle chess start pos).

position fen rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 10 5; movenums
=> No problem. Last token is useless for engines. It's not even parsed, only 50 move ply counter is relevant.

4r1k1/p1qr1p2/2pb1Bp1/1p5p/3P1n1R/1B3P2/PP3PK1/2Q4R w - -; no movenums 
=> No problem. Demolito uses 0 as default value if 50 move counter is missing.

    rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR    w   KQkq  -   0    1 ; extra white space
=> No problem. The UCI spec does indeed forbid this. But that is a rule for the GUI; aimed at amateur engines. Proper code uses a proper tokenizer: strtok_r() for Demolito, stringstream operator >> for C++ engines like Stockfish.

nrbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1; castle status
=> No problem. This is a valid Chess960 position, and Demolito will handle it fine as such.

nnnnknnn/nnnnnnnn/8/8/8/8/NNNNNNNN/NNNNKNNN w - - 0 1; 30 knights
rnbqkbnr/nnnnnnnn/8/8/8/8/NNNNNNNN/RNBQKBNR w KQkq - 0 1; 18 knights
rnbqkbnr/bbbbbbbb/8/8/8/8/BBBBBBBB/RNBQKBNR w KQkq - 0 1; 18 bishops
rnbqkbnr/rrrrrrrr/8/8/8/8/RRRRRRRR/RNBQKBNR w KQkq - 0 1; 18 rooks
rnbqkbnr/qqqqqqqq/8/8/8/8/QQQQQQQQ/RNBQKBNR w KQkq - 0 1; 18 queens
qqqqkqqq/qqqqqqqq/8/8/8/8/QQQQQQQQ/QQQQKQQQ w - - 0 1; 30 queens
=> No problem. Even with 30 queens, search works just fine w/o qsearch explosion.

8/PPPPPPPP/rnbqkbnr/8/8/RNBQKBNR/pppppppp/8 w - - 0 1; lots of promotions
=> No problem. It's not a requirement that position can be reached from start pos (or any Chess960 / shuffle chess start pos).
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.