Transposition Tables

hgm · Post by **hgm** » Tue Oct 02, 2007 3:33 pm

nczempin wrote:So perhaps even for uMax it would be "legal" (cost of 0 characters) to keep them out of the move generation, but allow it to be a legal move (without resigning).

After some thought, I decided that this would violate my rules.

The claim is that micro-Max is the smallest Chess _program_ (or that with best Elo per character, depending on which version we are talking about). Not just the smallest subroutine that comes up with moves.

So the character counts must include thuse of an interface that allows you to play chess with it, that can read, parse and execute user moves, and that somehow informs the user about the computer move. And preferably set up a position. It can be a minimal interface, but it has to be there, and the characters of it must be fully included in the count.

To solve the under-promotion problem in that context (by accepting it as an input move) takes 32 characters. It would than still be surprised by it, and never play one himself.

The fact that there also exists a Winboard-compatible version of uMax, is only a concession to user-friendliness. The WB versions of uMax have no official status, and their characters are not counted. The only guarantee is that, under certain time control settings (in particular fixed-time per move setting) it plays exactly the same moves the original stand-alone uMax would play, when playing on an otherwise unloaded computer. That ratings are in practice determined with other time controls(e.g. 40/40') and not at the type of time control the stand-alone uMax plays I don't really consider as a problem. As long as the opponents have to play the same time control, I do not expect the relative ratings to be dependent on such tiny details as playing 40/40' or 120/120'. So I assume the ratings as, e.g. CCRL determines them would equally apply to the stand-alone version of uMax.

So the 1433 characters of uMax 1.6 include the input line for under-promotions, and the Winboard version of it will not resign if one is played. OTOH, the 1961 characters of uMax 4.8 do not include an input line for under-promotions in its interface, so the WB version of it does resign on an underpromotion. And like I explained, it does it a bit too liberally, as even delaying the resign to give uMax the opportunity to capture it sometimes leads to resigns where the stand-alone uMax would have simply won the game (capturing the promoted piece after a check).

bob · Post by **bob** » Tue Oct 02, 2007 5:56 pm

nczempin wrote:
Uri Blass wrote: I think that I can score better by having move generator that does not consider bishop underpromotions.

If we think only about performance than the speed up by not implementing underpromotions to bishop is probably more important than one game out of million when underpromotion to bishop may help you to get a better result.

Uri
Not generating sub-promotions in the search doesn't preclude you from allowing them as a legal move. I consider legality checking of made moves as the domain of the interface (as in the code that impliments UCI or WB), not the engine itself.

So perhaps even for uMax it would be "legal" (cost of 0 characters) to keep them out of the move generation, but allow it to be a legal move (without resigning).

I will run the test later today and post the results. But I do not believe that generating under-promotions is going to have any significant effect on overall speed. I always try =Q first. In most positions that is met by the opponent capturing the queen the next ply (a futile promotion). =R, =B and =N result in instant transposition table hits, so there is no extra work to deal with, except for the case where some of those generate a check, and others do not.

I don't think you will find this will affect your game result at all. When our cluster gets back up, I will try a couple of 20K game matches to test this hypothesis, but I don't expect any difference. Except for cases where the under-promotion is the key to winning or avoiding losing..

bob · Post by **bob** » Wed Oct 03, 2007 3:22 am

hgm wrote:
bob wrote:Again, different goals. I want GM players to play my program. They won't if it drags draws out forever or plays every game to mate. This has been well-documented in the past in discussions with GM players when they talk about playing computers on chess servers. If you don't care about such games, fine. I do, however, in that I am trying to do something reasonably similar to what humans do when they play the game, including "having class and showing manners". It is something worth trying (hint).
All side-tracking. You brought the issue of resigning up, and if you knew your goal to be different from mine you also knew that this was irrelevant. If you don't think your engine needs to be capable to perform a decent checkmate, because the opponents are all GM and thus will have the class and manners to resign before the actual mate, you are badly mistaken. As soon as they even get a whiff of the fact that your engine might not be able to win KQK, they will play on in such lost positions laughing themselves silly about those stupid computers.
I am unaware of my engine making _any_ mistakes dealing with checkmate. It never announces a shorter mate than actually exists. If it has enough time it will always find the shortest mate, although in games, due to the way extensions work, it sometimes finds a longer mate than optimal since it isn't aware a shorter mate is possible, and when time runs out it stops and plays what it has...

So I would claim that my engine is quite capable of "performing a decent checkmate". I just consider it beyond rude to reach a KR vs K ending against a GM and make him play the mate out. I don't have any problems whatsoever in winning KR vs K or KQ vs KR, without any tablebases whatsoever. So I doubt they every use that as a reason to call my program "stupid" although there are plenty of non-tactical reasons to do so at times...

It isn't "silly". It works in a _very_ straightforward way, it makes looking at tree dumps much easier since bounds don't vary depending on the ply.. It is known to work correctly with hashing, something yours has not convinced me of yet. Etc. You have a _very_ strange definition of "silly". It would seem you mean "anything that works quite simply and provably correctly is silly". Because this is provably correct and it is definitely simple.
That is a very unsound inference. Just because one thing that works modarately simply and provably correct is silly, anything else sharing those properties should be silly as well?

I don't consider what has worked perfectly for at least the 30 years I have had hash tables as being "silly". You think it silly to muck with scores stored in the table (and very few mate scores or bounds get in there in 99% of real chess searches) and you think it OK to muck with the alpha/beta window _everywhere_ instead, which makes analyzing the tree dumps more complicated and confusing, and very possibly introduces a hashing issue dealing with incorrect bounds (which might hurt a lot or a little...)

Besides, I do not consider it straightforward at all, as you store different values in the hash table than what you return as search scores. This mucking with hash scores I don't like, and indeed leads to frequent errors for people first implementing hash tables. And what makes it truly silly, of course, it that it only works for mates. So that if the engine gets into a KQKR ending, it might be showing the same indecisive behavior of preferring to capture the Rook in 2 moves rather than in 1 move, because it can see that on the QxR the bare K can make a step towards the center, and thus wants to keep that move over the horizon by only playing QxR on the last ply of the search. Thus postponing the conversion forever or until the 50-move rule finally convinces it that a capture is needed. That is what I call silly.
Now we are in the twilight. Alpha/beta _is_ minimax, but with backward pruning included. No idea what on earth the above means.
It means what it says. Alpha-beta is a pruning technique that can be applied to a wide variety of search types, including, but not limited to minimax.

I'm not going down such a ridiculous road. Alpha/Beta _demands_ a minimax tree. To say otherwise is just not worthy of discussion. It is a technique that produces the _same_ result as minimax while searching far fewer nodes. It does not work with best-first. It doesn't work with 3 or more player games. It is a pure 2-player (hence min-max) paradigm. I don't know where you are going with that, and really don't want to know since it doesn't make any sense at all...

It is absolutely defined in the domain of minimax, which is depth-first.
More nonsense. Minimax is defined as an abstract prescription for propagating leave scores towards the root. It says _nothing_ about any temporal aspects of this. Depth-first is a prescription to walk a tree sequentially. I would not call retrograde analysis as used for EGTB building 'depth first', but it is undeniably minimax.

OK, I'm not going to argue if you won't at least go read up and understand what you are talking about. Minimax _is_ a depth-first search paradigm. To be precise, it is a methodlolgy to build a tree in depth-first order to reduce the storage complexity to O(d) rather than having to have space complexity of w^d as best-first approaches have). Alpha/Beta was _specifically_ derived from minimax, first explained by Newell Shaw and Simon, so I have absolutely no idea where you intend to go with this, but wherever it is, it is not a place based on facts...

Jeez, if we don't have a common and accepted vocabulary, no wonder it is so hard to communicate. I use the normal computer science definition of the above myself. One which you can find in _any_ AI book.
This is why I clearly and concisely define every every term I use, and mostly avoid using any term whatsoever, but give pseudo-code in stead. But alas, to little avail...

Sorry, but I am not interested in learning a new vocabulary. I have lived in the current CS/AI terminology for 40 years now. It has been good enough in the past that I don't see a need to go off and define everything differently making it impossible to communicate without having a special dictionary handy.

Again, a vocabulary problem. "stand pat" is the option you have in the quiescence search. But nowhere else. And this doesn't seem to be talking about the q-search at all unless you have switched the subject.
Well, I guess it is more a problem of you being incapable of understanding pseudo-code. Otherwise you would have noticed that the pseudo-code where you saw that allegedly unnecessary if-statement was for a Search() implementation that did both main search and QS, and thus needed a test for stand-pat cutoff...
Bob wrote:
hgm wrote:
Bob wrote: Clearly the second IF is not free.
the second IF actually saves a lot of unnecessary move generations in QS.
We are talking about the first (unnecessary) if...
Hmm, must be something wrong with my eyes, then...

The first move made is made at ply=1, the second move made is at ply=2. I have not seen anyone number them any differently than that...
Well, you have seen now. But you Americans also seem to count floors funny, so I can imagine what makes you do it that way.

I could not care less how we count, as long as it is noted (as was explicitly stated) that in all the examples I gave he root is ply=0, i.e. before making any plies. If you rather call a mate-in-3 if it happens at ply=6, then of course all the other branches would be cut at ply=4 in your terminology.

I don't want you to generate any opponent moves at ply=5. plies 1,3,5 are computer-on-move. But a mate in 3 delivers mate at ply=5, but I need to get to ply=6 to see that the opponent has no legal moves...

Are we again talking apples and oranges???
Like I said, just confusion through counting differently.

My final point. What I have in my transposition table has been working unchanged for 30+ years, first implemented in 1976 after talking with David Slate at the Houston ACM event that year. I have twiddled with the replacement strategy, but I still store scores and bounds the same way I did 30 years ago, because it works correctly and simply... Since I have a working hash algorithm, I'm not getting anything from this discussion. You have decided to do it your way, so you are not getting anything either. I don't see any point in continuing to stir the water here as a result...

bob · Post by **bob** » Wed Oct 03, 2007 3:24 am

Uri Blass wrote:1)Note that the good engines have no problem to force mate.
They sometime may spend a lot of time on mate in 1 but this is not something that change the result of the game.

2)preferring to capture the Rook in 2 moves rather than in 1 move is not a practical problem based on my experience even without delayed bonus.

If it happens it certainly does not happen often otherwise I could see it in games of movei that even does not store mates correctly and may report mate in 8 when the shortest mate is mate in 10 thanks to wrong handling of it in the hash but it does not prevent movei to win the game.

I plan to fix the problem of wrong mate scores but I do not find it as something that changes the result and if it changes the results it probably happens in less than 1 out of 100 games.

3)Movei has no problem of spending long time on mate in 1 because after having finishing an iteration with mate score it does not start another iteration.

You can claim that in theory it may lead to not finding mate because it always is going to play a move that leads to mate in 7 instead of mate in 6
until it has no mate without repetition that it evaluates as draw but practically it never happened in my experience.

Uri

I have a solution for that as implemented in Crafty. If I announce a mate in N, on the next search I will not stop until either (a) I find a mate in N-1, or (b) time runs out. (b) never happens so I never get into that mate in N, mate in N type of loop...

bob · Post by **bob** » Wed Oct 03, 2007 3:39 am

hgm wrote:
bob wrote:How can it be "defective" when it implements a rule of chess that is clearly defined? There are positions where a bishop under-promotion is the only move that wins or draws. They are rare, but they are not non-existent.
"Defective from a performance point of view" means that you would score better by not implementing the feature, and take your loss (e.g. by printing 'resign') in the situation where you would have needed the feature. Because the speedup of the search by not implementing the feature would bring you more Elo than the occasional loss when you needed it would have gained you.

For under-promotions this is not quite the case, but under conditions where not only the speed, but also the size is important, under-promotion is truly defective: these rules offer resigning as a fully legal alternative.

If the rules of Chess clearly define it or not, is immaterial.

So it is your belief that not doing underpromotions is actually going to let you perform _better_? I will test that hypothesis as soon as our cluster is back up... It is easy enough to test. I'll just comment out any under promotions at all, which means I will lose games on time if my opponent underpromotes, and that I will never underpromote in Crafty, period...

I don't believe it for a minute, because underpromotions do not cost much at all. Here's one example of a couple of positions. A tactical one, a complicated one, and an endgame. All three are given with all underpromotions allowed, then no underpromotions of any kind allowed. Let's see what the trees and time look like:

First a middlegame tactical position... first output is normal crafty, second is crafty with just =Q promotions.

log.001: time=24.26 mat=0 n=49646102 fh=93% nps=2.0M
log.002: time=24.23 mat=0 n=49639609 fh=93% nps=2.0M

Second a wild tactical position:

log.001: time=22.81 mat=-10 n=82909342 fh=98% nps=3.6M
log.002: time=23.04 mat=-10 n=82867471 fh=98% nps=3.6M

And finally an endgame where promotions are more common:

log.001: time=53.15 mat=0 n=174614773 fh=91% nps=3.3M
log.002: time=52.63 mat=0 n=174612577 fh=91% nps=3.3M

So best case saved about 1/2 second out of 50. Barely 1%. In a simple ending. In the other more normal positions, it saves zero.

Do you really believe that that tiny speedup is going to produce any measurable Elo gain? Do you also believe that not handling underpromotions is never going to cause you to lose a game or miss winning a game?

I think which approach is right is quite easy to recognize.

At least, IMHO..

hgm · Post by **hgm** » Wed Oct 03, 2007 2:48 pm

bob wrote:
hgm wrote:For under-promotions this is not quite the case, but under conditions where not only the speed, but also the size is important, under-promotion is truly defective: these rules offer resigning as a fully legal alternative.
So it is your belief that not doing underpromotions is actually going to let you perform _better_?

What exactly does 'not' mean in your 30+ year-old CS/AI dictionary?

bob wrote:So best case saved about 1/2 second out of 50. Barely 1%. In a simple ending. In the other more normal positions, it saves zero.

Do you really believe that that tiny speedup is going to produce any measurable Elo gain? Do you also believe that not handling underpromotions is never going to cause you to lose a game or miss winning a game?

This is indeed a truly remarkable statement, for someone that just argued that the presence of a single if statement, which he (mistakenly) thought would cause an overhead of 0.1% in a search routine, was such a ridiculous performance breaker that he could not possibly be expected to understand the pseudo-code.

bob · Post by **bob** » Wed Oct 03, 2007 4:40 pm

hgm wrote:
bob wrote:How can it be "defective" when it implements a rule of chess that is clearly defined? There are positions where a bishop under-promotion is the only move that wins or draws. They are rare, but they are not non-existent.
"Defective from a performance point of view" means that you would score better by not implementing the feature, and take your loss (e.g. by printing 'resign') in the situation where you would have needed the feature. Because the speedup of the search by not implementing the feature would bring you more Elo than the occasional loss when you needed it would have gained you.

For under-promotions this is not quite the case, but under conditions where not only the speed, but also the size is important, under-promotion is truly defective: these rules offer resigning as a fully legal alternative.

If the rules of Chess clearly define it or not, is immaterial.

Aha, so you choose to play the game you want to play, rather than the game that _everyone_ else plays?

bob · Post by **bob** » Wed Oct 03, 2007 4:43 pm

hgm wrote:This whole idea that a single conditional jump would 'wreck performance' is of course so silly that it it is a shameful display that Bob even dares to hide behind it, as an excuse for not reading the posts he reacts to. We are talking here about a single if per move generation. he move generation will obviously takes a thousand times longer. And that from a person who has repeatedly stated here that he does not care about the efficiency of his move genrator, as generating moves is only about 10% of the search time. And now something that might take 0.1% of that time (had it not been needed for QS anyway) would suddenly wreck performance to an extent that no one would use this?

I never have heard such a feeble and laughable excuse in my life. Not even when I was in Kindergarten.

I suppose you can point to a post where I said "a single conditional jump wrecks performance"????

A single conditional jump certainly affects performance. Although here I believe we are talking about far more than a single conditional jump...

bob · Post by **bob** » Wed Oct 03, 2007 4:59 pm

hgm wrote:
bob wrote:
hgm wrote:For under-promotions this is not quite the case, but under conditions where not only the speed, but also the size is important, under-promotion is truly defective: these rules offer resigning as a fully legal alternative.
So it is your belief that not doing underpromotions is actually going to let you perform _better_?

What exactly does 'not' mean in your 30+ year-old CS/AI dictionary?

Fortunately, I stick with "standard definitions" of the words I use... So it means _exactly_ what it usually means and I don't need to pause and give a definition since I am not changing it.

bob wrote:So best case saved about 1/2 second out of 50. Barely 1%. In a simple ending. In the other more normal positions, it saves zero.

Do you really believe that that tiny speedup is going to produce any measurable Elo gain? Do you also believe that not handling underpromotions is never going to cause you to lose a game or miss winning a game?
This is indeed a truly remarkable statement, for someone that just argued that the presence of a single if statement, which he (mistakenly) thought would cause an overhead of 0.1% in a search routine, was such a ridiculous performance breaker that he could not possibly be expected to understand the pseudo-code.

That's because you want to intentionally draw some tangential conclusion that was not intended. I'm going to make it as clear as possible:

(1) including under-promotions has _zero_ impact on your program's Elo, particularly since Elo is an integer value. I had to pick a simple endgame to see any effect in speed at all. And if you look at the times I posted and the nodes searched, the majority of that 1% time change apparently was unrelated to the underpromotion removal because the nodes searched did not change significantly. What it was, I don't care at the moment.

(2) Not including under-promotions is going to result in lost games here and there. I have seen them in 5-round ACM events, I have seen them in WCCC events. And even though the effect is small, it is not nearly as small as the tiny gain one gets by omitting them.

Therefore, including them will win a game here and there. Excluding them will not win a thing. Which, then, do you choose? for me the choice is obvious. And since I am really trying to write a program to play the game of chess as defined by the official rules, it is a no-brainer. I'd like to exclude en passant captures, they require extra handling that is far more invasive and costly than the code to handle under-promotions. I'd even like to exclude castling. More special case code to generate a two-piece move, not to mention dealing with castling status throughout the tree. But the rules of chess define the moves and I go by that...

For under-promotions, I don't worry about a single branch. Because there is no branch needed to exclude them. I just commented that code _out_ in my test. For hashing, you are doing more than a single jump. You are modifying the bounds at every point in the tree. Which is going to have a L1 hit (if lucky) or a L2 hit penalty at every last ply. On my 2ghz laptop, where I search about 2M nodes per second, that turns into about 1000 clocks / node. I don't want to add X clocks to that, even if X is somewhere in the 4-20 range, if I can avoid it altogether.

So can we get back to reality, and stop making outrageous statements about what I am claiming, and stick to _exactly_ what I am saying. I notice you seem to not like it when I provide _real_ data. I ran a longer test last night using a set of positions I often run just as a sanity check after making a significant change. Omitting under-promotions had no effect whatsoever in almost all of them. Only in simple endgames was there any difference and the difference was always well under the 1% in my example I posted here.

You can, if you want, actually _run_ these tests for yourself, and then you won't be guessing, postulating, pontificating, and exaggerating about what will happen. Then you will actually _know_. In the case of under-promotions, I am not guessing about what it costs. I actually _know_. Simply because I ran the test to see what it does...

Can we now move on since it is obvious that omitting them is a bad idea with no gain whatsoever, and a very real problem in some games?

bob · Post by **bob** » Wed Oct 03, 2007 5:30 pm

Vempele wrote:
Colin wrote:I planned to XOR in this value in every node, just after any pawn has moved 2 squares, regardless of wether it can actually be taken by an adjacent pawn.
Because its really just saying that in the position a pawn has just moved 2, in contrast to a position with all pieces in the same squares where the last move was not a pawn 2 forward, which will have an entirely different hash key.

Thanks for any help
You don't want identical positions to have different hash keys depending on the last move. It's not a transposition table if that happens...

It would still work, just much less efficiently. And it would solve one serious problem dealing with different paths to the same hash entry sometimes producing wrong values. It would add another serious effect in that hash hits would drop dramatically which would kill performance for very little gain in the other area.

Note that transposition means interchanging moves. that is
Nf3 Nc6 Nc3 Nf6 should be treated the same as Nc3 Nc6 Nf3 Nf6 because the positions are the same. But transposition tables catch another case that can cause some issues:

Bc1-d2, Nf6, Bd2-e3, Ng8, Be3-f4 and the simple Bc1-f4, since the two positions are identical. But the paths traversed are not the same. And it is possible that the missing path information might let you miss a repetition along the way.

Transposition Tables

Re: A way to avoid sub-promotions, yet play legal chess

Re: A way to avoid sub-promotions, yet play legal chess

Re: Transposition Tables

Re: Transposition Tables

Re: Transposition Tables

Re: Transposition Tables

Re: Transposition Tables

Re: Transposition Tables

Re: Transposition Tables

Re: Transposition Tables