Repetition detection structure.

Uri · Post by **Uri** » Sat Mar 01, 2008 6:53 pm

Repetition detection is very dangerous. The program might think it found a draw but in the next move it will find itself in a losing position because of zugzwang.

For example in this position Toga CMLX 1.4 beta5d thinks it's draw but the next move Kc6 wins for white because black is in zugzwang.

[D]8/2K1k3/7p/4p1p1/4P1P1/5P1P/8/8 w - - 0 57

Programs need zugzwang detection instead repetition detection.

Uri Blass · Post by **Uri Blass** » Sat Mar 01, 2008 7:22 pm

Uri wrote:Repetition detection is very dangerous. The program might think it found a draw but in the next move it will find itself in a losing position because of zugzwang.

For example in this position Toga CMLX 1.4 beta5d thinks it's draw but the next move Kc6 wins for white because black is in zugzwang.

[D]8/2K1k3/7p/4p1p1/4P1P1/5P1P/8/8 w - - 0 57

Programs need zugzwang detection instead repetition detection.

I believe that most programs do not use null move pruning in pawn endgames so this is not a problem.

I do not know toga CMLX and it may be buggy program.
All the toga versions that I have see Kc6 with a winning score.

I thought that maybe the problem is the game when Kc6 leads to repetition but Kb6 is also winning so I guess that Toga CMLX is buggy and it is the only top program that cannot find winning score for white.

Programs need zugzwang detection but I do not agree that they need it instead of repetition detection.

repetition detection clearly help program not to search irrelevant lines and usually in games searching lines that lead to repetition is a waste of time.

I believe that it may be better for them to use some different strategy than the strategy that they use today by not evaluating always first repetition as 0.00(evaluating first repetition as 0.00 is used because it is simple and changing it is not going to improve much the playing strength) but ignoring the repetition seems stupid.

I think that when a program detects first repetition then it is logical at least in part of the cases to stop to search because the line is not interesting to search.
The program can have better rules when to stop to search and when to search repetitions but not to detect the repetition seems to be a stupid idea.

Uri

bob · Post by **bob** » Sat Mar 01, 2008 10:44 pm

Uri wrote:Repetition detection is very dangerous. The program might think it found a draw but in the next move it will find itself in a losing position because of zugzwang.

For example in this position Toga CMLX 1.4 beta5d thinks it's draw but the next move Kc6 wins for white because black is in zugzwang.

[D]8/2K1k3/7p/4p1p1/4P1P1/5P1P/8/8 w - - 0 57

Programs need zugzwang detection instead repetition detection.

There's something wrong with the program then, as a problem had to have happened in an earlier move. It is pretty easy to see that if you can force a repetition one time, you can force it again to force the draw. That's why we get away with counting two-fold repetitions draws inside the search, but not in the real game.

The only time I have seen this as a problem is due to luck. You check your opponent and he can go either of two directions. One draws, one wins. But he doesn't know that, so he goes the draw way first, comes back to the initial position for the first repetition, and then goes the other way to win. But that is the only exception I have seen, and I ignore this completely... and have not seen it cause a problem.

hgm · Post by **hgm** » Sat Mar 01, 2008 11:42 pm

bob wrote:Let me rephrase my comment. "What exactly are you talking about?"

Well, read back and you will know...

In short: you are claiming that 1/x is always smaller than 10/y because 1 < 10, no matter what x and y are ("regardless of investment"). That says much for your abilities in arithmetic.

As for the relative speed: count instructions, and you will know. Or is counting too difficult too?

I think I explained everything crystal clear, and if it is still beyond you, you will simply have to live with that...

hgm · Post by **hgm** » Sun Mar 02, 2008 12:06 am

Uri Blass wrote:I decided now to look at Joker's games and here is one game that Joker lost

Thanks for looking at the games, Uri. I am currently so busy I haven't had time to do that yet. The Cheese game seems no 50-move case to me. It is Joker there that finally decides to make the capture, and Joker already had a zero score when it committed to that. So recognizing 50-move draws would not have made it act differently.

The GarboChess game is a complete mystery to me: It is suddenly adhudicated a win while the score is already +4 for ages, and Garbochess obviously does not know how to win this. There is no reversible move here. If the 50-move rule plays a role it can only be because the adjudicator is not aware of it!

Game 1 is a mystery to me. You can see that most of the time Joker has a zero score, (most likely because it sees a forced rep-draw), and then it plays a ridiculous move with zero score, and on the next move the zero score suddenly disappears. I can only conclude that there must be a severe search bug in Joker. It also happens when it allows the promotion.

Only the case of Buzz is left as a clear example where a program benefited from recognizing the 50-move rule, rather than from adjudication errors or search errors. And it is quite possible that Buzz does need to rely on the 50-move rule, as it is notorious for its stoic evaluation (almost nothing but material).

But all the cases seem the reverse of what is relevant: the question is if Joker would score any better if it scored 50-move draws in its evaluation. If the opponent does or doesn't do it I have no control over...

bob · Post by **bob** » Sun Mar 02, 2008 2:28 am

hgm wrote:
bob wrote:Let me rephrase my comment. "What exactly are you talking about?"
Well, read back and you will know...

In short: you are claiming that 1/x is always smaller than 10/y because 1 < 10, no matter what x and y are ("regardless of investment"). That says much for your abilities in arithmetic.

As for the relative speed: count instructions, and you will know. Or is counting too difficult too?

I think I explained everything crystal clear, and if it is still beyond you, you will simply have to live with that...

I didn't say any such thing. To go back to the _beginning_ I said this:

PostPost subject: Re: Repetition detection structure. Posted: Sat Feb 23, 2008 3:12 pm Reply to topic Reply with quote Edit/Delete this post Report Post
hgm wrote:
Joker uses a small hash table (256 entries) for repetition detection. Collisions are resolved by a simple linear search (wrapping around) starting from the primary entry. But typically you find the entry on the first or second try, as the table is always at least half empty (and typycally 90% empty).

I store a code in the table identifying the number and type of repeats (how many in the game history, how many in the tree), so that the search can take the appropriate decisions. And I store the level at which the position was first encountered in the tree. (And of course the hash key.)

On an irreversible move at game level I clear the entire table.

As the table is so small that it always resides in the L1 cache, this must be faster than the linear-array method.

here is what you have to beat:

time seconds seconds calls s/call s/call name
0.51 205.48 1.07 113029036 0.00 0.00 RepetitionCheck

I played a few pretty quick games, and picked the first one that ended by the 50 move rule, which is likely the worst-case for repetition list scanning.

RepetitionCheck() accounted for 0.51% of the total cpu time used (1/2 percent) which is pretty insignificant...

here is everything from most expensive down to RepetitionCheck():
Code:

% cumulative self self total
time seconds seconds calls s/call s/call name
16.93 35.40 35.40 499704723 0.00 0.00 Evaluate
8.60 53.38 17.98 563432081 0.00 0.00 MakeMove
8.24 70.61 17.24 802244362 0.00 0.00 EvaluatePassedPawns
8.12 87.59 16.98 115810 0.00 0.00 Search
7.62 103.52 15.93 1006239194 0.00 0.00 Attacked
6.11 116.30 12.79 371224516 0.00 0.00 Quiesce
6.10 129.06 12.76 339036879 0.00 0.00 Swap
4.73 138.96 9.90 116047645 0.00 0.00 GenerateCaptures
4.57 148.52 9.56 563432081 0.00 0.00 UnmakeMove
3.75 156.35 7.84 112735079 0.00 0.00 HashProbe
2.74 162.09 5.74 413850292 0.00 0.00 NextMove
2.73 167.80 5.72 49404108 0.00 0.00 EvaluateMobility
2.66 173.36 5.56 370399537 0.00 0.00 AttacksTo
2.48 178.54 5.18 421418977 0.00 0.00 SearchControl
2.25 183.24 4.70 31303371 0.00 0.00 GenerateCheckEvasions
2.13 187.69 4.45 125674880 0.00 0.00 EvaluateRooks
1.76 191.36 3.67 6584992 0.00 0.00 EvaluatePawns
1.51 194.52 3.17 12631503 0.00 0.00 GenerateNonCaptures
0.97 196.55 2.03 125674880 0.00 0.00 EvaluateKnights
0.88 198.39 1.84 36017477 0.00 0.00 NextEvasion
0.86 200.20 1.81 125674880 0.00 0.00 EvaluateKings
0.83 201.93 1.73 96027327 0.00 0.00 HashStore
0.61 203.21 1.28 115866513 0.00 0.00 PinnedOnKing
0.58 204.42 1.21 375 0.00 0.00 InitializeHashTables
0.51 205.48 1.07 113029036 0.00 0.00 RepetitionCheck

I then said if you want to optimize, you _always_ start at the top of the profile list. Maybe you don't make any changes to that module, that depends on a lot of things. But you do _start_ there. And work your way down the list. How can you know how much effort is involved in speeding up A or B if you just start at B and don't look at A? Who would care about improving 0.51% when there are 20 modules that are much bigger than that?

So again, you _start_ at the _top_. Don't give me any of this crap about effort and return on effort, that's all well and good, but it is _senseless_ to start at the bottom, particularly if you looked at what the original poster asked:

Hi,

I examined Crafty and Fruit code and found that they both use a simple array (accessed like a stack) to store the hash keys for the repetition detection code.

I understand that most of the time only a couple of positions needs to be verified (up to the last irreversible move), but in the case where we approach the 50 moves limit, we will need to scan most of the array at every nodes.

Is it possible to use something else, like a small hash table or event a binary tree?

I guess that what I'm asking is. Is there other ways to look for repetition detection or does everyone use a simple array?

MP

So is there something faster to use? the answer appears to be "no" based on the results I provided both above and in later testing where it turns out the average list length to search is one.

You then decided that my approach had extra overhead because I was handling another chess rule in the same code, the 50-move rule. And since you don't deal with that rule, that makes my dealing with it extra work. And the discussion went farther and farther away from the original topic, all at _your_ doing.

So do read back, and look at who said what, and when. I've never said effort is not important. But I absolutely did say that you start at the top and work your way down. You do not start at the bottom and work your way up. If you believe that is a good idea, go for it. 99%+ of the programming world does not agree.

As far as the counting instructions goes, you do know I do that all the time? And you _might_ have noticed that I generally provide _real_ data when I answer such questions. Not guesswork and such. I gave specific profile info. I then gave specific performance data for the repetition list. You have _never_ given any details about how you handle duplicates/collisions, about the questions I asked about the parallel search implementation issues that you simply wave your hands over and say "it will work".

So please, grow up, and think before you write such nonsense. I _always_ do. Sometimes you even learn something by doing so, as I was aware of the almost zero cost of repetition checks in Crafty, but I was not aware that the average list search length was 1 until I ran the test to produce some data. If you think your approach is significantly faster than a loop that searches one entry on average, more power to you. Doesn't compute for me however... You often remind me of another poster here that starts a discussion, digs himself into a hole, then changes the topic mid-thread, only to dig into another hole...

Your comment would have been appropriate had your approach been significantly faster. Done properly using the 50 move rule counter, it does not appear to be any faster (or slower) at all, until you consider the parallel search issues which are much more complicated with hash repetition detection than with a simple list.

hristo · Post by **hristo** » Sun Mar 02, 2008 7:16 am

hgm wrote:
bob wrote:Let me rephrase my comment. "What exactly are you talking about?"
Well, read back and you will know...

In short: you are claiming that 1/x is always smaller than 10/y because 1 < 10, no matter what x and y are ("regardless of investment"). That says much for your abilities in arithmetic.

H.G.,
this was not Bob's argument, even though you could infer the above, it was certainly not the central part of his argument.

(I must admit that sometimes you guys argue about strange things -- even though both of you are certifiably smart.)

For instance you said: "As the table is so small that it always resides in the L1 cache, this must be faster than the linear-array method." which presumes that the linear array doesn't fit in the cache or that in the process of searching, one would have to hit L2 or main memory. The evidence presented, by Bob, shows that L1 would be sufficient in his case -- searching only a single entry, on average, should fit in L1 just fine.

Both of you started this discussion in a rather cordial tone and then you both zeroed-in on a particular aspect of what the other side was saying that could be argued against.

For the purpose of chess programming, however, I believe that your approach (if it is a bit faster, on average, compared to the list method) might yield better chess-engine results for single CPU implementations. Based on pure speculation, having 0.1% extra time for eval and search (etc.) would be more beneficial than handling the fringe cases of the 50 move rule.
Besides, your implementation seems rather cute, just like you.

In the case of SMP, it is not immediately obvious that your approach would be better. I'm sorry, but in this case I dare not speculate. I have spent way too much time debugging threads -- so, "show me the money/profiler output"!

hgm wrote: As for the relative speed: count instructions, and you will know. Or is counting too difficult too?

Even if your code is smaller and executes a bit faster, this is not sufficient to estimate its 'cost' in execution time. For instance if your code must be called more often (than the simple list search) and from different portions of the evaluation (or a function, program execution) it is possible that _it_ [your code] will cause extra cache misses ... so, just counting instructions is (quite often) not enough.

(BTW, why the vitriolic remark?)

Bob,
I agree that optimization must be considered from the top ("look into the highest time-consuming functions and algorithms, first"). However, it is perfectly reasonable to consider optimizations at other levels (including the bottom) so long as such optimizations can be regression tested and are opaque, with respect to the rest of the application. Always optimizing at the top can, unnecessarily, restricts one from gathering a low-hanging fruit elsewhere.

Regards to both of you,
Hristo

hgm wrote: I think I explained everything crystal clear, and if it is still beyond you, you will simply have to live with that...

hgm · Post by **hgm** » Sun Mar 02, 2008 11:20 am

hristo wrote:For instance you said: "As the table is so small that it always resides in the L1 cache, this must be faster than the linear-array method." which presumes that the linear array doesn't fit in the cache or that in the process of searching, one would have to hit L2 or main memory. The evidence presented, by Bob, shows that L1 would be sufficient in his case -- searching only a single entry, on average, should fit in L1 just fine.

It was never my intension to imply this. Obviously, a chess game will not contain more than a couple of hundred moves, so even keeping the entire game in a linear array (rather than clearing it on every irreversible move) (at 4 or 8 bytes per move, or so) will obviously not overflow the L1 cache. I only added this remark because for a hash table, this is not obvious, as a hash table typically does contain a lot of empty space. So I felt it necessary to point out that in this particular implementation of a hash table this problem doesn't occur.

For the purpose of chess programming, however, I believe that your approach (if it is a bit faster, on average, compared to the list method) might yield better chess-engine results for single CPU implementations. Based on pure speculation, having 0.1% extra time for eval and search (etc.) would be more beneficial than handling the fringe cases of the 50 move rule.
Besides, your implementation seems rather cute, just like you.

Well, I just pointed out why I thought it was worthwile paying attention to the rep-draw detection method (very small gain, but for even smaller effort). Of course the effort was really zero, as I did not optimise this after first doing this in a sub-optimal way, but let the linear search start from the hash index when I first wrote it down, as a matter of habit. In practice the implementation is more complicated than the idealized case I describe here (constructed only for 1:1 comparison with the linear-array method), as Joker does not use the first-rep = draw method, and wants to know how many repetitions there were, and of which kind (game or tree). So in the linear search method I would have to count number of occurrences for every position in there. In the hash table, I only have to store a counter with it which is incremented or decremented every time the entry is encountered, and a zero value for the counter means the entry is empty. Under those conditions the linear-search method isn't competative at all. But even in the simplified case, hitting a zero stored key on the first access in the common case terminates the search after a single compare, while the common case for the linear search is having to compare the key of one entry, and then do a compare against a counter to see if there are more entries. So I still think it is a good example for a practically free way to speed up a small task by a factor two.

In the case of SMP, it is not immediately obvious that your approach would be better. I'm sorry, but in this case I dare not speculate. I have spent way too much time debugging threads -- so, "show me the money/profiler output"!

Well, it will obviously depend on the SMP implementation. The way I plan to implement SMP (which will of course be much more efficient than the way Bob does it

), it wouldn't matter at all. The structure will never have to be passed from one thread to another. In fact nothing would ever be passed from one thread to another, except through hiding messages in the hash table. (Which should not cause any communication overhead, as hash probes are supposed to be cache misses in the first place. If the data is currently owned by another core, the bus-snooping unit will substitute it for the memory data, providing a shorter access time rather than a longer one.)

Even if your code is smaller and executes a bit faster, this is not sufficient to estimate its 'cost' in execution time. For instance if your code must be called more often (than the simple list search) and from different portions of the evaluation (or a function, program execution) it is possible that _it_ [your code] will cause extra cache misses ... so, just counting instructions is (quite often) not enough.

(BTW, why the vitriolic remark?)

True, but that is not the case here. And extra cache misses might not show up in the profile either. The best way to guard against it is to keep your working set small, and lay out your core data to minimize collisions. The vitriolic remark is because I get increasingly annoyed by being requested to explain the same thing for the umptieth time, just because others pretend they did not read what I wrote before, while I have actually tons of work to do that would actually lead to the production of something useful. Did you already see my Gothoglot adapter that allows WinBoard to play on the Gothic-Chess server?

hristo · Post by **hristo** » Sun Mar 02, 2008 2:38 pm

hgm wrote:Did you already see my Gothoglot adapter that allows WinBoard to play on the Gothic-Chess server?

Yes I did. Impressive work!
Watched Joker trash TSCPG.

Regards,
Hristo

wgarvin · Post by **wgarvin** » Sun Mar 02, 2008 4:46 pm

hgm wrote:
bob wrote:Let me rephrase my comment. "What exactly are you talking about?"
Well, read back and you will know...

In short: you are claiming that 1/x is always smaller than 10/y because 1 < 10, no matter what x and y are ("regardless of investment"). That says much for your abilities in arithmetic.

As for the relative speed: count instructions, and you will know. Or is counting too difficult too?

I think I explained everything crystal clear, and if it is still beyond you, you will simply have to live with that...

An honest question for hgm.

I've seen a lot of posts from you written in this adversarial style, and I'm just wondering, why do you write like this? If the goal is to discuss things productively and share ideas and learn from each other, what is gained by the adversarial tone? It seems like most of your posts (or your posts to Bob at least) contain statements that have no purpose except to be inflammatory.

I enjoy reading all the posts here, a lot of them contain neat ideas or useful knowledge about chess and chess programming. But these recurring flamewars between you and bob are perplexing.

If the idea you're promoting makes sense on its own, it shouldn't need to be paired with a personal attack to try and discredit arguments from the other side. It seems like it has no effect except to get both you and bob riled up and cause you both to waste your valuable time in some sort of Internet pissing match.