SEE on non-capture moves in main search

hgm · Post by **hgm** » Mon Apr 02, 2007 2:31 pm

bob wrote:That would stretch the clock cycle time for _everything" since the clock cycle time has to be long enough for the most complex circuit in the thing to settle. MVV/LVA requires two minor cycles in a hardware implementation. First cycle is find the most valuable attacked piece (victim), second cycle is to find the least valuable piece defending that piece. If you add another "thing" then you increase the basic clock cycle time by 50% which is murderous for speed.

I suppose that in the underlined place in the qoute you really wanted to say 'attacking'. OK, I think I am starting to see your point: you get to know the attacker only afterwards, and have to select the victim without the benefit of knowing it, as you don't want to have 64 copies of the LVA-finding circuitry acting on each square in parallel.

bob wrote: I have taken current Crafty, and for non-losing captures (as proven by SEE) I then re-order them using MVV/LVA. I do this everywhere I would normally sort captures by SEE. I did not try to make it as efficient as possible since I am not (currently) worried about the time cost. I am only looking at tree sizes for identical fixed-depth searches, one using my normal SEE ordering, one using the MVV/LVA ordering for non-losing captures...

OK. As long as you realize that this theoretically still is sub-optimal.

bob · Post by **bob** » Mon Apr 02, 2007 5:56 pm

hgm wrote:
bob wrote:That would stretch the clock cycle time for _everything" since the clock cycle time has to be long enough for the most complex circuit in the thing to settle. MVV/LVA requires two minor cycles in a hardware implementation. First cycle is find the most valuable attacked piece (victim), second cycle is to find the least valuable piece defending that piece. If you add another "thing" then you increase the basic clock cycle time by 50% which is murderous for speed.
I suppose that in the underlined place in the qoute you really wanted to say 'attacking'. OK, I think I am starting to see your point: you get to know the attacker only afterwards, and have to select the victim without the benefit of knowing it, as you don't want to have 64 copies of the LVA-finding circuitry acting on each square in parallel.

You can't do it all in parallel. You have to use some sort of "tree" to reduce it down to a single answer. Which is a tree of depth 6 for 64 squares... Each square could find its own answer easily, but then they have to "get together" and figure out which is the "best answer" and that adds sequential logic and stretches the clock cycle time. That's where MVV/LVA came from. Two cycles. In Belle Ken first found the most valuable piece under attack, then he used almost the same hardware the next cycle to find the least valuable piece attacked from that square (which would be the least valuable piece attacking that square)...

bob wrote: I have taken current Crafty, and for non-losing captures (as proven by SEE) I then re-order them using MVV/LVA. I do this everywhere I would normally sort captures by SEE. I did not try to make it as efficient as possible since I am not (currently) worried about the time cost. I am only looking at tree sizes for identical fixed-depth searches, one using my normal SEE ordering, one using the MVV/LVA ordering for non-losing captures...
OK. As long as you realize that this theoretically still is sub-optimal.

I think it is sub-optimal no matter what.

bob · Post by **bob** » Mon Apr 02, 2007 5:57 pm

Tord Romstad wrote:
bob wrote:There are probably some interactions with LMR since changing the ordering changes the history values and such which changes the pruning in odd ways.
Yes. For this reason, I think it is best to disable LMR while doing move ordering tests.

Tord

It's a catch-22. Since LMR is used normally, removing it has a _different_ effect on things...

sje · Post by **sje** » Mon Apr 02, 2007 6:44 pm

bob wrote:You can't do it all in parallel. You have to use some sort of "tree" to reduce it down to a single answer. Which is a tree of depth 6 for 64 squares... Each square could find its own answer easily, but then they have to "get together" and figure out which is the "best answer" and that adds sequential logic and stretches the clock cycle time. That's where MVV/LVA came from. Two cycles. In Belle Ken first found the most valuable piece under attack, then he used almost the same hardware the next cycle to find the least valuable piece attacked from that square (which would be the least valuable piece attacking that square)...

Yes, but not necessarily with a depth of six. If I recall correctly, Belle used a priority decoder that compared more than two values at once.

With 21 best-of-four comparators doing 64->16->4->1, it's a three stage combinatorial circuit. With nine best-of-eight comparators, the job can be done in two stages: 64->8->1.

hgm · Post by **hgm** » Mon Apr 02, 2007 6:53 pm

bob wrote:I think it is sub-optimal no matter what.

Yes, but that what does it prove? Seems to me you are searching out-of-window already...

hgm · Post by **hgm** » Mon Apr 02, 2007 6:59 pm

sje wrote:
bob wrote:You can't do it all in parallel. You have to use some sort of "tree" to reduce it down to a single answer. Which is a tree of depth 6 for 64 squares... Each square could find its own answer easily, but then they have to "get together" and figure out which is the "best answer" and that adds sequential logic and stretches the clock cycle time. That's where MVV/LVA came from. Two cycles. In Belle Ken first found the most valuable piece under attack, then he used almost the same hardware the next cycle to find the least valuable piece attacked from that square (which would be the least valuable piece attacking that square)...
Yes, but not necessarily with a depth of six. If I recall correctly, Belle used a priority decoder that compared more than two values at once.

With 21 best-of-four comparators doing 64->16->4->1, it's a three stage combinatorial circuit. With nine best-of-eight comparators, the job can be done in two stages: 64->8->1.

You are confusing me now. It seems that 'find MVV' would need exactly the same circuitry to single out the highest victim. Or was the position represented as a piece list, rather than a board array?

Peter Fendrich · Post by **Peter Fendrich** » Mon Apr 02, 2007 7:19 pm

I don't get why you turn LMR off during move ordering tests.
Why not turn off killers or hash tables?
They are all both dependent of the move ordering and will also change the move ordering.
If there is a combined effect from LMR and different move ordering methods you want to catch that right?
/Peter

bob · Post by **bob** » Mon Apr 02, 2007 8:40 pm

sje wrote:
bob wrote:You can't do it all in parallel. You have to use some sort of "tree" to reduce it down to a single answer. Which is a tree of depth 6 for 64 squares... Each square could find its own answer easily, but then they have to "get together" and figure out which is the "best answer" and that adds sequential logic and stretches the clock cycle time. That's where MVV/LVA came from. Two cycles. In Belle Ken first found the most valuable piece under attack, then he used almost the same hardware the next cycle to find the least valuable piece attacked from that square (which would be the least valuable piece attacking that square)...
Yes, but not necessarily with a depth of six. If I recall correctly, Belle used a priority decoder that compared more than two values at once.

With 21 best-of-four comparators doing 64->16->4->1, it's a three stage combinatorial circuit. With nine best-of-eight comparators, the job can be done in two stages: 64->8->1.

There's no way to do best of 8 at once. Such a decoder has its own internal 2-way comparisons and a tree. When you are laying out a chip, you don't do decoders and the like, you do comparators and 2-1 multiplexors, as everything is simply transistors here and there.

But the bottom line is that it _still_ stretches the clock cycle so that becomes the slowest part of the pipeline and limits speed.

bob · Post by **bob** » Mon Apr 02, 2007 8:44 pm

hgm wrote:
sje wrote:
bob wrote:You can't do it all in parallel. You have to use some sort of "tree" to reduce it down to a single answer. Which is a tree of depth 6 for 64 squares... Each square could find its own answer easily, but then they have to "get together" and figure out which is the "best answer" and that adds sequential logic and stretches the clock cycle time. That's where MVV/LVA came from. Two cycles. In Belle Ken first found the most valuable piece under attack, then he used almost the same hardware the next cycle to find the least valuable piece attacked from that square (which would be the least valuable piece attacking that square)...
Yes, but not necessarily with a depth of six. If I recall correctly, Belle used a priority decoder that compared more than two values at once.

With 21 best-of-four comparators doing 64->16->4->1, it's a three stage combinatorial circuit. With nine best-of-eight comparators, the job can be done in two stages: 64->8->1.
You are confusing me now. It seems that 'find MVV' would need exactly the same circuitry to single out the highest victim. Or was the position represented as a piece list, rather than a board array?

Belle had a FSM for each square. It then had a "find-victim" operation that caused each square to find out if the piece on it was under attack. A priority tree picked the largest value. Then for that square another "find victim" cycle was done but inversely which found pieces that attacked the square and chose the smaller.

Each square-FSM was connected to its neighbors exactly as the squares were connected on the chess board, so there was some propagation issues for diagonals and sliders...

In belle each square had a state that included the piece standing on that square... So it was basically something like 64 processors, one for each square, and they all could "do their thing" at the same time...

bob · Post by **bob** » Mon Apr 02, 2007 8:46 pm

JP wrote:I don't get why you turn LMR off during move ordering tests.
Why not turn off killers or hash tables?
They are all both dependent of the move ordering and will also change the move ordering.
If there is a combined effect from LMR and different move ordering methods you want to catch that right?
/Peter

While I agree with you, I see his point, as there is a "ripple-effect" where a change in ordering produces a change in LMR pruning, which produces a change in hash table hits/misses which ...

But overall I try to test what I am working on, but "in situ" so that how it influences other things will be measured...

SEE on non-capture moves in main search

Re: SEE on non-capture moves in main search

Re: SEE on non-capture moves in main search

Re: SEE on non-capture moves in main search

Re: SEE on non-capture moves in main search

Re: SEE on non-capture moves in main search

Re: SEE on non-capture moves in main search

Re: SEE on non-capture moves in main search

Re: SEE on non-capture moves in main search

Re: SEE on non-capture moves in main search

Re: SEE on non-capture moves in main search