doing undoing

LoopList · Post by **LoopList** » Mon May 14, 2007 5:16 pm

What will be faster when incremental doing/undoing moves:

1.
do_move(const int move) {
  int to_from;
  ...
  to_from = move & 0xFFF; 
  key ^= random_rook_key[to_from];
  bitboard1 ^= set_bits[to_from];
  bitboard2 ^= set_bits[to_from];
  ...
}

2.
do_move(const int move) {
  int to;
  int from;
  ...
  to = move & 0x3F;
  from = (move >> 6) & 0x3F;
  key ^= random_rook_key[to]; 
  key ^= random_rook_key[from];
  bitboard1 |= set_bit[to];
  bitboard1 &= clear_bit[from];
  bitboard2 |= set_bit[to];
  bitboard2 &= clear_bit[from];
  ...
}

In case 1. we have less bit operations but we need much larger look-up-tables (size = 4096 x 8 Bytes = 32 KB).

In case 2. we have more bit operations but we need smaller loop-up-tables (size = 64 x 8 Bytes = 0.5 KB).

sje · Post by **sje** » Mon May 14, 2007 6:00 pm

Code both, then compare actual usage time results.

Gerd Isenberg · Post by **Gerd Isenberg** » Mon May 14, 2007 8:24 pm

Hi Fritz,

actually I like your first one better, but using simdwise sse2.
For approach 2, you may better use xor with set_bit always.

It depends on how many cachelines you can save in average with denser, but twice as much lookups. It depends not necessarily on the distribution of those cachelines in memory (but there are issues as well, pages, hardware pre-fetcher).

Memory versus computation is often hard to predict and will largely depend on your other memory and cache requirements. As long your L1 Datacache is not saturated, likely using more memory is faster. But if it becomes polluted it might be the other way around.

Gerd

Gerd Isenberg · Post by **Gerd Isenberg** » Tue May 15, 2007 11:06 am

Another idea for recent x64 cpus like core 2 duo is to use bittest intrinsics, to set, reset or complement bits by bitindex with one instruction. btx reg64, reg64 has a latency of one cycle on a core 2 duo.

http://msdn2.microsoft.com/en-us/library/azcs88h2.aspx

_bittestandset64
_bittestandreset64
_bittestandcomplement64

u64 key = ....;
_bittestandcomplement64(&key, from);
_bittestandcomplement64(&key, to);

Hopefully the compiler will use the btc reg64, reg64 here, since the btx-version with memory operand is much more expensive since it allows to process large bit-arrays and doesn't rely on mod 32/64...

In a perft- or "beancounter" framework, doMove- and undoMove-performance is more critical than in approaches where one usually spends much more time in leaf-nodes - also a futility-pruning versus lazy-eval issue.

Gerd

Aleks Peshkov · Post by **Aleks Peshkov** » Tue May 15, 2007 1:29 pm

Gerd Isenberg wrote:...where one usually spends much more time in leaf-nodes - also a futility-pruning versus lazy-eval issue.

Please explain the last statement a bit.

Gerd Isenberg · Post by **Gerd Isenberg** » Tue May 15, 2007 1:46 pm

Aleks Peshkov wrote:
Gerd Isenberg wrote:...where one usually spends much more time in leaf-nodes - also a futility-pruning versus lazy-eval issue.
Please explain the last statement a bit.

If you prune forward, you probably don't do Make/Unmake. In contrast, if you have a lot of cheap leaves, make/Unmake becomes relative more expensive.

doing undoing

doing undoing

Re: doing undoing

Re: doing undoing

Re: doing undoing

Re: doing undoing

Re: doing undoing