Moving to Magic Bitboards... any advice?

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

asanjuan
Posts: 214
Joined: Thu Sep 01, 2011 5:38 pm
Location: Seville, Spain

Moving to Magic Bitboards... any advice?

Post by asanjuan »

Hi all.
My engine is bitboard-based, but not using magics, nor rotated BB. I think it searches with a reasonable branching factor, but to my surprise and after reading some posts, i found that is quite slow in nps (2.5 Mnps without eval in a single intel core2).
Currently, my search only reaches 270knps. And as i said, it seems to me that is quite slow.
Maybe i'm using too much the well known msb and lsb routines to generate moves for sliding pieces, even in eval.

After reading the wiki, finally understood how magic bitboard works, and i'm planning to implement it into Rhetoric (my engine).

any advice from the experts?
(and please, don't say patience :) )
tpetzke
Posts: 686
Joined: Thu Mar 03, 2011 4:57 pm
Location: Germany

Re: Moving to Magic Bitboards... any advice?

Post by tpetzke »

Hi,

Code: Select all

Maybe i'm using too much the well known msb and lsb routines to generate moves for sliding pieces, even in eval.
Never guess, profile and look whether this is your hotspot. If it is then moving to magic bitboards won't help you because the need to localize the positions of set bits in a bitboard does not go away just by changing the bitboard arithmetic.

Thomas...
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Moving to Magic Bitboards... any advice?

Post by Don »

asanjuan wrote:Hi all.
My engine is bitboard-based, but not using magics, nor rotated BB. I think it searches with a reasonable branching factor, but to my surprise and after reading some posts, i found that is quite slow in nps (2.5 Mnps without eval in a single intel core2).
Currently, my search only reaches 270knps. And as i said, it seems to me that is quite slow.
Maybe i'm using too much the well known msb and lsb routines to generate moves for sliding pieces, even in eval.

After reading the wiki, finally understood how magic bitboard works, and i'm planning to implement it into Rhetoric (my engine).

any advice from the experts?
(and please, don't say patience :) )
2.5 Mnps seems fast to me, Komodo does much less. Of course Komodo has an evaluation function ...
What is your hardware and how many Mnps do the other programs do on this hardware?
asanjuan
Posts: 214
Joined: Thu Sep 01, 2011 5:38 pm
Location: Seville, Spain

Re: Moving to Magic Bitboards... any advice?

Post by asanjuan »

My perft function alone, doing only move generation, reaches 2.4 Mnps
The simple alpha-beta doing eval on quiescence reaches 600 knps, but aplying some heuristics, like lazy eval, futility, razoring, hash lookup, nullmove... the node speed is reduced to 270 knps for search.
My machine is a 2,4Mhz intel core 2.

On my hardware Fruit makes about 900knps on pure search, houdini about 1.5 Mnps...

Reading the posts of this forum, people say that they reach 16Mnps on perft tests, which is very impressive in my opinion.
This is why i'm trying to improve move generation speed.

... or am i missing something?
asanjuan
Posts: 214
Joined: Thu Sep 01, 2011 5:38 pm
Location: Seville, Spain

Re: Moving to Magic Bitboards... any advice?

Post by asanjuan »

Never guess, profile and look whether this is your hotspot
I've never used a profile tool. I have always optimized code by "intuition" with excellent results :D

But, ok, is a very good answer, so thank you very much. I'll try to profile.
Do you know any good profiler for windows?
(at this moment is when my ignorance is revealed :? )
User avatar
Evert
Posts: 2929
Joined: Sat Jan 22, 2011 12:42 am
Location: NL

Re: Moving to Magic Bitboards... any advice?

Post by Evert »

asanjuan wrote: On my hardware Fruit makes about 900knps on pure search, houdini about 1.5 Mnps...

Reading the posts of this forum, people say that they reach 16Mnps on perft tests, which is very impressive in my opinion.
This is why i'm trying to improve move generation speed.
What person A calls a node may not be the same as what person B calls a node, so be careful when comparing raw numbers.
As to perft, you can get a massive speedup by doing legal-only move generation and simply return the number of legal moves directly at the leafs (as opposed to trying them all on the board and testing whether they're legal). This doesn't really pay off in a regular search (at least in my experience it doesn't although the penalty is not as bad as I would have thought). Finally, you can get better results by using transposition tables to speed up the perft calculation.

In other words, just looking at the numbers produced by different programs is meaningless unless you know exactly what those programs are doing. To test the speed of your move generator (plus make/unmake) a better measure is time-to-depth. My recommendation would be to compare with Crafty, which does nothing special to speed up perft (I'm sure there are other programs you can use for exactly the same thing for the same reason, but I did not try any of them).

EDIT: forgot to say: make sure you compile the program you're comparing against (in my suggestion, Crafty) with the same compiler and compiler options as you're using for your own program. If they're not optimal, then at least they're the same in both cases and you're not comparing apples and oranges.
micron
Posts: 155
Joined: Mon Feb 15, 2010 9:33 am
Location: New Zealand

Re: Moving to Magic Bitboards... any advice?

Post by micron »

A minor point of terminology. There are no 'magic bitboards'. There are many methods to generate slider attacks with bitboards, some of which use 'magic' multiplication.

As a precursor to magic slider attack generation, the classic 4-ray, loop-free, branch-free method has some advantages, including being easy to understand. It has the same interface.

Code: Select all

BitBoard BishopAttacks( int sq, BitBoard occ ) 
{
  int         blockingSq; 
  BitBoard    attacks, partAttacks, blockers;

  attacks = gRayNW[sq]; 
  blockers = (attacks & occ) | 0x8000000000000000ULL; 
  blockingSq = GetLowestBit( blockers ); 
  attacks ^= gRayNW[blockingSq]; // mask off beyond blocking square
  
  partAttacks = gRayNE[sq]; 
  blockers = (partAttacks & occ) | 0x8000000000000000ULL; 
  blockingSq = GetLowestBit( blockers ); 
  partAttacks ^= gRayNE[blockingSq]; 
  attacks |= partAttacks; 

  partAttacks = gRaySE[sq]; 
  blockers = (partAttacks & occ) | 1ULL; 
  blockingSq = GetHighestBit( blockers ); // note Get*Highest*Bit() here
  partAttacks ^= gRaySE[blockingSq]; 
  attacks |= partAttacks; 

  partAttacks = gRaySW[sq]; 
  blockers = (partAttacks & occ) | 1ULL; 
  blockingSq = GetHighestBit( blockers ); // note Get*Highest*Bit() here
  partAttacks ^= gRaySW[blockingSq]; 
  return attacks | partAttacks; 
}
The auxiliary bitboard arrays gRayNW[64]... are easily precomputed.
RookAttacks() is identical, but with orthogonal rays gRayN[]...

These slider attack generators have respectable performance. Replacing the function bodies with magic gives an increase in NPS that is noticeable and worthwhile, but not huge.
tpetzke
Posts: 686
Joined: Thu Mar 03, 2011 4:57 pm
Location: Germany

Re: Moving to Magic Bitboards... any advice?

Post by tpetzke »

I've never used a profile tool. I have always optimized code by "intuition" with excellent results
Then your intuition is better than mine. Some things are really counter intuitive. For an easy start I would you gprof, the profiler from gcc. It is included in the mingw suite and very easy to use and free. It will tell you in what functions your program spends most of the time, not necessarily why your program spends your time there.

From a first look I would also say the Intel VTune Amplifier is a very powerful tool, it will tell you per statement in your function what it contributes to the run time of the program. Unfortunately it is not free but there is an evaluation version available.

And don't run your profiler in a VMware image, it won't work. I know it is obvious, but stupid me tried even that.

Thomas...
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: Moving to Magic Bitboards... any advice?

Post by lucasart »

asanjuan wrote:Hi all.
My engine is bitboard-based, but not using magics, nor rotated BB. I think it searches with a reasonable branching factor, but to my surprise and after reading some posts, i found that is quite slow in nps (2.5 Mnps without eval in a single intel core2).
Currently, my search only reaches 270knps. And as i said, it seems to me that is quite slow.
Maybe i'm using too much the well known msb and lsb routines to generate moves for sliding pieces, even in eval.

After reading the wiki, finally understood how magic bitboard works, and i'm planning to implement it into Rhetoric (my engine).

any advice from the experts?
(and please, don't say patience :) )
Compared to magics, I've found rotated BB to be almost as fast. If you're just looking at a board + move generation code (ie. a pure perft) magics won't make a big difference. The only advantage of magics is that the occupancy is a single bitboard (i/o 4 different ones to maintain dynamically with rotated BB). It is not only faster but more elegant, and allows for set operations in O(1): example you want to remove all white pawns from thr occupance, just a single &~ and that's it, while rotated BB would inevitably end up in an ugly loop... I found that useful for the eval, hence the switch from rotated BB to magics in my engine.

If your engine performs perft calculations at an average of 2.5 Mnps, then that's quite slow indeed (mine averages around 25-30 Mnps and it's surely far from optimal). I don't think the bottle neck is that you don't use magics. I can't give you any precise advice, as I haven't looked at your code, but the best way to improve it is to run the compiler profiler and see what you can optimize. Of course it's good to rely on your intuition when you code to make sure your code is efficient, but when you have such a complex combination of interacting functions to optimize, forget your intuition and let the profiler teach you :D. I've spent a fair amount of time profilng my perft, and it was definitely worth it.

Basically what I'm saying is that magics or rotated BB or whatever else is just a set of primitive operations, and the inefficieny is probably not in these operations but in the conception of the board and/or move generation that calls these primitive operations. Perhaps you should parse through the Stockfish code to see how a real artist does this! It should give you some idea on how to organize the board/movegen code efficiently.

Another area of improvement is the move sorting machinery. Let's say that your move ordering is the following:
1/ hash move
2/ non losing captures(*) by MVV/LVA or SEE
3/ killer moves
4/ losing captures by MVV/LVA or SEE
5/ quiet moves by history score
(*) "captures" include en passant captures and promotions

You need to make use of early beta cutoffs to implement a lazy move generation scheme. Examples:
* if the hash move triggers a beta cutoff, you can save the whole move generation (just have a function that tests the legality of the hash move). when the hash move exist it should be legal (with infinitesimal chances that it's not) and it will very often be the best move.
* if a non losing capture triggers a beta cutoff, you'll only have generated captures, and you'll save all the quiet move generation
* if a killer move generates a beta cutoff (again test the killer legality before, as it will often be illegal), same story, you can save the quiet move generation
* same story if a losing capture generates the cutoff

My movesorting machinery is the real bottleneck of my engine, and I plan to do what I describe above at some point, when the program is mature enough that raw speed really matters (if I ever reach that stage).
tpetzke
Posts: 686
Joined: Thu Mar 03, 2011 4:57 pm
Location: Germany

Re: Moving to Magic Bitboards... any advice?

Post by tpetzke »

basically what I'm saying is that magics or rotated BB or whatever else is just a set of primitive operations, and the inefficieny is probably not in these operations but in the conception of the board and/or move generation that calls these primitive operations.
I tend to agree, but if lsb or msb is implemented very slow it will make a big difference, as it called so so often. The profiler will tell you

I recently go a nice speedup by investing some time in my popcount method.

Thomas...