wgarvin wrote:syzygy wrote:Linus wrote:So my gut feel is that getting rid of the bitfield as early as possible, and turning all bitfield accesses into regular load/shift/mask/store operations is always the right thing to do.
Yeah... he was talking about what the gcc compiler writers should do in their optimizer, but oddly enough it sounds like good advice for us to follow too: if you want the fastest engine, don't use the compiler's bitfields, but do your own shifting and masking instead.
Of course you should use macros or inline functions to keep that nastiness just in one place, and have clear readable code everywhere else.
OK I decided to experiment, here are my results (results are expressed in leaf/sec on a varied perft() benchmark):
1/ struct with compiler handled bitfields
Code: Select all
struct move_t {
uint16_t fsq:6, tsq:6;
uint16_t prom:2; // 0=Knight...3=Queen
uint16_t flag:2; // 0=normal,1=ep,2=prom,3=castling
bool operator== (move_t m) const {
return fsq == m.get_fsq() && tsq == m.get_tsq() && flag == m.get_flag();
}
int get_prom() const {
assert(flag == PROMOTION);
return prom + KNIGHT;
}
void set_prom(int piece) {
assert(KNIGHT <= piece && piece <= QUEEN);
prom = piece - KNIGHT;
}
};
speed:
69563522 leaf/sec
2/ class with encapsulation
A preparatory step to modify the class to go from compiler bitfields to manual ones (see 3/). As expected there is absolutely zero overhead:
Code: Select all
class move_t {
uint16_t fsq:6, tsq:6;
uint16_t prom:2; // 0=Knight...3=Queen
uint16_t flag:2; // 0=normal,1=ep,2=prom,3=castling
public:
bool operator== (move_t m) const {
return fsq == m.get_fsq() && tsq == m.get_tsq() && flag == m.get_flag();
}
int get_fsq() const { return fsq; }
int get_tsq() const { return tsq; }
int get_flag() const { return flag; }
void set_fsq(unsigned _fsq) { fsq = _fsq; }
void set_tsq(unsigned _tsq) { tsq = _tsq; }
void set_flag(unsigned _flag) { flag = _flag; }
int get_prom() const {
assert(flag == PROMOTION);
return prom + KNIGHT;
}
void set_prom(int piece) {
assert(KNIGHT <= piece && piece <= QUEEN);
prom = piece - KNIGHT;
}
};
speed:
68930908 leaf/sec
3/ manual bitfields
it's quite ugly, and the compiler screams for unitialized variables, so I had to do a default ctor (with infinitesimal and unmeasurable performance cost, so don't worry about that). At least, the uglyness is kept inside the class, and the rest of the code doens't have to be ugly:
Code: Select all
class move_t {
/* 16 bit field:
* fsq = 0..5
* tsq = 6..11
* prom = 12,13 (0=Knight..3=Queen)
* flag = 14,15 (0=NORMAL...3=CASTLING)
* */
uint16_t mask;
public:
move_t(): mask(0) {} // silence compiler warnings
bool operator== (move_t m) const { return mask == m.mask; }
int get_fsq() const { return mask & 0x3f; }
int get_tsq() const { return (mask >> 6) & 0x3f; }
int get_flag() const { return (mask >> 14) & 3; }
void set_fsq(unsigned fsq) { mask &= 0xffc0; mask ^= fsq; }
void set_tsq(unsigned tsq) { mask &= 0xf03f; mask ^= (tsq << 6); }
void set_flag(unsigned flag) { mask &= 0x3fff; mask ^= (flag << 14); }
int get_prom() const {
assert(get_flag() == PROMOTION);
return ((mask >> 12) & 3) + KNIGHT;
}
void set_prom(int piece) {
assert(KNIGHT <= piece && piece <= QUEEN);
mask &= 0xcfff;
mask ^= (piece - KNIGHT) << 12;
}
};
speed:
93625373 leaf/sec
I am shocked by the poor performance of GCC here! My struct/class with compiler handled bitfields didn't incur any undue sign extension, so it should have been just as efficient as manually handled bitfields. But admittedly you guys were right, GCC sucks at bitfields, big time!
But I still don't like doing this. It's like fixing your code instead of fixing the compiler. I don't see what prevents the GNU guys from doing it right in GCC, so that people don't have to write this kind of code. This is so 80s... We're in 2012
I wonder if other compilers do better here. I pushed the patches 2/ and 3/ on my github:
https://github.com/lucasart/chess
PS: 94 M leaf/sec with a 16-bit move_t on an Intel Due Core ES5300 @ 2.6 GHz (one thread, no hash table cheating) is pretty damned fast perft

Theory and practice sometimes clash. And when that happens, theory loses. Every single time.