- I compiled it with -std=c11 and -std=c99 and got the same performance ~ 50 million leaves/seconds (remember that perft counts leaves, not nodes, it's not very different anyway)
- I compiled the same code with -std=c++11, and got 63 million leaves/sec
- I then modified the code to enforce type safety. Instead of having File, Rank, Square, Color, Piece be unsigned int, they became typed enum. This forced me to modify quite a bit of code and introduce operators, but it always pays off in the end. For example when I write square(f, r) instead of square(r, f) I get a compile error, instead of some segfault or an assert() fail *much* later(especially as Rank and File have the same 0..7 range). This is a recipie from Glaurung/Stockfish, which I recommend everyone (if you use C++) to use!
- Now my board code does 100 million leaves / seconds on my perft benchmark (*)
In theory there should be no performance gain (or loss due to inling of these trivial operators). So I don't understand where this massive speed gain is coming from. Are there some special techniques from C++11 that allows compilers to accelerate C code (I'm not using any C++ features, except typed enum and operators to manipulate them).
100 million leaves / sec on an Intel duo core from 3y ago (a rahter low end machine even at the time, that I paid only 300 pounds for).
So I'm both excited (performance doubled), and frustrated (not understanding why)...
By the way, if anyone is interested, my code is on github:
https://github.com/lucasart/chess.git
It is an attempt to do a simplified cutechess-cli like program. If you download these files only:
you basically have all you need to start writing a chess engine. So you have all the board code, that is very fast and optimized, and you can focus on writing more high-level code, that is actually search related, rather than reinventing wheels.board.h, board.cc, move.cc, movegen.cc, test.cc
bitboard.h, bitboard.cc
operator.h
types.h
main.cc
Of course, for an interface, you don't need such speed. But I had highly optimized code from my engine DisoCheck, so I wasn't going to throw it away and redo it all (it is a lot more work than it seems, and throw most people off writing an engine from scratch).
(*) Perft benchmark: {fen, depth, #leaves}
Code: Select all
{"rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1", 6, 119060324ULL}
{"r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq -", 5, 193690690ULL}
{"8/2p5/3p4/KP5r/1R3p1k/8/4P1P1/8 w - -", 7, 178633661ULL}
{"r2q1rk1/pP1p2pp/Q4n2/bbp1p3/Np6/1B3NBn/pPPP1PPP/R3K2R b KQ - 0 1", 6, 706045033ULL}
