C++ optimizations

Discussion of chess software programming and technical issues.

Moderator: Ras

lucasart
Posts: 3242
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

C++ optimizations

Post by lucasart »

- Initially I had a C99 board code (play, generate moves, perft and all the usual stuff)
- I compiled it with -std=c11 and -std=c99 and got the same performance ~ 50 million leaves/seconds (remember that perft counts leaves, not nodes, it's not very different anyway)
- I compiled the same code with -std=c++11, and got 63 million leaves/sec
- I then modified the code to enforce type safety. Instead of having File, Rank, Square, Color, Piece be unsigned int, they became typed enum. This forced me to modify quite a bit of code and introduce operators, but it always pays off in the end. For example when I write square(f, r) instead of square(r, f) I get a compile error, instead of some segfault or an assert() fail *much* later(especially as Rank and File have the same 0..7 range). This is a recipie from Glaurung/Stockfish, which I recommend everyone (if you use C++) to use!
- Now my board code does 100 million leaves / seconds on my perft benchmark (*)

In theory there should be no performance gain (or loss due to inling of these trivial operators). So I don't understand where this massive speed gain is coming from. Are there some special techniques from C++11 that allows compilers to accelerate C code (I'm not using any C++ features, except typed enum and operators to manipulate them).

100 million leaves / sec on an Intel duo core from 3y ago (a rahter low end machine even at the time, that I paid only 300 pounds for).

So I'm both excited (performance doubled), and frustrated (not understanding why)...

By the way, if anyone is interested, my code is on github:
https://github.com/lucasart/chess.git
It is an attempt to do a simplified cutechess-cli like program. If you download these files only:
board.h, board.cc, move.cc, movegen.cc, test.cc
bitboard.h, bitboard.cc
operator.h
types.h
main.cc
you basically have all you need to start writing a chess engine. So you have all the board code, that is very fast and optimized, and you can focus on writing more high-level code, that is actually search related, rather than reinventing wheels.

Of course, for an interface, you don't need such speed. But I had highly optimized code from my engine DisoCheck, so I wasn't going to throw it away and redo it all (it is a lot more work than it seems, and throw most people off writing an engine from scratch).

(*) Perft benchmark: {fen, depth, #leaves}

Code: Select all

{"rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1", 6, 119060324ULL}
{"r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq -", 5, 193690690ULL}
{"8/2p5/3p4/KP5r/1R3p1k/8/4P1P1/8 w - -", 7, 178633661ULL}
{"r2q1rk1/pP1p2pp/Q4n2/bbp1p3/Np6/1B3NBn/pPPP1PPP/R3K2R b KQ - 0 1", 6, 706045033ULL}
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
Vinvin
Posts: 5303
Joined: Thu Mar 09, 2006 9:40 am
Full name: Vincent Lejeune

Re: C++ optimizations

Post by Vinvin »

lucasart wrote:...
- Now my board code does 100 million leaves / seconds on my perft benchmark (*)

In theory there should be no performance gain (or loss due to inling of these trivial operators). So I don't understand where this massive speed gain is coming from. Are there some special techniques from C++11 that allows compilers to accelerate C code (I'm not using any C++ features, except typed enum and operators to manipulate them).

100 million leaves / sec on an Intel duo core from 3y ago (a rahter low end machine even at the time, that I paid only 300 pounds for).

So I'm both excited (performance doubled), and frustrated (not understanding why)...
...
I dream about all top engines get à +100% speed up like this ! :D
Or at least Stockfish 8-)
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: C++ optimizations

Post by mcostalba »

Congratulations !

Actually I have tested with -std=c++11 but unfortunatly no luck here :-(

I have checked your sources: a nice legal move generator

It is known that for perft a legal generator is faster. A pseudo-legal move generator is IMHO suitable in a full engine, where the most part of generated moves are not tried becuase a cut-off occurs earlier or becuase are simply pruned. But when you have to try all the generated moves, like in perft, testing for legality while in the generation loop seems the optimal approach.

Regarding perft, another trick that I have seen is to not serialize the moves of the last ply but simply to popcount the corresponding bitboard.
hMx
Posts: 61
Joined: Wed Mar 08, 2006 9:40 pm
Location: Germany, Berlin

Re: C++ optimizations

Post by hMx »

lucasart wrote:- Initially I had a C99 board code (play, generate moves, perft and all the usual stuff)
- I compiled it with -std=c11 and -std=c99 and got the same performance ~ 50 million leaves/seconds (remember that perft counts leaves, not nodes, it's not very different anyway)
- I compiled the same code with -std=c++11, and got 63 million leaves/sec
- I then modified the code to enforce type safety. Instead of having File, Rank, Square, Color, Piece be unsigned int, they became typed enum. This forced me to modify quite a bit of code and introduce operators, but it always pays off in the end. For example when I write square(f, r) instead of square(r, f) I get a compile error, instead of some segfault or an assert() fail *much* later(especially as Rank and File have the same 0..7 range). This is a recipie from Glaurung/Stockfish, which I recommend everyone (if you use C++) to use!
- Now my board code does 100 million leaves / seconds on my perft benchmark (*)

In theory there should be no performance gain (or loss due to inling of these trivial operators). So I don't understand where this massive speed gain is coming from. Are there some special techniques from C++11 that allows compilers to accelerate C code (I'm not using any C++ features, except typed enum and operators to manipulate them).

100 million leaves / sec on an Intel duo core from 3y ago (a rahter low end machine even at the time, that I paid only 300 pounds for).

So I'm both excited (performance doubled), and frustrated (not understanding why)...
Just a wild guess...
from the enum type the compiler may deduce a small possible numeric range, and as a consequence omit some range checks on later array indexing?

Cheers
lucasart
Posts: 3242
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: C++ optimizations

Post by lucasart »

mcostalba wrote:Congratulations !

Actually I have tested with -std=c++11 but unfortunatly no luck here :-(

I have checked your sources: a nice legal move generator
Thank you :)
mcostalba wrote: It is known that for perft a legal generator is faster. A pseudo-legal move generator is IMHO suitable in a full engine, where the most part of generated moves are not tried becuase a cut-off occurs earlier or becuase are simply pruned. But when you have to try all the generated moves, like in perft, testing for legality while in the generation loop seems the optimal approach.

Regarding perft, another trick that I have seen is to not serialize the moves of the last ply but simply to popcount the corresponding bitboard.
Actually I have an interface in mind, so I don't really care about perft or even move generation in itself. The only reason why I have it is as a test of non regrssion (and incidentally speed) on the entire board code.
What you want an interface to do, is not even to generate or count legal moves, but rather:
(i) know if there are legal moves in the current positions (ideally withouth wasting time generating them all)
(ii) what is the current board "stats": None, CheckMate, StaleMate, DrawBy50Move etc.
For (i), I have a function has_moves(), which in most cases doesn't even generate moves (I only optimized the critical execution path with has_piece_moves()).
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
jdart
Posts: 4411
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: C++ optimizations

Post by jdart »

Range checking should be off anyway, in an optimized build.

--Jon
lucasart
Posts: 3242
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: C++ optimizations

Post by lucasart »

jdart wrote:Range checking should be off anyway, in an optimized build.

--Jon
I agree 100%

Besides, relying on range checking (or garbage collection) is admitting your own incompetence. That's why incompetent programmers prefer Java to C :wink:
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
User avatar
velmarin
Posts: 1600
Joined: Mon Feb 21, 2011 9:48 am

Re: C++ optimizations

Post by velmarin »

lucasart wrote: you basically have all you need to start writing a chess engine. So you have all the board code, that is very fast and optimized, and you can focus on writing more high-level code, that is actually search related, rather than reinventing wheels.

Of course, for an interface, you don't need such speed. But I had highly optimized code from my engine DisoCheck, so I wasn't going to throw it away and redo it all (it is a lot more work than it seems, and throw most people off writing an engine from scratch).
:D

You are very generous, nice initiative.
No regret ...
Today's changes ...
GPL, :?: :?:
lucasart
Posts: 3242
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: C++ optimizations

Post by lucasart »

velmarin wrote:
lucasart wrote: you basically have all you need to start writing a chess engine. So you have all the board code, that is very fast and optimized, and you can focus on writing more high-level code, that is actually search related, rather than reinventing wheels.

Of course, for an interface, you don't need such speed. But I had highly optimized code from my engine DisoCheck, so I wasn't going to throw it away and redo it all (it is a lot more work than it seems, and throw most people off writing an engine from scratch).
:D

You are very generous, nice initiative.
No regret ...
Today's changes ...
GPL, :?: :?:
Absolutely no regrets on using the GPL. On the contrary, it was my intention since the beginning. I just forgot to put it initially.

Besing, this code distributed without the GPL is illegal (strictly speaking):
* Umko is GPL
* the magic bitboard generation code (in bitboard.cc) is pretty much copy-pasted from Umko (with a few trivial changes)
* So my entire project must be released under the GPL

And that's whole point of the GPL:
* I took something from the free software community
* In exchange, I have to give back to the community whatever I make out of it.
* Fair deal, no ?

As for the code from DiscoChec, since I am the copyright holder, I can do whatever I want (if I exclude the Umko magic bitboard part).

Anyway, we're hobbyist programmers, not lawyers, but I do take pride in propagating software freedom through the GPL :D
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
Gian-Carlo Pascutto
Posts: 1260
Joined: Sat Dec 13, 2008 7:00 pm

Re: C++ optimizations

Post by Gian-Carlo Pascutto »

lucasart wrote:So I don't understand where this massive speed gain is coming from.
Profiling would shed light.