What is profiling? Profiling is instrumenting your code to see how much time your CPU(s) spend in different parts of your code.
Profiling is very important when you care about performance, because humans are notoriously bad at guessing where performance bottlenecks are, and if you want to optimize your program, you REALLY want to know where to focus your effort on.
Have you heard of the saying "pre-mature optimization is the root of all evil"? It's true, but when is it not pre-mature? After you discover a performance bottleneck through profiling!
So how do you profile your own engine? 4 easy steps!
1. Implement a bench mode if you don't already have one. Have the engine just search a few positions and exit cleanly (this is important - you can't rely on Ctrl-C, because profiles are dumped at the end of execution).
2. Install the necessary tools (this is for Ubuntu/Debian, I'm sure you can figure it out for other distros) -
Code: Select all
sudo apt-get install python python-pip graphviz
sudo pip install gprof2dot
Code: Select all
g++ (or gcc) -Os -g -pg <your normal options except don't do further optimization>
Code: Select all
./<your engine> bench
gprof <your engine>|prof2dot.py -s|dot -Tpng -o profile.png
----------------------------------------------------------------------------------------------
As an example, this is the profile of Giraffe: http://matthewlai.ca/tmp/profile.png
Each box represents a function. For example, the Search::Search() box says
Code: Select all
Search::Search
94.85%
(0.11%)
1056108x
Then we see that 73.91% is spent in QSearch, and 85% in the eval function (regular search calls eval as well). Then if we follow the call graph all the way down, we see that almost all of that time is spent in the Forward call of linear layers in the neural network, which spends all its time in Eigen's matrix-vector product function, as we would expect.
What are some of the things I can learn from this graph, if this is the first time I profiled?
1. I don't do pseudo-legal move generation, and if we look at Board::GenerateAllLegalMoves, we see that only 1.39% of the time is spent there anyways. There is absolutely no need to do pseudo-legal move generation (and make the code unnecessarily complex) in Giraffe. Obviously incremental move generation is even further out of the question.
2. I determine whether a move is checking or not by applying and unapplying. This is a highly inefficient way to do that, and other engines have very complicated ways of doing this. But should I spend time optimizing this in Giraffe? No. CheckLegal only takes 1.22% of the time (and it's called from absolutely everywhere).
3. SEE also takes almost no time at all. So is the SEE-maps thing that I was doing and thought was quite slow.
4. The only place I should really be spending time optimizing is the neural network (and feature generation, a bit), since they take up 86% of the time.
----------------------------------------------------------------------------------------------
On Windows: This is supposedly possible on Windows as well if you use GCC (MinGW or Cygwin), but I never tried it: http://yzhong.co/profiling-with-gprof-u ... -window-7/
If you use Microsoft tools, MSVC has its own profiler, but I have no idea how that works, and I believe you have to pay a lot of money to get that with the Enterprise edition.
Another popular (free) option is Very Sleepy: http://www.codersnotes.com/sleepy/
I believe gprof2dot also supports visualizing Very Sleepy profiles.