Profiling a mediocre engine

JVMerlino · Post by **JVMerlino** » Fri Apr 22, 2022 2:57 am

I did some profiling (using Very Sleepy) on the latest version of Myrddin and came up with some results that I hoped I could get some feedback/comparison from the community. This data came from a 3-minute search from the initial position.

59.5% of the time is spent in qsearch
37.0% of the time is spent in move generation
37.7% of the time is spent in evaluation
19.7% of the time is spent performing a BitScan
8.1% of the time is spent probing various hashes (transposition, eval, pawn)
6.0% of the time is spent determining the attackers for a given square
5.1% of the time is spent in make/unmake
3.8% of the time is spent getting the next move in the ordered move list
3.6% of the time is spent determining if a King is in check
Move generation spends more than half of its time generating pawn moves

Obviously, there's a lot of overlap here, so the percentages do make sense.

But does any of this raise big red flags with you? Thanks for your time....
jm

mvanthoor · Post by **mvanthoor** » Fri Apr 22, 2022 11:44 am

Try a position such as KiwiPete. It's much less quiet than the starting position, so your time spent in QSearch should go up. After this run, it seems the engine is spending too much time in move generation. Should be way less than about 1/3rd of the time.

Bo Persson · Post by **Bo Persson** » Fri Apr 22, 2022 12:37 pm

It points at where an effort might make a differnce.

Obviously you shouldn't start by rewriting make/unmake or move ordering in assembly. But if you can improve move ordering to get a better order, but not by making it 1% faster, that might reduce the qsearch.

Otherwise, move generation and BitScan looks like places that can use some trimming.

In the old Chess 0.5 from Byte magazine, the portable bitscan routine took over 40% of the runtime. Rewriting it with non-portable bit-ops made it come down to essentially zero. That makes me see 19.7% here as a tad much.

Tearth · Post by **Tearth** » Fri Apr 22, 2022 4:46 pm

Statistics for Inanis (I've preserved the order), using my set of benchmark positions:

36% of the time is spent in qsearch
18% of the time is spent in move generation
14% of the time is spent on evaluation
0% of the time is spent performing a BitScan
17% of the time is spent probing various hashes (transposition, eval, pawn)
12% of the time is spent determining the attackers for a given square
11% of the time is spent in make/unmake
0% of the time is spent getting the next move in the ordered move list
4% of the time is spent determining if a King is in check

A few notes:
- the majority of time spent in evaluation is on calculating mobility (which also covers king safety, I do both at the same time)
- bit scan is based on tzcnt instruction, so it's pretty much instant
- "% of the time is spent getting the next move in the ordered move list" is included in "% of the time is spent in move generation", because I have a staged move generator and everything is tightly coupled
- I didn't notice any time spent on probing the pawn hash table, I suppose it sits in a cache so the effect is negligible
- make_move is two times slower than undo_move

j.t. · Post by **j.t.** » Fri Apr 22, 2022 5:31 pm

What's a good program to accurately profile? I use Valgrind with Callgrind and KCachegrind, but often it's not clear where exactly most of the time is spent, especially when compiling release versions.

Anyway, the limited information I got is:

- 53% is used by the evaluation
- 18% by the search
- 8% SEE
- 5% doMove
- 5% inCheck
- 2% generate captures
- 1% generate pawn captures
- 1% get hash entry for search
- 1% quiesce
- 1% generate quiet moves

tcusr · Post by **tcusr** » Fri Apr 22, 2022 6:53 pm

j.t. wrote: ↑Fri Apr 22, 2022 5:31 pm What's a good program to accurately profile? I use Valgrind with Callgrind and KCachegrind, but often it's not clear where exactly most of the time is spent, especially when compiling release versions.

Anyway, the limited information I got is:

- 53% is used by the evaluation
- 18% by the search
- 8% SEE
- 5% doMove
- 5% inCheck
- 2% generate captures
- 1% generate pawn captures
- 1% get hash entry for search
- 1% quiesce
- 1% generate quiet moves

add -g and -fno-omit-frame-pointer to keep track of inlined functions in release mode.

JVMerlino · Post by **JVMerlino** » Fri Apr 22, 2022 7:13 pm

First of all, thanks to Tearth and Bo Persson, I switched from BitScanForward64() to _tzcnt_u64() and got an OVERALL 6% increase in engine speed. Thank you for the handful of elo!

Unfortunately, I have discovered that asserts in the code can wildly affect the stats produced by Very Sleepy. After making the above change, BitScan was still showing 3.3% overall usage. However, more than 3/4 of that was spent on the assert.

So, after making the above BitScan change and then converting all asserts to noops, here are the results for KiwiPete:
67.0% of the time is spent in qsearch
37.0% of the time is spent in move generation
31.3% of the time is spent in evaluation
0.6% of the time is spent performing a BitScan - THANKS AGAIN!
9.5% of the time is spent probing various hashes (transposition, eval, pawn)
7.8% of the time is spent determining the attackers for a given square
6.3% of the time is spent in make/unmake
5.5% of the time is spent getting the next move in the ordered move list - big increase here...hmmm....almost certainly bad move ordering

2.5% of the time is spent determining if a King is in check
And 41.4% of move generation time is spent on pawn moves

tcusr · Post by **tcusr** » Fri Apr 22, 2022 7:20 pm

JVMerlino wrote: ↑Fri Apr 22, 2022 7:13 pm First of all, thanks to Tearth and Bo Persson, I switched from BitScanForward64() to _tzcnt_u64() and got an OVERALL 6% increase in engine speed. Thank you!

Unfortunately, I have discovered that asserts in the code can wildly affect the stats produced by Very Sleepy. After making the above change, BitScan was still showing 3.3% overall usage. However, more than 3/4 of that was spent on the assert. So, after making the above BitScan change and then converting all asserts to noops, here are the results for KiwiPete:
58.6% of the time is spent in qsearch
35.6% of the time is spent in move generation
30.7% of the time is spent in evaluation
0.6% of the time is spent performing a BitScan - THANKS AGAIN!
10.7% of the time is spent probing various hashes (transposition, eval, pawn)
7.6% of the time is spent determining the attackers for a given square
5.8% of the time is spent in make/unmake - 5.9% relative increase
6.0% of the time is spent getting the next move in the ordered move list - big increase here...hmmm....
2.5% of the time is spent determining if a King is in check
And 40.7% of move generation time is spent on pawn moves

if you're using assert() from assert.h you can disable by adding -DNDEBUG in the compile options.
if you use your own assert do something like this

Code: Select all

#ifdef NDEBUG
#define MY_ASSERT(condition) ((void)0)
#else
#define MY_ASSERT(condition) if(condition) /* blah blah */ 
#endif

JVMerlino · Post by **JVMerlino** » Fri Apr 22, 2022 7:38 pm

tcusr wrote: ↑Fri Apr 22, 2022 7:20 pm if you're using assert() from assert.h you can disable by adding -DNDEBUG in the compile options.
if you use your own assert do something like this
Code: Select all
#ifdef NDEBUG
#define MY_ASSERT(condition) ((void)0)
#else
#define MY_ASSERT(condition) if(condition) /* blah blah */ 
#endif

Ah, thanks for that. I just put:

Code: Select all

#define assert(x)

...right after including assert.h in my main header file.

abulmo2 · Post by **abulmo2** » Sat Apr 23, 2022 9:15 am

In Dumb I.9, I gather the time spent by module:

Code: Select all

board	68,44 %
move	14,05 %
search	8,74 %
eval	6,39 %
util	0,85 %
D lib	1,42 %
C lib & kernel call	0,11 %

The board module does all the computation related to move generation, check/pin detection, zobrist key computation, board make/unmake, SEE, etc.
The move module does the move list manipulation, move scoring and sorting.
The search module contains the search function and the transposition table storing & probing
The eval module contains the (simple) evaluation function.

The time were evaluated using the integrated benchmark in Dumb at depth 20 and the Linux perf tool:

Code: Select all

perf record dumb -b 20
perf report | ddemangle >  perf-b20.txt

Dumb is a simple engine (1200 line of code in D language, I hope quite readable) but about 100 Elo stronger than Myrddin in CCRL 40/15.

So in Dumb, most of the time is spent in the board module. It does not mean that move generation is slow in Dumb, but that the rest is very simple and fast to compute. The evaluation is just material and a square positional table, updated at each move. Calling the evaluation function is so fast there is no need to have a transposition table in the qsearch function, making the time spent in the search module less important.

Dumb code & releases are available in github

Profiling a mediocre engine

Profiling a mediocre engine

Re: Profiling a mediocre engine

Re: Profiling a mediocre engine

Re: Profiling a mediocre engine

Re: Profiling a mediocre engine

Re: Profiling a mediocre engine

Re: Profiling a mediocre engine

Re: Profiling a mediocre engine

Re: Profiling a mediocre engine

Re: Profiling a mediocre engine