Profiling a mediocre engine

Discussion of chess software programming and technical issues.

Moderator: Ras

JVMerlino
Posts: 1397
Joined: Wed Mar 08, 2006 10:15 pm
Location: San Francisco, California

Profiling a mediocre engine

Post by JVMerlino »

I did some profiling (using Very Sleepy) on the latest version of Myrddin and came up with some results that I hoped I could get some feedback/comparison from the community. This data came from a 3-minute search from the initial position.

59.5% of the time is spent in qsearch
37.0% of the time is spent in move generation
37.7% of the time is spent in evaluation
19.7% of the time is spent performing a BitScan
8.1% of the time is spent probing various hashes (transposition, eval, pawn)
6.0% of the time is spent determining the attackers for a given square
5.1% of the time is spent in make/unmake
3.8% of the time is spent getting the next move in the ordered move list
3.6% of the time is spent determining if a King is in check
Move generation spends more than half of its time generating pawn moves

Obviously, there's a lot of overlap here, so the percentages do make sense. :-)
But does any of this raise big red flags with you? Thanks for your time....
jm
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Profiling a mediocre engine

Post by mvanthoor »

Try a position such as KiwiPete. It's much less quiet than the starting position, so your time spent in QSearch should go up. After this run, it seems the engine is spending too much time in move generation. Should be way less than about 1/3rd of the time.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
User avatar
Bo Persson
Posts: 257
Joined: Sat Mar 11, 2006 8:31 am
Location: Malmö, Sweden
Full name: Bo Persson

Re: Profiling a mediocre engine

Post by Bo Persson »

It points at where an effort might make a differnce.

Obviously you shouldn't start by rewriting make/unmake or move ordering in assembly. But if you can improve move ordering to get a better order, but not by making it 1% faster, that might reduce the qsearch.

Otherwise, move generation and BitScan looks like places that can use some trimming.

In the old Chess 0.5 from Byte magazine, the portable bitscan routine took over 40% of the runtime. Rewriting it with non-portable bit-ops made it come down to essentially zero. That makes me see 19.7% here as a tad much.
Tearth
Posts: 70
Joined: Thu Feb 25, 2021 5:12 pm
Location: Poland
Full name: Pawel Osikowski

Re: Profiling a mediocre engine

Post by Tearth »

Statistics for Inanis (I've preserved the order), using my set of benchmark positions:

36% of the time is spent in qsearch
18% of the time is spent in move generation
14% of the time is spent on evaluation
0% of the time is spent performing a BitScan
17% of the time is spent probing various hashes (transposition, eval, pawn)
12% of the time is spent determining the attackers for a given square
11% of the time is spent in make/unmake
0% of the time is spent getting the next move in the ordered move list
4% of the time is spent determining if a King is in check

A few notes:
- the majority of time spent in evaluation is on calculating mobility (which also covers king safety, I do both at the same time)
- bit scan is based on tzcnt instruction, so it's pretty much instant
- "% of the time is spent getting the next move in the ordered move list" is included in "% of the time is spent in move generation", because I have a staged move generator and everything is tightly coupled
- I didn't notice any time spent on probing the pawn hash table, I suppose it sits in a cache so the effect is negligible
- make_move is two times slower than undo_move
User avatar
j.t.
Posts: 263
Joined: Wed Jun 16, 2021 2:08 am
Location: Berlin
Full name: Jost Triller

Re: Profiling a mediocre engine

Post by j.t. »

What's a good program to accurately profile? I use Valgrind with Callgrind and KCachegrind, but often it's not clear where exactly most of the time is spent, especially when compiling release versions.

Anyway, the limited information I got is:

- 53% is used by the evaluation
- 18% by the search
- 8% SEE
- 5% doMove
- 5% inCheck
- 2% generate captures
- 1% generate pawn captures
- 1% get hash entry for search
- 1% quiesce
- 1% generate quiet moves
tcusr
Posts: 325
Joined: Tue Aug 31, 2021 10:32 pm
Full name: tcusr

Re: Profiling a mediocre engine

Post by tcusr »

j.t. wrote: Fri Apr 22, 2022 5:31 pm What's a good program to accurately profile? I use Valgrind with Callgrind and KCachegrind, but often it's not clear where exactly most of the time is spent, especially when compiling release versions.

Anyway, the limited information I got is:

- 53% is used by the evaluation
- 18% by the search
- 8% SEE
- 5% doMove
- 5% inCheck
- 2% generate captures
- 1% generate pawn captures
- 1% get hash entry for search
- 1% quiesce
- 1% generate quiet moves
add -g and -fno-omit-frame-pointer to keep track of inlined functions in release mode.
JVMerlino
Posts: 1397
Joined: Wed Mar 08, 2006 10:15 pm
Location: San Francisco, California

Re: Profiling a mediocre engine

Post by JVMerlino »

First of all, thanks to Tearth and Bo Persson, I switched from BitScanForward64() to _tzcnt_u64() and got an OVERALL 6% increase in engine speed. Thank you for the handful of elo! :D :wink:

Unfortunately, I have discovered that asserts in the code can wildly affect the stats produced by Very Sleepy. After making the above change, BitScan was still showing 3.3% overall usage. However, more than 3/4 of that was spent on the assert. :oops: So, after making the above BitScan change and then converting all asserts to noops, here are the results for KiwiPete:
67.0% of the time is spent in qsearch
37.0% of the time is spent in move generation
31.3% of the time is spent in evaluation
0.6% of the time is spent performing a BitScan - THANKS AGAIN!
9.5% of the time is spent probing various hashes (transposition, eval, pawn)
7.8% of the time is spent determining the attackers for a given square
6.3% of the time is spent in make/unmake
5.5% of the time is spent getting the next move in the ordered move list - big increase here...hmmm....almost certainly bad move ordering :(
2.5% of the time is spent determining if a King is in check
And 41.4% of move generation time is spent on pawn moves
Last edited by JVMerlino on Fri Apr 22, 2022 7:36 pm, edited 1 time in total.
tcusr
Posts: 325
Joined: Tue Aug 31, 2021 10:32 pm
Full name: tcusr

Re: Profiling a mediocre engine

Post by tcusr »

JVMerlino wrote: Fri Apr 22, 2022 7:13 pm First of all, thanks to Tearth and Bo Persson, I switched from BitScanForward64() to _tzcnt_u64() and got an OVERALL 6% increase in engine speed. Thank you! :D

Unfortunately, I have discovered that asserts in the code can wildly affect the stats produced by Very Sleepy. After making the above change, BitScan was still showing 3.3% overall usage. However, more than 3/4 of that was spent on the assert. :oops: So, after making the above BitScan change and then converting all asserts to noops, here are the results for KiwiPete:
58.6% of the time is spent in qsearch
35.6% of the time is spent in move generation
30.7% of the time is spent in evaluation
0.6% of the time is spent performing a BitScan - THANKS AGAIN!
10.7% of the time is spent probing various hashes (transposition, eval, pawn)
7.6% of the time is spent determining the attackers for a given square
5.8% of the time is spent in make/unmake - 5.9% relative increase
6.0% of the time is spent getting the next move in the ordered move list - big increase here...hmmm....
2.5% of the time is spent determining if a King is in check
And 40.7% of move generation time is spent on pawn moves
if you're using assert() from assert.h you can disable by adding -DNDEBUG in the compile options.
if you use your own assert do something like this

Code: Select all

#ifdef NDEBUG
#define MY_ASSERT(condition) ((void)0)
#else
#define MY_ASSERT(condition) if(condition) /* blah blah */ 
#endif
JVMerlino
Posts: 1397
Joined: Wed Mar 08, 2006 10:15 pm
Location: San Francisco, California

Re: Profiling a mediocre engine

Post by JVMerlino »

tcusr wrote: Fri Apr 22, 2022 7:20 pm if you're using assert() from assert.h you can disable by adding -DNDEBUG in the compile options.
if you use your own assert do something like this

Code: Select all

#ifdef NDEBUG
#define MY_ASSERT(condition) ((void)0)
#else
#define MY_ASSERT(condition) if(condition) /* blah blah */ 
#endif
Ah, thanks for that. I just put:

Code: Select all

#define assert(x)
...right after including assert.h in my main header file.
abulmo2
Posts: 467
Joined: Fri Dec 16, 2016 11:04 am
Location: France
Full name: Richard Delorme

Re: Profiling a mediocre engine

Post by abulmo2 »

In Dumb I.9, I gather the time spent by module:

Code: Select all

board	68,44 %
move	14,05 %
search	8,74 %
eval	6,39 %
util	0,85 %
D lib	1,42 %
C lib & kernel call	0,11 %
The board module does all the computation related to move generation, check/pin detection, zobrist key computation, board make/unmake, SEE, etc.
The move module does the move list manipulation, move scoring and sorting.
The search module contains the search function and the transposition table storing & probing
The eval module contains the (simple) evaluation function.

The time were evaluated using the integrated benchmark in Dumb at depth 20 and the Linux perf tool:

Code: Select all

perf record dumb -b 20
perf report | ddemangle >  perf-b20.txt
Dumb is a simple engine (1200 line of code in D language, I hope quite readable) but about 100 Elo stronger than Myrddin in CCRL 40/15.

So in Dumb, most of the time is spent in the board module. It does not mean that move generation is slow in Dumb, but that the rest is very simple and fast to compute. The evaluation is just material and a square positional table, updated at each move. Calling the evaluation function is so fast there is no need to have a transposition table in the qsearch function, making the time spent in the search module less important.

Dumb code & releases are available in github
Richard Delorme