So here are my tests.
in this position
[d]r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq - 0 1
at depth 3 we have 97862 nodes visited
My first engine's(Chenglite) flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
45.45 0.05 0.05 100362 0.00 0.00 MakeMove
45.45 0.10 0.05 48 1.04 2.29 Perft
9.09 0.11 0.01 107164 0.00 0.00 IsSquareAttacked
0.00 0.11 0.00 100362 0.00 0.00 AddMove
0.00 0.11 0.00 2088 0.00 0.00 GenerateMoves
0.00 0.11 0.00 2 0.00 0.00 GetTimeMs
0.00 0.11 0.00 1 0.00 0.00 InitSearch
0.00 0.11 0.00 1 0.00 0.00 ParseFen
0.00 0.11 0.00 1 0.00 110.00 PerftTest
% the percentage of the total running time of the
time program used by this function.
which is pretty satisfying
Chenglite's design:
0x88 board representation
VICE - like move generator, but copy-make make move approach instead of make/unmake and NO PIECE LIST
my second engine's(Tighty) flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
77.85 3.55 3.55 4280842 0.00 0.00 B // eval position to score moves for move ordering
13.38 4.16 0.61 100507 0.00 0.00 G // generate moves
5.37 4.41 0.24 1 0.24 4.55 F
1.86 4.49 0.09 8530175 0.00 0.00 M //make move is called twice - first time in movegen, second - in perft
1.32 4.55 0.06 8530175 0.00 0.00 U //unmake move
0.11 4.55 0.01 X
0.11 4.56 0.01 frame_dummy
0.00 4.56 0.00 1 0.00 0.00 A // parse fen
% the percentage of the total running time of the
time program used by this function.
almost 5 times slower...
Tighty's design is similar to micro-Max
it uses 3 nested loops in move gen, just like microMax but has separated make/unmake move routines and move list
Tighty is a king capture engine just like micro-Max so it doesn't have is_square_attacked() routine.
The first thing confused me the most at this point is number of movegen calls - 2088 in Chenglite vs 100507 in Tighty, WTF?????
please ignore eval() calls in Tighty's movegen, it takes the most of time, but doesn't matter at the moment, also make/unmake move is called twice - first in movegen for under promotion(just like in micro-Max Tighty is OR'ing piece in target square if this is a promoted pawn AFTER the move is made) and move scoring purposes which is quite inefficient practically but allows to ignore all known move-ordering techniques and sort the move list before the very first search iteration. So I'll emphasize the question again - movegen calls - 2088 in Chenglite vs 100507 in Tighty - how on earth is that possible???
And at the end the engine I'm currently working at(MCP - minimalist chess program) flat profile:
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
68.85 1.26 1.26 673873 0.00 0.00 square_att
13.11 1.50 0.24 99950 0.00 0.00 generate_moves
9.84 1.68 0.18 1 0.18 1.83 perft
5.46 1.78 0.10 100362 0.00 0.00 make_move
2.73 1.83 0.05 4237213 0.00 0.00 add_move
0.00 1.83 0.00 1 0.00 0.00 parse_fen
% the percentage of the total running time of the
time program used by this function.
it's at least 3 times faster than Tighty, but still 2 times slower than Chenglite which I consider to be strange...
MCP design.
0x88 board representation
3 nested loops in movegen, like micro-Max, BUT NO make/unmake move in movegen(underpromotions are now generated separately from making moves) no eval for move scoring. IMPORTANT: MCP compare to Tighty and micro-Max is NOT a KING CAPTURE engine, it has is_square_attacked function written in the style of move gen - same 3 nested loops(over pieces, directions and squares). It has PIECE LIST(the very first piece list in my chess programming life) which is the same as VICE's piece list - piecec[piece][pieceNum[piece]] (pieceNum array is used to iterate over pieces of the same piece type). BUT generate moves is called 99950 times!!!! it's less than 100507 times in tighty (I believe this is because is_square_attacked saves some movegen calls) but still much more than in Chenglite - 2088 calls.
I REALLY WONDER WHY is that happening. I mean could someone please kindly explain to me why the difference in move gen (and is_square_attecked) calls is so HUGE? What am I doing wrong and what particularly influences the number of function calls?
