help - gprof output

Discussion of chess software programming and technical issues.

Moderator: Ras

Chan Rasjid
Posts: 588
Joined: Thu Mar 09, 2006 4:47 pm
Location: Singapore

help - gprof output

Post by Chan Rasjid »

Hello,

I am trying to trim things down as my cowrie nps is far below those of others; on my computer (Debian linux athlon64 1 core) in the early positions from startpos:-
1) Crafty 23.3 - 1.3/ 2.0 M nps; I suspect the nps count has error.
2) Robbolit - 1M nps
3) Stockfish - 800,000/900,000 nps.
4) Spark - 600,000/700,000 nps.
5) my Cowrie - 300,000 - 350,000 nps.
6) Gaviota - 250,000 - 300,000 nps.

The is from gcc gprof:
Flat profile:

Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
38.16 7.90 7.90 18218398 0.00 0.00 eval
9.62 9.89 1.99 11308265 0.00 0.00 gen_qs
8.29 11.60 1.72 bitcnt
7.25 13.10 1.50 1833258 0.00 0.00 gen_quiet_moves
6.24 14.39 1.29 13428251 0.00 0.00 probehash
4.71 15.37 0.98 2037 0.00 0.01 search_pvs
4.59 16.32 0.95 8654368 0.00 0.00 search_quiesce
3.92 17.13 0.81 bitScanForward
2.85 17.72 0.59 13824695 0.00 0.00 swapBestMove
2.56 18.25 0.53 unmake
2.27 18.72 0.47 10573508 0.00 0.00 updatehash
1.96 19.12 0.41 makemove
1.74 19.48 0.36 init_material_table
1.40 19.77 0.29 723374 0.00 0.00 gen_evade_check
0.97 19.97 0.20 ev_pawn_shelter_w
0.80 20.14 0.17 is_incheck
0.65 20.27 0.14 xyRookAttack
0.58 20.39 0.12 2919456 0.00 0.00 swapMoveFront
0.51 20.50 0.11 2 0.05 0.05 allocatehash
0.29 20.56 0.06 2 0.03 0.03 initTT
0.15 20.59 0.03 455647 0.00 0.00 set_move_score
0.12 20.61 0.03 is_sq_attacked_brq
0.10 20.63 0.02 1 0.02 8.17 uci
0.07 20.65 0.02 257295 0.00 0.00 eval_bishop_ah7_trap
Can someone confirm the following:
1) % time of a function exclude the time spent within child calls that are profiled and listed. eg, search_pvs() and search_quiesce() exclude time spent in probehash();
2) the 6.24% of probehash() seems costly; it only probes TT 4 times and extract entry data only; it is more costly than the full search search_pvs() ! Is there anything suspicious?

Thanks,
Rasjid
Gian-Carlo Pascutto
Posts: 1260
Joined: Sat Dec 13, 2008 7:00 pm

Re: help - gprof output

Post by Gian-Carlo Pascutto »

Chan Rasjid wrote: 1) % time of a function exclude the time spent within child calls that are profiled and listed. eg, search_pvs() and search_quiesce() exclude time spent in probehash();
Yup. If you want more details about that kind of data, there are far better profilers than gprof. (perf inside modern kernels, callgrind, ...)
2) the 6.24% of probehash() seems costly; it only probes TT 4 times and extract entry data only; it is more costly than the full search search_pvs() ! Is there anything suspicious?
Sounds perfectly normal. Random memory access is extremely slow in a modern CPU, think hundreds of cycles.
Chan Rasjid
Posts: 588
Joined: Thu Mar 09, 2006 4:47 pm
Location: Singapore

Re: help - gprof output

Post by Chan Rasjid »

Thanks Gian,

I am not yet at the level to do fine grain profiling.

Ok, I have some idea of what to do about speed.

Rasjid.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: help - gprof output

Post by bob »

Chan Rasjid wrote:Hello,

I am trying to trim things down as my cowrie nps is far below those of others; on my computer (Debian linux athlon64 1 core) in the early positions from startpos:-
1) Crafty 23.3 - 1.3/ 2.0 M nps; I suspect the nps count has error.
What does that mean? In crafty, it is simply total nodes searched divided by total time used. Hard to have any error in that. I also don't quite understand the "1.3 / 2.0" numbers. What are they?

2) Robbolit - 1M nps
3) Stockfish - 800,000/900,000 nps.
4) Spark - 600,000/700,000 nps.
5) my Cowrie - 300,000 - 350,000 nps.
6) Gaviota - 250,000 - 300,000 nps.

The is from gcc gprof:
Flat profile:

Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
38.16 7.90 7.90 18218398 0.00 0.00 eval
9.62 9.89 1.99 11308265 0.00 0.00 gen_qs
8.29 11.60 1.72 bitcnt
7.25 13.10 1.50 1833258 0.00 0.00 gen_quiet_moves
6.24 14.39 1.29 13428251 0.00 0.00 probehash
4.71 15.37 0.98 2037 0.00 0.01 search_pvs
4.59 16.32 0.95 8654368 0.00 0.00 search_quiesce
3.92 17.13 0.81 bitScanForward
2.85 17.72 0.59 13824695 0.00 0.00 swapBestMove
2.56 18.25 0.53 unmake
2.27 18.72 0.47 10573508 0.00 0.00 updatehash
1.96 19.12 0.41 makemove
1.74 19.48 0.36 init_material_table
1.40 19.77 0.29 723374 0.00 0.00 gen_evade_check
0.97 19.97 0.20 ev_pawn_shelter_w
0.80 20.14 0.17 is_incheck
0.65 20.27 0.14 xyRookAttack
0.58 20.39 0.12 2919456 0.00 0.00 swapMoveFront
0.51 20.50 0.11 2 0.05 0.05 allocatehash
0.29 20.56 0.06 2 0.03 0.03 initTT
0.15 20.59 0.03 455647 0.00 0.00 set_move_score
0.12 20.61 0.03 is_sq_attacked_brq
0.10 20.63 0.02 1 0.02 8.17 uci
0.07 20.65 0.02 257295 0.00 0.00 eval_bishop_ah7_trap
Can someone confirm the following:
1) % time of a function exclude the time spent within child calls that are profiled and listed. eg, search_pvs() and search_quiesce() exclude time spent in probehash();
2) the 6.24% of probehash() seems costly; it only probes TT 4 times and extract entry data only; it is more costly than the full search search_pvs() ! Is there anything suspicious?

Thanks,
Rasjid
Chan Rasjid
Posts: 588
Joined: Thu Mar 09, 2006 4:47 pm
Location: Singapore

Re: help - gprof output

Post by Chan Rasjid »

Crafty:
White(1): go
time limit 1.50 (+0.00) (9.00)
depth time score variation (1)
15-> 1.47 0.35 1. Nf3 Nf6 2. Nc3 Nc6 3. e4 e5 4. d4
exd4 5. Nxd4 Bc5 6. Nxc6 bxc6 7. e5
Qe7 8. Bf4
time=1.60 mat=0 n=1921951 fh=91% nps=1.2M
extensions=5K qchecks=21K reduced=269K pruned=279K
predicted=0 evals=989K 50move=0 EGTBprobes=0 hits=0
SMP-> splits=0 aborts=0 data=0/64 elap=1.60
White(1): Nf3
time used: 1.60
Black(1): time 5839
time remaining: 58.39 (Crafty).
Black(1): otim 5772
time remaining: 57.72 (opponent).
Black(1): Nf6
time used: 2.24
time limit 1.45 (+0.00) (8.70)
depth time score variation (1)
15-> 1.09 0.18 2. Nc3 Nc6 3. e4 e5 4. Bb5 Bc5 5. O-O
O-O 6. Nxe5 Nxe5 7. d4 Bd6 8. dxe5
Bxe5 9. Bg5
time=1.45 mat=0 n=1879017 fh=90% nps=1.3M
extensions=5K qchecks=19K reduced=249K pruned=341K
predicted=0 evals=901K 50move=2 EGTBprobes=0 hits=0
SMP-> splits=0 aborts=0 data=0/64 elap=1.45
White(2): Nc3
time used: 1.45
Black(2): time 5692
time remaining: 56.92 (Crafty).
Black(2): otim 5731
time remaining: 57.31 (opponent).
Black(2): d5
time used: 0.38
time limit 1.42 (+0.00) (8.52)
depth time score variation (1)
13-> 1.10 0.24 3. e3 e6 4. Bd3 Bc5 5. O-O O-O 6. Ng5
Na6 7. a3 g6 8. Qe2 d4 9. Nce4 Nxe4
10. Nxe4
time=1.45 mat=0 n=2010725 fh=91% nps=1.4M
extensions=7K qchecks=30K reduced=281K pruned=415K
predicted=0 evals=936K 50move=0 EGTBprobes=0 hits=0
SMP-> splits=0 aborts=0 data=0/64 elap=1.45
White(3): e3
time used: 1.45
Black(3): time 5548
time remaining: 55.48 (Crafty).
Black(3): otim 5591
time remaining: 55.91 (opponent).
Black(3): Nc6
time used: 1.36
time limit 1.38 (+0.00) (8.28)
depth time score variation (1)
14-> 1.81 0.11 4. d4 e6 5. Bd3 Bd6 6. O-O O-O 7. a3
Bd7 8. Bd2 a6 9. Qe2 Ng4 10. h3 Nf6
time=1.81 mat=0 n=2719874 fh=93% nps=1.5M
extensions=8K qchecks=50K reduced=362K pruned=598K
predicted=0 evals=1.2M 50move=1 EGTBprobes=0 hits=0
SMP-> splits=0 aborts=0 data=0/64 elap=1.81
White(4): d4
time used: 1.81
Black(4): time 5365
time remaining: 53.65 (Crafty).
Black(4): otim 5548
time remaining: 55.48 (opponent).
Black(4): g6
time used: 0.40
time limit 1.34 (+0.00) (8.04)
depth time score variation (1)
14-> 1.32 0.46 5. Ne5 Bg7 6. Nxc6 bxc6 7. Bd3 O-O
8. O-O Rb8 9. Rb1 Bg4 10. f3 Be6 11.
Bd2 Qd6 <HT>
time=1.37 mat=0 n=2136327 fh=94% nps=1.6M
extensions=6K qchecks=31K reduced=306K pruned=598K
predicted=0 evals=833K 50move=0 EGTBprobes=0 hits=0
SMP-> splits=0 aborts=0 data=0/64 elap=1.37
White(5): Ne5
time used: 1.37
Black(5): time 5228
time remaining: 52.28 (Crafty).
Black(5): otim 5369
time remaining: 53.69 (opponent).
Black(5): Bg7
time used: 1.76
time limit 1.30 (+0.00) (7.80)
depth time score variation (1)
15 1.03 0.49 6. Nxc6 bxc6 7. Be2 O-O 8. h4 Rb8 9.
h5 gxh5 10. Bxh5 Nxh5 11. Qxh5 h6 12.
Rh4 Bf6 13. Rf4
time=1.36 mat=0 n=2313718 fh=95% nps=1.7M
extensions=7K qchecks=36K reduced=321K pruned=667K
predicted=0 evals=840K 50move=2 EGTBprobes=0 hits=0
SMP-> splits=0 aborts=0 data=0/64 elap=1.36
White(6): Nxc6
time used: 1.36
Black(6): time 5091
time remaining: 50.91 (Crafty).
Black(6): otim 5134
time remaining: 51.34 (opponent).
Black(6): bxc6
time used: 2.32
time limit 1.30 (+0.00) (7.80)
depth time score variation (1)
14 0.53 0.35 7. Be2 O-O 8. g4 Rb8 9. O-O Ne4 10.
Nxe4 dxe4 11. f4 exf3 12. Bxf3 e5 13.
Bxc6 exd4 14. exd4 Bxd4+
14-> 0.67 0.35 7. Be2 O-O 8. g4 Rb8 9. O-O Ne4 10.
Nxe4 dxe4 11. f4 exf3 12. Bxf3 e5 13.
Bxc6 exd4 14. exd4 Bxd4+
time=1.34 mat=0 n=1997341 fh=94% nps=1.5M
extensions=2K qchecks=8K reduced=255K pruned=429K
predicted=0 evals=931K 50move=0 EGTBprobes=0 hits=0
SMP-> splits=0 aborts=0 data=0/64 elap=1.34
White(7): Be2
time used: 1.34
Black(7): time 4957
time remaining: 49.57 (Crafty).
Black(7): otim 5049
time remaining: 50.49 (opponent).
Black(7): O-O
time used: 0.83
time limit 1.27 (+0.00) (7.62)
depth time score variation (1)
14 1.88 0.39 8. O-O Re8 9. f4 Qd6 10. g4 Be6 11.
g5 Nd7 12. Bg4 Bxg4 13. Qxg4 h5 14.
gxh6 Bxh6
time=1.88 mat=0 n=2867780 fh=94% nps=1.5M
extensions=2K qchecks=7K reduced=405K pruned=698K
predicted=0 evals=1.2M 50move=2 EGTBprobes=0 hits=0
SMP-> splits=0 aborts=0 data=0/64 elap=1.88
White(8): O-O
time used: 1.88
Black(8): time 4769
time remaining: 47.69 (Crafty).
Black(8): otim 5006
time remaining: 50.06 (opponent).
Black(8): Re8
time used: 0.40
time limit 1.22 (+0.00) (7.32)
depth time score variation (1)
14-> 1.13 0.35 9. f4 Rb8 10. g4 Nd7 11. b3 e6 12.
Ba3 c5 13. Qd2 cxd4 14. exd4 Qh4 15.
g5 Bb7
time=1.33 mat=0 n=2334299 fh=93% nps=1.8M
extensions=2K qchecks=7K reduced=310K pruned=642K
predicted=0 evals=970K 50move=4 EGTBprobes=0 hits=0
SMP-> splits=0 aborts=0 data=0/64 elap=1.33
White(9): f4
time used: 1.33
Black(9): time 4636
time remaining: 46.36 (Crafty).
Black(9): otim 4456
time remaining: 44.56 (opponent).
Black(9): a5
time used: 5.47
time limit 1.18 (+0.00) (7.08)
depth time score variation (1)
12-> 1.01 0.49 10. b3 Qd6 11. Bb2 Be6 12. Qd2 Bf5
13. a4 Rab8 14. Ba3 Qe6 15. Bc5 Ng4
time=1.20 mat=0 n=1962523 fh=93% nps=1.6M
extensions=2K qchecks=4K reduced=213K pruned=402K
predicted=0 evals=947K 50move=0 EGTBprobes=0 hits=0
SMP-> splits=0 aborts=0 data=0/64 elap=1.20
White(10): b3
time used: 1.20
Black(10): time 4516
time remaining: 45.16 (Crafty).
Black(10): otim 4421
time remaining: 44.21 (opponent).
Black(10): c5
time used: 0.32
time limit 1.15 (+0.00) (6.90)
depth time score variation (1)
9 0.12 0.17 11. Bb5 Bd7 12. Bxd7 Qxd7 13. Rb1 Ng4
14. Nxd5 Qxd5 15. Qxg4 cxd4
9 0.16 0.35 11. Rb1 cxd4 12. exd4 Ne4 13. Nxe4
dxe4 14. Be3 Be6 15. Qd2
9-> 0.17 0.35 11. Rb1 cxd4 12. exd4 Ne4 13. Nxe4
dxe4 14. Be3 Be6 15. Qd2
10 0.18 0.31 11. Rb1 cxd4 12. exd4 Ne4 13. Nxe4
dxe4 14. Be3 Be6 15. Bb5 Rf8
10-> 0.22 0.31 11. Rb1 cxd4 12. exd4 Ne4 13. Nxe4
dxe4 14. Be3 Be6 15. Bb5 Rf8
11 0.27 0.15 11. Rb1 cxd4 12. exd4 Bf5 13. Re1 Qd6
14. Be3 e6 15. Nb5 Qc6 16. Bd3
11 0.42 0.34 11. Re1 Bf5 12. Bb2 Qd6 13. Nb5 Qb6
14. dxc5 Qxc5 15. Bd4 Qc6 16. Bd3
11-> 0.49 0.34 11. Re1 Bf5 12. Bb2 Qd6 13. Nb5 Qb6
14. dxc5 Qxc5 15. Bd4 Qc6 16. Bd3
12 0.63 0.30 11. Re1 cxd4 12. exd4 Ne4 13. Nxe4
dxe4 14. Be3 Ba6 15. Qd2 e6 16. c4
Rb8
12-> 0.89 0.30 11. Re1 cxd4 12. exd4 Ne4 13. Nxe4
dxe4 14. Be3 Ba6 15. Qd2 e6 16. c4
Rb8
time=1.22 mat=0 n=2386496 fh=92% nps=2.0M
extensions=4K qchecks=7K reduced=278K pruned=633K
predicted=0 evals=963K 50move=0 EGTBprobes=0 hits=0
SMP-> splits=0 aborts=0 data=0/64 elap=1.22
White(11): Re1
time used: 1.22
Black(11): time 4393
time remaining: 43.93 (Crafty).
The above shows the nps for Crafty increasing from move 1 to move 10.


Robolitto move 17, nps 1054000 :

1280475435.292 Engine->Adapter: info depth 15 seldepth 35 score cp 9 time 5495 nodes 5796409 nps 1054000 pv a7a5 c3a4 f6e4 e2d3 c8f5 d3e4 f5e4 a4c5 d8d6 c5e4 d5e4 b2b3 a5a4 c1b2 a4a3 b2c3 c6c5 d1d2 c5d4 c3d4 g7d4 d2d4 d6d4 e3d4
1280475435.292 Engine->Adapter: bestmove a7a5 ponder c3a4

Crafty move 17 nps=2.1M:
Black(17): Qd6
time used: 0.86
time limit 0.98 (+0.00) (5.88)
depth time score variation (1)
15 0.87 -0.78 18. Bc4 c5 19. g4 Bc8 20. Qf3 Bxd4+
21. Bxd4 Qxd4+ 22. Kh1 Bxg4 23. Qxf7+
Kh8 24. Rac1 Rf8 25. c3 Rxf7 26. cxd4
Bf3+ 27. Kg1
time=1.07 mat=0 n=2277004 fh=93% nps=2.1M
extensions=23K qchecks=55K reduced=288K pruned=660K
predicted=0 evals=749K 50move=1 EGTBprobes=0 hits=0
SMP-> splits=0 aborts=0 data=0/64 elap=1.07

It seems unlikely for top programs to have an nps difference by factor of 2:1.

Rasjid
User avatar
Volker Annuss
Posts: 180
Joined: Mon Sep 03, 2007 9:15 am

Re: help - gprof output

Post by Volker Annuss »

Chan Rasjid wrote:It seems unlikely for top programs to have an nps difference by factor of 2:1.
Why?
kempelem

Re: help - gprof output

Post by kempelem »

Top programs have big evaluation functions evaluate interior nodes and they have very agresive pruring.

Some of this programs discard the moves before evaluate! or even before make_move!! (skiping make and unmake) but counting it

Others cant do this... :wink:

escuse my english
Chan Rasjid
Posts: 588
Joined: Thu Mar 09, 2006 4:47 pm
Location: Singapore

Re: help - gprof output

Post by Chan Rasjid »

Volker Annuss wrote:
Chan Rasjid wrote:It seems unlikely for top programs to have an nps difference by factor of 2:1.
Why?
I don't mean it too seriously; but a factor of x 2 seems too good. Top programs are generally well designed and they should be near in speed. What can bring down the nps by 50% is evaluation. So the SECRET of Robbo, SF should then be evaluation as both are about as "slow" which means they have very much more in their evaluation than Crafty.
Some of this programs discard the moves before evaluate! or even before make_move!! (skiping make and unmake) but counting it
Normally these skipped moves, hash hits are not counted.

Rasjid
Gian-Carlo Pascutto
Posts: 1260
Joined: Sat Dec 13, 2008 7:00 pm

Re: help - gprof output

Post by Gian-Carlo Pascutto »

I don't mean it too seriously; but a factor of x 2 seems too good. Top programs are generally well designed and they should be near in speed. What can bring down the nps by 50% is evaluation. So the SECRET of Robbo, SF should then be evaluation as both are about as "slow" which means they have very much more in their evaluation than Crafty.
No.
Normally these skipped moves, hash hits are not counted.
The amount of work per node still affects NPS greatly. More pruning will lower the NPS: you do more work and you generate less successor positions.
Chan Rasjid
Posts: 588
Joined: Thu Mar 09, 2006 4:47 pm
Location: Singapore

Re: help - gprof output

Post by Chan Rasjid »

Gian-Carlo Pascutto wrote:
Normally these skipped moves, hash hits are not counted.
The amount of work per node still affects NPS greatly. More pruning will lower the NPS: you do more work and you generate less successor positions.
Nps is inversely proportional to the "average work done within each node visited".
1) 70-80% of nodes are QS nodes and the heaviest workload is eval(); this seems to be the single most significant item that determines nps.
2) pruning nodes don't seem to significantly affect nps; the work done executing the pruning codes would be incurred whether a move be pruned or not pruned. So there should be little change in the average work done with the node.
3) (maybe) In general, all "aggressive" or other search tricks like lmr, SE, futility don't affect nps much; only it reaches depth much higher.

Rasjid