help - gprof output

Discussion of chess software programming and technical issues.

Moderator: Ras

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: help - gprof output

Post by bob »

Chan Rasjid wrote:
Crafty:
White(1): go
time limit 1.50 (+0.00) (9.00)
depth time score variation (1)
15-> 1.47 0.35 1. Nf3 Nf6 2. Nc3 Nc6 3. e4 e5 4. d4
exd4 5. Nxd4 Bc5 6. Nxc6 bxc6 7. e5
Qe7 8. Bf4
time=1.60 mat=0 n=1921951 fh=91% nps=1.2M
extensions=5K qchecks=21K reduced=269K pruned=279K
predicted=0 evals=989K 50move=0 EGTBprobes=0 hits=0
SMP-> splits=0 aborts=0 data=0/64 elap=1.60
White(1): Nf3
time used: 1.60
Black(1): time 5839
time remaining: 58.39 (Crafty).
Black(1): otim 5772
time remaining: 57.72 (opponent).
Black(1): Nf6
time used: 2.24
time limit 1.45 (+0.00) (8.70)
depth time score variation (1)
15-> 1.09 0.18 2. Nc3 Nc6 3. e4 e5 4. Bb5 Bc5 5. O-O
O-O 6. Nxe5 Nxe5 7. d4 Bd6 8. dxe5
Bxe5 9. Bg5
time=1.45 mat=0 n=1879017 fh=90% nps=1.3M
extensions=5K qchecks=19K reduced=249K pruned=341K
predicted=0 evals=901K 50move=2 EGTBprobes=0 hits=0
SMP-> splits=0 aborts=0 data=0/64 elap=1.45
White(2): Nc3
time used: 1.45
Black(2): time 5692
time remaining: 56.92 (Crafty).
Black(2): otim 5731
time remaining: 57.31 (opponent).
Black(2): d5
time used: 0.38
time limit 1.42 (+0.00) (8.52)
depth time score variation (1)
13-> 1.10 0.24 3. e3 e6 4. Bd3 Bc5 5. O-O O-O 6. Ng5
Na6 7. a3 g6 8. Qe2 d4 9. Nce4 Nxe4
10. Nxe4
time=1.45 mat=0 n=2010725 fh=91% nps=1.4M
extensions=7K qchecks=30K reduced=281K pruned=415K
predicted=0 evals=936K 50move=0 EGTBprobes=0 hits=0
SMP-> splits=0 aborts=0 data=0/64 elap=1.45
White(3): e3
time used: 1.45
Black(3): time 5548
time remaining: 55.48 (Crafty).
Black(3): otim 5591
time remaining: 55.91 (opponent).
Black(3): Nc6
time used: 1.36
time limit 1.38 (+0.00) (8.28)
depth time score variation (1)
14-> 1.81 0.11 4. d4 e6 5. Bd3 Bd6 6. O-O O-O 7. a3
Bd7 8. Bd2 a6 9. Qe2 Ng4 10. h3 Nf6
time=1.81 mat=0 n=2719874 fh=93% nps=1.5M
extensions=8K qchecks=50K reduced=362K pruned=598K
predicted=0 evals=1.2M 50move=1 EGTBprobes=0 hits=0
SMP-> splits=0 aborts=0 data=0/64 elap=1.81
White(4): d4
time used: 1.81
Black(4): time 5365
time remaining: 53.65 (Crafty).
Black(4): otim 5548
time remaining: 55.48 (opponent).
Black(4): g6
time used: 0.40
time limit 1.34 (+0.00) (8.04)
depth time score variation (1)
14-> 1.32 0.46 5. Ne5 Bg7 6. Nxc6 bxc6 7. Bd3 O-O
8. O-O Rb8 9. Rb1 Bg4 10. f3 Be6 11.
Bd2 Qd6 <HT>
time=1.37 mat=0 n=2136327 fh=94% nps=1.6M
extensions=6K qchecks=31K reduced=306K pruned=598K
predicted=0 evals=833K 50move=0 EGTBprobes=0 hits=0
SMP-> splits=0 aborts=0 data=0/64 elap=1.37
White(5): Ne5
time used: 1.37
Black(5): time 5228
time remaining: 52.28 (Crafty).
Black(5): otim 5369
time remaining: 53.69 (opponent).
Black(5): Bg7
time used: 1.76
time limit 1.30 (+0.00) (7.80)
depth time score variation (1)
15 1.03 0.49 6. Nxc6 bxc6 7. Be2 O-O 8. h4 Rb8 9.
h5 gxh5 10. Bxh5 Nxh5 11. Qxh5 h6 12.
Rh4 Bf6 13. Rf4
time=1.36 mat=0 n=2313718 fh=95% nps=1.7M
extensions=7K qchecks=36K reduced=321K pruned=667K
predicted=0 evals=840K 50move=2 EGTBprobes=0 hits=0
SMP-> splits=0 aborts=0 data=0/64 elap=1.36
White(6): Nxc6
time used: 1.36
Black(6): time 5091
time remaining: 50.91 (Crafty).
Black(6): otim 5134
time remaining: 51.34 (opponent).
Black(6): bxc6
time used: 2.32
time limit 1.30 (+0.00) (7.80)
depth time score variation (1)
14 0.53 0.35 7. Be2 O-O 8. g4 Rb8 9. O-O Ne4 10.
Nxe4 dxe4 11. f4 exf3 12. Bxf3 e5 13.
Bxc6 exd4 14. exd4 Bxd4+
14-> 0.67 0.35 7. Be2 O-O 8. g4 Rb8 9. O-O Ne4 10.
Nxe4 dxe4 11. f4 exf3 12. Bxf3 e5 13.
Bxc6 exd4 14. exd4 Bxd4+
time=1.34 mat=0 n=1997341 fh=94% nps=1.5M
extensions=2K qchecks=8K reduced=255K pruned=429K
predicted=0 evals=931K 50move=0 EGTBprobes=0 hits=0
SMP-> splits=0 aborts=0 data=0/64 elap=1.34
White(7): Be2
time used: 1.34
Black(7): time 4957
time remaining: 49.57 (Crafty).
Black(7): otim 5049
time remaining: 50.49 (opponent).
Black(7): O-O
time used: 0.83
time limit 1.27 (+0.00) (7.62)
depth time score variation (1)
14 1.88 0.39 8. O-O Re8 9. f4 Qd6 10. g4 Be6 11.
g5 Nd7 12. Bg4 Bxg4 13. Qxg4 h5 14.
gxh6 Bxh6
time=1.88 mat=0 n=2867780 fh=94% nps=1.5M
extensions=2K qchecks=7K reduced=405K pruned=698K
predicted=0 evals=1.2M 50move=2 EGTBprobes=0 hits=0
SMP-> splits=0 aborts=0 data=0/64 elap=1.88
White(8): O-O
time used: 1.88
Black(8): time 4769
time remaining: 47.69 (Crafty).
Black(8): otim 5006
time remaining: 50.06 (opponent).
Black(8): Re8
time used: 0.40
time limit 1.22 (+0.00) (7.32)
depth time score variation (1)
14-> 1.13 0.35 9. f4 Rb8 10. g4 Nd7 11. b3 e6 12.
Ba3 c5 13. Qd2 cxd4 14. exd4 Qh4 15.
g5 Bb7
time=1.33 mat=0 n=2334299 fh=93% nps=1.8M
extensions=2K qchecks=7K reduced=310K pruned=642K
predicted=0 evals=970K 50move=4 EGTBprobes=0 hits=0
SMP-> splits=0 aborts=0 data=0/64 elap=1.33
White(9): f4
time used: 1.33
Black(9): time 4636
time remaining: 46.36 (Crafty).
Black(9): otim 4456
time remaining: 44.56 (opponent).
Black(9): a5
time used: 5.47
time limit 1.18 (+0.00) (7.08)
depth time score variation (1)
12-> 1.01 0.49 10. b3 Qd6 11. Bb2 Be6 12. Qd2 Bf5
13. a4 Rab8 14. Ba3 Qe6 15. Bc5 Ng4
time=1.20 mat=0 n=1962523 fh=93% nps=1.6M
extensions=2K qchecks=4K reduced=213K pruned=402K
predicted=0 evals=947K 50move=0 EGTBprobes=0 hits=0
SMP-> splits=0 aborts=0 data=0/64 elap=1.20
White(10): b3
time used: 1.20
Black(10): time 4516
time remaining: 45.16 (Crafty).
Black(10): otim 4421
time remaining: 44.21 (opponent).
Black(10): c5
time used: 0.32
time limit 1.15 (+0.00) (6.90)
depth time score variation (1)
9 0.12 0.17 11. Bb5 Bd7 12. Bxd7 Qxd7 13. Rb1 Ng4
14. Nxd5 Qxd5 15. Qxg4 cxd4
9 0.16 0.35 11. Rb1 cxd4 12. exd4 Ne4 13. Nxe4
dxe4 14. Be3 Be6 15. Qd2
9-> 0.17 0.35 11. Rb1 cxd4 12. exd4 Ne4 13. Nxe4
dxe4 14. Be3 Be6 15. Qd2
10 0.18 0.31 11. Rb1 cxd4 12. exd4 Ne4 13. Nxe4
dxe4 14. Be3 Be6 15. Bb5 Rf8
10-> 0.22 0.31 11. Rb1 cxd4 12. exd4 Ne4 13. Nxe4
dxe4 14. Be3 Be6 15. Bb5 Rf8
11 0.27 0.15 11. Rb1 cxd4 12. exd4 Bf5 13. Re1 Qd6
14. Be3 e6 15. Nb5 Qc6 16. Bd3
11 0.42 0.34 11. Re1 Bf5 12. Bb2 Qd6 13. Nb5 Qb6
14. dxc5 Qxc5 15. Bd4 Qc6 16. Bd3
11-> 0.49 0.34 11. Re1 Bf5 12. Bb2 Qd6 13. Nb5 Qb6
14. dxc5 Qxc5 15. Bd4 Qc6 16. Bd3
12 0.63 0.30 11. Re1 cxd4 12. exd4 Ne4 13. Nxe4
dxe4 14. Be3 Ba6 15. Qd2 e6 16. c4
Rb8
12-> 0.89 0.30 11. Re1 cxd4 12. exd4 Ne4 13. Nxe4
dxe4 14. Be3 Ba6 15. Qd2 e6 16. c4
Rb8
time=1.22 mat=0 n=2386496 fh=92% nps=2.0M
extensions=4K qchecks=7K reduced=278K pruned=633K
predicted=0 evals=963K 50move=0 EGTBprobes=0 hits=0
SMP-> splits=0 aborts=0 data=0/64 elap=1.22
White(11): Re1
time used: 1.22
Black(11): time 4393
time remaining: 43.93 (Crafty).
The above shows the nps for Crafty increasing from move 1 to move 10.


Robolitto move 17, nps 1054000 :

1280475435.292 Engine->Adapter: info depth 15 seldepth 35 score cp 9 time 5495 nodes 5796409 nps 1054000 pv a7a5 c3a4 f6e4 e2d3 c8f5 d3e4 f5e4 a4c5 d8d6 c5e4 d5e4 b2b3 a5a4 c1b2 a4a3 b2c3 c6c5 d1d2 c5d4 c3d4 g7d4 d2d4 d6d4 e3d4
1280475435.292 Engine->Adapter: bestmove a7a5 ponder c3a4

Crafty move 17 nps=2.1M:
Black(17): Qd6
time used: 0.86
time limit 0.98 (+0.00) (5.88)
depth time score variation (1)
15 0.87 -0.78 18. Bc4 c5 19. g4 Bc8 20. Qf3 Bxd4+
21. Bxd4 Qxd4+ 22. Kh1 Bxg4 23. Qxf7+
Kh8 24. Rac1 Rf8 25. c3 Rxf7 26. cxd4
Bf3+ 27. Kg1
time=1.07 mat=0 n=2277004 fh=93% nps=2.1M
extensions=23K qchecks=55K reduced=288K pruned=660K
predicted=0 evals=749K 50move=1 EGTBprobes=0 hits=0
SMP-> splits=0 aborts=0 data=0/64 elap=1.07

It seems unlikely for top programs to have an nps difference by factor of 2:1.

Rasjid
Crafty has always been slower prior to castling, because the development code is executed everywhere. Once castling happens in the search, the development stuff does not get executed below that point. So as pieces are developed, that starts to happen more frequently. As far as the difference in NPS between robo* and Crafty, not a clue and not interested. :)
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: help - gprof output

Post by bob »

Chan Rasjid wrote:
Volker Annuss wrote:
Chan Rasjid wrote:It seems unlikely for top programs to have an nps difference by factor of 2:1.
Why?
I don't mean it too seriously; but a factor of x 2 seems too good. Top programs are generally well designed and they should be near in speed. What can bring down the nps by 50% is evaluation. So the SECRET of Robbo, SF should then be evaluation as both are about as "slow" which means they have very much more in their evaluation than Crafty.
Some of this programs discard the moves before evaluate! or even before make_move!! (skiping make and unmake) but counting it
Normally these skipped moves, hash hits are not counted.

Rasjid
There are lots of things to comment on, but for just two, a factor of 2x is completely unremarkable. Compare hiarcs to fritz. Of course one can compare rybka to anything but Rybka's numbers are bogus and way low.

Second, crafty certainly does pruning in the last 4 plies, where it does do make and unmake (otherwise an evaluation would be worthless) but no recursive search calls are done
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: help - gprof output

Post by bob »

Chan Rasjid wrote:
Gian-Carlo Pascutto wrote:
Normally these skipped moves, hash hits are not counted.
The amount of work per node still affects NPS greatly. More pruning will lower the NPS: you do more work and you generate less successor positions.
Nps is inversely proportional to the "average work done within each node visited".
1) 70-80% of nodes are QS nodes and the heaviest workload is eval(); this seems to be the single most significant item that determines nps.
2) pruning nodes don't seem to significantly affect nps; the work done executing the pruning codes would be incurred whether a move be pruned or not pruned. So there should be little change in the average work done with the node.
3) (maybe) In general, all "aggressive" or other search tricks like lmr, SE, futility don't affect nps much; only it reaches depth much higher.

Rasjid
That is my feeling. Our NPS doesn't change much over time, unless we do something new like the mobility look-up rather than calculation loop, which speeds up eval significantly, and hence NPS. I've not noticed any great change in Crafty's NPS, at least for as long as we have been doing the cluster testing and using the 8-core cluster node to play in tournaments. We've been averaging 20M or so for a long time, with a jump a version or two back with the mobility change...
Chan Rasjid
Posts: 588
Joined: Thu Mar 09, 2006 4:47 pm
Location: Singapore

Re: help - gprof output

Post by Chan Rasjid »

ok, my nps is not too low and bad after all;

I said it was 350,000 as compared to that of Robbo's and SF's 1M nps; that was because my current program does eval() at all internal nodes "blindly"; it was just a simple plugging in of eval() just for testing; taking it away brought my nps to back to about 600,000 - 700,000 nps.

I think I have too much of useless stuff in my current eval(). Also my program has no serious optimization yet; there should be ways to speed things up. I will take Crafty's as benchmark.

Rasjid
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: help - gprof output

Post by bob »

Chan Rasjid wrote:ok, my nps is not too low and bad after all;

I said it was 350,000 as compared to that of Robbo's and SF's 1M nps; that was because my current program does eval() at all internal nodes "blindly"; it was just a simple plugging in of eval() just for testing; taking it away brought my nps to back to about 600,000 - 700,000 nps.

I think I have too much of useless stuff in my current eval(). Also my program has no serious optimization yet; there should be ways to speed things up. I will take Crafty's as benchmark.

Rasjid
I have a version of Crafty that does an eval at all interior nodes. I was playing around with this in 23.2/23.3 when I was redoing the LMR and futility stuff. Interestingly, doing the interior node eval didn't seem to affect NPS by very much at all. Mainly because that is only 10% of the nodes I suppose. But I did not find it helped enough to keep it at the time, but saved it for later testing. You can make better LMR decisions if you know the eval before/after any particular move. But the gain there was offset by the speed loss so it was a break-even. And for break-even, I always choose "simpler"...
Karlo Bala
Posts: 373
Joined: Wed Mar 22, 2006 10:17 am
Location: Novi Sad, Serbia
Full name: Karlo Balla

Re: help - gprof output

Post by Karlo Bala »

Chan Rasjid wrote:ok, my nps is not too low and bad after all;

I said it was 350,000 as compared to that of Robbo's and SF's 1M nps; that was because my current program does eval() at all internal nodes "blindly"; it was just a simple plugging in of eval() just for testing; taking it away brought my nps to back to about 600,000 - 700,000 nps.

Rasjid
I doubt that the main reason for slowing the engine is an internal node evaluation. As previously mentioned, 70-80% of nodes are in the qsearch, so that the elimination of evals from all the internal nodes can not bring you more than 20-30% acceleration. I think that the evaluation of all internal nodes can not slow down the engine more than 10-15%.
Best Regards,
Karlo Balla Jr.
Karlo Bala
Posts: 373
Joined: Wed Mar 22, 2006 10:17 am
Location: Novi Sad, Serbia
Full name: Karlo Balla

Re: help - gprof output

Post by Karlo Bala »

bob wrote: ...
Mainly because that is only 10% of the nodes I suppose.
...
Internal evaluation of the node also save inchek test, and generating attacks within eval may be used to generate moves.
Best Regards,
Karlo Balla Jr.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: help - gprof output

Post by bob »

Karlo Bala wrote:
bob wrote: ...
Mainly because that is only 10% of the nodes I suppose.
...
Internal evaluation of the node also save inchek test, and generating attacks within eval may be used to generate moves.
How do you save the incheck test? I don't deal with check in the evaluation, that's a dynamic (rather than static) property best left to search. Generating attacks is not a high-cost thing if you are using bitboards.
Karlo Bala
Posts: 373
Joined: Wed Mar 22, 2006 10:17 am
Location: Novi Sad, Serbia
Full name: Karlo Balla

Re: help - gprof output

Post by Karlo Bala »

bob wrote:
Karlo Bala wrote:
bob wrote: ...
Mainly because that is only 10% of the nodes I suppose.
...
Internal evaluation of the node also save inchek test, and generating attacks within eval may be used to generate moves.
How do you save the incheck test? I don't deal with check in the evaluation, that's a dynamic (rather than static) property best left to search. Generating attacks is not a high-cost thing if you are using bitboards.
If you generate the mobility on a standard way (create an attacks mask, and then counting bits), then you can save the attacks masks for both sides, and use them to detect illegal moves, or moves that gives check. Not much but could be useful.
Best Regards,
Karlo Balla Jr.
Chan Rasjid
Posts: 588
Joined: Thu Mar 09, 2006 4:47 pm
Location: Singapore

Re: help - gprof output

Post by Chan Rasjid »

Karlo Bala wrote:
Chan Rasjid wrote:ok, my nps is not too low and bad after all;

I said it was 350,000 as compared to that of Robbo's and SF's 1M nps; that was because my current program does eval() at all internal nodes "blindly"; it was just a simple plugging in of eval() just for testing; taking it away brought my nps to back to about 600,000 - 700,000 nps.

Rasjid
I doubt that the main reason for slowing the engine is an internal node evaluation. As previously mentioned, 70-80% of nodes are in the qsearch, so that the elimination of evals from all the internal nodes can not bring you more than 20-30% acceleration. I think that the evaluation of all internal nodes can not slow down the engine more than 10-15%.
analysis from startpos:
//with internal eval()
1280527256.876 Engine->Adapter: info depth 9 score cp +28 time 45 nodes 12741088 nps 282282 pv 1.e4 Nf6 2.f3 c5 3.Nc3 Nc6 4.b3 e6

//w/o internal eval()
1280527476.814 Engine->Adapter: info depth 9 score cp +28 time 23 nodes 12741088 nps 547626 pv 1.e4 Nf6 2.f3 c5 3.Nc3 Nc6 4.b3 e6
1) internal eval() called at all ply.
2) I have futility evaluation at start of eval(int beta){ }; beta is passed infi so no lazy evaluation for internal
eval(). Normal futility hit rates is only about 5-15%; so should not be much different from QS eval().

If the nps drop should be about 10-15% with internal nodes evaluation, then I am not sure why I have about 50%. This 50% is confirmed in playing games with other engines.

Rasjid