help - gprof output

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Karlo Bala
Posts: 373
Joined: Wed Mar 22, 2006 10:17 am
Location: Novi Sad, Serbia
Full name: Karlo Balla

Re: help - gprof output

Post by Karlo Bala »

Chan Rasjid wrote:
analysis from startpos:

//with internal eval()
1280527256.876 Engine->Adapter: info depth 9 score cp +28 time 45 nodes 12741088 nps 282282 pv 1.e4 Nf6 2.f3 c5 3.Nc3 Nc6 4.b3 e6

//w/o internal eval()
1280527476.814 Engine->Adapter: info depth 9 score cp +28 time 23 nodes 12741088 nps 547626 pv 1.e4 Nf6 2.f3 c5 3.Nc3 Nc6 4.b3 e6


1) internal eval() called at all ply.
2) I have futility evaluation at start of eval(int beta){ }; beta is passed infi so no lazy evaluation for internal
eval(). Normal futility hit rates is only about 5-15%; so should not be much different from QS eval().

If the nps drop should be about 10-15% with internal nodes evaluation, then I am not sure why I have about 50%. This 50% is confirmed in playing games with other engines.

Rasjid
Strange, because professor Hyatt verified my assumption.
Can you post the code snippet in which you call the eval and qsearch?
Best Regards,
Karlo Balla Jr.
Chan Rasjid
Posts: 588
Joined: Thu Mar 09, 2006 4:47 pm
Location: Singapore

Re: help - gprof output

Post by Chan Rasjid »

I think I found why the nps slowdown was 50% when internal nodes are evaluated in my case. I call eval() just after make() within full search and at every node within qssearch() at the top; so horizon nodes are evaluated twice and they constitute about 60% on average!

The mess is basically because I don't know too well how to do futility pruning when material is much above beta within QS or its equivalent in depth 1/2 within full-search. I am now testing a fix and will post result when done; should be your estimated 10-15%.

Rasjid
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: help - gprof output

Post by bob »

Chan Rasjid wrote:
Karlo Bala wrote:
Chan Rasjid wrote:ok, my nps is not too low and bad after all;

I said it was 350,000 as compared to that of Robbo's and SF's 1M nps; that was because my current program does eval() at all internal nodes "blindly"; it was just a simple plugging in of eval() just for testing; taking it away brought my nps to back to about 600,000 - 700,000 nps.

Rasjid
I doubt that the main reason for slowing the engine is an internal node evaluation. As previously mentioned, 70-80% of nodes are in the qsearch, so that the elimination of evals from all the internal nodes can not bring you more than 20-30% acceleration. I think that the evaluation of all internal nodes can not slow down the engine more than 10-15%.
analysis from startpos:
//with internal eval()
1280527256.876 Engine->Adapter: info depth 9 score cp +28 time 45 nodes 12741088 nps 282282 pv 1.e4 Nf6 2.f3 c5 3.Nc3 Nc6 4.b3 e6

//w/o internal eval()
1280527476.814 Engine->Adapter: info depth 9 score cp +28 time 23 nodes 12741088 nps 547626 pv 1.e4 Nf6 2.f3 c5 3.Nc3 Nc6 4.b3 e6
1) internal eval() called at all ply.
2) I have futility evaluation at start of eval(int beta){ }; beta is passed infi so no lazy evaluation for internal
eval(). Normal futility hit rates is only about 5-15%; so should not be much different from QS eval().

If the nps drop should be about 10-15% with internal nodes evaluation, then I am not sure why I have about 50%. This 50% is confirmed in playing games with other engines.

Rasjid
the initial position is not a very good representative of typical trees. It takes several moves before you begin to see any q-search captures which will make the early trees all interior nodes with just one layer of q-search nodes (nothing more than a direct call to Evaluate() followed by a return). But as you get into the game, the q-search really starts to grow and is commonly 90% or so of total tree size in the middlegame. If it is 50% throughout the game, that suggests something else is going on...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: help - gprof output

Post by bob »

Chan Rasjid wrote:I think I found why the nps slowdown was 50% when internal nodes are evaluated in my case. I call eval() just after make() within full search and at every node within qssearch() at the top; so horizon nodes are evaluated twice and they constitute about 60% on average!

The mess is basically because I don't know too well how to do futility pruning when material is much above beta within QS or its equivalent in depth 1/2 within full-search. I am now testing a fix and will post result when done; should be your estimated 10-15%.

Rasjid
Makes sense. An idea:

I was using eval(after a move) - eval(before a move) to determine whether the move was good or bad at a simplistic level. + numbers mean the move improves the score, for example. Since you compute the score _after_ a move, you can pass this to the next ply to be used as the "before" score there, which would become the static evaluation number at the first q-search node.
Karlo Bala
Posts: 373
Joined: Wed Mar 22, 2006 10:17 am
Location: Novi Sad, Serbia
Full name: Karlo Balla

Re: help - gprof output

Post by Karlo Bala »

Chan Rasjid wrote:I think I found why the nps slowdown was 50% when internal nodes are evaluated in my case. I call eval() just after make() within full search and at every node within qssearch() at the top; so horizon nodes are evaluated twice and they constitute about 60% on average!

The mess is basically because I don't know too well how to do futility pruning when material is much above beta within QS or its equivalent in depth 1/2 within full-search. I am now testing a fix and will post result when done; should be your estimated 10-15%.

Rasjid
Yes, I made the same mistake... :wink:
Best Regards,
Karlo Balla Jr.
Gian-Carlo Pascutto
Posts: 1243
Joined: Sat Dec 13, 2008 7:00 pm

Re: help - gprof output

Post by Gian-Carlo Pascutto »

Chan Rasjid wrote: 1) 70-80% of nodes are QS nodes and the heaviest workload is eval(); this seems to be the single most significant item that determines nps.
Depends entirely on program design.
2) pruning nodes don't seem to significantly affect nps; the work done executing the pruning codes would be incurred whether a move be pruned or not pruned. So there should be little change in the average work done with the node.
As I already said: the number of successor positions you generate is very important. This has nothing to do with the work for the pruning code itself. If for a fixed amount "x" of work, you generate 35 successor positions, vs only generating 5 successor positions, isn't it obvious that you can do more total work in the first case than in the latter? (Of course, there's no guarantee this extra work is useful, but it does generate more NPS)
3) (maybe) In general, all "aggressive" or other search tricks like lmr, SE, futility don't affect nps much; only it reaches depth much higher.
Things like futility do affect NPS for reasons already stated. For the others it depends on the program.
Chan Rasjid
Posts: 588
Joined: Thu Mar 09, 2006 4:47 pm
Location: Singapore

Re: help - gprof output

Post by Chan Rasjid »

Karlo Bala wrote:
Chan Rasjid wrote:I think I found why the nps slowdown was 50% when internal nodes are evaluated in my case. I call eval() just after make() within full search and at every node within qssearch() at the top; so horizon nodes are evaluated twice and they constitute about 60% on average!

The mess is basically because I don't know too well how to do futility pruning when material is much above beta within QS or its equivalent in depth 1/2 within full-search. I am now testing a fix and will post result when done; should be your estimated 10-15%.

Rasjid
Yes, I made the same mistake... :wink:
Analysis mode:
fen -r1b1kb1r/ppp2ppp/2n5/3q4/4p3/2P2N2/P1PPBPPP/R1BQ1RK1 w kq - 0 9

//with internal eval()
1280609442.897 Engine->Adapter: info depth 9 score cp -19 time 14 nodes 8427731 nps 565505 pv 1.Nd4 Nxd4 2.cxd4 Bd6 3.c3 O-O 4.d3 Bf5 5.dxe4 Bxe4 6.f3 Bf5 7.Bg5
(slowdown 20%)

1280609495.590 Engine->Adapter: info depth 10 score cp -17 time 67 nodes 37398347 nps 553262 pv 1.Nd4 Nxd4 2.cxd4 Bd6 3.c3 O-O 4.d3 Bf5 5.Be3 Qc6 6.Qd2 a6 7.f3 exd3 8.Bxd3
(slowdown 17%)

//w/o internal eval()
1280609714.888 Engine->Adapter: info depth 9 score cp -19 time 14 nodes 10254758 nps 700461 pv 1.Nd4 Nxd4 2.cxd4 Bd6 3.c3 O-O 4.d3 Bf5 5.dxe4 Bxe4 6.f3 Bf5 7.Bg5

1280609749.085 Engine->Adapter: info depth 10 score cp -17 time 48 nodes 32383280 nps 663089 pv 1.Nd4 Nxd4 2.cxd4 Bd6 3.c3 O-O 4.d3 Bf5 5.Be3 Qc6 6.Qd2 a6 7.f3 exd3
Now with the bug cleared, the slowdown from evaluating internal nodes is about 15-20%. This should be within expectation as my current evaluation is rather heavy.

Karlo, thanks for alerting me on this bug.

Rasjid
Chan Rasjid
Posts: 588
Joined: Thu Mar 09, 2006 4:47 pm
Location: Singapore

Re: help - gprof output

Post by Chan Rasjid »

Gian-Carlo Pascutto wrote:
Chan Rasjid wrote: 2) pruning nodes don't seem to significantly affect nps; the work done executing the pruning codes would be incurred whether a move be pruned or not pruned. So there should be little change in the average work done with the node.
As I already said: the number of successor positions you generate is very important. This has nothing to do with the work for the pruning code itself. If for a fixed amount "x" of work, you generate 35 successor positions, vs only generating 5 successor positions, isn't it obvious that you can do more total work in the first case than in the latter? (Of course, there's no guarantee this extra work is useful, but it does generate more NPS)
3) (maybe) In general, all "aggressive" or other search tricks like lmr, SE, futility don't affect nps much; only it reaches depth much higher.
Things like futility do affect NPS for reasons already stated. For the others it depends on the program.
About what determines nps, I still fail to see a connection with the "number of successor positions"; I am still inclined to seeing it as an "average work done per node" problem and the space of the tree searched. My analysis is this:

We visualize a simplified "averaged-out" chess tree with nodes that are nothing but a number of _fixed_lines_of codes_ that get executed when a node is visited. Given a fixed time allocation of T secs, alphabeta would traverse the chess tree. Taking a particular node with 30 moves, it could either search all moves or pruned and avoid 15 of the moves. If it prunes, the time saved in not searching these 15 moves will allow it to search 15 other moves in other parts of the chess tree that it could not have reached had it not pruned. At the end, a program will stop searching only after it uses up all the allocated time T secs. In this manner, pruning only directs search to avoid certain parts of the tree but does not change the final number of nodes visited; ie the nps should remained unchanged.

I am not sure if this analysis is correct.

Rasjid