I did this experiment this week.
Starting from a bare alpha-beta search only config in Weini (without even any sort of move ordering) and only a PST eval, Weini is -500elo versus fairy-max, but is crunching 1Mnps using 1 thread.
On the same hardware, I gradually add some sorting stuff, null move, pvs, aspiration, lmr, lmp, extension ... still with only a PST eval. The good news is that Weini strength is raising as I add more search tricks, and at the end easilly wins fairy-max. The best additions seems to be mvv-lva, TT, and recapture extension for now. Not everything is activated now but Weini speed has fallen now to 600knps.
For information Weini 0.0.24 with better evaluation is now only at 400knps (I count nodes+qnodes).
I easilly understand how speed decreases when evaluation becomes more complex. I have difficulties to understand why it also decreases (a lot) when activating more search features. Of course, they are more branches in the code, TT must be probed, ...
I use perf-stat, perf-top, valgrind, visual studio, gnuprof, ... but I cannot catch real hot-spots now, especially in the optimized compilation ...
Any advices will be appreciated.
about speed et profiling tools
Moderators: hgm, Rebel, chrisw
-
- Posts: 1871
- Joined: Sat Nov 25, 2017 2:28 pm
- Location: France
-
- Posts: 2488
- Joined: Tue Aug 30, 2016 8:19 pm
- Full name: Rasmus Althoff
Re: about speed et profiling tools
You could use gprof on the 1M NPS version and then on the 400k NPS version and compare the data to see where the additional time is being spent now. For example, you could give a node limit to the move time and tell Weini to search let's say a 100 million nodes and then stop.
Rasmus Althoff
https://www.ct800.net
https://www.ct800.net
-
- Posts: 1871
- Joined: Sat Nov 25, 2017 2:28 pm
- Location: France
Re: about speed et profiling tools
This is more or less what I do but profilers cannot give the exact behavior of the optimized (-O3) compiled binary. I mean I cannot access a useful line by line profiling. And of course the not-optimized code seems to differ quite a lot in term of hot spot.Ras wrote: ↑Thu Sep 20, 2018 9:16 pm You could use gprof on the 1M NPS version and then on the 400k NPS version and compare the data to see where the additional time is being spent now. For example, you could give a node limit to the move time and tell Weini to search let's say a 100 million nodes and then stop.
I think what is happening is that the more nodes are cut, the more the movegenerator is the bottleneck (I am not using bitboard for move generation yet...).
-
- Posts: 4052
- Joined: Thu May 15, 2008 9:57 pm
- Location: Berlin, Germany
- Full name: Sven Schüle
Re: about speed et profiling tools
It is not necessarily a speed issue.
A possible explanation might also be that the stronger search implies a higher ratio of QS nodes vs. full-width nodes, and if you almost never call eval() at full-width nodes then QS nodes on average take significantly more time. In my imagination a weaker search frequently visits "bad" nodes that can be refuted very easily by a simple capture and therefore with a very small QS subtree.
You could try to analyze this by collecting statistics about node counts.
Sven Schüle (engine author: Jumbo, KnockOut, Surprise)
-
- Posts: 2488
- Joined: Tue Aug 30, 2016 8:19 pm
- Full name: Rasmus Althoff
Re: about speed et profiling tools
I would still go with the non-optimised binaries for profiling because that still shows where the program is spending most of its time. For the move generator, just implement perft - that gives you an idea how many NPS you could have without search and eval. If that is something like 10 times the node rate in the real search, then this won't be a bottleneck.
And of course Sven's explanation makes also a lot of sense. There's something else here: once your eval becomes more fine grained, this itself will lower the speed. Let's look at three cases where your coarse eval gave beta, that would have been three cut-offs. Now with the finer one, let's assume you get beta-1, beta and beta+1. That will only be two cut-offs. On the other hand, you get more accuracy. That will likely become a trade-off.
If I take a look at my engine, which is in the same range as yours, I get around 1900 kNPS when analysing in the start position (single thread only). With perft, I get around 11000 kNPS. No bitboards either, and my hardware is quite old (Phenom 2 1100T).
Rasmus Althoff
https://www.ct800.net
https://www.ct800.net
-
- Posts: 1871
- Joined: Sat Nov 25, 2017 2:28 pm
- Location: France
Re: about speed et profiling tools
Perft is implemented, Weini generates around 5Mnps (valid nodes), 10Mnps (pseudo nodes), on a very old corei7 and is very far from your engine in term of real search speed.Ras wrote: ↑Fri Sep 21, 2018 1:10 amI would still go with the non-optimised binaries for profiling because that still shows where the program is spending most of its time. For the move generator, just implement perft - that gives you an idea how many NPS you could have without search and eval. If that is something like 10 times the node rate in the real search, then this won't be a bottleneck.
And of course Sven's explanation makes also a lot of sense. There's something else here: once your eval becomes more fine grained, this itself will lower the speed. Let's look at three cases where your coarse eval gave beta, that would have been three cut-offs. Now with the finer one, let's assume you get beta-1, beta and beta+1. That will only be two cut-offs. On the other hand, you get more accuracy. That will likely become a trade-off.
If I take a look at my engine, which is in the same range as yours, I get around 1900 kNPS when analysing in the start position (single thread only). With perft, I get around 11000 kNPS. No bitboards either, and my hardware is quite old (Phenom 2 1100T).
-
- Posts: 1563
- Joined: Thu Jul 16, 2009 10:47 am
- Location: Almere, The Netherlands
Re: about speed et profiling tools
I use Intel VTune, from which I feel that it gives me the most information. It only runs on Intel proc. not on AMD, for me no problem because I don't use AMD anyway.
My engine typically does ~2500 kn/s on the starting position (single core on a 4 GHz. Broadwell), perft runs at ~125.000 kn/s. I count each move.do() as a node, most engines count a call to search() as a node which usually gives a higher number because of re-searches.
Most time consuming functions are evaluation, SEE and probing the hash table, move generation only takes a few percent of the total time.
My engine typically does ~2500 kn/s on the starting position (single core on a 4 GHz. Broadwell), perft runs at ~125.000 kn/s. I count each move.do() as a node, most engines count a call to search() as a node which usually gives a higher number because of re-searches.
Most time consuming functions are evaluation, SEE and probing the hash table, move generation only takes a few percent of the total time.
-
- Posts: 1871
- Joined: Sat Nov 25, 2017 2:28 pm
- Location: France
Re: about speed et profiling tools
Using this test positionSven wrote: ↑Thu Sep 20, 2018 10:08 pmIt is not necessarily a speed issue.
A possible explanation might also be that the stronger search implies a higher ratio of QS nodes vs. full-width nodes, and if you almost never call eval() at full-width nodes then QS nodes on average take significantly more time. In my imagination a weaker search frequently visits "bad" nodes that can be refuted very easily by a simple capture and therefore with a very small QS subtree.
You could try to analyze this by collecting statistics about node counts.
[d] rnbq1rk1/1p1n1ppp/2pbp3/3p2P1/p4P2/1P1BPN1P/PBPP4/RN1QK2R w KQ - 1 10
Activating the major search trick but only a PST eval ==> 450knps
the values of nodes / qnodes is :
Code: Select all
#INFO - 2018-09-21 19:19:59-228: visitednodes 104361
#INFO - 2018-09-21 19:19:59-228: visitedqnodes 372858
#INFO - 2018-09-21 19:19:59-228: visitedrealqnodes 109763
For what it's worth, here's the seldepth for all depth
Code: Select all
#INFO - 2018-09-21 19:19:59-228: 0 seldepth 8
#INFO - 2018-09-21 19:19:59-228: 1 seldepth 6
#INFO - 2018-09-21 19:19:59-228: 2 seldepth 13
#INFO - 2018-09-21 19:19:59-228: 3 seldepth 12
#INFO - 2018-09-21 19:19:59-228: 4 seldepth 15
#INFO - 2018-09-21 19:19:59-228: 5 seldepth 22
#INFO - 2018-09-21 19:19:59-228: 6 seldepth 21
#INFO - 2018-09-21 19:19:59-228: 7 seldepth 25
#INFO - 2018-09-21 19:19:59-228: 8 seldepth 27
Code: Select all
% function
21.54 GetTheat
10.42 ApplyMove
5.21 Sort moves
5.21 Move generator
5.21 Eval
Code: Select all
#INFO - 2018-09-21 19:22:41-773: visitednodes 64
#INFO - 2018-09-21 19:22:41-773: visitedqnodes 17410543
#INFO - 2018-09-21 19:22:41-773: visitedrealqnodes 8737286
Code: Select all
#INFO - 2018-09-21 19:22:41-773: 0 seldepth 25
#INFO - 2018-09-21 19:22:41-773: 1 seldepth 30
#INFO - 2018-09-21 19:22:41-773: 2 seldepth 32
The hotspots are then
Code: Select all
% function
42.00 Eval
16.00 Move generator
8.00 Qsearch
-
- Posts: 1871
- Joined: Sat Nov 25, 2017 2:28 pm
- Location: France
Re: about speed et profiling tools
And here is the full statistic output of Weini 0.0.24 on the same position
Code: Select all
#INFO - 2018-09-21 21:38:38-910: FEN : rnbq1rk1/1p1n1ppp/2pbp3/3p2P1/p4P2/1P1BPN1P/PBPP4/RN1QK2R w KQ- - 1 10
#INFO - 2018-09-21 21:38:38-910: HASH : 14581085400006350364
#INFO - 2018-09-21 21:38:38-910: ===================== Position display (end) ====================
#INFO - 2018-09-21 21:38:38-910: Next search time 1200
#INFO - 2018-09-21 21:38:38-910: SearchSync begin
#INFO - 2018-09-21 21:38:38-910: SearchSync waiting for threads
#INFO - 2018-09-21 21:38:38-910: SearchSync all threads ok
1 -24 0 113 (1/6), pv 0-0
2 -21 0 592 (2/6), pv b3a4 a8a4
3 -1 1 5456 (3/11), pv 0-0 a4a3 b2c3
4 -28 1 8288 (4/14), pv b3a4 a8a4 0-0 a4b4
5 -16 3 14487 (5/16), pv b3a4 a8a4 0-0 d7c5 b1c3
6 -18 5 23042 (6/16), pv b3a4 d7c5 0-0 c5a4 b2d4 b8d7
7 -22 9 39048 (7/17), pv b3a4 d7c5 b2d4 c5d3+ c2d3 c6c5 d4c3
8 -27 17 74902 (8/22), pv 0-0 d8b6 b2d4 c6c5 d4b2 a4b3 c2b3 b8c6
9 -27 29 120297 (9/23), pv 0-0 d8b6 b2c3 c6c5 f1e1 b8c6 a2a3 f7f6 g5f6
#INFO - 2018-09-21 21:38:39-414: Window re-search -33 -21 (-21, b3 a4)
10 -19 75 321398 (10/27), pv b3a4 d8b6 b2e5 d7e5 f4e5 d6c5 b1c3 b8d7 0-0 b6c7
#INFO - 2018-09-21 21:38:40-109: >> TC endeed in AlphaBeta
#INFO - 2018-09-21 21:38:40-109: nodes per s : 424 knps (n:128978, q:380056 (74%), t:1200ms)
#INFO - 2018-09-21 21:38:40-109: End of search
#INFO - 2018-09-21 21:38:40-109: Searcher return move b3 a4(6000) | with score -19 at depth 9
#INFO - 2018-09-21 21:38:40-109: Best move is b3 a4 with score -19 (depth 9/40)
#INFO - 2018-09-21 21:38:40-109: betacut 47767
#INFO - 2018-09-21 21:38:40-109: betacutroot 1
#INFO - 2018-09-21 21:38:40-109: qbetacut 64105
#INFO - 2018-09-21 21:38:40-109: qearlybetacut 203860
#INFO - 2018-09-21 21:38:40-109: deltaalphacut 911
#INFO - 2018-09-21 21:38:40-109: seecut 309
#INFO - 2018-09-21 21:38:40-109: seecutAlphaBeta 2808
#INFO - 2018-09-21 21:38:40-109: nullmoveverification 0
#INFO - 2018-09-21 21:38:40-109: nullmovecutafterverification 0
#INFO - 2018-09-21 21:38:40-109: nullmovecutbeta 10842
#INFO - 2018-09-21 21:38:40-109: nullmovecuttry 22456
#INFO - 2018-09-21 21:38:40-109: nullmovecutskipped 358
#INFO - 2018-09-21 21:38:40-109: reductionAndPruningReset 1694
#INFO - 2018-09-21 21:38:40-109: extensionReset 0
#INFO - 2018-09-21 21:38:40-109: staticNullMovebetacut 0
#INFO - 2018-09-21 21:38:40-109: staticNullMovebetacutQtry 0
#INFO - 2018-09-21 21:38:40-109: staticNullMovebetacutQ 0
#INFO - 2018-09-21 21:38:40-109: staticNullMovebetacutH 37436
#INFO - 2018-09-21 21:38:40-109: failedReduction 0
#INFO - 2018-09-21 21:38:40-109: futilitycut 0
#INFO - 2018-09-21 21:38:40-109: qfutilitycut 70915
#INFO - 2018-09-21 21:38:40-109: futilityMovePruned 79843
#INFO - 2018-09-21 21:38:40-109: moveCountPruned 261148
#INFO - 2018-09-21 21:38:40-109: lmralphabeta 25374
#INFO - 2018-09-21 21:38:40-109: lmralphabetafail 182
#INFO - 2018-09-21 21:38:40-109: lmrsearch 245
#INFO - 2018-09-21 21:38:40-109: lmrsearchfail 0
#INFO - 2018-09-21 21:38:40-109: ttARHitExact 146
#INFO - 2018-09-21 21:38:40-109: ttARHitAlpha 1313
#INFO - 2018-09-21 21:38:40-109: ttARHitBeta 5237
#INFO - 2018-09-21 21:38:40-109: ttARHitAlphaOver 1930
#INFO - 2018-09-21 21:38:40-109: ttARHitBetaOver 7047
#INFO - 2018-09-21 21:38:40-109: ttARHitUsed 0
#INFO - 2018-09-21 21:38:40-109: ttARCollision 1551
#INFO - 2018-09-21 21:38:40-109: ttAllHit 15956
#INFO - 2018-09-21 21:38:40-109: ttHitExact 0
#INFO - 2018-09-21 21:38:40-109: ttHitAlpha 3
#INFO - 2018-09-21 21:38:40-109: ttHitBeta 51
#INFO - 2018-09-21 21:38:40-109: ttHitAlphaOver 55
#INFO - 2018-09-21 21:38:40-109: ttHitBetaOver 174
#INFO - 2018-09-21 21:38:40-109: ttHitUsed 4106
#INFO - 2018-09-21 21:38:40-109: ttCollision 1551
#INFO - 2018-09-21 21:38:40-109: ttARQHitExact 43
#INFO - 2018-09-21 21:38:40-109: ttARQHitAlpha 7520
#INFO - 2018-09-21 21:38:40-109: ttARQHitBeta 4516
#INFO - 2018-09-21 21:38:40-109: ttARQHitAlphaOver 12201
#INFO - 2018-09-21 21:38:40-109: ttARQHitBetaOver 6310
#INFO - 2018-09-21 21:38:40-109: ttARQHitUsed 0
#INFO - 2018-09-21 21:38:40-110: ttARQCollision 20472
#INFO - 2018-09-21 21:38:40-110: ttQAllHit 31285
#INFO - 2018-09-21 21:38:40-110: ttQHitExact 0
#INFO - 2018-09-21 21:38:40-110: ttQHitAlpha 16
#INFO - 2018-09-21 21:38:40-110: ttQHitBeta 91
#INFO - 2018-09-21 21:38:40-110: ttQHitAlphaOver 362
#INFO - 2018-09-21 21:38:40-110: ttQHitBetaOver 226
#INFO - 2018-09-21 21:38:40-110: ttQHitUsed 12045
#INFO - 2018-09-21 21:38:40-110: ttQCollision 20472
#INFO - 2018-09-21 21:38:40-110: ttClear 0
#INFO - 2018-09-21 21:38:40-110: ttQClear 0
#INFO - 2018-09-21 21:38:40-110: ttEClear 0
#INFO - 2018-09-21 21:38:40-110: ttELClear 0
#INFO - 2018-09-21 21:38:40-110: ttEPClear 0
#INFO - 2018-09-21 21:38:40-110: leafNodes 106125
#INFO - 2018-09-21 21:38:40-110: visitednodes 128561
#INFO - 2018-09-21 21:38:40-110: visitedqnodes 380056
#INFO - 2018-09-21 21:38:40-110: visitedrealqnodes 161266
#INFO - 2018-09-21 21:38:40-110: windowsuccess 5
#INFO - 2018-09-21 21:38:40-110: windowfails 1
#INFO - 2018-09-21 21:38:40-110: pvsRootsuccess 397
#INFO - 2018-09-21 21:38:40-110: pvsRootfails 6
#INFO - 2018-09-21 21:38:40-110: pvsAlphaBetasuccess 5563
#INFO - 2018-09-21 21:38:40-110: pvsAlphaBetafails 198
#INFO - 2018-09-21 21:38:40-110: analysisnodes 471263
#INFO - 2018-09-21 21:38:40-110: analysedlazynodes 0
#INFO - 2018-09-21 21:38:40-110: ttEvalLazyHits 0
#INFO - 2018-09-21 21:38:40-110: ttEvalHits 132206
#INFO - 2018-09-21 21:38:40-110: ttEvalPawnHits 313647
#INFO - 2018-09-21 21:38:40-110: ttEvalLazyCollision 0
#INFO - 2018-09-21 21:38:40-110: ttEvalCollision 14932
#INFO - 2018-09-21 21:38:40-110: ttEvalPawnCollision 150
#INFO - 2018-09-21 21:38:40-110: terminalnodes 0
#INFO - 2018-09-21 21:38:40-110: openingBookHits 0
#INFO - 2018-09-21 21:38:40-110: positionCopy 2224710
#INFO - 2018-09-21 21:38:40-110: ttsort 15899
#INFO - 2018-09-21 21:38:40-110: ttsorttry 15899
#INFO - 2018-09-21 21:38:40-110: pvsort 0
#INFO - 2018-09-21 21:38:40-110: pvsorttry 0
#INFO - 2018-09-21 21:38:40-110: pvrootsort 0
#INFO - 2018-09-21 21:38:40-110: pvrootsorttry 0
#INFO - 2018-09-21 21:38:40-110: iidsort 0
#INFO - 2018-09-21 21:38:40-110: probCutTry 2069
#INFO - 2018-09-21 21:38:40-110: probCutSuccess 523
#INFO - 2018-09-21 21:38:40-110: countCheckExtensionRoot 0
#INFO - 2018-09-21 21:38:40-110: countCheckExtensionSearch 10
#INFO - 2018-09-21 21:38:40-110: countCheckExtensionAlphaBeta 16244
#INFO - 2018-09-21 21:38:40-110: countEndGameExtensionSearch 0
#INFO - 2018-09-21 21:38:40-110: countVeryEndGameExtensionSearch 0
#INFO - 2018-09-21 21:38:40-110: countPVExtensionSearch 0
#INFO - 2018-09-21 21:38:40-110: countSingleReplyExtensionSearch 0
#INFO - 2018-09-21 21:38:40-110: countSingularExtensionAlphaBeta 0
#INFO - 2018-09-21 21:38:40-110: countNearPromotionExtensionSearch 0
#INFO - 2018-09-21 21:38:40-110: countNearPromotionExtensionAlphaBeta 5607
#INFO - 2018-09-21 21:38:40-110: countReCaptureExtensionSearch 6
#INFO - 2018-09-21 21:38:40-110: countReCaptureExtensionAlphaBeta 5106
#INFO - 2018-09-21 21:38:40-110: tcEBFAbort 0
#INFO - 2018-09-21 21:38:40-110: tcForcedAbort 1
#INFO - 2018-09-21 21:38:40-110: tcIdealAbort 0
#INFO - 2018-09-21 21:38:40-110: *********************************
#INFO - 2018-09-21 21:38:40-110: Nodes per depths:
#INFO - 2018-09-21 21:38:40-110: 0 nodes 42
#INFO - 2018-09-21 21:38:40-110: 0 qnodes 71
#INFO - 2018-09-21 21:38:40-110: 1 nodes 131
#INFO - 2018-09-21 21:38:40-110: 1 qnodes 461
#INFO - 2018-09-21 21:38:40-110: 2 nodes 641
#INFO - 2018-09-21 21:38:40-110: 2 qnodes 4815
#INFO - 2018-09-21 21:38:40-110: 3 nodes 1695
#INFO - 2018-09-21 21:38:40-110: 3 qnodes 6593
#INFO - 2018-09-21 21:38:40-110: 4 nodes 2978
#INFO - 2018-09-21 21:38:40-110: 4 qnodes 11509
#INFO - 2018-09-21 21:38:40-110: 5 nodes 5040
#INFO - 2018-09-21 21:38:40-110: 5 qnodes 18002
#INFO - 2018-09-21 21:38:40-110: 6 nodes 8876
#INFO - 2018-09-21 21:38:40-110: 6 qnodes 30172
#INFO - 2018-09-21 21:38:40-110: 7 nodes 16811
#INFO - 2018-09-21 21:38:40-110: 7 qnodes 58091
#INFO - 2018-09-21 21:38:40-110: 8 nodes 28086
#INFO - 2018-09-21 21:38:40-110: 8 qnodes 92211
#INFO - 2018-09-21 21:38:40-110: 9 nodes 76453
#INFO - 2018-09-21 21:38:40-110: 9 qnodes 244945
#INFO - 2018-09-21 21:38:40-110: EBF 1.82548
#INFO - 2018-09-21 21:38:40-110: 2 BF 3.90665
#INFO - 2018-09-21 21:38:40-110: 2 QBF 8.2351
#INFO - 2018-09-21 21:38:40-110: 3 BF 3.59707
#INFO - 2018-09-21 21:38:40-110: 3 QBF 3.78173
#INFO - 2018-09-21 21:38:40-110: 4 BF 2.15543
#INFO - 2018-09-21 21:38:40-110: 4 QBF 1.54604
#INFO - 2018-09-21 21:38:40-110: 5 BF 1.72437
#INFO - 2018-09-21 21:38:40-110: 5 QBF 1.65241
#INFO - 2018-09-21 21:38:40-110: 6 BF 1.72642
#INFO - 2018-09-21 21:38:40-110: 6 QBF 1.61914
#INFO - 2018-09-21 21:38:40-110: 7 BF 1.82634
#INFO - 2018-09-21 21:38:40-110: 7 QBF 1.79636
#INFO - 2018-09-21 21:38:40-110: 8 BF 1.77884
#INFO - 2018-09-21 21:38:40-110: 8 QBF 1.74819
#INFO - 2018-09-21 21:38:40-110: 9 BF 2.13256
#INFO - 2018-09-21 21:38:40-110: 9 QBF 2.05343
#INFO - 2018-09-21 21:38:40-110: 0 seldepth 6
#INFO - 2018-09-21 21:38:40-110: 1 seldepth 6
#INFO - 2018-09-21 21:38:40-110: 2 seldepth 11
#INFO - 2018-09-21 21:38:40-110: 3 seldepth 14
#INFO - 2018-09-21 21:38:40-110: 4 seldepth 16
#INFO - 2018-09-21 21:38:40-110: 5 seldepth 16
#INFO - 2018-09-21 21:38:40-110: 6 seldepth 17
#INFO - 2018-09-21 21:38:40-110: 7 seldepth 22
#INFO - 2018-09-21 21:38:40-110: 8 seldepth 23
#INFO - 2018-09-21 21:38:40-110: 9 seldepth 27
#INFO - 2018-09-21 21:38:40-110: 10 seldepth 30
#INFO - 2018-09-21 21:38:40-111: TT hit rate (used) 3.19381% (25.7333%)
#INFO - 2018-09-21 21:38:40-111: TT collision rate 1.20643%
#INFO - 2018-09-21 21:38:40-111: col rate (theoretical) 3.00343%
#INFO - 2018-09-21 21:38:40-111: QTT hit rate (used) 3.16927% (38.5009%)
#INFO - 2018-09-21 21:38:40-111: QTT collision rate 5.38657%
#INFO - 2018-09-21 21:38:40-111: col rate (theoretical) 16.1178%
#INFO - 2018-09-21 21:38:40-111: ETT hit rate 28.0535%
#INFO - 2018-09-21 21:38:40-111: ETT collision rate 3.16851%
#INFO - 2018-09-21 21:38:40-111: col rate (theoretical) 5.41326%
#INFO - 2018-09-21 21:38:40-111: ETTP hit rate 92.5057%
#INFO - 2018-09-21 21:38:40-111: ETTP collision rate 0.0442403%
#INFO - 2018-09-21 21:38:40-111: col rate (theoretical) 7.66514%
#INFO - 2018-09-21 21:38:40-111: pvs success rate 96.5631%
#INFO - 2018-09-21 21:38:40-111: pvs success rate (root) 98.5112%
#INFO - 2018-09-21 21:38:40-111: window success rate 83.3333%
#INFO - 2018-09-21 21:38:40-111: lmr success rate 99.2827%
#INFO - 2018-09-21 21:38:40-111: lmr success rate (root) 100%
#INFO - 2018-09-21 21:38:40-111: nullmove success rate 48.2811%
#INFO - 2018-09-21 21:38:40-111: nullmove skipped rate 1.59423%
#INFO - 2018-09-21 21:38:40-111: tt sort success rate 100% (should be 100 %)
#INFO - 2018-09-21 21:38:40-111: Qtt sort success rate 100% (should be 100 %)
-
- Posts: 4052
- Joined: Thu May 15, 2008 9:57 pm
- Location: Berlin, Germany
- Full name: Sven Schüle
Re: about speed et profiling tools
I suggest to compare numbers for equal fixed depth only. Also your comparison goes to the extreme by removing *all* search features at once. Smaller steps would be good.xr_a_y wrote: ↑Fri Sep 21, 2018 7:47 pm[...]Sven wrote: ↑Thu Sep 20, 2018 10:08 pmIt is not necessarily a speed issue.
A possible explanation might also be that the stronger search implies a higher ratio of QS nodes vs. full-width nodes, and if you almost never call eval() at full-width nodes then QS nodes on average take significantly more time. In my imagination a weaker search frequently visits "bad" nodes that can be refuted very easily by a simple capture and therefore with a very small QS subtree.
You could try to analyze this by collecting statistics about node counts.
Activating the major search trick but only a PST eval ==> 450knps
the values of nodes / qnodes is :
Note that a "real" qnodes is a qnodes that is not cut by the stand pat test.Code: Select all
#INFO - 2018-09-21 19:19:59-228: visitednodes 104361 #INFO - 2018-09-21 19:19:59-228: visitedqnodes 372858 #INFO - 2018-09-21 19:19:59-228: visitedrealqnodes 109763
[...]
Now using the alpha/beta bare search (not going to the same depth ...) ==> 1.5Mnps
[...]Code: Select all
#INFO - 2018-09-21 19:22:41-773: visitednodes 64 #INFO - 2018-09-21 19:22:41-773: visitedqnodes 17410543 #INFO - 2018-09-21 19:22:41-773: visitedrealqnodes 8737286
So that searching more qnode seems faster than less here.
I took the same test position, started with normal current Jumbo, removed search features step by step while keeping the same full evaluation function, and got the following numbers for a fixed-depth search to 10 plies (single-core, VS build which is slower than GCC for me):
Code: Select all
search features nodes qsNodes time nps qs%
normal 2.123.407 1.369.755 2,04 1.040.886 65%
no pruning 10.081.246 7.896.069 7,90 1.276.107 78%
no killer, no IID 9.829.471 7.722.409 7,76 1.266.684 79%
no nullmove 182.208.589 141.437.003 130,19 1.399.559 78%
no TT 915.899.029 749.467.322 670,14 1.366.728 82%
So there are different cases: Removing all pruning features makes the program faster in terms of NPS while increasing the QS ratio. Of course the resulting program is a lot weaker than before. Removing killer moves and IID has almost no effect at this stage (for this test position it even gives a small improvement in "time to depth"). Removing nullmove increases NPS again without increasing the QS ratio (and time to depth goes up drastically here). And finally removing TT increases QS ratio, keeps NPS almost constant, and makes the program much weaker again.
It is hard to say what exactly causes what. For some search features introducing them lowers NPS, for others it doesn't. At least it seems to be confirmed (for this test position) that a stronger search does not necessarily correlate with a higher QS ratio, despite my expectation.
Sven Schüle (engine author: Jumbo, KnockOut, Surprise)