Q about the CCRL ponder hit statistics

Mike S. · Post by **Mike S.** » Thu Sep 06, 2007 3:46 am

The CCRL displays the ponder hit percentage, for pairs of engines which have been tested against each other. Of course, the logic is simple:

100 * N ponder hits / N moves = PH%

I noticed that the CCRL PH% for engine A against B, is always the same as for B against A. Which means that the ponder hit numbers of A+B are combined for the calculation.

Is my assumption correct, that a PH of 60% in (for example) one game with 50 calculated full moves could mean theoretically:

1. A 30, B 30
2. A 40, B 20
3. A 12, B 48

Of course, the 'natural' tendency should be that the stronger engine predicts better, in general (as a general view of quality, independant of engine similarities). But it is only an assumption. Am I correct that I cannot find an individual PH% for ONE engine, in the CCRL statistics?

(They have so many great statistics to select, maybe I missed something.)

Anyway, I am aware of the problem that an engine which has played more games against similar engines (and/or against engines which are more similar), will be favoured compared to an engine which had less/fewer similar opponents, if we would simply look at the engine's individual ponder hits. Maybe that value could be 'combined somehow' with the current PH% which include the opponent's hits? I am not a good mathematician or statistician, or none at all

and I don't know if this is possible in a way that it makes sense. Nevertheless, I think it would be interesting which engine is the best 'predictor' and if a ranking of this would be consistent with the normal ranking (as to be expected?).

Kirill Kryukov · Post by **Kirill Kryukov** » Thu Sep 06, 2007 4:14 am

Hi Mike!

Mike S. wrote:The CCRL displays the ponder hit percentage, for pairs of engines which have been tested against each other. Of course, the logic is simple:

100 * N ponder hits / N moves = PH%

I noticed that the CCRL PH% for engine A against B, is always the same as for B against A. Which means that the ponder hit numbers of A+B are combined for the calculation.

Is my assumption correct, that a PH of 60% in (for example) one game with 50 calculated full moves could mean theoretically:

1. A 30, B 30
2. A 40, B 20
3. A 12, B 48

Yes, this is correct. We count both A->B and B->A ponder hits together for better confidence. Actually, now that we discovered some engines which don't print (or don't always print) the expected move, I am considering to count each side ponder hits separately, as otherwise it is broken.

Mike S. wrote:Of course, the 'natural' tendency should be that the stronger engine predicts better, in general (as a general view of quality, independant of engine similarities). But it is only an assumption.

This is not obvious to me, but this will be interesting to check.

Mike S. wrote:Am I correct that I cannot find an individual PH% for ONE engine, in the CCRL statistics?

Yes, correct. Not at the current moment at least.

Mike S. wrote:(They have so many great statistics to select, maybe I missed something.)

Anyway, I am aware of the problem that an engine which has played more games against similar engines (and/or against engines which are more similar), will be favoured compared to an engine which had less/fewer similar opponents, if we would simply look at the engine's individual ponder hits.

Quality of prediction could be evaluated by comparison with the opponent's predictions in the same game. For example, if two engines both predict 65% of each other moves, we can't say whether they predict so well, or if they are just similar. But if engine A predicts 65% and engine B only 55% (in A vs B game), then we can say that A is 10% better. We can accumulate that +(10*number_of_moves) for engine A and -(10*number_of_moves) for B, etc..

Mike S. wrote:Maybe that value could be 'combined somehow' with the current PH% which include the opponent's hits? I am not a good mathematician or statistician, or none at all and I don't know if this is possible in a way that it makes sense. Nevertheless, I think it would be interesting which engine is the best 'predictor' and if a ranking of this would be consistent with the normal ranking (as to be expected?).

Yes, this is interesting to check!

Another reason why we need to count A->B and B->A correlation separately is that our current (symmetric) counting allows for some kinds of cheating. You can make a clone and pretend it is original by printing random expected move instead of the one you really expect. Such clone will be hard to detect with our current ponder hit analysis.

Best,
Kirill

Q about the CCRL ponder hit statistics

Q about the CCRL ponder hit statistics

Re: Q about the CCRL ponder hit statistics