Stats and bench on Stockfish development site

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

JohnS
Posts: 215
Joined: Sun Feb 24, 2008 2:08 am

Stats and bench on Stockfish development site

Post by JohnS »

Most new version include something like this.

LLR: 2.95 (-2.94,2.94)
Total: 39818 W: 8174 L: 7956 D: 23688

bench: 3453941

Can anyone explain what they mean and how they are calculated. I get the second line, but what conditions are used for the games.

Thanks.
phenri
Posts: 284
Joined: Tue Aug 13, 2013 9:44 am

Re: Stats and bench on Stockfish development site

Post by phenri »

User avatar
Ajedrecista
Posts: 1971
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: Stats and bench on Stockfish development site.

Post by Ajedrecista »

Hello John:
JohnS wrote:Most new version include something like this.

LLR: 2.95 (-2.94,2.94)
Total: 39818 W: 8174 L: 7956 D: 23688

bench: 3453941

Can anyone explain what they mean and how they are calculated. I get the second line, but what conditions are used for the games.

Thanks.
I am not an expert on the subject but I will try my best.

You are referring to this sequential test. LLR means Log Likelihood Ratio and it is a measure of SPRT (Sequential Probability Ratio Test). Stage I of that test had these testing conditions:

Code: Select all

sprt @ 15+0.05 th 1
Which means a sequential probability ratio test at time control 15" + 0.05"/move (a Fischer time control) per player, and one thread (th 1) each engine. Stage I has this TC of 15+0.05 and some parameters in SPRT: alpha = 0.05 (5%) and beta = 0.05 (5%), which are type I and type II errors of the test (alpha represents type I errors and beta represents type II errors).

The code of the calculation of LLR can be found here between lines 54 and 121. The numbers between parenthesis (-2.94, 2.94) represent the lower and upper bounds. If LLR < -2.94, the patch is discarded; else if LLR > 2.94, the patch is accepted (in this case, it is tested again at longer TC at stage II: 60+0.05).

Looking at the code (lines 95 and 96), these bounds are easily calulated: (lower bound) = ln[beta/(1 - alpha)] = ln(0.05/0.95) = -ln(19) ~ -2.9444; (upper bound) = ln[(1 - beta)/alpha] = ln(0.95/0.05) = ln(19) ~ 2.9444. In this case: LLR > (upper bound), so this patch will be tested again at longer TC. LLR is a sort of stopping rule.

SPRT has different parameters depending on each stage: alpha and beta remain constant, but elo0 and elo1 parameters (measured in BayesElo units) vary:

Code: Select all

Always&#58; elo0 < elo1.

Stage I&#58;
  elo0 = -1.5
  elo1 =  4.5

Stage II&#58;
  elo0 = 0
  elo1 = 6
Stage II is more restrictive and therefore more difficult to pass: a lot of patches accepted at Stage I fail at Stage II; Stage I is a kind of fast filter of bad patches.

I can give you some additional numbers in the example you posted:

Code: Select all

Parameters found at LLR_parameters.txt file&#58;
 
alpha&#58;        0.0500
beta&#58;         0.0500
 
bayeselo_0&#58;  -1.5000
bayeselo_1&#58;   4.5000
 
----------------------------
 
Lower bound for LLR&#58; -2.9444
Upper bound for LLR&#58;  2.9444
 
----------------------------
 
Games&#58;      39818
 
Wins&#58;        8174 &#40;20.53 %).
Loses&#58;       7956 &#40;19.98 %).
Draws&#58;      23688 &#40;59.49 %).
 
bayeselo&#58;     2.9443
drawelo&#58;    238.0870
 
----------------------------
 
  LLR&#40;wins&#41;&#58;      224.7448
  LLR&#40;loses&#41;&#58;    -219.5174
  LLR&#40;draws&#41;&#58;      -2.2820
 
         LLR&#58;       2.9454
I got these results with my own programme (I copied the mathematic underlyings from the piece of code I told you before). Of course: LLR = LLR(wins) + LLR(loses)+ LLR(draws).

Draws count negatively in LLR: this is because (elo0 + elo1)/2 > 0; elo0 + elo1 > 0. Draws do not affect LLR if (elo0 + elo1)/2 = 0; elo0 = -elo1. Logically, draws count positively in LLR if (elo0 + elo1)/2 < 0.

I said before that Stage II is more restrictive than Stage I and it is due to parameters elo0 and elo1. SPRT(-1.5, 4.5) gave a LLR ~ 2.9454; if we take the same number of wins, draws and loses but we work with SPRT(0, 6), then LLR ~ -0.1136, much less than the other LLR value, so the test would continue.

------------------------

Regarding bench, here is an interesting thing on the subject. Bench is ran with a command line instruction. Open the cmd in the folder that contains the executable and type the following:

Code: Select all

C&#58;\Documents and Settings\&#91;...&#93;\StockFish\stockfish-4-win\stockfish-4-win\Windows>stockfish_4_32bit bench 128 1 12 default depth
(You have to type stockfish_4_32bit bench 128 1 12 default depth; the text before the word 'bench' is the name of the executable without '.exe'). 128 means 128 MB of hash; 1 means one core; 12 means depth 12. At the end of the benchmark task it should appear something like this:

Code: Select all

===========================
Total time &#40;ms&#41; &#58; 11469
Nodes searched  &#58; 4132352
Nodes/second    &#58; 360306
Bench = Nodes searched = 4132352 for my copy of SF 4, which is different from the reported bench at this site:

Code: Select all

Author&#58; Marco Costalba 
Date&#58; Tue Aug 20 09&#58;01&#58;25 2013 +0200 
Timestamp&#58; 1376982085 

Stockfish 4 

Stockfish bench signature is&#58; 4132374
My copy is probably corrupted. :(

Bench is done with these 16 positions (lines 35 to 50 of benchmark.cpp file):

Code: Select all

rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq - 0 10
8/2p5/3p4/KP5r/1R3p1k/8/4P1P1/8 w - - 0 11
4rrk1/pp1n3p/3q2pQ/2p1pb2/2PP4/2P3N1/P2B2PP/4RRK1 b - - 7 19
rq3rk1/ppp2ppp/1bnpb3/3N2B1/3NP3/7P/PPPQ1PP1/2KR3R w - - 7 14
r1bq1r1k/1pp1n1pp/1p1p4/4p2Q/4Pp2/1BNP4/PPP2PPP/3R1RK1 w - - 2 14
r3r1k1/2p2ppp/p1p1bn2/8/1q2P3/2NPQN2/PPP3PP/R4RK1 b - - 2 15
r1bbk1nr/pp3p1p/2n5/1N4p1/2Np1B2/8/PPP2PPP/2KR1B1R w kq - 0 13
r1bq1rk1/ppp1nppp/4n3/3p3Q/3P4/1BP1B3/PP1N2PP/R4RK1 w - - 1 16
4r1k1/r1q2ppp/ppp2n2/4P3/5Rb1/1N1BQ3/PPP3PP/R5K1 w - - 1 17
2rqkb1r/ppp2p2/2npb1p1/1N1Nn2p/2P1PP2/8/PP2B1PP/R1BQK2R b KQ - 0 11
r1bq1r1k/b1p1npp1/p2p3p/1p6/3PP3/1B2NN2/PP3PPP/R2Q1RK1 w - - 1 16
3r1rk1/p5pp/bpp1pp2/8/q1PP1P2/b3P3/P2NQRPP/1R2B1K1 b - - 6 22
r1q2rk1/2p1bppp/2Pp4/p6b/Q1PNp3/4B3/PP1R1PPP/2K4R w - - 2 18
4k2r/1pb2ppp/1p2p3/1R1p4/3P4/2r1PN2/P4PPP/1R4K1 b - - 3 22
3q2k1/pb3p1p/4pbp1/2r5/PpN2N2/1P2P2P/5PP1/Q2R2K1 b - - 4 26
I hope that this post will be helpful for you.

Regards from Spain.

Ajedrecista.
Paloma
Posts: 1167
Joined: Thu Dec 25, 2008 9:07 pm
Full name: Herbert L

Re: Stats and bench on Stockfish development site.

Post by Paloma »

Thank you for this explanation
JohnS
Posts: 215
Joined: Sun Feb 24, 2008 2:08 am

Re: Stats and bench on Stockfish development site.

Post by JohnS »

Thanks to Paul and Jesus for the explanations.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Stats and bench on Stockfish development site.

Post by mcostalba »

Ajedrecista wrote: My copy is probably corrupted. :(
Just run 'stockfish bench'
User avatar
Eelco de Groot
Posts: 4567
Joined: Sun Mar 12, 2006 2:40 am
Full name:   

Re: Stats and bench on Stockfish development site.

Post by Eelco de Groot »

Ajedrecista wrote:
Regarding bench, here is an interesting thing on the subject. Bench is ran with a command line instruction. Open the cmd in the folder that contains the executable and type the following:

Code: Select all

C&#58;\Documents and Settings\&#91;...&#93;\StockFish\stockfish-4-win\stockfish-4-win\Windows>stockfish_4_32bit bench 128 1 12 default depth
(You have to type stockfish_4_32bit bench 128 1 12 default depth; the text before the word 'bench' is the name of the executable without '.exe'). 128 means 128 MB of hash; 1 means one core; 12 means depth 12. At the end of the benchmark task it should appear something like this:

Code: Select all

===========================
Total time &#40;ms&#41; &#58; 11469
Nodes searched  &#58; 4132352
Nodes/second    &#58; 360306
Bench = Nodes searched = 4132352 for my copy of SF 4, which is different from the reported bench at this site:

Code: Select all

Author&#58; Marco Costalba 
Date&#58; Tue Aug 20 09&#58;01&#58;25 2013 +0200 
Timestamp&#58; 1376982085 

Stockfish 4 

Stockfish bench signature is&#58; 4132374
My copy is probably corrupted. :(
I think at the time of the thread, it was not yet possible to run the bench command from inside the program. Marco added that possibility later. So you now just look for the stockfish.exe in your download, doubleclick or rightclick -> open. And then after the message appears
Stockfish 050913 by Tord Romstad, Marco Costalba and Joona Kiiski
(depending on the date the code was compiled)
type bench, hit Enter key.

That way you don't have to open a separate command window in the same folder as stockfish.exe is in. Very useful change for us simple Windows users, thanks to Marco :) !

Eelco
Debugging is twice as hard as writing the code in the first
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan