Naum 4 t1: 3068
Naum 4 t2: 3083
Naum 4 t4: 3105
Rybka 3 t1: 3116
Rybka 3 t2: 3145
Rybka 3 t4: 3197
Naum's scores map very well to it respective CEGT ratings and Rybka 3 t1 scores just a hair over Naum 4 t4 as well which is what we would expect if their CEGT ratings are correct.
To design the suite, I ran five engines over 658 positions I ripped from high level ICCF games, each for 10 seconds and I then discarded all positions where all engines failed the position with a time of 10 seconds or all solved it easily in under 1 second.
I then shuffled the positions randomly 2x10^7 times until I could take the total times for each engine, for the first 30 positions, and they matched up very well their CEGT ratings through a logarithmic regression. The code that tells me when my positions are usable:
Code: Select all
double LogError(double x[]) {
int i, j;
double
diffTime,
diffRating,
sum = 0.0;
for (i=0; i<ENGINES-1; i++)
for (j=i+1; j<ENGINES; j++) {
diffRating = ratings[i] - ratings[j];
diffTime = log(x[j] / x[i])/log(2.0);
sum += pow(diffRating - diffTime*SPEEDFACTOR,2);
}
return sqrt(sum/((ENGINES-1)*(ENGINES/2)));
}
For example, Rybka solves these positions with a rated time of ~34, and Shredder ~122. This theoretically equates to a Elo difference of 55*log(122/34)/log(2)=101. The difference between their CEGT ratings is 112. The error here is 112 - 101 = 11.
The suite is tuned for engines rated between 3200 and 3000 (CEGT) at 10 seconds of analysis on a Q6600 3.0GHz. YMMV for different engines on different hardware. It becomes difficult to test engines rated more than 200 Elo apart due to the lack of precision in Arena's "rated times", which are whole numbers 0 through 10, in this case. For engines 330 Elo apart we'd expect a 2^(330/55) = 64x difference in time which would require we test every 658 position with every engine for a minimum of 64 seconds. This can only be done with days of analysis time, which i can't afford. But that would be the ideal way to go. For Rybka, Naum, Shredder, Stockfish, and Zappa II (all x64 and t=4), this function maps their "rated time" to their CEGT ratings with R^2 = .9984 :
RATING = 3513 + -89.8 ln x
where x is the rated time.
The methodology still needs some refinement but I think I am on the right track.
Code: Select all
2r5/pp1bkpp1/2rnp2p/3p4/3P4/P2BP1P1/1P1KNPP1/2R4R w - - fmvn 22; hmvc 13; bm Rxc6;
r2b2k1/pp6/4rp1p/2p5/4P3/5N2/PP4PP/3R1RK1 w - - fmvn 22; hmvc 0; bm Nh4;
7r/1p1kbppp/p1b1p3/4P3/5P2/P3B3/1P2BKPP/2R5 w - - fmvn 22; hmvc 2; bm Bc5;
r5k1/pp4pp/2p5/3rNb2/3P3P/P7/1P4P1/3RR1K1 w - - fmvn 22; hmvc 3; bm b4;
r2r2k1/pq3pb1/2p1n1p1/2B1p1Pp/4N2P/5P2/PPP2Q2/2KRR3 w - - fmvn 22; hmvc 0; bm Bd6;
2rq2k1/1p4pp/3p2r1/3Ppp2/bP6/P5P1/2PQBP1P/R3K2R w KQ - fmvn 22; hmvc 0; bm c4;
4qr1k/3b2pp/p1rpp3/1p5n/4P1B1/2N1QP2/PPP4P/1K1R2R1 w - - fmvn 22; hmvc 3; bm e5;
2kr3r/1b3p2/pq2p2p/2b1P1p1/PpB3Q1/1Pn5/5PPP/R1B1NRK1 w - - fmvn 22; hmvc 0; bm Qh5;
r7/1b1nkp1p/1r1ppp1b/8/1P2PP2/3Q2P1/P1P4P/2KRR3 w - - fmvn 22; hmvc 1; bm a3;
3r2k1/1ppr1qpp/2n1b3/p2np3/8/PP1P1NP1/1BQ1PKBP/2RR4 w - - fmvn 22; hmvc 0; bm Qc5;
3rr1k1/pb3ppp/8/1p1n4/8/PB3N2/1P3PPP/3R1RK1 w - - fmvn 22; hmvc 2; bm Rfe1;
2r2rk1/3pb1pp/p2npP2/7P/1p1NPP2/3qB3/PPP5/1K1R1R2 w - - fmvn 22; hmvc 0; bm cxd3;
4k3/1r2b2p/p1q1pnr1/2p5/1pp2B2/2N2PPQ/PP2P2P/R2R2K1 w - - fmvn 22; hmvc 0; bm Ne4;
6k1/pq2ppb1/1p4pp/n7/3P3B/3QPN1P/P4PP1/6K1 w - - fmvn 22; hmvc 0; bm Qb5;
r4r1k/1p3pp1/3b1n2/p2p4/3N4/P1B1P2q/1P2QP1P/R3K1R1 w Q - fmvn 22; hmvc 1; bm Qf3;
r1br2k1/1p1n1pp1/p1nBp2p/q7/4N3/6Q1/2PRB1PP/4K2R w K - fmvn 22; hmvc 7; bm Bc7;
3rr1k1/ppp2ppp/8/3P1b2/2Pq4/PQ6/4BPPP/R3R1K1 w - - fmvn 22; hmvc 0; bm Bf1;
rn3rk1/pppb4/3p2pp/3P4/1PP1R3/P4BP1/1b3P1P/4Q1K1 w - - fmvn 22; hmvc 0; bm Qe3;
5rk1/p5p1/1ppr3p/1Q1nqp2/1PB5/P2RP3/5PPP/5RK1 w - - fmvn 22; hmvc 0; bm Qa4;
2b1rr1k/5ppp/2p2n2/ppq1PPB1/2p3P1/2P5/P3Q1BP/4RR1K w - - fmvn 22; hmvc 0; bm Bh4;
3r1r2/1p3pkp/p1bRp3/4Pp1B/8/1NN5/P1P3PP/6K1 w - - fmvn 22; hmvc 1; bm Rd1;
1q1r2k1/p2nbppp/b1p1p3/8/P1pPP3/2B3P1/2QN1PBP/3R2K1 w - - fmvn 22; hmvc 2; bm Ba1;
r3r3/pp3pk1/2b2qp1/7p/5n1P/2N3Q1/PP3PP1/R2R1BK1 w - - fmvn 22; hmvc 1; bm f3;
1r3r1k/3q2pp/B2p4/1P1Nppb1/2Q1P3/8/1P3PPP/5RK1 w - - fmvn 22; hmvc 0; bm exf5;
r3kb1r/2qn1ppp/p3pn2/5N2/2P5/2P1BP2/1N2Q1PP/R2R2K1 w kq - fmvn 22; hmvc 7; bm Na4;
r4r2/pp3pk1/2b1p1p1/P2pP2p/3P3P/3B3R/2PK1PP1/R7 w - - fmvn 22; hmvc 0; bm Rb1;
4r1k1/5pp1/3b4/1p1p1b2/3P3p/2P1N1Pq/1P3P1P/R1BQ2K1 w - - fmvn 22; hmvc 0; bm Qf1;
r1r3k1/4qpbp/2bp1np1/ppn1p3/2P1P3/1P2BNNP/P1BQ1PP1/2R1R1K1 w - - fmvn 22; hmvc 0; bm Bxc5;
3rr3/p1qn1kp1/1p3b1p/8/3pQ3/1P3N2/PB3PPP/3R1RK1 w - - fmvn 22; hmvc 0; bm Qd5+;
r6r/ppp2kp1/1b1qb2p/5p2/8/5NN1/PPQ2PPP/R4RK1 w - - fmvn 22; hmvc 6; bm Rfe1;