Results
Code: Select all
Rank Name Elo + - games score oppo. draws
1 Umko 1.2 (x2) 2899 44 41 200 81% 2673 28%
2 Fruit 2.1 2700 24 23 625 64% 2587 28%
3 GNU Chess 5.1 2659 26 26 450 55% 2627 32%
4 Sloppy 0.2.2 2631 27 27 425 49% 2634 29%
5 Pepito 1.59 2587 32 32 300 53% 2559 30%
6 Greko 8.2 2519 30 30 350 47% 2539 30%
7 Pawny 0.3.1 2488 24 24 600 45% 2520 21%
8 DoubleCheck 2.3.1 2397 33 32 300 59% 2335 22%
9 Sungorus 1.4 2357 22 22 675 44% 2410 19%
10 Jazz 5.01 2349 23 23 600 45% 2388 24%
11 DoubleCheck 2.3 2340 21 21 725 47% 2366 22%
12 Beowulf 2.4 2286 32 33 300 45% 2329 18%
13 GNU Chess 5.08 2191 37 38 250 30% 2346 18%
* only open source and portable programs: I'm not interested in proprietary and/or windows only programs. Ideally licensed under the GNU GPL, otherwise no license or a license that doesn't present "excessive" copyright terms.
* 1min + 1sec increment: for any given CPU time, it's better to play 10 times more games, than play ten times longer games.
* 64 MB Hash, no EGTB: 64 is certainly enough for such rapid games. As for EGTB, any good program will show almost zero elo increase with EGTB.
* book: performance.bin by Marc Lacrosse, limited to 10 moves (20 half-moves).
* 64 bit versions only: I don't see any good reason to double the testing work by testing both 32 and 64 bit versions of a given engine.
* interface: cutechess-cli. This is a command line interface, which has two benefits compared to GUIs
- it allows multi-threaded testing. For example if engine A and B don't have an SMP search, then I can run 2 games in parallel on my 2 CPU hardware. When A and/or B are SMP, then games must be played one by one, allowing SMP engines to use the 2 CPUs.
- it is very fast and doesn't cause programs to lose on time for such a quick time control.
* SMP capable programs play with 2 CPU: It is not a trivial task for engine developers to parallelize the search algorithm, so it's only fair to give them that advantage over non SMP programs.
* pondering off: using pondering with multi-threaded testing or multi-threaded programs is a bad idea, as the engine pondering may significantly reduce the CPU allocation of its opponent.
* elo calculator: BayesElo, certainly better than EloStat for many reasons. The list is calibrated with Fruit 2.1 at 2700 elo.
* no automatic resigning for "weak" engines: some programs are buggy and may not be able to win a dead won endgame, so they should be penalised accordingly. Of course some engines (typically xboard) have a resign feature hardcoded in the program, so I let them resign as they please.