I've started a new private rating list. Testing conditions are the following:
* only open source and multi-plateform programs: I'm not interested in proprietary and/or windows only programs. I try to compile the programs myself (with GCC 4.6) as much as possible, to make sure I'm running an efficient compile.
* time control: 1min + 1sec increment. For any given CPU time, it's certainly better to play 10 times more games, than play ten times longer games. Any serious engine developper will agree with that

* book: performance.bin limited to 10 moves (ie. 20 half-moves)
* 64 bit versions only: who uses a 32 bit CPU these days anyway? Perhaps many people still use 32 OS on 64 bit architectures. Anyway I don't see any good reason to double the testing work by testing both 32 and 64 bit versions of a given engine.
* interface: cutechess-cli. This is a command line interface, which has two benefits compared to GUIs
- it allows multi-threaded testing. For example if engine A and B don't have an SMP search, then I can run 2 games in parallel on my 2 CPU hardware. When A and/or B are SMP, then games must be played one by one, allowing SMP engines to use the 2 CPUs.
- it is very fast and doesn't cause programs to lose on time for such a quick time control.
* elo calculator: BayesElo: certainly better than EloStat for many reasons.
Here's what I have so far, but of course this list will evolve
Code: Select all
Rank Name              Elo    +    - games score oppo. draws 
   1 Fruit 2.1        2740   44   41   225   85%  2450   16% 
   2 Pawny 0.3.1      2519   32   32   300   52%  2506   19% 
   3 Sungorus 1.4     2402   24   24   500   49%  2415   19% 
   4 Jazz 5.01        2391   24   24   500   47%  2417   22% 
   5 DoubleCheck 2.3  2382   20   20   675   47%  2408   20% 
   6 Beowulf 2.4      2343   33   33   250   48%  2359   19% 
   7 GNUChess 5.08    2239   37   39   200   31%  2380   19% 
- there's no real information to be extracted by testing both 1 CPU and 2 CPU: all you'll see is that the 2 CPU is (almost) twice faster, with the corresponding elo gain.
- It is not a trivial task for engine developers to parallelize the search algorithm, so it's only fair to give them that advantage over non SMP programs.
Any suggestions as to which program to test next are welcome. As you can see I'm slowly on my way from the bottom of the list to the top. So I'll include stronger engines step by step, so the ratings make sense. If I just throw a StockFish in there, it will destroy all other participants, and its rating will not be precisely determined. I need to fill the gap before.
Lucas
PS: as a speed benchmark, my machine runs "stockfish bench" in a little less than 7 seconds (SF version 2.1.1).
 . It has been playing on FICS for a while.
 . It has been playing on FICS for a while.
