I know that a test between a bunch of engines is better than against a previous version of the same engine (in this case differences tend to be a little bigger IIRC).Hugo wrote:Hello
This testformat is not giving a real estimation of the playingstrenth.
First mistake : brother fight makes less sense. Minimum 10 different opponents make more sense
Second mistake: only 1000 games with that fast timecontoll is 10 time too less. I would recommend minimum 10.000 games.
I tested this Stockfish 1111026 vs. 17 different opponents with 5 +3 ponder ON. Each engine one core and 64bit. Result was 2922 after 679 games.
The original Stockfish 2.1.1 was 2930.
I did wonder about that result, because I was VERRY impressed of this 111026 engine using on a quad at playchess for games with XXL time control. I was loosing only 1 game. could win some impressive games vs houdini. Thats why I expected a clear plus to SF 2.1.1.
Regards, Clemens Keck
I know that regarding number of games: the more the merrier. I am not a true engine tester and simply I wanted to add my grain of salt: I have no time and hardware (specially time) for 10000 games. So, in few moments I will upload only 400 games in single thread mode with ~ ± 30 error bar (it is huge indeed). Sorry. But the rating difference I get is +33 with 400 games (and was +37 with 1000 games), so a kind of stability... altough my tests are surely biased in some way, as I said before in this topic.
I test 32-bit version, and maybe 111026 is more optimized for 32-bit than 64-bit... only a guess of an amateur. Anyway, thanks for the tips. You will have work very soon with Critter 1.4 and Komodo 4 releases!
Regards from Spain.
Ajedrecista.