(1000 games) SF 111026 vs. SF 2.1.1 (both of them 32-bit).

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

User avatar
Ajedrecista
Posts: 1985
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: (1000 games) SF 111026 vs. SF 2.1.1 (both of them 32-bit

Post by Ajedrecista »

Hello Clemens:
Hugo wrote:Hello

This testformat is not giving a real estimation of the playingstrenth.
First mistake : brother fight makes less sense. Minimum 10 different opponents make more sense
Second mistake: only 1000 games with that fast timecontoll is 10 time too less. I would recommend minimum 10.000 games.
I tested this Stockfish 1111026 vs. 17 different opponents with 5 +3 ponder ON. Each engine one core and 64bit. Result was 2922 after 679 games.
The original Stockfish 2.1.1 was 2930.
I did wonder about that result, because I was VERRY impressed of this 111026 engine using on a quad at playchess for games with XXL time control. I was loosing only 1 game. could win some impressive games vs houdini. Thats why I expected a clear plus to SF 2.1.1.


Regards, Clemens Keck
I know that a test between a bunch of engines is better than against a previous version of the same engine (in this case differences tend to be a little bigger IIRC).

I know that regarding number of games: the more the merrier. I am not a true engine tester and simply I wanted to add my grain of salt: I have no time and hardware (specially time) for 10000 games. So, in few moments I will upload only 400 games in single thread mode with ~ ± 30 error bar (it is huge indeed). Sorry. But the rating difference I get is +33 with 400 games (and was +37 with 1000 games), so a kind of stability... altough my tests are surely biased in some way, as I said before in this topic.

I test 32-bit version, and maybe 111026 is more optimized for 32-bit than 64-bit... only a guess of an amateur. :roll: Anyway, thanks for the tips. You will have work very soon with Critter 1.4 and Komodo 4 releases!

Regards from Spain.

Ajedrecista.
Last edited by Ajedrecista on Sat Dec 10, 2011 5:31 pm, edited 1 time in total.
MM
Posts: 766
Joined: Sun Oct 16, 2011 11:25 am

Re: (1000 games) SF 111026 vs. SF 2.1.1 (both of them 32-bit

Post by MM »

Ajedrecista wrote:You will have work very soon with Critter 1.4 and Komodo 4 releases!

Regards from Spain.

Ajedrecista.
Hi, when?

Regards
MM
User avatar
Ajedrecista
Posts: 1985
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

400 games in single thread mode.

Post by Ajedrecista »

Hello again:
mcostalba wrote:
Ajedrecista wrote: I tested 32-bit (not 64-bit) and AFAIK my computer (Intel Pentium D930 of 2006) also does not support POPCNT. Both engines used 2 cores (instead of one, and run two parallel matches, because then 111026 speed was more than double than 2.1.1 speed, like if 2.1.1 was using one thread and 111026 was using two... this was very unfair and biased) as Engines.lbe.txt says (this is why I included it).
I'd suggest to retest in single thread mode...
I try a fast test with the same opponents and I got the following result:

Code: Select all

SF 111026  --->  219/400  (+164  -126  =110)
SF 2.1.1 JA  --->  181/400  (+126 -164 =110)
The error bar is huge (around ± 30 Elo), so this test is not very serious. Now, the difference is around +33. Test_2 is ready to download here:

Test_2.rar (2.63 KB)

I think SF needs a little more improvements before the next official release because +33 or +37 does not seem realistic (I surely introduce some bias in the way I test, as Clemens stated... I knew it, but I wanted to add my grain of salt). Good luck with SF development.

Regards from Spain.

Ajedrecista.
User avatar
Ajedrecista
Posts: 1985
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re:

Post by Ajedrecista »

Hello Maurizio:
MM wrote:
Ajedrecista wrote:You will have work very soon with Critter 1.4 and Komodo 4 releases!

Regards from Spain.

Ajedrecista.
Hi, when?

Regards
Although this is a little off-topic: Don Dailey (Komodo programmer) said that Komodo team will try to release Komodo 4 MP (multi-processor) around mid-december; Richard Vida (Critter programmer) also said that he will try to release Critter 1.4 in this current month.

Regards from Spain.

Ajedrecista.
MM
Posts: 766
Joined: Sun Oct 16, 2011 11:25 am

Re:

Post by MM »

Ajedrecista wrote:Hello Maurizio:
MM wrote:
Ajedrecista wrote:You will have work very soon with Critter 1.4 and Komodo 4 releases!

Regards from Spain.

Ajedrecista.
Hi, when?

Regards
Although this is a little off-topic: Don Dailey (Komodo programmer) said that Komodo team will try to release Komodo 4 MP (multi-processor) around mid-december; Richard Vida (Critter programmer) also said that he will try to release Critter 1.4 in this current month.

Regards from Spain.

Ajedrecista.
:o thanks

Regards
MM
User avatar
Ajedrecista
Posts: 1985
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

(3000 games): SF 111026 (w32) vs. IH 47c+ GH (w32).

Post by Ajedrecista »

Hello again:

I had access to a faster hardware (Intel i5-760 at 2.8 GHz) and I ran another test with SF 111026. As usual, it is not a professional test. I know that Clemens Keck will not like this test (sorry), but I have improved some things:

a) This was not a brother fight. OTOH I have not tested SF 111026 against a bunch of engines... It was a direct match against IH 47c+ GH (w32).

b) The time control was a bit slower (2" + 0.1" / move) with 3000 games. Now depths and kN/s are greater than in previous tests (faster hardware also helped, of course).

c) Four simultaneous games in single core mode... it took me a while, anyway.

4000 openings (EPD) by Dann Corbit was used. Here are the results:

Code: Select all

LittleBlitzer 2.5:

StockFish 111026 w32	1355/3000	(+891 -1181 =928)
IvanHoe 47c+ GH w32	 1645/3000	(+1181 -891 =928)
So, ~ -33.7 ± 10.6 Elo (with almost 31% of draws, very stable during all the match). More than 10% of draws are by adjudication, so maybe number of moves for the adjudication should be 250 instead 150 in future tests (if any). The evolution of the match was a little strange from my POV: SF started very well (tied with IH), then went between -20 and -25 Elo (fully expected) from 200 games (more less); but with more than 1000 games (IIRC) the difference went to -40, and from 2000+ games it started a slow recovery, until the final -33.7... It always happens 'something' in my amateur tests (sigh).

I do not know if this difference is realistic at this time control, given the fact that IPPOLIT family engines perform very good at short time controls. I only try to add a grain of salt, but of course this test should not be considered seriously: it is only a try.

I downloaded this IH version from here. This link is given in Chess2U Forum.

More details of this test can be downloaded from Mediafire:

Test_3.rar (511.36 KB)

I renamed SF and IH executables for simplification in Engines.lbe and Engines.lbe.txt files. It should have no impact in the match.

Good luck in the development of StockFish!

Regards from Spain.

Ajedrecista.