(1000 games) SF 111026 vs. SF 2.1.1 (both of them 32-bit).

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

User avatar
Ajedrecista
Posts: 1981
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

(1000 games) SF 111026 vs. SF 2.1.1 (both of them 32-bit).

Post by Ajedrecista »

Hello to everybody:

Using LittleBlitzer 2.5 (not the last version) I run a match of 1000 games in tournament mode (2500 ms for 40 moves, repeating). I suppose that this is a strange time control, but results are interesting, anyway. I used 4000 openings (EPD) by Dann Corbit.

I know that SF team do not like intermediate releases (like this 'old' 111026 compile) but the increase in speed is so notorious (as everybody know) and I encourage them to investigate in the reason/s of this speed gain (that should worth a few Elo). Other changes between 2.1.1 version and current sources must earn more Elo, so this Elo gain is not only due to the speed (that is what I think).

Code: Select all

SF 111026  --->  553/1000  (+411 -305 =284)
SF 2.1.1 JA  --->  447/1000  (+305 -411 =284)
A nice +37 improvement... although I am sure that my engine testing skills are very bad (this was my first test) and it is surely biased. Match results (and more) can be downloaded:

http://www.mediafire.com/?xb2mzp41u9z56h6

Long live to StockFish!

Image

Regards from Spain.

Ajedrecista.
Frank Quisinsky
Posts: 6811
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: (1000 games) SF 111026 vs. SF 2.1.1 (both of them 32-bit

Post by Frank Quisinsky »

Hi,

after my information much of our programmres used such time controls and do the same you do. It's an interesting information, yes!

But I believe we cann't say 37 ELO stronger as the preview version.

Looks in SWCR, the PHQ setting.
PHQ = version 2.1.1 with much changes in settings three persons try out for some weeks.

This PHQ setting is a bit weaker with very fast time controls as version 2.1.1 but 19 ELO stronger so far with longer time controls. Produced 8% short win games (very aggressive), version 2.1.1 produced 5.5%. Remis quote from PHQ is smaler and other things. In my opinion PHQ is playing so far the most interesting tactical chess.

But all in all ...
After all what I read to the test versions of Stockfish, it seems clearly improved. Nice to see that the SF project is going on. I am very interesting in Stockfish (one of my favorits) and will test the new version as soon as possible and if an official version is available.

Thanks for your test!

Best
Frank
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: (1000 games) SF 111026 vs. SF 2.1.1 (both of them 32-bit

Post by mcostalba »

Ajedrecista wrote:but the increase in speed is so notorious (as everybody know)
Actually I am not sure to know that the speed has increased. We didn't do anything special regarding raw speed, so at first glance I'd think SF 111026 is a faster binary than official SF due to more optimized compilation.

"As everybody know" ;-) Jim compiles do not take advantage of POPCOUNT and mss3 support, this is a choice of Jim mainly because he doesn't have access to a machine that supports POPCNT, that's the reason there isn't another special built SF_x86-64sse_JA.exe apart from the base (and universally compatible) SF_x86-64_JA.exe.

If Jim is willing to test/verify this hypothesis he could simply grab current sources, do a trial compile and see if it is faster than 2.1.1, if it is not then it means the difference in your test is due to different compilation. If instead it really is then probably it is about time to do a new release...
Frank Quisinsky
Posts: 6811
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: (1000 games) SF 111026 vs. SF 2.1.1 (both of them 32-bit

Post by Frank Quisinsky »

Hi Marco,

if you need a test with SWCR conditions from a pre-release version of Stockfish I can do that after the my IvanHoe event. My IvanHoe event need from today around 8 days.

Thanks again for the work you do around Stockfish and my regards to the others Stockfish team members.

Best
Frank
User avatar
Ajedrecista
Posts: 1981
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: (1000 games) SF 111026 vs. SF 2.1.1 (both of them 32-bit

Post by Ajedrecista »

Hello:
mcostalba wrote:
Ajedrecista wrote:but the increase in speed is so notorious (as everybody know)
Actually I am not sure to know that the speed has increased. We didn't do anything special regarding raw speed, so at first glance I'd think SF 111026 is a faster binary than official SF due to more optimized compilation.

"As everybody know" ;-) Jim compiles do not take advantage of POPCOUNT and mss3 support, this is a choice of Jim mainly because he doesn't have access to a machine that supports POPCNT, that's the reason there isn't another special built SF_x86-64sse_JA.exe apart from the base (and universally compatible) SF_x86-64_JA.exe.

If Jim is willing to test/verify this hypothesis he could simply grab current sources, do a trial compile and see if it is faster than 2.1.1, if it is not then it means the difference in your test is due to different compilation. If instead it really is then probably it is about time to do a new release...
@Frank: I agree with you: +37 seems a little exaggerated... maybe +20 if we are lucky. Take in mind that my match could be biased in some way (I do not know). If you look Evolution_of_the_match.txt you will see that 2.1.1 made an important comeback and maybe the final results would be closer with more games played. The way I calculate the uncertainties seems right: looking here I get ~ +3.1 ± 4.52 (very close).

AFAIK SF 2.1.1 PHQ is a modified 2.1.1; OTOH 111026 version (and others) include improvements of newer sources, so it is not another 2.1.1 modification: it includes many changes of the souce uploaded to GitHub.

I suppose you refer to draw ratio when you say remis quote... I like more draws and less loses, so in view of results I prefer IH 46fB instead of IH 46f (just a matter of taste). I guess that most people prefer just the opposite (less drawn games).

And finally I agree again: SF has improved a little (how many Elo?) and I like this engine, not for its evals (sometimes yes) but for the mainlines it gives.

@Marco: I wrote everybody because many people in Chess2U, ImmortalChess... commented that. Those people agree that an optimized compilation is the reason of this speed gain.

I tested 32-bit (not 64-bit) and AFAIK my computer (Intel Pentium D930 of 2006) also does not support POPCNT. Both engines used 2 cores (instead of one, and run two parallel matches, because then 111026 speed was more than double than 2.1.1 speed, like if 2.1.1 was using one thread and 111026 was using two... this was very unfair and biased) as Engines.lbe.txt says (this is why I included it).

Good luck with the project!

Regards from Spain.

Ajedrecista.
Last edited by Ajedrecista on Sat Dec 10, 2011 2:31 pm, edited 1 time in total.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: (1000 games) SF 111026 vs. SF 2.1.1 (both of them 32-bit

Post by mcostalba »

Ajedrecista wrote: I tested 32-bit (not 64-bit) and AFAIK my computer (Intel Pentium D930 of 2006) also does not support POPCNT. Both engines used 2 cores (instead of one, and run two parallel matches, because then 111026 speed was more than double than 2.1.1 speed, like if 2.1.1 was using one thread and 111026 was using two... this was very unfair and biased) as Engines.lbe.txt says (this is why I included it).
I'd suggest to retest in single thread mode...
User avatar
Ajedrecista
Posts: 1981
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: (1000 games) SF 111026 vs. SF 2.1.1 (both of them 32-bit

Post by Ajedrecista »

Hello again:
mcostalba wrote:
Ajedrecista wrote: I tested 32-bit (not 64-bit) and AFAIK my computer (Intel Pentium D930 of 2006) also does not support POPCNT. Both engines used 2 cores (instead of one, and run two parallel matches, because then 111026 speed was more than double than 2.1.1 speed, like if 2.1.1 was using one thread and 111026 was using two... this was very unfair and biased) as Engines.lbe.txt says (this is why I included it).
I'd suggest to retest in single thread mode...
This is what I wanted, but I was unable because of the huge difference in speeds. If I manage to test in single thread mode, you will know it.

Regards from Spain.

Ajedrecista.
Frank Quisinsky
Posts: 6811
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: (1000 games) SF 111026 vs. SF 2.1.1 (both of them 32-bit

Post by Frank Quisinsky »

Hi,

I hope that with newer SF versions we can produced the style the PHQ versions have with the setting changes we made (PHQ stand for three persons, for three ideas).

Its so nice to see PHQ in tactical positions.

Perhaps, I don't know, we can produced with the same settings and the newer sources such a style. Must be test in detail :-)

Stefan Pohl tested the same version you tested against more participant with a comparable result. I believe Stefan set his results in this forum. Let me search ...

http://talkchess.com/forum/viewtopic.ph ... 00&t=41308

Stefans results from are each time very interesting for myself, I don't know how Stefan is working but I can produced each of his results with things I do. Around the same results you have ...

Best
Frank
Hugo
Posts: 782
Joined: Tue Dec 01, 2009 11:10 am

Re: (1000 games) SF 111026 vs. SF 2.1.1 (both of them 32-bit

Post by Hugo »

Hello

This testformat is not giving a real estimation of the playingstrenth.
First mistake : brother fight makes less sense. Minimum 10 different opponents make more sense
Second mistake: only 1000 games with that fast timecontoll is 10 time too less. I would recommend minimum 10.000 games.
I tested this Stockfish 1111026 vs. 17 different opponents with 5 +3 ponder ON. Each engine one core and 64bit. Result was 2922 after 679 games.
The original Stockfish 2.1.1 was 2930.
I did wonder about that result, because I was VERRY impressed of this 111026 engine using on a quad at playchess for games with XXL time control. I was loosing only 1 game. could win some impressive games vs houdini. Thats why I expected a clear plus to SF 2.1.1.


Regards, Clemens Keck
Frank Quisinsky
Posts: 6811
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: (1000 games) SF 111026 vs. SF 2.1.1 (both of them 32-bit

Post by Frank Quisinsky »

Hi Clemens,

that's right, but each compare between the same engine will give you very fast a good first look. Not more not less!

I wrote about it in CSS Forum for around 1 year.
I need around 24-26 opponents for my 40 games SWCR matches. With more as 26 opponents the statistic curve will give me not a big advantage. But the advantage with fewer as 26 will be propositional big. I create that with database simulations.

More opponents are in my opinion much more important as to play more games with fewer opponents. Unfortunateley, this calculation isn't included in ELOstat and Bayesian after my information.

A match with 5.000 games between two engines and you will get a error bar from +- 8. Right is ... one opponent only = errorbar should be +-64 after 5.000 games. And with more and more opponents it will be better, not with more games.

Example 10.000 games between two engines only.
Errorbar is now +- 62 and after 5.000 games = +-64 ... after my simulation.

Again, but for a fast simulation between the same engine in different versions I think this test is a good indicator.

Best
Frank

It's a pity that so many persons know what the a program gave out without to bring the own brain in eng-eng test position :-)