Stockfish 2.1 running for the IPON

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Stockfish 2.1 running for the IPON

Post by Laskos »

IWB wrote:Hi

Yes, I have that rate of time losses and 1% is actually nothing I worrry about.

In generall ALL engines have time losses (or at least the majority of them). Some engines more, some less. I play so many games that I am getting used to it. :-)

There are different kind of time losses. One kind, like here for Stockfish, is something I do not care about at all - because it is 0.6% AND the engine continues to play which is very important in my automatic setup. So I consider this as a feature of an engine which is a part of its playing strength. Another kind is a loss of time with a crash which prevents the GUI to continue. As I have to stop and restart some parts of tourney again and again.
As I accept time losses (I am not a developer, why should I) as something normal (as it happens to humans at 5+ 3 too) I start to worry at a certain level which is different for a 2600 or 3000 Elo engine.
In short, I don't see why to worry about time losses as a tester as long as the game series continues.

Things are different if I am betatesting or if I would develop an engine of course ...

Bye
Ingo
This must be a GUI issue. I used LittleBlitzer 2.72 to play ridiculously fast games 0.3s + 0.03s with SF 2.1 and SF 2.01. Average game length is ~2.9s, average time per move ~0.023s. Not a single loss on time or illegal move in 2100 games!


Games Completed = 2100 of 2100 (Avg game length = 2.889 sec)
Settings = RR/16MB/300ms+30ms/M 10000cp for 1000 moves, D 200 moves/PGN:D:\LittleBlitzer\swcr.pgn(3395)
Time = 7389 sec elapsed, 0 sec remaining
1. Stockfish 2.1 JA 64bit 1247.5/2100 883-488-729 (L: m=488 t=0 i=0 a=0) (tpm=23.4 d=8.8 nps=915073)
2. Stockfish 2.0.1 JA 64bit 852.5/2100 488-883-729 (L: m=883 t=0 i=0 a=0) (tpm=22.9 d=8.2 nps=847557)


Maybe it would be a good idea to switch to another GUI, there could be more to it if ~1% of the games are lost on time, for example illegal move losses or skewed legal finished games because of the wrongs of the GUI time management.

Kai
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: Stockfish 2.1 running for the IPON

Post by IWB »

Hi
Laskos wrote: ...
Maybe it would be a good idea to switch to another GUI, there could be more to it if ~1% of the games are lost on time, for example illegal move losses or skewed legal finished games because of the wrongs of the GUI time management.
I know that the Classic is a bit "lazy" with timing around 0.00. Hiarcs added an extra feature for this.
Besides that, right now I am playing on the CB GUI, and I see that Stockfish is playing very close to 0.00 while most other engines keep some time left.

But keep in mind, I did not complain (as 0.6% is nothing to worry about) I was asked and I cant switch as there is nothing else which offers what I need?

The GUI has to...
1. ... load settings of an engnine (no book/1 thread ... )
2. ... load that engine from a central network drive.
3. ... start multiple instances on one computer
4. ... play on one single tourney
5. ... play over a network (on that tourney)
6. ... has to be usable (I hate config files!)
7. ... produce an estimated Elo output during the tourney (for all running instances on every computer)

I played a bit around with Ltitle Blitzer latlely and unfortunately it cant handle all the needs I have and the usage is ...

But again, Stockfish gained a bit from this rather than loosing :-)

Bye
Ingo
lucasart
Posts: 3243
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: Stockfish 2.1 running for the IPON

Post by lucasart »

Laskos wrote: This must be a GUI issue. I used LittleBlitzer 2.72 to play ridiculously fast games 0.3s + 0.03s with SF 2.1 and SF 2.01. Average game length is ~2.9s, average time per move ~0.023s. Not a single loss on time or illegal move in 2100 games!
Thank you so much. I've tried LittleBlitzer 2.72 and it's amazing!
User avatar
nthom
Posts: 112
Joined: Thu Mar 09, 2006 6:15 am
Location: Australia

Re: Stockfish 2.1 running for the IPON

Post by nthom »

lucasart wrote:
Laskos wrote: This must be a GUI issue. I used LittleBlitzer 2.72 to play ridiculously fast games 0.3s + 0.03s with SF 2.1 and SF 2.01. Average game length is ~2.9s, average time per move ~0.023s. Not a single loss on time or illegal move in 2100 games!
Thank you so much. I've tried LittleBlitzer 2.72 and it's amazing!
Nice to see it getting some good use. I initially wrote it to make my engine LittleThought better, which it definitely did, but then my brain ran out of improvements to test :)
zamar
Posts: 613
Joined: Sun Jan 18, 2009 7:03 am

Re: Stockfish 2.1 running for the IPON

Post by zamar »

Thanks for your tests Ingo!
IWB wrote: Initial Elo increase of 7 Elo. I consider this mainly as an engine code cleaning/bug fix.
I don't quite agree. If you compare the results of version 1.7, 1.8, 1.9, 2.0, 2.1, you can see that ELO is raising around 10 points per release. So SF is climbing slowly but surely. But clearly we are getting closer to a local optimum and it's getting harder and harder to improve the engine.
Joona Kiiski
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Stockfish 2.1 running for the IPON

Post by Laskos »

zamar wrote:Thanks for your tests Ingo!
IWB wrote: Initial Elo increase of 7 Elo. I consider this mainly as an engine code cleaning/bug fix.
I don't quite agree. If you compare the results of version 1.7, 1.8, 1.9, 2.0, 2.1, you can see that ELO is raising around 10 points per release. So SF is climbing slowly but surely. But clearly we are getting closer to a local optimum and it's getting harder and harder to improve the engine.
We need to wait for other results. I played for fun 3000 games gauntlets with SF 2.1 and SF 2.01 against 5 strong opponents. The games are extremely fast, 0.3s + 0.03s, and SF generally performs miserably at this short TC, I would not recommend anybody this short TC for absolute strength probe, fortunately I need only the difference between SF 2.1 and SF 2.01. Here are LittleBlitzer results, not a single loss on time or illegal move:

SF 2.1 gauntlet

Games Completed = 3000 of 10000 (Avg game length = 3.489 sec)
Settings = Gauntlet/16MB/300ms+30ms/M 10000cp for 1000 moves, D 200 moves/PGN:K:\LittleBlitzer-2.51\swcr.pgn(3395)
Time = 12820 sec elapsed, 29914 sec remaining
1. Stockfish 2.1 JA 64bit 1028.5/3000 702-1645-653 (L: m=1645 t=0 i=0 a=0) (D: r=326 i=134 f=167 s=17 a=9) (tpm=23.8 d=8.8 nps=936543)
2. Komodo64 1.3 JA 297.0/600 214-220-166 (L: m=220 t=0 i=0 a=0) (D: r=67 i=26 f=63 s=8 a=2) (tpm=31.9 d=7.5 nps=860190)
3. Houdini 1.5a x64 493.5/600 450-63-87 (L: m=63 t=0 i=0 a=0) (D: r=31 i=18 f=33 s=3 a=2) (tpm=29.8 d=9.5 nps=1405853)
4. Deep Rybka 4.1 x64" 410.0/600 338-118-144 (L: m=118 t=0 i=0 a=0) (D: r=67 i=39 f=31 s=5 a=2) (tpm=29.6 d=6.3 nps=44345)
5. Critter 1.01 64-bit 288.5/600 214-237-149 (L: m=237 t=0 i=0 a=0) (D: r=100 i=29 f=18 s=0 a=2) (tpm=23.4 d=9.3 nps=1030109)
6. Ivanhoe B47cBx64-1 482.5/600 429-64-107 (L: m=64 t=0 i=0 a=0) (D: r=61 i=22 f=22 s=1 a=1) (tpm=30.9 d=10.1 nps=1015145)


SF 2.01 gauntlet

Games Completed = 3000 of 10000 (Avg game length = 3.295 sec)
Settings = Gauntlet/16MB/300ms+30ms/M 10000cp for 1000 moves, D 200 moves/PGN:K:\LittleBlitzer-2.51\swcr.pgn(3395)
Time = 12140 sec elapsed, 28328 sec remaining
1. Stockfish 2.0.1 JA 64bit 826.0/3000 544-1892-564 (L: m=1892 t=0 i=0 a=0) (D: r=303 i=120 f=118 s=15 a=8) (tpm=23.1 d=8.2 nps=843158)
2. Komodo64 1.3 JA 366.0/600 294-162-144 (L: m=162 t=0 i=0 a=0) (D: r=75 i=14 f=45 s=8 a=2) (tpm=31.9 d=7.5 nps=838500)
3. Houdini 1.5a x64 536.5/600 508-35-57 (L: m=35 t=0 i=0 a=0) (D: r=18 i=18 f=17 s=2 a=2) (tpm=29.6 d=9.2 nps=1372005)
4. Deep Rybka 4.1 x64" 437.5/600 374-99-127 (L: m=99 t=0 i=0 a=0) (D: r=60 i=42 f=21 s=3 a=1) (tpm=29.5 d=6.3 nps=43432)
5. Critter 1.01 64-bit 325.5/600 255-204-141 (L: m=204 t=0 i=0 a=0) (D: r=93 i=33 f=12 s=2 a=1) (tpm=23.4 d=9.4 nps=1022524)
6. Ivanhoe B47cBx64-1 508.5/600 461-44-95 (L: m=44 t=0 i=0 a=0) (D: r=57 i=13 f=23 s=0 a=2) (tpm=30.9 d=10.0 nps=997745)


Elo SF 2.1 3140
Elo SF 2.01 3111


+29 +/- 11 Elo points (95% error margins) for SF 2.1 against SF 2.01 in separate gauntlets.

Kai
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Stockfish 2.1 running for the IPON

Post by mcostalba »

Laskos wrote: Elo SF 2.1 3140
Elo SF 2.01 3111


+29 +/- 11 Elo points (95% error margins) for SF 2.1 against SF 2.01 in separate gauntlets.
I think is the final bayes elo passage that takes the score down, for instance here we have a nice +24 after 1700 games

http://cegt.siteboard.eu/f6t59-testing- ... h-2-1.html

But when calculated with bayes elo we have just a +13 ELO, from 3131 to 3144.

http://www.husvankempen.de/nunn/40_4_Ra ... liste.html
Frank Quisinsky
Posts: 7190
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: Bayesian / ELOStat ...

Post by Frank Quisinsky »

Actual SWCR with Bayesian:

Code: Select all

   Rank Name                            Elo    +    - games score oppo. draws 
  4   11 Stockfish 1.9.1 JA x64         2906   18   17  1280   73%  2729   33% 
  -   12 Stockfish 1.8.0 JA x64         2906   18   18  1200   75%  2715   33% 
  -   13 Stockfish 2.0.1 JA x64         2905   19   19  1019   70%  2763   38% NEW -  1
  -   17 Stockfish 1.7.1 JA x64         2896   19   19  1120   76%  2706   34% 
  -   18 Stockfish 1.9.1 JA w32         2893   21   20   960   78%  2689   31% 
  -   20 Stockfish 2.0.1 JA w32         2873   20   20  1000   76%  2680   34% 
  -   22 Stockfish 1.7.1 JA w32         2872   18   18  1200   75%  2687   31% 
  -   24 Stockfish 1.8.0 JA w32         2861   20   20   920   74%  2695   37% 
  -   29 Stockfish 1.6.3 JA w32         2824   18   18  1080   71%  2678   36% 
  -   36 Stockfish 1.6.0 JA w32         2799   19   19   960   68%  2685   39% 
  -   58 Stockfish 1.5.1 JA w32         2729   20   19   840   59%  2669   43% 
With ELOstat ...

Code: Select all

    Program                          Elo    +   -   Games   Score   Av.Op.  Draws
 12 Stockfish 1.9.1 JA w32         : 2909   20  20   960    77.7 %   2692   31.0 %
 13 Stockfish 2.0.1 JA x64         : 2907   17  17  1019    69.5 %   2763   38.0 %
 15 Stockfish 1.8.0 JA x64         : 2904   17  17  1200    74.8 %   2715   32.5 %
 16 Stockfish 1.7.1 JA x64         : 2902   18  17  1120    75.7 %   2705   33.7 %
 18 Stockfish 1.9.1 JA x64         : 2901   16  16  1280    72.9 %   2728   33.4 %
 20 Stockfish 1.7.1 JA w32         : 2884   18  17  1200    75.3 %   2691   30.7 %
 21 Stockfish 2.0.1 JA w32         : 2884   19  18  1000    76.3 %   2680   33.9 %
 22 Stockfish 1.8.0 JA w32         : 2877   19  18   920    73.7 %   2698   37.0 %
 27 Stockfish 1.6.3 JA w32         : 2839   17  17  1080    71.3 %   2681   36.1 %
 33 Stockfish 1.6.0 JA w32         : 2815   18  17   960    67.5 %   2688   39.2 %
 55 Stockfish 1.5.1 JA w32         : 2738   18  18   840    59.3 %   2672   42.5 %
40-minutes games with ponder = on.
Around the double of time if you compare with IPON.

1. 32bit versions have bigger jumpings. Seems to be clear that Jim try out different compiler settings.

2. Stockfish is stronger with shorter time controls. With longer time controls your engine lost around 10-15 ELO compare to the others. Easy to see in my older blitz list because I had here the same results Ingo had.

But more important is the program for the calculations. With a higher remis quote Bayesian will give you a fewer rating.

And from version to version Stockfish lost tactical strength. Can be see if you make analyzes with average moves by game.

How many games won Stockfish between move number:
30-40
40-50
50-60
60-70

Same you can do with the remis games.

More points in endgames, fewer points with higher versions in the late middlegame.

Possible analyzes because I played without resign factor. More statistics are possible.

So far:
Most interesting Stockfish version is absolutly clear for me version 1.7.1.
Plays the most powerfull computer chess. Newer versions are a little bit stronger but lost a lot of the great Stockfish / Glaurung playing style.

Interesting is version 1.9.1 for 32bit. Here Jim found a good setting for his compile (think so).

Important is furthermore to test vs. many different engines. I think the Stockfish team tested against a handfull engines only. More important as to have many games is to test vs. many different opponents.

So, perhaps the interna test should be changed !!

Best
Frank
Frank Quisinsky
Posts: 7190
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: Bayesian / ELOStat ...

Post by Frank Quisinsky »

Hi Marco,

made no sense to pick up good results from list a or b.

Ingo switched the processor from Intel to AMD for 4-5 months. From Q6600 to an 6 Core AMD system. Could be 4-5 points.

CEGT don't have many games from all the Stockfish versions I have. Furthermore without ponder! Ponder mode can be a reason too.

Time control in SWCR is higher as in CEGT. CCRL have the highest time control so far.

I think to find out the differences is very complicated. More sense is to search why Stockfish lost much of his great style.

Download the SWCR database. Please make statistics with my database. A good question is: Vs. which opponents Stockfish have the most problems (made fewer points as ELO give out). Search 10 opponents Stockfish have problems and test vs. this 10 engines 100 games per match if you released a new version without to test vs. IvanHoe / Rybka / and Houdini.

Used 40 in 3 and 40 in 10 and you can see very easy .... what is happen.

Best
Frank

PS: Do you fixed the KQ-KR problem? SF 2.0.1 can't set mate with KQ - KR.

But all in all, thanks for your new versions. I think the learn code is more important for your team as to try to make more ELO with a new version. And I believe LMR in combination with the time control isn't a possible reason for the 10-15 ELO points more from version 1.7.1 to 2.0.1.

With higher or fewer LMR more points with Houdini or Ivanhoe or fewer points vs. Houdini or IvanHoe but the same results vs. engines in the middle of the SWCR rating list. If I compare the late middlegame with statistics to other strong engines, Stockfish gave more points vs. weaker opponents as IvanHoe or Rybka. Clearly more points. Endgame is great without endgame tablebases.

Made no sense to test Stockfish vs. Rybka, Houdini and IvanHoe because you will not see an possible + for a new version.

My opinion!
Frank Quisinsky
Posts: 7190
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: Bayesian / ELOStat ...

Post by Frank Quisinsky »

Hi,

I forgot ...

Version 2.1.0 is after 760 games I played with the x64 version on my 2 test Q9950 systems (have 6, 4 I am using for SWCR) 6 ELO stronger as versions 1.8.0, 1.9.1 and 2.0.1. Same conditions as in SWCR.

Best
Frank