first CEGT result of stockfish is not good

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: first CEGT result of stockfish is not good

Post by mcostalba »

Werner wrote:
mcostalba wrote:In your tests do you take in account the % of lost/wins on time in the total amount of lost/wins?

Do you have some kind of trigger or warning to avoid these kind of (seemingly) artifacts ?

Thanks
Marco
Hi Marco,
a) I do not count lost games on time - normally I delete these games and if it´s too much I stop test and contact the author. I try to avoid it of course.
b) the games sent to the CEGT Admin for the list are always controlled for
- doublettes, - lost on time, - line results; and sometimes the admin makes these tests for the whole database too. I think Leo makes such reports in his forum.

At the moment I am running the Stockfish matches inside Chessbase GUI. No lost on time till now. So if there are problems with Arena 2.01 I will not change. Perhaps I make a test with Shredder GUI?
Thanks for the answer. It will help me to test better. Test it is really very difficult for me. Much more then coding. Perhaps because I am not experienced in this but it seems to me a very trick and complex subject: it is very easy to get wrong results, very slippery.

BTW you do all this filtering manually? Current GUIs do not have some feature that allows you to filter out according to these rules automatically?
User avatar
Werner
Posts: 2994
Joined: Wed Mar 08, 2006 10:09 pm
Location: Germany
Full name: Werner Schüle

Re: first CEGT result of stockfish is not good

Post by Werner »

mcostalba wrote:BTW you do all this filtering manually? Current GUIs do not have some feature that allows you to filter out according to these rules automatically?
Hi Marco,
I filter manually every week with CB light or Scid.

And here is the next result:

1 Fruit 2.3.5m x64 2CPU +13/-12/=25 51.00% 25.5/50
2 Stockfish 1.3 JA x64 2CPU +12/-13/=25 49.00% 24.5/50
Werner
zamar
Posts: 613
Joined: Sun Jan 18, 2009 7:03 am

Re: first CEGT result of stockfish is not good

Post by zamar »

Thank you for posting the results here, Werner!

Fruit has always been tough opponent for Stockfish, so I don't believe there is anything seriously wrong with your tests.

Looking at the CEGT archives for Stockfish 1.2 performance against Fruit family:

Stockfish 1.2 1CPU: (2832)
Fruit 2.3.5m p15 w32 1CPU - 2897 50 6 18 26 30.0 2750
Toga II 1.4.3 JDb19a 1CPU - 2871 50 10 15 25 35.0 2764
Grapefruit 1.0 1CPU - 2858 50 12 19 19 43.0 2810
Fruit 2.3.3f Beta - 2838 50 17 25 8 59.0 2900
Fruit 05/11/03 - 2820 50 14 18 18 46.0 2793

Here old Stockfish is clearly losing to fruit family.

Stockfish 1.2 2CPU: (2912)
Cyclone 3.1 4CPU - 2921 50 10 24 16 44.0 2880
Cyclone 2.3 2CPU - 2919 50 14 24 12 52.0 2933
Toga II 1.4 Beta5c 2CPU - 2892 50 16 22 12 54.0 2919
121 Fruit 2.4 Beta A x64 1CPU - 2885 50 12 28 10 52.0 2899

Here matches are more balanced, but old Stockfish is still underperforming a bit.

Stockfish 1.2 4CPU: (2965)

35 Fruit 2.3.5m x64 4CPU p15 - 2971 50 12 23 15 47.0 2950
47 Fruit 2.4 Beta A x64 4CPU - 2958 50 15 27 8 57.0 3006

Here old Stockfish is (a bit suprisingly) little better, but number of games is very low to draw any conclusion.

So if there hopefully is some improvement in Stockfish 1.2 -> 1.3, your results are well in error bounds :)
Joona Kiiski
User avatar
Werner
Posts: 2994
Joined: Wed Mar 08, 2006 10:09 pm
Location: Germany
Full name: Werner Schüle

Re: first CEGT result of stockfish is not good

Post by Werner »

Hi Marco,
the first 1CPU result is ok.
Hope there are no problems with the scaling.

Code: Select all

1   Stockfish 1.3 x64 1CPU     +17/-15/=18 52.00%   26.0/50
2   Fruit 2.3.5m p15 w32 1CPU  +15/-17/=18 48.00%   24.0/50
Werner
El Gringo
Posts: 118
Joined: Tue Nov 27, 2007 4:01 pm

Re: first CEGT result of stockfish is not good

Post by El Gringo »

Werner wrote:
mcostalba wrote:In your tests do you take in account the % of lost/wins on time in the total amount of lost/wins?

Do you have some kind of trigger or warning to avoid these kind of (seemingly) artifacts ?

Thanks
Marco
Hi Marco,
a) I do not count lost games on time - normally I delete these games and if it´s too much I stop test and contact the author. I try to avoid it of course.
b) the games sent to the CEGT Admin for the list are always controlled for
- doublettes, - lost on time, - line results; and sometimes the admin makes these tests for the whole database too. I think Leo makes such reports in his forum.

At the moment I am running the Stockfish matches inside Chessbase GUI. No lost on time till now. So if there are problems with Arena 2.01 I will not change. Perhaps I make a test with Shredder GUI?

Hi Marco, Werner,

I"m playing with stockfish 1.3 JA under Shredder GUI, no time losses in the past 43 games.

Best
Johan
CEGT team