first CEGT result of stockfish is not good

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

Uri Blass
Posts: 10900
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

first CEGT result of stockfish is not good

Post by Uri Blass »

User avatar
Werner
Posts: 2994
Joined: Wed Mar 08, 2006 10:09 pm
Location: Germany
Full name: Werner Schüle

Re: first CEGT result of stockfish is not good

Post by Werner »

Hi,
I hope these are only some statistical errors! To be sure I will change the GUI to Arena 2.01 for the next matches.
Here a short update:
Stockfish 1.3 x64 1CPU - Fruit 2.3.5m w32 1CPU = 15-15 (2897)
Stockfish 1.3 x64 2CPU - Fruit 2.3.5m x64 2CPU = 23,5-21,5 (2922)
Stockfish 1.3 x64 4CPU - Deep Sjeng WC 2008 x64 4CPU = 11-7 (2874) :cry:
Werner
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: first CEGT result of stockfish is not good

Post by mcostalba »

Werner wrote:Hi,
I hope these are only some statistical errors! To be sure I will change the GUI to Arena 2.01 for the next matches.
Here a short update:
Stockfish 1.3 x64 1CPU - Fruit 2.3.5m w32 1CPU = 15-15 (2897)
Stockfish 1.3 x64 2CPU - Fruit 2.3.5m x64 2CPU = 23,5-21,5 (2922)
Stockfish 1.3 x64 4CPU - Deep Sjeng WC 2008 x64 4CPU = 11-7 (2874) :cry:
Stockfish is too slow for Arena...often loose on time. :)

Werner, I would like to ask you as a testing expert, one question I have on lost on time games.

In my internal tests I made up a trick to try to avoid loosing on time or loosing/winning by accident.

I have called it "Last Seconds Noise filtering" it simply works so that when at very few seconds from the time limit one of the two engine is very high in score (and the other very low) the match is adjudicated to the engine in advantage. This avoids situations where, in the final rush, the winning engine blunders and loose or draws a practically won match also if it played in a superior way.

And now the question.

In your tests do you take in account the % of lost/wins on time in the total amount of lost/wins?

Do you have some kind of trigger or warning to avoid these kind of (seemingly) artifacts ?

Thanks
Marco
User avatar
Werner
Posts: 2994
Joined: Wed Mar 08, 2006 10:09 pm
Location: Germany
Full name: Werner Schüle

Re: first CEGT result of stockfish is not good

Post by Werner »

mcostalba wrote:In your tests do you take in account the % of lost/wins on time in the total amount of lost/wins?

Do you have some kind of trigger or warning to avoid these kind of (seemingly) artifacts ?

Thanks
Marco
Hi Marco,
a) I do not count lost games on time - normally I delete these games and if it´s too much I stop test and contact the author. I try to avoid it of course.
b) the games sent to the CEGT Admin for the list are always controlled for
- doublettes, - lost on time, - line results; and sometimes the admin makes these tests for the whole database too. I think Leo makes such reports in his forum.

At the moment I am running the Stockfish matches inside Chessbase GUI. No lost on time till now. So if there are problems with Arena 2.01 I will not change. Perhaps I make a test with Shredder GUI?
Werner
Ryan Benitez
Posts: 719
Joined: Thu Mar 09, 2006 1:21 am
Location: Portland Oregon

Re: first CEGT result of stockfish is not good

Post by Ryan Benitez »

mcostalba wrote:
Werner wrote:Hi,
I hope these are only some statistical errors! To be sure I will change the GUI to Arena 2.01 for the next matches.
Here a short update:
Stockfish 1.3 x64 1CPU - Fruit 2.3.5m w32 1CPU = 15-15 (2897)
Stockfish 1.3 x64 2CPU - Fruit 2.3.5m x64 2CPU = 23,5-21,5 (2922)
Stockfish 1.3 x64 4CPU - Deep Sjeng WC 2008 x64 4CPU = 11-7 (2874) :cry:
Stockfish is too slow for Arena...often loose on time. :)

Werner, I would like to ask you as a testing expert, one question I have on lost on time games.

In my internal tests I made up a trick to try to avoid loosing on time or loosing/winning by accident.

I have called it "Last Seconds Noise filtering" it simply works so that when at very few seconds from the time limit one of the two engine is very high in score (and the other very low) the match is adjudicated to the engine in advantage. This avoids situations where, in the final rush, the winning engine blunders and loose or draws a practically won match also if it played in a superior way.

And now the question.

In your tests do you take in account the % of lost/wins on time in the total amount of lost/wins?

Do you have some kind of trigger or warning to avoid these kind of (seemingly) artifacts ?

Thanks
Marco
Losses on time in my experience are rarely the fault of the engine. To compensate for windows try adding a small time buffer so that the engine believes that it has slightly less time. Best would be to use Linux and Cutechess-cli though.
User avatar
Dr.Wael Deeb
Posts: 9773
Joined: Wed Mar 08, 2006 8:44 pm
Location: Amman,Jordan

Re: first CEGT result of stockfish is not good

Post by Dr.Wael Deeb »

Werner wrote:
mcostalba wrote:In your tests do you take in account the % of lost/wins on time in the total amount of lost/wins?

Do you have some kind of trigger or warning to avoid these kind of (seemingly) artifacts ?

Thanks
Marco
Hi Marco,
a) I do not count lost games on time - normally I delete these games and if it´s too much I stop test and contact the author. I try to avoid it of course.
b) the games sent to the CEGT Admin for the list are always controlled for
- doublettes, - lost on time, - line results; and sometimes the admin makes these tests for the whole database too. I think Leo makes such reports in his forum.

At the moment I am running the Stockfish matches inside Chessbase GUI. No lost on time till now. So if there are problems with Arena 2.01 I will not change. Perhaps I make a test with Shredder GUI?
This is what I am talking about years and years for now....Games lost on time must be deleted and not included in the rating list....a chess engine must not lose on time,this this typical for humans....
Dr.D
_No one can hit as hard as life.But it ain’t about how hard you can hit.It’s about how hard you can get hit and keep moving forward.How much you can take and keep moving forward….
User avatar
Dr.Wael Deeb
Posts: 9773
Joined: Wed Mar 08, 2006 8:44 pm
Location: Amman,Jordan

Re: first CEGT result of stockfish is not good

Post by Dr.Wael Deeb »

Ryan Benitez wrote:
mcostalba wrote:
Werner wrote:Hi,
I hope these are only some statistical errors! To be sure I will change the GUI to Arena 2.01 for the next matches.
Here a short update:
Stockfish 1.3 x64 1CPU - Fruit 2.3.5m w32 1CPU = 15-15 (2897)
Stockfish 1.3 x64 2CPU - Fruit 2.3.5m x64 2CPU = 23,5-21,5 (2922)
Stockfish 1.3 x64 4CPU - Deep Sjeng WC 2008 x64 4CPU = 11-7 (2874) :cry:
Stockfish is too slow for Arena...often loose on time. :)

Werner, I would like to ask you as a testing expert, one question I have on lost on time games.

In my internal tests I made up a trick to try to avoid loosing on time or loosing/winning by accident.

I have called it "Last Seconds Noise filtering" it simply works so that when at very few seconds from the time limit one of the two engine is very high in score (and the other very low) the match is adjudicated to the engine in advantage. This avoids situations where, in the final rush, the winning engine blunders and loose or draws a practically won match also if it played in a superior way.

And now the question.

In your tests do you take in account the % of lost/wins on time in the total amount of lost/wins?

Do you have some kind of trigger or warning to avoid these kind of (seemingly) artifacts ?

Thanks
Marco
Losses on time in my experience are rarely the fault of the engine. To compensate for windows try adding a small time buffer so that the engine believes that it has slightly less time. Best would be to use Linux and Cutechess-cli though.
Exactly my point 8-)
Dr.D
_No one can hit as hard as life.But it ain’t about how hard you can hit.It’s about how hard you can get hit and keep moving forward.How much you can take and keep moving forward….
Uri Blass
Posts: 10900
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: first CEGT result of stockfish is not good

Post by Uri Blass »

Dr.Wael Deeb wrote:
Werner wrote:
mcostalba wrote:In your tests do you take in account the % of lost/wins on time in the total amount of lost/wins?

Do you have some kind of trigger or warning to avoid these kind of (seemingly) artifacts ?

Thanks
Marco
Hi Marco,
a) I do not count lost games on time - normally I delete these games and if it´s too much I stop test and contact the author. I try to avoid it of course.
b) the games sent to the CEGT Admin for the list are always controlled for
- doublettes, - lost on time, - line results; and sometimes the admin makes these tests for the whole database too. I think Leo makes such reports in his forum.

At the moment I am running the Stockfish matches inside Chessbase GUI. No lost on time till now. So if there are problems with Arena 2.01 I will not change. Perhaps I make a test with Shredder GUI?
This is what I am talking about years and years for now....Games lost on time must be deleted and not included in the rating list....a chess engine must not lose on time,this this typical for humans....
Dr.D
You can take engines that lose on time(more often than 1 game out of 1000) out of the rating list but
deleting losses on time is not fair because in this case authors can tell their engine to lose on time instead of resigning to earn rating points.

Uri
Uri Blass
Posts: 10900
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: first CEGT result of stockfish is not good

Post by Uri Blass »

Ryan Benitez wrote:
mcostalba wrote:
Werner wrote:Hi,
I hope these are only some statistical errors! To be sure I will change the GUI to Arena 2.01 for the next matches.
Here a short update:
Stockfish 1.3 x64 1CPU - Fruit 2.3.5m w32 1CPU = 15-15 (2897)
Stockfish 1.3 x64 2CPU - Fruit 2.3.5m x64 2CPU = 23,5-21,5 (2922)
Stockfish 1.3 x64 4CPU - Deep Sjeng WC 2008 x64 4CPU = 11-7 (2874) :cry:
Stockfish is too slow for Arena...often loose on time. :)

Werner, I would like to ask you as a testing expert, one question I have on lost on time games.

In my internal tests I made up a trick to try to avoid loosing on time or loosing/winning by accident.

I have called it "Last Seconds Noise filtering" it simply works so that when at very few seconds from the time limit one of the two engine is very high in score (and the other very low) the match is adjudicated to the engine in advantage. This avoids situations where, in the final rush, the winning engine blunders and loose or draws a practically won match also if it played in a superior way.

And now the question.

In your tests do you take in account the % of lost/wins on time in the total amount of lost/wins?

Do you have some kind of trigger or warning to avoid these kind of (seemingly) artifacts ?

Thanks
Marco
Losses on time in my experience are rarely the fault of the engine. To compensate for windows try adding a small time buffer so that the engine believes that it has slightly less time. Best would be to use Linux and Cutechess-cli though.
I think that cases when windows is quilty on losses on time rarely happens except very fast time control that is not used by CEGT.

I also think that it is the responsibility of the authors to care that
the engine is going to believe that it has slightly less time when slightly is at least 1% of the remaining time(you are not going to lose even 1 elo if you play 1% faster in the first moves espacially when 1% faster for the first moves mean that you have more time later).

Uri
User avatar
Werner
Posts: 2994
Joined: Wed Mar 08, 2006 10:09 pm
Location: Germany
Full name: Werner Schüle

Re: first CEGT result of stockfish is not good

Post by Werner »

Uri Blass wrote:You can take engines that lose on time(more often than 1 game out of 1000) out of the rating list but
deleting losses on time is not fair because in this case authors can tell their engine to lose on time instead of resigning to earn rating points.
Uri
Hi Uri,
good idea :wink:
but I think it does not work because the test will be stopped and of course I have a look at all games lost in time.
And I forgot to mention: If an engine has a won position and the other engine looses on time - I do not delete this game, it counts.
Werner