Self-play at different time controls

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

Ferdy
Posts: 4846
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Self-play at different time controls

Post by Ferdy »

Adam wrote:I am trying to figure out a way to extract the time data so that we can how it is distributed through the games.
Got the script to extract the time per move in a game, I don't know if this what you mean. I selected 3 games by TC 1'+1" comparing with the other TC.

Image

Image

Image
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: Self-play at different time controls

Post by Adam Hair »

My thought was to use the average or median from all of the games instead of plotting each game.
Ferdy
Posts: 4846
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Self-play at different time controls

Post by Ferdy »

Adam Hair wrote:My thought was to use the average or median from all of the games instead of plotting each game.
Here is the average time in seconds for 300 games per engine. I have records for each color and only showing the white side. Moves selected is only up to 120 moves.
Moves before 40 really matters, The 1+1 still has the stamina to continue beyond that move.

Image
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: Self-play at different time controls

Post by Adam Hair »

Thanks! The graph really explains the results.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Self-play at different time controls

Post by Laskos »

Adam Hair wrote:Thanks! The graph really explains the results.
Explanation is a bit harder. 1+1 and 2+0 are in competition for all three top engines for testing framework as strength versus time used goes. One of two:
1/ Time management of 2+0 can be improved.
2/ Use more games allowed by 2+0 compared to 1+1 for same time used, with no much strength loss.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Self-play at different time controls

Post by Laskos »

Another thing to mention is that adjudication rules play a role here. I don't know which you used, but I think the average game without adjudication would be 5-10 moves longer. If time management knew what are the rules, I think 1+1 and 2+0 would be even closer matched in strength, with some time saving for 2+0.
User avatar
mclane
Posts: 18899
Joined: Thu Mar 09, 2006 6:40 pm
Location: US of Europe, germany
Full name: Thorsten Czub

Re: Self-play at different time controls

Post by mclane »

Self play is not measuring strength. Self play measures differences,
But more differences is not more strength. So it can happen that if you have a version that wins against all other versions of the same chess engine, this winner is not really stronger against other chess programs. It's unimportant if your game base is 500 or 5000000 games, the problem is the same.

The engine that is the winner is not the stronger engine.

Therefore self play is wasting time.
What seems like a fairy tale today may be reality tomorrow.
Here we have a fairy tale of the day after tomorrow....
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: Self-play at different time controls

Post by Adam Hair »

mclane wrote:Self play is not measuring strength. Self play measures differences,
But more differences is not more strength. So it can happen that if you have a version that wins against all other versions of the same chess engine, this winner is not really stronger against other chess programs. It's unimportant if your game base is 500 or 5000000 games, the problem is the same.

The engine that is the winner is not the stronger engine.

Therefore self play is wasting time.
If self play was a waste of time, then most of the top active engines would not be making any progress.

Besides, this is not a comparison of various engines at various time controls. We do not know if SF would score better than Houdini at 1'+1" than at 40/1'. But we can see that SF plays at a higher level at 1'+1" than at 40/1'.
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: Self-play at different time controls

Post by Adam Hair »

Laskos wrote:Another thing to mention is that adjudication rules play a role here. I don't know which you used, but I think the average game without adjudication would be 5-10 moves longer. If time management knew what are the rules, I think 1+1 and 2+0 would be even closer matched in strength, with some time saving for 2+0.
Hopefully it only plays a small role in this case. The resign threshold was 300cp for 3 moves, and the games were adjudicated as draws if 250 moves were played and the eval was less than +/- 50cp. There should have been no resign threshold. I copied and pasted the cutechess script from another test I was doing and forgot to remove the resign switch.

Given that these are UCI engines, I do not think they are sent the adjudication rules nor would they know what to do with the information.
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: Self-play at different time controls

Post by Adam Hair »

Laskos wrote:
Adam Hair wrote:Thanks! The graph really explains the results.
Explanation is a bit harder. 1+1 and 2+0 are in competition for all three top engines for testing framework as strength versus time used goes. One of two:
1/ Time management of 2+0 can be improved.
2/ Use more games allowed by 2+0 compared to 1+1 for same time used, with no much strength loss.
The difference between 1+1 and 2+0 is muted somewhat by the high draw rate (71.5%). However, I think you are correct that 2'+0" is a better choice than 1'+1" for testing. And perhaps even better would be 2' plus some small increment (perhaps 50 millisec increment like the Stockfish framework uses).