Self-play at different time controls

Ferdy · Post by **Ferdy** » Fri Dec 05, 2014 2:04 pm

Adam wrote:I am trying to figure out a way to extract the time data so that we can how it is distributed through the games.

Got the script to extract the time per move in a game, I don't know if this what you mean. I selected 3 games by TC 1'+1" comparing with the other TC.

Adam Hair · Post by **Adam Hair** » Fri Dec 05, 2014 4:33 pm

My thought was to use the average or median from all of the games instead of plotting each game.

Ferdy · Post by **Ferdy** » Fri Dec 05, 2014 6:14 pm

Adam Hair wrote:My thought was to use the average or median from all of the games instead of plotting each game.

Here is the average time in seconds for 300 games per engine. I have records for each color and only showing the white side. Moves selected is only up to 120 moves.
Moves before 40 really matters, The 1+1 still has the stamina to continue beyond that move.

Adam Hair · Post by **Adam Hair** » Fri Dec 05, 2014 9:39 pm

Thanks! The graph really explains the results.

Laskos · Post by **Laskos** » Sat Dec 06, 2014 12:53 am

Adam Hair wrote:Thanks! The graph really explains the results.

Explanation is a bit harder. 1+1 and 2+0 are in competition for all three top engines for testing framework as strength versus time used goes. One of two:
1/ Time management of 2+0 can be improved.
2/ Use more games allowed by 2+0 compared to 1+1 for same time used, with no much strength loss.

Laskos · Post by **Laskos** » Sat Dec 06, 2014 12:49 pm

Another thing to mention is that adjudication rules play a role here. I don't know which you used, but I think the average game without adjudication would be 5-10 moves longer. If time management knew what are the rules, I think 1+1 and 2+0 would be even closer matched in strength, with some time saving for 2+0.

mclane · Post by **mclane** » Sat Dec 06, 2014 1:20 pm

Self play is not measuring strength. Self play measures differences,
But more differences is not more strength. So it can happen that if you have a version that wins against all other versions of the same chess engine, this winner is not really stronger against other chess programs. It's unimportant if your game base is 500 or 5000000 games, the problem is the same.

The engine that is the winner is not the stronger engine.

Therefore self play is wasting time.

Adam Hair · Post by **Adam Hair** » Sat Dec 06, 2014 1:45 pm

mclane wrote:Self play is not measuring strength. Self play measures differences,
But more differences is not more strength. So it can happen that if you have a version that wins against all other versions of the same chess engine, this winner is not really stronger against other chess programs. It's unimportant if your game base is 500 or 5000000 games, the problem is the same.

The engine that is the winner is not the stronger engine.

Therefore self play is wasting time.

If self play was a waste of time, then most of the top active engines would not be making any progress.

Besides, this is not a comparison of various engines at various time controls. We do not know if SF would score better than Houdini at 1'+1" than at 40/1'. But we can see that SF plays at a higher level at 1'+1" than at 40/1'.

Adam Hair · Post by **Adam Hair** » Sat Dec 06, 2014 1:58 pm

Laskos wrote:Another thing to mention is that adjudication rules play a role here. I don't know which you used, but I think the average game without adjudication would be 5-10 moves longer. If time management knew what are the rules, I think 1+1 and 2+0 would be even closer matched in strength, with some time saving for 2+0.

Hopefully it only plays a small role in this case. The resign threshold was 300cp for 3 moves, and the games were adjudicated as draws if 250 moves were played and the eval was less than +/- 50cp. There should have been no resign threshold. I copied and pasted the cutechess script from another test I was doing and forgot to remove the resign switch.

Given that these are UCI engines, I do not think they are sent the adjudication rules nor would they know what to do with the information.

Adam Hair · Post by **Adam Hair** » Sat Dec 06, 2014 2:37 pm

Laskos wrote:
Adam Hair wrote:Thanks! The graph really explains the results.
Explanation is a bit harder. 1+1 and 2+0 are in competition for all three top engines for testing framework as strength versus time used goes. One of two:
1/ Time management of 2+0 can be improved.
2/ Use more games allowed by 2+0 compared to 1+1 for same time used, with no much strength loss.

The difference between 1+1 and 2+0 is muted somewhat by the high draw rate (71.5%). However, I think you are correct that 2'+0" is a better choice than 1'+1" for testing. And perhaps even better would be 2' plus some small increment (perhaps 50 millisec increment like the Stockfish framework uses).

Self-play at different time controls

Re: Self-play at different time controls

Re: Self-play at different time controls

Re: Self-play at different time controls

Re: Self-play at different time controls

Re: Self-play at different time controls

Re: Self-play at different time controls

Re: Self-play at different time controls

Re: Self-play at different time controls

Re: Self-play at different time controls

Re: Self-play at different time controls