http://www.top-5000.nl/selfplay.htm
It's quite odd to see the capriciousness of the percentages playing at increasing time controls.
Code: Select all
Results of an engine-engine selfplay match
meant for discussion purposes
Engine-one ProDeo 1.74
Engine-two ProDeo 1.74 with an EVAL change in King Safety
Blitz 5 seconds all 10,000 games 49.8 %
Blitz 10 seconds all 10,000 games 50.6 %
Blitz 20 seconds all 7,777 games 50.7 %
Blitz 40 seconds all 10,000 games 50.3 %
Blitz 80 seconds all 8,700 games 51.3 %
Remarks
1. It seems with increasing time the EVAL change works best.
2. Blitz-80 vs Blitz-40 although a full percent better still falls into the error margin of 6 elo according to ELOSTAT. So in theory an improvement is still not proven.
Graphs (see the link above)
With a PGN utility the below graphs were made which shows the progress of each match. After each 100 games a datapoint is created and imported into Excel.
From the 5 graphs one might conclude the first 1000 games in a match are pretty meaningless due to the random nature of 2 almost equal engines in strength.
A reasonable number looks 5000 games to conclude an improvement, but not its exact elo.
The PGN tool will be made available later.

