Whiskers wrote: ↑Wed Apr 03, 2024 8:45 amI've played against the same 6 engines throughout almost all of Patricia's development (round robin with each engine playing 3000 games).
The bold is mine.
Gauntlet would be a different story.
Correct?
90% of coding is debugging, the other 10% is writing bugs.
Whiskers wrote: ↑Wed Apr 03, 2024 8:45 amI've played against the same 6 engines throughout almost all of Patricia's development (round robin with each engine playing 3000 games).
The bold is mine.
Gauntlet would be a different story.
Correct?
Yes, sorry, these tests were all round robins; every engine played 3000 games (enough to get the EAS score down to +- 10k, which is good enough for my purposes) split evenly across all engines. I switched to gauntlets after self testing started giving me very suspicious results.
Whiskers wrote: ↑Wed Apr 03, 2024 8:45 amI've played against the same 6 engines throughout almost all of Patricia's development (round robin with each engine playing 3000 games).
The bold is mine.
Gauntlet would be a different story.
Correct?
Yes, sorry, these tests were all round robins; every engine played 3000 games (enough to get the EAS score down to +- 10k, which is good enough for my purposes) split evenly across all engines. I switched to gauntlets after self testing started giving me very suspicious results.
1. When I create an elo pool 200 elo less than Rebel, run a robin round the EAS score will be extremely high.
2. When I create an elo pool 200 elo less than Rebel, run the gauntlet the EAS score likely will be also extremely high but all the opponents have to struggle to get the minimum requirement of won games and draws and are blacklisted by the EAS tool.
So please, choose an elo pool of Patricia's current strength and run the gauntlet.
90% of coding is debugging, the other 10% is writing bugs.
Whiskers wrote: ↑Wed Apr 03, 2024 9:53 am
A gauntlet? Or a round-robin?
I only use gauntlet and make sure no opponent engine gets blacklisted. An easy way to do is to collect the pgns, the more games, the more reliable the result. 20.000 games is my minimum.
90% of coding is debugging, the other 10% is writing bugs.