I'm planning to test my engine, by playing against itself with other parameters (personality).
Now I can set this up using OOP and all, but want to try to keep things "dirty simple" with plain procedures and global variables.
Now two questions:
1) Is it a good idea to startup 2 uci-engines with a third program? ( I still need to write all things concerning the uci-communication).
2) How do you guys do things like this?
Testing engines
Moderator: Ras
-
Dann Corbit
- Posts: 12808
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Testing engines
Usually people just use an existing tournament organizer instead of writing their own tools for this.EricLang wrote:I'm planning to test my engine, by playing against itself with other parameters (personality).
Now I can set this up using OOP and all, but want to try to keep things "dirty simple" with plain procedures and global variables.
Now two questions:
1) Is it a good idea to startup 2 uci-engines with a third program? ( I still need to write all things concerning the uci-communication).
Most people use a pre-existing tool. For instance, you can organize a tournament using Arena or ChessGUI.2) How do you guys do things like this?
Dr. Hyatt has a {highly enviable} cluster against which he can achive very rapid results for a huge number of games.
-
hgm
- Posts: 28429
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Testing engines
Having different versions of a program play each other is a notoriously ineffective way to tune an engine. Even gross errors like move-generator bugs remain unnoticed sometimes. The 'optimized' version often plays worse against other opponents.
Most people evaluate their engines (program changes as well as tuning) by playing a gauntlet against a wide variety of opponents, which bracket the engine in strength by a range of about 200-300 Elo on either side. For normal Chess I usually play Nunn matches, i.e. games starting from the 10 Nunn positions, so that you can play each opponent 20 times with the guarantee of no duplicate games. With 25 different opponents this gives you 500 games, which again results in an accuracy of 25 Elo points (95% confidence).
I use PSWBTM + WinBoard for this. (UCI engines would then have to use Polyglot, which you can make WinBoard do automatically by installing the engine in PSWBTM with the engine-following WinBoard option "WBopt /%sIsUCI=true.)
Most people evaluate their engines (program changes as well as tuning) by playing a gauntlet against a wide variety of opponents, which bracket the engine in strength by a range of about 200-300 Elo on either side. For normal Chess I usually play Nunn matches, i.e. games starting from the 10 Nunn positions, so that you can play each opponent 20 times with the guarantee of no duplicate games. With 25 different opponents this gives you 500 games, which again results in an accuracy of 25 Elo points (95% confidence).
I use PSWBTM + WinBoard for this. (UCI engines would then have to use Polyglot, which you can make WinBoard do automatically by installing the engine in PSWBTM with the engine-following WinBoard option "WBopt /%sIsUCI=true.)
-
EricLang
Re: Testing engines
The term "Nunn matches" is new to me. What is that? And "the 10 Nunn positions"...
-
Dann Corbit
- Posts: 12808
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Testing engines
Named after GM Nunn.EricLang wrote:The term "Nunn matches" is new to me. What is that? And "the 10 Nunn positions"...
Here are the 10 Nunn positions compressed with bzip2:
http://cap.connx.com/EPD/nunn.epd.bz2
Another version with ECO classification:
http://cap.connx.com/EPD/NUNNTEST.EPD.bz2
Here are the 6 Nunn v2 positions:
http://cap.connx.com/EPD/nunn2.epd.bz2
-
hgm
- Posts: 28429
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Testing engines
Note that there are also PGN files around for the Nunn positions (I don't have the link now), which contain the sequence of opening moves leading to the position. I always use those, rather than EPD files, as they also work on engines that do not support setting up a position. (And especially in the strength range of a beginning engine, there are quite a few opponents that suffer from that.)