Olithink 5.1.8 released because of better ChessDM vs Crafty

OliverBr · Post by **OliverBr** » Wed Oct 29, 2008 11:29 am

Download here:

http://home.arcor.de/dreamlike/chess/

This is a funny case, because the only difference to 5.1.7 is the removal of two lines of code and this yielded to a much better Death Match against crafty:

Code: Select all

olithink517 - crafty20.14 : 285.5/1000 214-143-643 28%
olithink518 - crafty20.14 : 315.0/1000 243-144-613 31%

Question to all of you: Do you think 1000 matches are enough to see the improvement (I don't see any risk of "inbreeding" here, because crafty is genetically very different to olithink)

OliverBr · Post by **OliverBr** » Wed Oct 29, 2008 12:07 pm

Hi Gabor,
thank you for your reply. Can you give me a advice how I can easily calculate the ELO from the results of the 1000 games against Crafty?

BubbaTough · Post by **BubbaTough** » Wed Oct 29, 2008 2:57 pm

OliverBr wrote:I don't see any risk of "inbreeding" here, because crafty is genetically very different to olithink

In my opinion the risk of testing against just crafty is almost as large as self test. You are tuning your engine to beat crafty (from the start positions you are using). Thus, you risk creating an engine particularly suited to beating crafty but perhaps worse against other engines. If crafty is bad against pawn storms for example, you may end up building an engine that does all sorts of overly risky pawn storms that would have been punished by other engines (just an example, no idea how crafty is with pawn storms). If you are going to do 1000 games, I would recommend 200 against 5 engines instead as more reliable results, even though it reduces your number of starting positions.

Regarding "how many" games are needed to detect an improvement, I leave that to the experts. I will say that I consider it dependent on what you are using the testing for. If it is for judging a feature or deciding what to use in ChessWars, perfect testing is not critical (1000 games sounds great to me). When sending an engine to a testing group who is going to spend a lot of time on your engine, I would be more conservative in my criteria as a sign of respect for their time investment.

-Sam

Dann Corbit · Post by **Dann Corbit** » Wed Oct 29, 2008 3:00 pm

OliverBr wrote:Hi Gabor,
thank you for your reply. Can you give me a advice how I can easily calculate the ELO from the results of the 1000 games against Crafty?

Here is a batch file (which can just as easily become shell scripts) and command script that I use to have BayesElo create Elo rating lists.

C:\tmp\prob>type ..\belostat.bat
copy %1.pgn test.pgn
bayeselo < belo.txt

C:\tmp\prob>type ..\belo.txt
readpgn test.pgn
elo
mm
offset 2500
exactdist
ratings >rating.txt
x
x

I really like BayesElo because I can give it millions of games and it never dies a horrible death.

Ovyron · Post by **Ovyron** » Wed Oct 29, 2008 9:52 pm

BubbaTough wrote:Thus, you risk creating an engine particularly suited to beating crafty but perhaps worse against other engines.

The exception is creating an engine that is particularly suited to have better results against Rybka, I claim that the new generation of chess engines that had a big jump in strength but people complained about their worse playing style were tweaked for Rybka 1-2, but Rybka 3 had an easier time beating them than she should because the tweaks didn't work anymore. Still, these engines got stronger.

krazyken · Post by **krazyken** » Thu Oct 30, 2008 12:59 am

OliverBr wrote:Hi Gabor,
thank you for your reply. Can you give me a advice how I can easily calculate the ELO from the results of the 1000 games against Crafty?

the best way to do it is to download BayesELO and it will happily calculate everything for you.

Ovyron · Post by **Ovyron** » Thu Oct 30, 2008 2:18 am

krazyken wrote:the best way to do it is to download BayesELO

Here's the instructions on how to use it, as it's a command line program just enter the following commands followed by enter:

readpgn x.pgn
elo
mm
exactdist
offset 2500
prior 0.1
ratings>allgames.txt

Change the x for the name of your PGN, 2500 for the starting rating and the prior line is optional, to be used when you have engines of very different strength on the PGN (Without it strong engines will be underrated and weak engines will be overrated, if they don't have enough games.)

Olithink 5.1.8 released because of better ChessDM vs Crafty

Olithink 5.1.8 released because of better ChessDM vs Crafty

Re: Olithink 5.1.8 released because of better ChessDM vs Cra

Re: Olithink 5.1.8 released because of better ChessDM vs Cra

Re: Olithink 5.1.8 released because of better ChessDM vs Cra

Re: Olithink 5.1.8 released because of better ChessDM vs Cra

Re: Olithink 5.1.8 released because of better ChessDM vs Cra

Re: Olithink 5.1.8 released because of better ChessDM vs Cra