A poor man's testing environment

hgm · Post by **hgm** » Sat Jan 05, 2013 11:44 am

lucasart wrote:How about calling your log file log_%pid.log ? That solves the problem, no ?

It makes it a bit had to later relate a log file to a specific game.

In WinBoard's intrinsic tournament manager each game gets a unique number, which will be included in the PGN as a tag. Usually you learn of bad things (blunders, crashes) through the PGN, and then you just have to look at the game number and retrieve the corresponding log file (which you would typically save as "tourneyXXXdebugs/game%d.debug").

ilari · Post by **ilari** » Sat Jan 05, 2013 1:40 pm

Rebel wrote: Tried "-concurrency 4"
Code: Select all
 c:\cc\cutechess-cli -engine name=MAIN1 cmd=yourengine.exe dir=C:\a\main1 proto=uci -engine name=WORK1 cmd=yourengine.exe dir=c:\a\work1 proto=uci -each tc=inf -draw 160 100 -resign 5 500 -rounds 1000 -repeat -pgnout c:\cc\all.pgn -pgnin c:\cc\1.pgn -pgndepth 20 -concurrency 4
What happens is that indeed 4 threads are started but... processor activity is only 25% playing only 1 game. Aborting the match kills only one thread leaving the other 6 executables idle in the task manager and I must remove them manually. Totally unusable for me.

Perhaps a WB2UCI problem?

Anyway, it's not very encouraging to modify my starter-page with a "-concurrency" option advice.

The main question is: why are you using WB2UCI? Does cutechess-cli's Xboard/Winboard support lack features that WB2UCI provides?

Some past versions of cutechess-cli had bugs in the concurrency implementation but the latest version should work fine. I'll have to try WB2UCI myself to find out if the problem is there.

Don · Post by **Don** » Sat Jan 05, 2013 2:05 pm

lucasart wrote:
Don wrote:
hgm wrote:
lucasart wrote:Cutechess-cli is perfectly capable of handling concurrent writing into the resulting PGN if that's your point.
Indeed, WinBoard would also have no problem with that. (It uses file locking to synchronize the different playing agents.)

But I think his problem is not with the GUI, but that his engine keeps a log file with a fixed name in their installation directory. So if you run it 8 times concurrently, they would all write to the same log file, which would become an absolute mess (as writes to log files typically are not buffered, to not lose data in case of a crash).
Komodo has the same issue. It would be good if testing tools provided some sort of symbolic shortcut or macro that replaces %i (for example) with testing instance number. So then you could say, ./myProgram -log %i.log and the tester would make the substitution before invoking the program. That would not be rocket science.
How about calling your log file log_%pid.log ? That solves the problem, no ?

No. I cannot have tens of thousands of log files. My tester is cross platform too and I don't think there is a platform specific way to get the windows equivalent of the pid.

It is possible for the engine to always create a unique file but I would have the same problem with having tens of thousands of log files.

This is easy to solve though so I'm not really looking for a solution. I can build in support to my automated tester and this is pretty simple.

There are also tricks to make it work with all testers (at least in linux) by using locks. I would probably use the flock() call. You have locks such as lock1 lock2, etc. and the engine on startup tries lock1, lock2 and so on until it has success. When the engine terminates the lock is automatically freed. I don't know how this works in Windows but I don't intend to use this solution anyway but it is a possibility.

jdart · Post by **jdart** » Sat Jan 05, 2013 7:09 pm

Rebel wrote: Separate improvements that might bite each other when combining them, don't get me started

This is really what CLOP is for. It is not really that hard to set up although the documentation is very sparse.

However, I have gotten variable results from CLOP. Some of the suggested values look reasonable but some do not. For example CLOP thought that queenside castling should get a bigger bonus than kingside castling. That is counter-intuitive to chess players, although it might actually be true.

And I have not yet done the step of verifying that CLOP-suggested changes actually improve scores in my standard game-playing setup, i.e. running a constant set of tuned parameters against a set of opponents in a long game match to verify that the tuned values are better than the base values.

--Jon

Don · Post by **Don** » Sat Jan 05, 2013 8:04 pm

jdart wrote:
Rebel wrote: Separate improvements that might bite each other when combining them, don't get me started
This is really what CLOP is for. It is not really that hard to set up although the documentation is very sparse.

However, I have gotten variable results from CLOP. Some of the suggested values look reasonable but some do not. For example CLOP thought that queenside castling should get a bigger bonus than kingside castling. That is counter-intuitive to chess players, although it might actually be true.

And I have not yet done the step of verifying that CLOP-suggested changes actually improve scores in my standard game-playing setup, i.e. running a constant set of tuned parameters against a set of opponents in a long game match to verify that the tuned values are better than the base values.

--Jon

One thing about CLOP is that it is not immune to sample error. I doubt most people are running it with the huge number of samples that are required. For fine tuning a chess program with automated testing just to gain a couple of ELO requires tens of thousands of games and CLOP does not magically allow this to happen with only a few hundred.

However one benefit of CLOP is that you can tune many features simultaneously. Also it will help you find good values without "fishing" so that is nice.

But you can also do tuning with a technique that H.G. posted on this forum a few years ago which he called orthogonal multi-testing. It's brilliant.

jdart · Post by **jdart** » Sat Jan 05, 2013 8:39 pm

I am using at least 20-30,000 games.

There is also some risk though of CLOP finding a local rather than a global maximum. This is documented.

--Jon

Don · Post by **Don** » Sat Jan 05, 2013 9:13 pm

jdart wrote:I am using at least 20-30,000 games.

There is also some risk though of CLOP finding a local rather than a global maximum. This is documented.

--Jon

That's probably a pretty good number, but clearly the number of games is related to how far OFF the values could be. Larry and I have used numbers like that to choose initial values when we implement things with many parameters as a starting point for further manual tuning.

Don · Post by **Don** » Sat Jan 05, 2013 9:25 pm

hgm wrote:
lucasart wrote:How about calling your log file log_%pid.log ? That solves the problem, no ?
It makes it a bit had to later relate a log file to a specific game.

In WinBoard's intrinsic tournament manager each game gets a unique number, which will be included in the PGN as a tag. Usually you learn of bad things (blunders, crashes) through the PGN, and then you just have to look at the game number and retrieve the corresponding log file (which you would typically save as "tourneyXXXdebugs/game%d.debug").

That gives me yet another idea. My tester assigns unique game numbers to each game too and I could make some fixed number of log files and use something like gameID % N as the name of the log file. Each program would have to have it's own log file so I could have log_17_white.log and log_17_black.log or something like that.

One issue that makes this dicey is that if one game is taking too long the same log file may not be available when another program needs to reuse it. So I would basically have to provide something like 100 log files to be highly certain that it will be available for the next cycle.

There is yet another solution too. My tester actually serializes the PGN output so that game 27 is not stored until all the previous games are complete. This is because I use the PGN as a checkpoint - it serves as permanent state for stopping and restarting. But I could do the same exact thing with the log files. The log file could be written to a temporary file and then combined at a later point in time into a single log file. I would only have to provide a script that is called by the tester each time a pgn file is taken from memory and appended to the official pgn file. I could make this a permanent mechanism - a script that is called just before (or after) storing a game on disk.

Kempelen · Post by **Kempelen** » Sun Jan 06, 2013 6:23 pm

Rebel wrote:I like to present a page for starters how to test a chess engine with limited hardware. I am interested in some feedback for further improvement.

http://www.top-5000.nl/tuning.htm

Hi Ed,

Very valuable an nice insides in your page. Thanks you very much.

Reading it without going into deep think, I came with a few questions for you?

a) At the end of the page you provide a set of pgns that all sum 6060 different possitions. I suppose you dont play all positions against all opponents and two side, beacuase it will be, assuming 6 opponent, 72720 games, and that are a lot which I suspect you dont play them. So, do you choose them randomly?? or what exactly are you doing?

b) You mention the disadventage of playing eng-eng matches, being you loose "control" over the style of the program. This is one aspect of my engine I have always missed and would like to improve. One way is "manally one by one" see games and tune by hand, but do you think is there a mix method, or an authomatic one that could take also eng-eng result to improve result and style in the same time? I said this because I suspect only hand tune is not enought to make a reasonable strong engine even if it has a fantastic style.

Regards
Fermin

S.Taylor · Post by **S.Taylor** » Sun Jan 06, 2013 6:49 pm

[quote="Rebel"]I like to present a page for starters how to test a chess engine with limited hardware. I am interested in some feedback for further improvement.

Use 2 old XT's to test with.

Those were good old sturdy machines. Not much could go wrong with them.

A poor man's testing environment

Re: A poor man's testing environment

Re: A poor man's testing environment

Re: A poor man's testing environment

Re: A poor man's testing environment

Re: A poor man's testing environment

Re: A poor man's testing environment

Re: A poor man's testing environment

Re: A poor man's testing environment

Re: A poor man's testing environment

Re: A poor man's testing environment