Linux/Mac engine gauntlet, cluster testing

matthewlai · Post by **matthewlai** » Mon Aug 11, 2014 9:58 pm

Just thought some people may be interested in this -

I am currently compiling a gauntlet of open source Linux/OSX chess engines for testing.

It's still work-in-progress and doesn't work, yet, but it's available here - http://bitbucket.matthewlai.ca/chessgauntlet

All engines in the gauntlet have licenses that allow distribution of modified copies (I believe all of them are GPL so far, besides Crafty?). I have about 30 engines in there so far, of varying strengths, from Stockfish to 2000 ELO engines.

They are also all in C/C++, mostly because I don't know other languages well enough to fix other people's code (and some of them... are really quite atrocious).

I originally planned to include <2000 engines as well, but had to give up since code quality is actually pretty well correlated with engine strength in most cases (many exceptions of course), and most <2000 engines just take way too long to fix (and I imagine would be more buggy as well).

I fixed the engines so that they all compile under latest LLVM (and GCC later - though I suspect there won't be much work for that, since most engines were written for GCC).

I also have a build script that builds all the engines from source.

I am not really interested in supporting Windows at this time, because it's a lot of work, and my end goal is to do cluster testing with this gauntlet, and of course no cluster runs Windows.

I am also writing a text mode program that matches engines without depending on any external library (important for clusters... especially other people's clusters).

Has anyone done something similar to this before?

Also, anyone has experience with using clusters for testing? I know Crafty is tested on Dr. Hyatt's cluster, but I'm surprised it's not used for more engines since clusters are relatively widely available now - many universities have them, and there are commercial HPC services as well, like Penguin Computing.

bob · Post by **bob** » Tue Aug 12, 2014 12:19 am

matthewlai wrote:Just thought some people may be interested in this -

I am currently compiling a gauntlet of open source Linux/OSX chess engines for testing.

It's still work-in-progress and doesn't work, yet, but it's available here - http://bitbucket.matthewlai.ca/chessgauntlet

All engines in the gauntlet have licenses that allow distribution of modified copies (I believe all of them are GPL so far, besides Crafty?). I have about 30 engines in there so far, of varying strengths, from Stockfish to 2000 ELO engines.

They are also all in C/C++, mostly because I don't know other languages well enough to fix other people's code (and some of them... are really quite atrocious).

I originally planned to include <2000 engines as well, but had to give up since code quality is actually pretty well correlated with engine strength in most cases (many exceptions of course), and most <2000 engines just take way too long to fix (and I imagine would be more buggy as well).

I fixed the engines so that they all compile under latest LLVM (and GCC later - though I suspect there won't be much work for that, since most engines were written for GCC).

I also have a build script that builds all the engines from source.

I am not really interested in supporting Windows at this time, because it's a lot of work, and my end goal is to do cluster testing with this gauntlet, and of course no cluster runs Windows.

I am also writing a text mode program that matches engines without depending on any external library (important for clusters... especially other people's clusters).

Has anyone done something similar to this before?

Also, anyone has experience with using clusters for testing? I know Crafty is tested on Dr. Hyatt's cluster, but I'm surprised it's not used for more engines since clusters are relatively widely available now - many universities have them, and there are commercial HPC services as well, like Penguin Computing.

I test on linux clusters myself. Some notes, in no particular order.

1. Engine quality is a major issue. Yes there are lots of so-called "linux engines" but they have many bugs, particularly those that lose way too many games on time, which is not good for testing at all.

2. I wrote my own "matchmaker" that plays games from a known starting position, maintains the clocks, keeps up with wins/losses/draws/stalemates/repetitions/50-move/etc. Writes PGN for each game so that BayesElo can give me results. Etc.

3. I use BayesElo to suck in all the PGN files and produce Elo listings that let me evaluate results. It works, and needs no changes to work either.

I'd certainly be interested in any reasonably strong engines you discover that are actually reliable. Particularly at faster time controls like 10sec + 0.1s increment, which causes many programs massive problems..

matthewlai · Post by **matthewlai** » Tue Aug 12, 2014 12:34 am

bob wrote: I test on linux clusters myself. Some notes, in no particular order.

1. Engine quality is a major issue. Yes there are lots of so-called "linux engines" but they have many bugs, particularly those that lose way too many games on time, which is not good for testing at all.

2. I wrote my own "matchmaker" that plays games from a known starting position, maintains the clocks, keeps up with wins/losses/draws/stalemates/repetitions/50-move/etc. Writes PGN for each game so that BayesElo can give me results. Etc.

3. I use BayesElo to suck in all the PGN files and produce Elo listings that let me evaluate results. It works, and needs no changes to work either.

I'd certainly be interested in any reasonably strong engines you discover that are actually reliable. Particularly at faster time controls like 10sec + 0.1s increment, which causes many programs massive problems..

Thanks!

Engine quality is my biggest worry as well. Initially I thought most engines would do fine at very fast time controls since I imagined that's how most people would have tuned their eval params, since most people don't have a few hundred CPUs at their disposal. That didn't seem to be the case. No idea how they tuned their engines...

When I tuned my own engine (back in 2008 - I am just picking up computer chess again now, with much higher programming skills), I ran hundreds of thousands of 1s games, so I know my engine works well at 1s. It's definitely not "reasonably strong" compared to Crafty, though, so probably not very useful for you. Hopefully it will be one day!

I used xboard back then, but had to remove all the drawing code (there was no "-noGUI" back then), because the drawing code was using more CPU time than the engines!

The tool I am writing for matching engines is exactly like what you described. It was easier than I imagined to support proper "protover 2" engines. It can already match Crafty vs Crafty, and I have all the move generation and stuff done so it can translate between algebraic and SAN as well, with proper disambiguation. I imagine most of the remaining work would be workarounds for buggy engines. I guess UCI engines would be easier to work with since that would shift many bugs to be Polyglot's problem.

bob · Post by **bob** » Tue Aug 12, 2014 1:54 am

BTW, I have ALWAYS been a SAN type of guy, since it is not that hard to produce. But there are a number of good programs that will not touch it. Fortunately since I wrote my own referee program borrowing heavily from Crafty, It was easy enough to read either SAN or algebraic, and send algebraic-only to the engines that require it.

Pain in the butt, however. And you can NOT trust draw/win claims from anybody. I made my referee automatically terminate games on 3-fold rep, stalemate, 50-move draws and checkmates. I found engines that would claim wins for the wrong side, etc.

matthewlai · Post by **matthewlai** » Tue Aug 12, 2014 2:00 am

bob wrote:BTW, I have ALWAYS been a SAN type of guy, since it is not that hard to produce. But there are a number of good programs that will not touch it. Fortunately since I wrote my own referee program borrowing heavily from Crafty, It was easy enough to read either SAN or algebraic, and send algebraic-only to the engines that require it.

Yeah I was hoping to not have to write move generation code, but it wasn't really all that bad. Move generation is easy when speed is not a problem.

Pain in the butt, however. And you can NOT trust draw/win claims from anybody. I made my referee automatically terminate games on 3-fold rep, stalemate, 50-move draws and checkmates. I found engines that would claim wins for the wrong side, etc.

That's a good idea. I will keep that in mind.

I am planning to terminate games even before 50-move. Maybe 20 moves or something, since most games are just wasting CPU cycles then. There will be some false draws, but I'm hoping it will be offset by being able to play more useful games.

For testing I will probably also terminate games when both sides agree one side is 200cp up or so. Again, some false wins, but hopefully few enough that the effect will be negligible, and save a lot of time.

lucasart · Post by **lucasart** » Tue Aug 12, 2014 3:48 am

bob wrote: I'd certainly be interested in any reasonably strong engines you discover that are actually reliable. Particularly at faster time controls like 10sec + 0.1s increment, which causes many programs massive problems..

On Linux, using cutechess-cli, I typically run tests in 1+0.01 with DiscoCheck, under heavy parallelism constraints: concurrency=7 on an hyperthreaded quad i7-3770k, while browsing internet and doing other light activities. Zero time losses, using a time buffer of 10ms.
You need to use the latest cutechess compiled from source, it has a submillisecond timer now. DiscoCheck also uses a submillisecond timer.

I don't know if DiscoCheck qualifies as "reasonably strong". Depends on your reference point. It's certainly strong enough for Crafty.

Max · Post by **Max** » Tue Aug 12, 2014 9:58 am

Hi,

matthewlai wrote:I am currently compiling a gauntlet of open source Linux/OSX chess engines for testing.

perhaps you could add some of these engines to your list

amy
discocheck
firenzina
fruit
fruit-reloaded
glaurung
minko
octochess
pepito
protector
redqueen
rodent
toga
vajolet

They compile & run on the mac .. unsure if they match your needs.

-Max

matthewlai · Post by **matthewlai** » Tue Aug 12, 2014 10:24 am

Max wrote:Hi,
matthewlai wrote:I am currently compiling a gauntlet of open source Linux/OSX chess engines for testing.
perhaps you could add some of these engines to your list

amy
discocheck
firenzina
fruit
fruit-reloaded
glaurung
minko
octochess
pepito
protector
redqueen
rodent
toga
vajolet

They compile & run on the mac .. unsure if they match your needs.

-Max

Thanks!

I have looked at some of them -
Amy doesn't seem to compile on 64-bit (in LLVM), and it didn't seem trivial to fix.

I didn't want to add Fruit and Toga for now because I already have GNU Chess 6 (which is based on Fruit), and I am aiming for a wider range of playing styles. Though Fruit Reloaded may be interesting.

Same for Glaurung (since I have Stockfish already).

Pepito's code looks very old (GCC 2.x era), and didn't look like it will compile without many changes.

RedQueen didn't have any build instructions or Makefile, and just compiling all the source files didn't seem to work. I didn't spend too much time on this.

Will definitely look into the others when I have time to add more engines!

ZirconiumX · Post by **ZirconiumX** » Tue Aug 12, 2014 10:34 am

Michel van den Bergh made a GNUChess 5 fork, which if I remember correctly, is actually stronger than 6 (because the FSF does not seem to know how to improve a chess program's strength).

http://hardy.uhasselt.be/Toga/gnuchess-release/

Matthew:out

matthewlai · Post by **matthewlai** » Tue Aug 12, 2014 10:37 am

ZirconiumX wrote:Michel van den Bergh made a GNUChess 5 fork, which if I remember correctly, is actually stronger than 6 (because the FSF does not seem to know how to improve a chess program's strength).

http://hardy.uhasselt.be/Toga/gnuchess-release/

Matthew:out

Yes. That's why I have both GNU Chess 5 and GNU Chess 6 in the gauntlet.

GNU Chess 6 was actually a rewrite by Fabien, based on Fruit 2.1.

Linux/Mac engine gauntlet, cluster testing

Linux/Mac engine gauntlet, cluster testing

Re: Linux/Mac engine gauntlet, cluster testing

Re: Linux/Mac engine gauntlet, cluster testing

Re: Linux/Mac engine gauntlet, cluster testing

Re: Linux/Mac engine gauntlet, cluster testing

Re: Linux/Mac engine gauntlet, cluster testing

Re: Linux/Mac engine gauntlet, cluster testing

Re: Linux/Mac engine gauntlet, cluster testing

Re: Linux/Mac engine gauntlet, cluster testing

Re: Linux/Mac engine gauntlet, cluster testing