Linux/Mac engine gauntlet, cluster testing

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Linux/Mac engine gauntlet, cluster testing

Post by matthewlai »

Just thought some people may be interested in this -

I am currently compiling a gauntlet of open source Linux/OSX chess engines for testing.

It's still work-in-progress and doesn't work, yet, but it's available here - http://bitbucket.matthewlai.ca/chessgauntlet

All engines in the gauntlet have licenses that allow distribution of modified copies (I believe all of them are GPL so far, besides Crafty?). I have about 30 engines in there so far, of varying strengths, from Stockfish to 2000 ELO engines.

They are also all in C/C++, mostly because I don't know other languages well enough to fix other people's code (and some of them... are really quite atrocious).

I originally planned to include <2000 engines as well, but had to give up since code quality is actually pretty well correlated with engine strength in most cases (many exceptions of course), and most <2000 engines just take way too long to fix (and I imagine would be more buggy as well).

I fixed the engines so that they all compile under latest LLVM (and GCC later - though I suspect there won't be much work for that, since most engines were written for GCC).

I also have a build script that builds all the engines from source.

I am not really interested in supporting Windows at this time, because it's a lot of work, and my end goal is to do cluster testing with this gauntlet, and of course no cluster runs Windows.

I am also writing a text mode program that matches engines without depending on any external library (important for clusters... especially other people's clusters).

Has anyone done something similar to this before?

Also, anyone has experience with using clusters for testing? I know Crafty is tested on Dr. Hyatt's cluster, but I'm surprised it's not used for more engines since clusters are relatively widely available now - many universities have them, and there are commercial HPC services as well, like Penguin Computing.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Linux/Mac engine gauntlet, cluster testing

Post by bob »

matthewlai wrote:Just thought some people may be interested in this -

I am currently compiling a gauntlet of open source Linux/OSX chess engines for testing.

It's still work-in-progress and doesn't work, yet, but it's available here - http://bitbucket.matthewlai.ca/chessgauntlet

All engines in the gauntlet have licenses that allow distribution of modified copies (I believe all of them are GPL so far, besides Crafty?). I have about 30 engines in there so far, of varying strengths, from Stockfish to 2000 ELO engines.

They are also all in C/C++, mostly because I don't know other languages well enough to fix other people's code (and some of them... are really quite atrocious).

I originally planned to include <2000 engines as well, but had to give up since code quality is actually pretty well correlated with engine strength in most cases (many exceptions of course), and most <2000 engines just take way too long to fix (and I imagine would be more buggy as well).

I fixed the engines so that they all compile under latest LLVM (and GCC later - though I suspect there won't be much work for that, since most engines were written for GCC).

I also have a build script that builds all the engines from source.

I am not really interested in supporting Windows at this time, because it's a lot of work, and my end goal is to do cluster testing with this gauntlet, and of course no cluster runs Windows.

I am also writing a text mode program that matches engines without depending on any external library (important for clusters... especially other people's clusters).

Has anyone done something similar to this before?

Also, anyone has experience with using clusters for testing? I know Crafty is tested on Dr. Hyatt's cluster, but I'm surprised it's not used for more engines since clusters are relatively widely available now - many universities have them, and there are commercial HPC services as well, like Penguin Computing.
I test on linux clusters myself. Some notes, in no particular order.

1. Engine quality is a major issue. Yes there are lots of so-called "linux engines" but they have many bugs, particularly those that lose way too many games on time, which is not good for testing at all.

2. I wrote my own "matchmaker" that plays games from a known starting position, maintains the clocks, keeps up with wins/losses/draws/stalemates/repetitions/50-move/etc. Writes PGN for each game so that BayesElo can give me results. Etc.

3. I use BayesElo to suck in all the PGN files and produce Elo listings that let me evaluate results. It works, and needs no changes to work either.

I'd certainly be interested in any reasonably strong engines you discover that are actually reliable. Particularly at faster time controls like 10sec + 0.1s increment, which causes many programs massive problems..
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Linux/Mac engine gauntlet, cluster testing

Post by matthewlai »

bob wrote: I test on linux clusters myself. Some notes, in no particular order.

1. Engine quality is a major issue. Yes there are lots of so-called "linux engines" but they have many bugs, particularly those that lose way too many games on time, which is not good for testing at all.

2. I wrote my own "matchmaker" that plays games from a known starting position, maintains the clocks, keeps up with wins/losses/draws/stalemates/repetitions/50-move/etc. Writes PGN for each game so that BayesElo can give me results. Etc.

3. I use BayesElo to suck in all the PGN files and produce Elo listings that let me evaluate results. It works, and needs no changes to work either.

I'd certainly be interested in any reasonably strong engines you discover that are actually reliable. Particularly at faster time controls like 10sec + 0.1s increment, which causes many programs massive problems..
Thanks!

Engine quality is my biggest worry as well. Initially I thought most engines would do fine at very fast time controls since I imagined that's how most people would have tuned their eval params, since most people don't have a few hundred CPUs at their disposal. That didn't seem to be the case. No idea how they tuned their engines...

When I tuned my own engine (back in 2008 - I am just picking up computer chess again now, with much higher programming skills), I ran hundreds of thousands of 1s games, so I know my engine works well at 1s. It's definitely not "reasonably strong" compared to Crafty, though, so probably not very useful for you. Hopefully it will be one day!

I used xboard back then, but had to remove all the drawing code (there was no "-noGUI" back then), because the drawing code was using more CPU time than the engines!

The tool I am writing for matching engines is exactly like what you described. It was easier than I imagined to support proper "protover 2" engines. It can already match Crafty vs Crafty, and I have all the move generation and stuff done so it can translate between algebraic and SAN as well, with proper disambiguation. I imagine most of the remaining work would be workarounds for buggy engines. I guess UCI engines would be easier to work with since that would shift many bugs to be Polyglot's problem.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Linux/Mac engine gauntlet, cluster testing

Post by bob »

BTW, I have ALWAYS been a SAN type of guy, since it is not that hard to produce. But there are a number of good programs that will not touch it. Fortunately since I wrote my own referee program borrowing heavily from Crafty, It was easy enough to read either SAN or algebraic, and send algebraic-only to the engines that require it.

Pain in the butt, however. And you can NOT trust draw/win claims from anybody. I made my referee automatically terminate games on 3-fold rep, stalemate, 50-move draws and checkmates. I found engines that would claim wins for the wrong side, etc.
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Linux/Mac engine gauntlet, cluster testing

Post by matthewlai »

bob wrote:BTW, I have ALWAYS been a SAN type of guy, since it is not that hard to produce. But there are a number of good programs that will not touch it. Fortunately since I wrote my own referee program borrowing heavily from Crafty, It was easy enough to read either SAN or algebraic, and send algebraic-only to the engines that require it.
Yeah I was hoping to not have to write move generation code, but it wasn't really all that bad. Move generation is easy when speed is not a problem.
Pain in the butt, however. And you can NOT trust draw/win claims from anybody. I made my referee automatically terminate games on 3-fold rep, stalemate, 50-move draws and checkmates. I found engines that would claim wins for the wrong side, etc.
That's a good idea. I will keep that in mind.

I am planning to terminate games even before 50-move. Maybe 20 moves or something, since most games are just wasting CPU cycles then. There will be some false draws, but I'm hoping it will be offset by being able to play more useful games.

For testing I will probably also terminate games when both sides agree one side is 200cp up or so. Again, some false wins, but hopefully few enough that the effect will be negligible, and save a lot of time.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
lucasart
Posts: 3241
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: Linux/Mac engine gauntlet, cluster testing

Post by lucasart »

bob wrote: I'd certainly be interested in any reasonably strong engines you discover that are actually reliable. Particularly at faster time controls like 10sec + 0.1s increment, which causes many programs massive problems..
On Linux, using cutechess-cli, I typically run tests in 1+0.01 with DiscoCheck, under heavy parallelism constraints: concurrency=7 on an hyperthreaded quad i7-3770k, while browsing internet and doing other light activities. Zero time losses, using a time buffer of 10ms.
You need to use the latest cutechess compiled from source, it has a submillisecond timer now. DiscoCheck also uses a submillisecond timer.

I don't know if DiscoCheck qualifies as "reasonably strong". Depends on your reference point. It's certainly strong enough for Crafty.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
Max
Posts: 247
Joined: Tue Apr 13, 2010 10:41 am

Re: Linux/Mac engine gauntlet, cluster testing

Post by Max »

Hi,
matthewlai wrote:I am currently compiling a gauntlet of open source Linux/OSX chess engines for testing.
perhaps you could add some of these engines to your list

amy
discocheck
firenzina
fruit
fruit-reloaded
glaurung
minko
octochess
pepito
protector
redqueen
rodent
toga
vajolet

They compile & run on the mac .. unsure if they match your needs.

-Max
Hope we're not just the biological boot loader for digital super intelligence. Unfortunately, that is increasingly probable - Elon Musk
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Linux/Mac engine gauntlet, cluster testing

Post by matthewlai »

Max wrote:Hi,
matthewlai wrote:I am currently compiling a gauntlet of open source Linux/OSX chess engines for testing.
perhaps you could add some of these engines to your list

amy
discocheck
firenzina
fruit
fruit-reloaded
glaurung
minko
octochess
pepito
protector
redqueen
rodent
toga
vajolet

They compile & run on the mac .. unsure if they match your needs.

-Max
Thanks!

I have looked at some of them -
Amy doesn't seem to compile on 64-bit (in LLVM), and it didn't seem trivial to fix.

I didn't want to add Fruit and Toga for now because I already have GNU Chess 6 (which is based on Fruit), and I am aiming for a wider range of playing styles. Though Fruit Reloaded may be interesting.

Same for Glaurung (since I have Stockfish already).

Pepito's code looks very old (GCC 2.x era), and didn't look like it will compile without many changes.

RedQueen didn't have any build instructions or Makefile, and just compiling all the source files didn't seem to work. I didn't spend too much time on this.

Will definitely look into the others when I have time to add more engines!
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
ZirconiumX
Posts: 1350
Joined: Sun Jul 17, 2011 11:14 am
Full name: Hannah Ravensloft

Re: Linux/Mac engine gauntlet, cluster testing

Post by ZirconiumX »

Michel van den Bergh made a GNUChess 5 fork, which if I remember correctly, is actually stronger than 6 (because the FSF does not seem to know how to improve a chess program's strength).

http://hardy.uhasselt.be/Toga/gnuchess-release/

Matthew:out
tu ne cede malis, sed contra audentior ito
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Linux/Mac engine gauntlet, cluster testing

Post by matthewlai »

ZirconiumX wrote:Michel van den Bergh made a GNUChess 5 fork, which if I remember correctly, is actually stronger than 6 (because the FSF does not seem to know how to improve a chess program's strength).

http://hardy.uhasselt.be/Toga/gnuchess-release/

Matthew:out
Yes. That's why I have both GNU Chess 5 and GNU Chess 6 in the gauntlet.

GNU Chess 6 was actually a rewrite by Fabien, based on Fruit 2.1.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.