Linux/Mac engine gauntlet, cluster testing

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Linux/Mac engine gauntlet, cluster testing

Post by matthewlai »

bob wrote:Note you can also cause Crafty to produce "long algebraic" (i.e. e7e5 or Ng1f3). Crafty will also accept such moves so you can test this in both directions to be sure you are producing and parsing both correctly.

the command is "output long"
Yeah that's why I tested Crafty against Phalanx, since Phalanx is algebraic-only. My code was doing the translation both ways. So I'm pretty confident it works now.

My code always parses, tests legality, then re-generates moves in the other engine's preferred format.

Just in case some engines add weird symbols and stuff, and some other engines don't like that.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Linux/Mac engine gauntlet, cluster testing

Post by matthewlai »

A few more engines have been tested. Some of them required minor modifications to the matching program, but more and more engines are working out of the box now.

I am starting with stronger engines because they seem to be less buggy as well. I am mostly just using them to iron out bugs in my matching program before I move on to buggier engines (though I have already discovered quite a few bugs - and reported them to the authors).

So far I have been testing with "0 0:01 0.05" games,

Crafty, GNU Chess 6, and new Phalanx work perfectly. Phalanx loses on time about 1% of the time (very long games). I think that's acceptable.

Arasan seems to work perfectly now, but used to have what seemed like a race condition somewhere with very short games.

Micromax seems to be quite buggy at fast games especially with custom start positions (my matching program uses setboard if it's reporeted in features, otherwise edit).

Kiwi resigns all the time. Not sure what's going on. It looks like it doesn't really support reuse, but that's not specified in feature command.

I'm hoping to go through all my 30 or so engines in the next few days, take out the buggy ones, maybe add a few more (suggested in this thread), and officially release the gauntlet.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Linux/Mac engine gauntlet, cluster testing

Post by matthewlai »

All the engines have now been tested. A few engines have been removed for instability, and a few more removed because I couldn't get them to compile after a quick attempt to fix them.

The gauntlet is available here -
http://bitbucket.matthewlai.ca/chessgauntlet/overview

In an ideal world, you would just need to download the repository (http://bitbucket.matthewlai.ca/chessgauntlet/downloads or using hg), and run ./build_all.sh

It will give you a bunch of engines under "engines/[engine name]".

I tested it under OSX (LLVM) and Ubuntu server (GCC 4.8)

All the engines have been tested with at least a few hundred games, with no illegal moves, and no obvious problems (there was one engine that just kept resigning early and often - it was removed).

Special thanks to Daniel (EXchess) and Dusan (Phalanx) for being super responsive at fixing bugs! Both engines are reliable now as far as I can tell.

Here is the list of engines in the gauntlet right now -
Stockfish (commit 1b69910865)
RobboLito 0.085
Texel 1.4
Senpai 1.0
EXchess 7.31 beta
GNU Chess 6.1.2
Scorpio (commit bd5633bb35)
Cheng 0.36c
Sloppy (commit feaa8122169a91a8ae926ff62563244ab9d58f74)
GNU Chess 5.50
Greko 12.0
Arasan 17.2
Diablo 0.5.1
Crafty 24.0
Betsabe II 1.30
Phalanx XXIII (SVN r79)

I will be adding more engines (like those mentioned in this thread already) when I get time. I am looking for weaker but reliable engines in particular. They are pretty hard to come by.

cmatch in the repository is my custom matching program optimized for very fast games without depending on any library. It's very near ready for release, and works well with all the engines in the gauntlet, but I'll make another announcement for that, and probably separate it out of this repository. Feel free to give it a try, though.

Example command, from /cmatch/src -
make
./cmatch ../../engines/crafty ../../engines/stockfish -n 100 -tc "0 0:1 0.02" -pgnout result.pgn -startpos start_positions.fen

Thanks
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.