Looking for automatic Engine Testing Software

Discussion of chess software programming and technical issues.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
OliverBr
Posts: 334
Joined: Tue Dec 18, 2007 8:38 pm
Location: Munich, Germany
Full name: Dr. Oliver Brausch
Contact:

Re: Looking for automatic Engine Testing Software

Post by OliverBr » Mon Jul 27, 2020 4:16 pm

Hello together,

I want to implement the use of "Test Suites", a collections of FEN-Positions with Best Move etc. where the engine can be tested. I do have some questions:

1) Not only the "if", but also the "when" is very important when a position is tested. How is this typically done? By measuring the time or nodes? There must be a limit after which the engine should give up?

2) I know of "bm" (Best Move) and "am" (Probably Anti Move). Can they be combined? Is there another option?

3) It looks as always the numbers of "full move" and "50-move-plies" are omitted. But they may play an important role in the analysis of a position?!

4) Such Test suites are often called ".epd", while the notation "Extended Position Description" is something different. Am I missing something?

Thank you very much for information about this matter!
Chess Engine OliThink: http://brausch.org/home/chess

brianr
Posts: 422
Joined: Thu Mar 09, 2006 2:01 pm

Re: Looking for automatic Engine Testing Software

Post by brianr » Mon Jul 27, 2020 5:30 pm

Test suites are a mixed bag.
At best a coarse measure of engine strength; at worst, almost meaningless.

If you really want to add this capability, again, Crafty source has it all.
bm, am, report after x plies or time, held position for y plies or time.

Some GUIs will do it for you if your engine outputs relatively "standard" info.
Some only work with UCI format output.

EPD I think is an extension of FEN with optionally defined fields that accommodate test suites, but not sure about any actual standards.

Of course, many years ago I implemented most of the test suite stuff in Tinker until I saw how little it showed.
I also have about a dozen hard-coded test positions that I can manually run and watch the output to make sure I have not broken something. This works for my engine because I know how it tends to do with this very limited set of positions. These days I only do actual games to gauge strength.

Dann Corbit
Posts: 11221
Joined: Wed Mar 08, 2006 7:57 pm
Location: Redmond, WA USA
Contact:

Re: Looking for automatic Engine Testing Software

Post by Dann Corbit » Mon Jul 27, 2020 5:54 pm

Ed and Chris made an interesting test suite designed to make a very quick Elo calculation.
You can read about it here:
https://www.rebel13.nl/download/speedy-rating-list.html
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.

OliverBr
Posts: 334
Joined: Tue Dec 18, 2007 8:38 pm
Location: Munich, Germany
Full name: Dr. Oliver Brausch
Contact:

Re: Looking for automatic Engine Testing Software

Post by OliverBr » Mon Jul 27, 2020 9:32 pm

This is an interesting win (besides hundreds of defeats against a very strong engine). It's a textbook tactical position, I guess such thing can only happen in blitz games.
32. Rxg7+ already wins.
Chess Engine OliThink: http://brausch.org/home/chess

jdart
Posts: 3952
Joined: Fri Mar 10, 2006 4:23 am
Location: http://www.arasanchess.org

Re: Looking for automatic Engine Testing Software

Post by jdart » Mon Jul 27, 2020 10:12 pm

I have a script for running test suites - it is for UCI engines but would work with Winboard engines + Polyglot. It handles "bm" and "am" and generates summary results:

https://github.com/jdart1/arasan-chess/ ... analyze.py

Also there are some test suites in the "tests" directory.

--Jon

abulmo2
Posts: 261
Joined: Fri Dec 16, 2016 10:04 am
Contact:

Re: Looking for automatic Engine Testing Software

Post by abulmo2 » Tue Jul 28, 2020 11:32 am

brianr wrote:
Mon Jul 27, 2020 5:30 pm
EPD I think is an extension of FEN with optionally defined fields that accommodate test suites, but not sure about any actual standards.
The description of the epd fields is available within the pgn standard https://www.thechessdrum.net/PGN_Reference.txt
Richard Delorme

abulmo2
Posts: 261
Joined: Fri Dec 16, 2016 10:04 am
Contact:

Re: Looking for automatic Engine Testing Software

Post by abulmo2 » Tue Jul 28, 2020 11:42 am

OliverBr wrote:
Mon Jul 27, 2020 4:16 pm
Hello together,

I want to implement the use of "Test Suites", a collections of FEN-Positions with Best Move etc. where the engine can be tested. I do have some questions:
Embedded with amoeba, epdtest is a program to run an UCI chess engine against a test suite (in an epd file). It is a command line tool, but it handles bm, am, dm (distance to mate) and MEA score.

Code: Select all

$ epdtest -h
epdtest --engine|-e <chess engine> --file|-f <epd file> [options]
test an engine with a set of epd files
	--file|-f <epd file>  EPD file to test
	--engine|-e <engine>  Use an external engine executable (default: use embedded amoeba)
	--movetime|-t <time>  Maximum time per move in seconds (default: inf)
	--depth|-d <depth>    Maximum depth per move (default: 127)
	--nodes|-n <nodes>    Maximum nodes per move (default: 18446744073709551615)
	--hash|-H <size>      Hash size in Mb (default 256 MB)
	--analyse|-a          Set Engine into analyse mode
	--cpu|-c <threads>    Cpu number (default 1 cpu)
	--MEA|-M              MEA score (default false)
	--bonus|-b            Use time bonus in MEA Score (default false)
	--loop|-l <repeat>    Repeat the test several times (default ×1)
	--verbose|-v          More verbose output (default: false)
	--quiet|-q            Less verbose output (default: false)
	--debug|-g            Log on to a debug file (default: false)
	--help|-h             Display this help
	--version|-V          Show version number
1) Not only the "if", but also the "when" is very important when a position is tested. How is this typically done? By measuring the time or nodes? There must be a limit after which the engine should give up?
You can use, depth, nodes or time. The formers are deterministic with one thread, but using time is more realistic.
2) I know of "bm" (Best Move) and "am" (Probably Anti Move). Can they be combined? Is there another option?
In epdtest I count them separately. I also count correct mate score.

3) It looks as always the numbers of "full move" and "50-move-plies" are omitted. But they may play an important role in the analysis of a position?!
There are special fields in epd to handle them. But I agree it is a mistake to not have them in epd as in the fen format.
4) Such Test suites are often called ".epd", while the notation "Extended Position Description" is something different. Am I missing something?
From the pgn standard:

Code: Select all

16.2: EPD

EPD is "Extended Position Description"; it is a standard for describing chess
positions along with an extended set of structured attribute values using the
ASCII character set.  It is intended for data and command interchange among
chessplaying programs.  It is also intended for the representation of portable
opening library repositories.
Thank you very much for information about this matter!
[/quote]
Richard Delorme

OliverBr
Posts: 334
Joined: Tue Dec 18, 2007 8:38 pm
Location: Munich, Germany
Full name: Dr. Oliver Brausch
Contact:

Re: Looking for automatic Engine Testing Software

Post by OliverBr » Tue Jul 28, 2020 8:33 pm

jdart wrote:
Mon Jul 27, 2020 10:12 pm
I have a script for running test suites - it is for UCI engines but would work with Winboard engines + Polyglot. It handles "bm" and "am" and generates summary results:

https://github.com/jdart1/arasan-chess/ ... analyze.py

Also there are some test suites in the "tests" directory.

--Jon
I hava another question: I am playing some cutechess-cli tourney with Arasan 11.7 and get sometimes the following note:

Code: Select all

Warning: Arasan 11.7 forfeits by invalid result claim: 1-0 {Black resigns}
Finished game 30 (Arasan 11.7 vs OliThink 5.5.9): 0-1 {Black wins by adjudication: Invalid result claim}
Is this a known problem? How can I prevent it? Does it happen with other versions, too?

EDIT: I looked into such game and white was losing, so probably the correct result claim is "White resigns".
Chess Engine OliThink: http://brausch.org/home/chess

OliverBr
Posts: 334
Joined: Tue Dec 18, 2007 8:38 pm
Location: Munich, Germany
Full name: Dr. Oliver Brausch
Contact:

Re: Looking for automatic Engine Testing Software

Post by OliverBr » Tue Jul 28, 2020 8:52 pm

abulmo2 wrote:
Tue Jul 28, 2020 11:42 am

Embedded with amoeba, epdtest is a program to run an UCI chess engine against a test suite (in an epd file). It is a command line tool, but it handles bm, am, dm (distance to mate) and MEA score.
Wow, this is great, thank you!
Only OliThink is a xboard protocol engine, so I will need an adapter like polyglot. Do you have a tip what to use and how?
Chess Engine OliThink: http://brausch.org/home/chess

User avatar
hgm
Posts: 24651
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

Re: Looking for automatic Engine Testing Software

Post by hgm » Tue Jul 28, 2020 9:04 pm

Note that XBoard 4.9 can also do EPD test suites. Just run a match starting from the position file with -epd as extra argument. See https://www.gnu.org/software/xboard/wha ... tml#tag-A5 .

Post Reply