Looking for automatic Engine Testing Software

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

OliverBr
Posts: 725
Joined: Tue Dec 18, 2007 9:38 pm
Location: Munich, Germany
Full name: Dr. Oliver Brausch

Re: Looking for automatic Engine Testing Software

Post by OliverBr »

Hello together,

I want to implement the use of "Test Suites", a collections of FEN-Positions with Best Move etc. where the engine can be tested. I do have some questions:

1) Not only the "if", but also the "when" is very important when a position is tested. How is this typically done? By measuring the time or nodes? There must be a limit after which the engine should give up?

2) I know of "bm" (Best Move) and "am" (Probably Anti Move). Can they be combined? Is there another option?

3) It looks as always the numbers of "full move" and "50-move-plies" are omitted. But they may play an important role in the analysis of a position?!

4) Such Test suites are often called ".epd", while the notation "Extended Position Description" is something different. Am I missing something?

Thank you very much for information about this matter!
Chess Engine OliThink: http://brausch.org/home/chess
OliThink GitHub:https://github.com/olithink
brianr
Posts: 536
Joined: Thu Mar 09, 2006 3:01 pm

Re: Looking for automatic Engine Testing Software

Post by brianr »

Test suites are a mixed bag.
At best a coarse measure of engine strength; at worst, almost meaningless.

If you really want to add this capability, again, Crafty source has it all.
bm, am, report after x plies or time, held position for y plies or time.

Some GUIs will do it for you if your engine outputs relatively "standard" info.
Some only work with UCI format output.

EPD I think is an extension of FEN with optionally defined fields that accommodate test suites, but not sure about any actual standards.

Of course, many years ago I implemented most of the test suite stuff in Tinker until I saw how little it showed.
I also have about a dozen hard-coded test positions that I can manually run and watch the output to make sure I have not broken something. This works for my engine because I know how it tends to do with this very limited set of positions. These days I only do actual games to gauge strength.
Dann Corbit
Posts: 12541
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Looking for automatic Engine Testing Software

Post by Dann Corbit »

Ed and Chris made an interesting test suite designed to make a very quick Elo calculation.
You can read about it here:
https://www.rebel13.nl/download/speedy-rating-list.html
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
OliverBr
Posts: 725
Joined: Tue Dec 18, 2007 9:38 pm
Location: Munich, Germany
Full name: Dr. Oliver Brausch

Re: Looking for automatic Engine Testing Software

Post by OliverBr »

This is an interesting win (besides hundreds of defeats against a very strong engine). It's a textbook tactical position, I guess such thing can only happen in blitz games.
32. Rxg7+ already wins.
[pgn]
[pgn][Event "?"]
[Site "?"]
[Date "2020.07.27"]
[Round "120"]
[White "OliThink 5.5.9c"]
[Black "Arasan 22.1"]
[Result "1-0"]

1. c4 e5 2. Nc3 Nc6 3. a3 h6 4. d3 Nf6 5. e4 Bc5 6. b4 Bd4 7. Nge2 d6 8. Nxd4
Nxd4 9. Ne2 Ne6 10. f3 a5 11. Be3 c5 12. bxc5 dxc5 13. Rb1 O-O 14. g3 Qd6 15. h4
Rd8 16. Nc3 Ra6 17. Rh2 Nh5 18. Kf2 Nd4 19. Nd5 Nf6 20. Bh3 Nxd5 21. exd5 Bxh3
22. Rxh3 b5 23. Bxd4 cxd4 24. Rxb5 Qxa3 25. Rh1 a4 26. Qd2 Raa8 27. Rhb1 Kh7
28. Rb7 Kg8 29. R1b2 Qa1 30. Kg2 Qa3 31. Qe2 f6 32. f4 exf4 33. Rxg7+ 1-0
[/pgn][/pgn]
Chess Engine OliThink: http://brausch.org/home/chess
OliThink GitHub:https://github.com/olithink
jdart
Posts: 4367
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: Looking for automatic Engine Testing Software

Post by jdart »

I have a script for running test suites - it is for UCI engines but would work with Winboard engines + Polyglot. It handles "bm" and "am" and generates summary results:

https://github.com/jdart1/arasan-chess/ ... analyze.py

Also there are some test suites in the "tests" directory.

--Jon
abulmo2
Posts: 433
Joined: Fri Dec 16, 2016 11:04 am
Location: France
Full name: Richard Delorme

Re: Looking for automatic Engine Testing Software

Post by abulmo2 »

brianr wrote: Mon Jul 27, 2020 7:30 pm EPD I think is an extension of FEN with optionally defined fields that accommodate test suites, but not sure about any actual standards.
The description of the epd fields is available within the pgn standard https://www.thechessdrum.net/PGN_Reference.txt
Richard Delorme
abulmo2
Posts: 433
Joined: Fri Dec 16, 2016 11:04 am
Location: France
Full name: Richard Delorme

Re: Looking for automatic Engine Testing Software

Post by abulmo2 »

OliverBr wrote: Mon Jul 27, 2020 6:16 pm Hello together,

I want to implement the use of "Test Suites", a collections of FEN-Positions with Best Move etc. where the engine can be tested. I do have some questions:
Embedded with amoeba, epdtest is a program to run an UCI chess engine against a test suite (in an epd file). It is a command line tool, but it handles bm, am, dm (distance to mate) and MEA score.

Code: Select all

$ epdtest -h
epdtest --engine|-e <chess engine> --file|-f <epd file> [options]
test an engine with a set of epd files
	--file|-f <epd file>  EPD file to test
	--engine|-e <engine>  Use an external engine executable (default: use embedded amoeba)
	--movetime|-t <time>  Maximum time per move in seconds (default: inf)
	--depth|-d <depth>    Maximum depth per move (default: 127)
	--nodes|-n <nodes>    Maximum nodes per move (default: 18446744073709551615)
	--hash|-H <size>      Hash size in Mb (default 256 MB)
	--analyse|-a          Set Engine into analyse mode
	--cpu|-c <threads>    Cpu number (default 1 cpu)
	--MEA|-M              MEA score (default false)
	--bonus|-b            Use time bonus in MEA Score (default false)
	--loop|-l <repeat>    Repeat the test several times (default ×1)
	--verbose|-v          More verbose output (default: false)
	--quiet|-q            Less verbose output (default: false)
	--debug|-g            Log on to a debug file (default: false)
	--help|-h             Display this help
	--version|-V          Show version number
1) Not only the "if", but also the "when" is very important when a position is tested. How is this typically done? By measuring the time or nodes? There must be a limit after which the engine should give up?
You can use, depth, nodes or time. The formers are deterministic with one thread, but using time is more realistic.
2) I know of "bm" (Best Move) and "am" (Probably Anti Move). Can they be combined? Is there another option?
In epdtest I count them separately. I also count correct mate score.

3) It looks as always the numbers of "full move" and "50-move-plies" are omitted. But they may play an important role in the analysis of a position?!
There are special fields in epd to handle them. But I agree it is a mistake to not have them in epd as in the fen format.
4) Such Test suites are often called ".epd", while the notation "Extended Position Description" is something different. Am I missing something?
From the pgn standard:

Code: Select all

16.2: EPD

EPD is "Extended Position Description"; it is a standard for describing chess
positions along with an extended set of structured attribute values using the
ASCII character set.  It is intended for data and command interchange among
chessplaying programs.  It is also intended for the representation of portable
opening library repositories.
Thank you very much for information about this matter!
[/quote]
Richard Delorme
OliverBr
Posts: 725
Joined: Tue Dec 18, 2007 9:38 pm
Location: Munich, Germany
Full name: Dr. Oliver Brausch

Re: Looking for automatic Engine Testing Software

Post by OliverBr »

jdart wrote: Tue Jul 28, 2020 12:12 am I have a script for running test suites - it is for UCI engines but would work with Winboard engines + Polyglot. It handles "bm" and "am" and generates summary results:

https://github.com/jdart1/arasan-chess/ ... analyze.py

Also there are some test suites in the "tests" directory.

--Jon
I hava another question: I am playing some cutechess-cli tourney with Arasan 11.7 and get sometimes the following note:

Code: Select all

Warning: Arasan 11.7 forfeits by invalid result claim: 1-0 {Black resigns}
Finished game 30 (Arasan 11.7 vs OliThink 5.5.9): 0-1 {Black wins by adjudication: Invalid result claim}
Is this a known problem? How can I prevent it? Does it happen with other versions, too?

EDIT: I looked into such game and white was losing, so probably the correct result claim is "White resigns".
Chess Engine OliThink: http://brausch.org/home/chess
OliThink GitHub:https://github.com/olithink
OliverBr
Posts: 725
Joined: Tue Dec 18, 2007 9:38 pm
Location: Munich, Germany
Full name: Dr. Oliver Brausch

Re: Looking for automatic Engine Testing Software

Post by OliverBr »

abulmo2 wrote: Tue Jul 28, 2020 1:42 pm
Embedded with amoeba, epdtest is a program to run an UCI chess engine against a test suite (in an epd file). It is a command line tool, but it handles bm, am, dm (distance to mate) and MEA score.
Wow, this is great, thank you!
Only OliThink is a xboard protocol engine, so I will need an adapter like polyglot. Do you have a tip what to use and how?
Chess Engine OliThink: http://brausch.org/home/chess
OliThink GitHub:https://github.com/olithink
User avatar
hgm
Posts: 27808
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Looking for automatic Engine Testing Software

Post by hgm »

Note that XBoard 4.9 can also do EPD test suites. Just run a match starting from the position file with -epd as extra argument. See https://www.gnu.org/software/xboard/wha ... tml#tag-A5 .