Looking for automatic Engine Testing Software

Discussion of chess software programming and technical issues.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
OliverBr
Posts: 334
Joined: Tue Dec 18, 2007 8:38 pm
Location: Munich, Germany
Full name: Dr. Oliver Brausch
Contact:

Re: Looking for automatic Engine Testing Software

Post by OliverBr » Mon Jul 20, 2020 7:36 pm

brianr wrote:
Mon Jul 20, 2020 4:24 pm
If you are serious about testing enough to order a 32 core box,
Very serious. I will be renting it and it's not even very expensive, about 130 EUR/month because I don't need any resources other than CPU Cores, while Leela doesn't need even those...
...suggest using Ordo in addition to other tools.
Sample command:
./ordo -Q -N 0 -D -a 0 -A "nameofengineusedforzeroElo" -W -n4 -s500 -U "0,1,2,3,4,5,7,8,9,10,6" -p your.pgn

If you stop/restart cutechess-cli matches before normal completion, there are games with '*' results, which Ordo does not like.
For that, just use SCID. Import pgn, search headers to exclude '*' result games, then export pgn.
Thanks, this is a great tip!
I myself wrote already a mini-mini pgn tool that at least give some information about how things go:

Code: Select all

java -jar TourneyEval.jar FourTour.pgn
Fruit 2.1:	132.0
Glaurung 2.1:	121.0
OliThink 5.5.8:	79.0
OliThink 5.4.11:	48.0
Funny Fact: Glaurung is "cheating" because it is using 8 threads while the other use 1. I didn't notice it before.

Here the java src for TourneyEval:

Code: Select all

public class TourneyEval {
	
	public static void main(String[] args) throws Exception {
		FileReader fr = new FileReader(args[0]);
		BufferedReader bf = new BufferedReader(fr);
		Map<String, Integer> engines = new HashMap<String, Integer>();
		String buf = null;
		String we = "";
		String be = "";
		while ((buf = bf.readLine()) != null) {
			
			if (buf.startsWith("[White") || buf.startsWith("[Black")) {
				int n1 = buf.indexOf('"');
				int n2 = buf.indexOf('"', n1 + 1);
				String e = buf.substring(n1 + 1 , n2);
				if (!engines.containsKey(e)) {
					engines.put(e, 0);
				}
				if (buf.startsWith("[White")) we = e; else be = e;
			} 
			
			if (buf.startsWith("[Result")) {
				int n1 = buf.indexOf('"');
				int n2 = buf.indexOf('"', n1 + 1);
				
				int wp = 0;
				int bp = 0;
				
				String r = buf.substring(n1 +1, n2);
				if ("1-0".equals(r)) {
					wp+=2;
				} else if ("0-1".equals(r)) {
					bp+=2;
				} else {
					wp++; bp++;
				}
				
				engines.put(we, engines.get(we) + wp);
				engines.put(be, engines.get(be) + bp);
			}			
		}
		
		bf.close();
		fr.close();
		
		while (!engines.isEmpty()) {
			String leading = ""; int maxval = -1; 
			for (String e : engines.keySet()) {
				if (engines.get(e) > maxval) {
					leading = e;
					maxval = engines.get(e);
				}
			}			
			System.out.println(leading + ":\t" + maxval/2.0);
			engines.remove(leading);
		}
	}

}
Chess Engine OliThink: http://brausch.org/home/chess

OliverBr
Posts: 334
Joined: Tue Dec 18, 2007 8:38 pm
Location: Munich, Germany
Full name: Dr. Oliver Brausch
Contact:

Re: Looking for automatic Engine Testing Software

Post by OliverBr » Mon Jul 20, 2020 8:01 pm

brianr wrote:
Mon Jul 20, 2020 4:24 pm
If you are serious about testing enough to order a 32 core box, suggest using Ordo in addition to other tools.
Sample command:
./ordo -Q -N 0 -D -a 0 -A "nameofengineusedforzeroElo" -W -n4 -s500 -U "0,1,2,3,4,5,7,8,9,10,6" -p your.pgn
It would be nice to have a MacOSX release, too. Typically hardly to none work is to do for Linux sources. OliThink's sources only have few forks for Windows and Unix. The code for Linux and MacOSX is equal.

EDIT: Trying to compile the sources on MacOSX, there is only one error:

Code: Select all

In file included from sysport/sysport.c:4:
sysport/sysport.h:192:11: error: unknown type name 'pthread_spinlock_t'; did you mean 'pthread_rwlock_t'?
                typedef pthread_spinlock_t              mythread_spinx_t; 
                        ^~~~~~~~~~~~~~~~~~
                        pthread_rwlock_t
/usr/include/sys/_pthread/_pthread_rwlock_t.h:31:35: note: 'pthread_rwlock_t' declared here
typedef __darwin_pthread_rwlock_t pthread_rwlock_t;

Anyway, it's a great tool. This is the result after 770 games:

Code: Select all

sources/ordo -Q -N 0 -D -a 0 -A "OliThink 5.5.8" -W -n4 -s500 -U "0,1,2,3,4,5,7,8,9,10,6" -p FourTour.pgn
0   10   20   30   40   50   60   70   80   90   100 (%)
|----|----|----|----|----|----|----|----|----|----|
***************************************************

   # PLAYER             :  RATING  ERROR  POINTS  PLAYED   (%)    W    D    L  D(%)  CFS(%)
   1 Fruit 2.1          :     171     64   135.0     193  69.9  116   38   39  19.7      89
   2 Glaurung 2.1       :     135     60   123.0     192  64.1  108   30   54  15.6     100
   3 OliThink 5.5.8     :       0   ----    79.0     193  40.9   62   34   97  17.6     100
   4 OliThink 5.4.11    :     -99     62    48.0     192  25.0   31   34  127  17.7     ---

White advantage = 28.91 +/- 18.31
Draw rate (equal opponents) = 20.47 % +/- 2.24
Chess Engine OliThink: http://brausch.org/home/chess

Dann Corbit
Posts: 11221
Joined: Wed Mar 08, 2006 7:57 pm
Location: Redmond, WA USA
Contact:

Re: Looking for automatic Engine Testing Software

Post by Dann Corbit » Thu Jul 23, 2020 5:26 am

There was a time when Fruit 2.1 was the strongest chess engine in the world.
And teeny source code Olithink is closing in fast.
:shock:
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.

OliverBr
Posts: 334
Joined: Tue Dec 18, 2007 8:38 pm
Location: Munich, Germany
Full name: Dr. Oliver Brausch
Contact:

Re: Looking for automatic Engine Testing Software

Post by OliverBr » Thu Jul 23, 2020 1:11 pm

Dann Corbit wrote:
Thu Jul 23, 2020 5:26 am
There was a time when Fruit 2.1 was the strongest chess engine in the world.
And teeny source code Olithink is closing in fast.
:shock:
Unfortunately it is not that fast. I am fighting now for 10 ELOs and need thousands of game in order to measure it.

From a relative point of view in 2008 OliThink 5.3.3 was much stronger. It may be weaker than 5.5.8, but it could at least beat any other open source engine. Of course, that was before Stockfish and Leela.
Chess Engine OliThink: http://brausch.org/home/chess

OliverBr
Posts: 334
Joined: Tue Dec 18, 2007 8:38 pm
Location: Munich, Germany
Full name: Dr. Oliver Brausch
Contact:

Re: Looking for automatic Engine Testing Software

Post by OliverBr » Thu Jul 23, 2020 1:14 pm

I have some question to other users who use cutechess or similar software.

Which is you preferred Time Control in order to play thousands of matches?

Which is your setting about resign and draw? How much CPU can be saved with it?
Chess Engine OliThink: http://brausch.org/home/chess

odomobo
Posts: 78
Joined: Thu Jul 05, 2018 11:09 pm
Location: Chicago, IL
Full name: Josh Odom

Re: Looking for automatic Engine Testing Software

Post by odomobo » Thu Jul 23, 2020 6:03 pm

OliverBr wrote:
Thu Jul 23, 2020 1:14 pm
Which is your setting about resign and draw? How much CPU can be saved with it?
I think it can be somewhat dangerous to use automatic adjudication in engine testing tournaments. This is for 2 reasons: there are some positions that engines typically see as a win but are actually drawn, and otherwise it's possible that you've introduced a bug into your engine that prevents it from successfully converting won endgames. Either way, this can skew your results in a way that's uncontrollable.

Similarly, if you play all games to completion, you can perform analysis on games where both engines agreed that 1 side was winning, which eventually ended in a draw.

brianr
Posts: 422
Joined: Thu Mar 09, 2006 2:01 pm

Re: Looking for automatic Engine Testing Software

Post by brianr » Thu Jul 23, 2020 7:41 pm

I only use syzygy 6 men adjudication with cutechess-cli (and provide the tablebases to the engines too).

Time control depends. With A/B engines the increment is about 0.4-0.6 seconds depending on which CPU I'm using.

For NN engines, the GPUs seem to need some "spin up" time and I use a 1 sec increment.

For 4 and 8 CPU matches I use 3 sec inc.

For NNs with same size nets or A/B with minor eval only changes I use fixed nodes (which is the fastest, by far).

To reduce draw rate I use a somewhat unbalanced book.

Finally, from time to time I do a "sanity check" and play identical copies of an engine and if the results are not very close to 50/50 something is broken.

OliverBr
Posts: 334
Joined: Tue Dec 18, 2007 8:38 pm
Location: Munich, Germany
Full name: Dr. Oliver Brausch
Contact:

Re: Looking for automatic Engine Testing Software

Post by OliverBr » Thu Jul 23, 2020 8:18 pm

brianr wrote:
Thu Jul 23, 2020 7:41 pm
For 4 and 8 CPU matches I use 3 sec inc.
This is quite a lot.
I like to play 40/90 which is more than 2 seconds per move. But this is very slow and I am not a very patient person.
Even with a 32 Core Machine this is needs several hours to play 2000 games in order to get reliable results.

I am still fighting with myself if I focus on my engine fighting each other or a tournament with other engines (e.g. Fruit and Glaurung).
Problem with other engines is: The results becomes much more unpredictable. Even after 1800 games the Elo difference varies from 180 to 210 (to Fruit).

Another question: On a 8 Core Cpu, do you play 8 concurrent games or 16 (because there are 16 threads)?
Chess Engine OliThink: http://brausch.org/home/chess

brianr
Posts: 422
Joined: Thu Mar 09, 2006 2:01 pm

Re: Looking for automatic Engine Testing Software

Post by brianr » Thu Jul 23, 2020 9:30 pm

I have not had more than a 4 CPU box until relatively recently (about 9 months) and generally do not run 8 core games.
In fact, my first match was this week for SF-NNUE. Tinker still does not have parallel search so I have not bothered (and it hasn't been worked on for several years, pretty much since Giraffe arrived on the scene).

I know the "more threads than cores issue" has been changing over the years, but I'm still not comfortable exceeded the cores with threads for the little bit I have done.

Testing against the prior best version of your engine I think is fine for several iterations.
Periodically, it is good to also test against a pool of other opponents as the feeling is self-play Elo becomes inflated.
Sometimes when A>B and B>C it is not always the case that A>C.

OliverBr
Posts: 334
Joined: Tue Dec 18, 2007 8:38 pm
Location: Munich, Germany
Full name: Dr. Oliver Brausch
Contact:

Re: Looking for automatic Engine Testing Software

Post by OliverBr » Thu Jul 23, 2020 10:11 pm

My engine does not have parallel search either, but for testing any CPU helps to collect as much games as possible.

This week I rented a 32 Core system, it was less expensive than one may think and I am quite happy with it, because cutechess-cli with concurrency 32 runs really smoothly.

Tinker could be a nice opponent, can you post me the link to get/compile a 64bit Linux executable?

The "unbalanced book" you are talking about is very interesting. Is it available?
Chess Engine OliThink: http://brausch.org/home/chess

Post Reply