Looking for automatic Engine Testing Software

Discussion of chess software programming and technical issues.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
chrisw
Posts: 3352
Joined: Tue Apr 03, 2012 2:28 pm

Re: Looking for automatic Engine Testing Software

Post by chrisw » Fri Jul 24, 2020 8:28 am

OliverBr wrote:
Thu Jul 23, 2020 8:18 pm
brianr wrote:
Thu Jul 23, 2020 7:41 pm
For 4 and 8 CPU matches I use 3 sec inc.
This is quite a lot.
I like to play 40/90 which is more than 2 seconds per move. But this is very slow and I am not a very patient person.
Even with a 32 Core Machine this is needs several hours to play 2000 games in order to get reliable results.

I am still fighting with myself if I focus on my engine fighting each other or a tournament with other engines (e.g. Fruit and Glaurung).
Problem with other engines is: The results becomes much more unpredictable. Even after 1800 games the Elo difference varies from 180 to 210 (to Fruit).

Another question: On a 8 Core Cpu, do you play 8 concurrent games or 16 (because there are 16 threads)?
Windows Task Manager, you can monitor CPU usage for each core and with GPU-X, the temperature of your CPU. Pushing either of those to limits probably not a good idea. You could also send nps to a logfile and monitor that. If your 20 core machine runs 40 threads at, let's say 50% of nps when running 20 threads, then you defeated the purpose of using 2x as many threads (may as well have just halved the game time).

OliverBr
Posts: 334
Joined: Tue Dec 18, 2007 8:38 pm
Location: Munich, Germany
Full name: Dr. Oliver Brausch
Contact:

Re: Looking for automatic Engine Testing Software

Post by OliverBr » Fri Jul 24, 2020 1:16 pm

chrisw wrote:
Fri Jul 24, 2020 8:28 am
Windows Task Manager, you can monitor CPU usage for each core and with GPU-X, the temperature of your CPU. Pushing either of those to limits probably not a good idea. You could also send nps to a logfile and monitor that. If your 20 core machine runs 40 threads at, let's say 50% of nps when running 20 threads, then you defeated the purpose of using 2x as many threads (may as well have just halved the game time).
I am using a remote Linux Server with 32 Cores and 64 Threads. I am still to find out how to monitor the cpu temperature. Until now concurrency = 32 and I get a fine stable load of 32.
Chess Engine OliThink: http://brausch.org/home/chess

jdart
Posts: 3952
Joined: Fri Mar 10, 2006 4:23 am
Location: http://www.arasanchess.org

Re: Looking for automatic Engine Testing Software

Post by jdart » Fri Jul 24, 2020 2:06 pm

OliverBr wrote:
Thu Jul 23, 2020 8:18 pm
I like to play 40/90 which is more than 2 seconds per move. But this is very slow and I am not a very patient person.
Even with a 32 Core Machine this is needs several hours to play 2000 games in order to get reliable results.

I am still fighting with myself if I focus on my engine fighting each other or a tournament with other engines (e.g. Fruit and Glaurung).
Problem with other engines is: The results becomes much more unpredictable. Even after 1800 games the Elo difference varies from 180 to 210 (to Fruit).

Another question: On a 8 Core Cpu, do you play 8 concurrent games or 16 (because there are 16 threads)?
Standard time control used by Stockfish for testing is 1:0+0.6 (this is their "slow" time control). I use something similar. You should scale this by machine speed, though. I use 1:0+0.6 on my dual Xeon 2690x3, and I scale it up on the other slower machines I have, according to the NPS on the machine (this is also what OpenBench does).

Standard practic for most of the strong engines including Stockfish is to use matches of the new candidate version against the previous commit. I do sometimes run gauntlet matches against other engines for gauging progress, but I don't any longer routinely use these for testing changes.

My practice is to not run more concurrent matches than there are physical cores on a machine.

--Jon

chrisw
Posts: 3352
Joined: Tue Apr 03, 2012 2:28 pm

Re: Looking for automatic Engine Testing Software

Post by chrisw » Fri Jul 24, 2020 3:54 pm

OliverBr wrote:
Fri Jul 24, 2020 1:16 pm
chrisw wrote:
Fri Jul 24, 2020 8:28 am
Windows Task Manager, you can monitor CPU usage for each core and with GPU-X, the temperature of your CPU. Pushing either of those to limits probably not a good idea. You could also send nps to a logfile and monitor that. If your 20 core machine runs 40 threads at, let's say 50% of nps when running 20 threads, then you defeated the purpose of using 2x as many threads (may as well have just halved the game time).
I am using a remote Linux Server with 32 Cores and 64 Threads. I am still to find out how to monitor the cpu temperature. Until now concurrency = 32 and I get a fine stable load of 32.
Ah, well, if it’s remote and time rented, let them worry about overheating, you only need worry about nps, logging is probably the answer. It all depends on how the system handles your core demands, what can happen if you push the limits is that one of your (32?) nps rates gets hammered while the others are okay.
I have two six cores and one four core, the two sixes are fine with six matches concurrent, the four isn’t so i dropped it to three, and now it’s fine. Jealous of your 32. Been looking at possibilities, but it’s a maze out there.

OliverBr
Posts: 334
Joined: Tue Dec 18, 2007 8:38 pm
Location: Munich, Germany
Full name: Dr. Oliver Brausch
Contact:

Re: Looking for automatic Engine Testing Software

Post by OliverBr » Fri Jul 24, 2020 8:32 pm

chrisw wrote:
Fri Jul 24, 2020 3:54 pm
Jealous of your 32. Been looking at possibilities, but it’s a maze out there.
Actually, it's less expensive than I thought before, about 140 EUR/month. If you consider that the CPU alone costs 2500 EUR, I find the price quite good.
Only drawback, while not for me: You have to use Linux, because with Windows you really become poor.

https://www.hetzner.com/dedicated-rootserver/ax161

PS: One reason I gave up developing OliThink about 2010 was because I didn't have the possibilities to run large tests. It was nearly impossible to say if the new version is better than the previous until it was played in a large, time consuming tournament. I am not the most patient person and still it's going too slow for me with 32-core...
Chess Engine OliThink: http://brausch.org/home/chess

OliverBr
Posts: 334
Joined: Tue Dec 18, 2007 8:38 pm
Location: Munich, Germany
Full name: Dr. Oliver Brausch
Contact:

Re: Looking for automatic Engine Testing Software

Post by OliverBr » Sun Jul 26, 2020 2:35 pm

jdart wrote:
Fri Jul 24, 2020 2:06 pm
Standard practic for most of the strong engines including Stockfish is to use matches of the new candidate version against the previous commit. I do sometimes run gauntlet matches against other engines for gauging progress, but I don't any longer routinely use these for testing changes.
--Jon
Hi Jon,
I have just installed Arasan 22.1 on my test machine and I have to say, I am really impressed!
Arasan humiliates OliThink with something like 200-1-8. Last time I checked about 2009 it wasn't that strong (Version 11.7).
How could you improve 700 ELO points? What were the most significant steps?


PS: Here is the only won game of OliThink agains Arasan 22.1, despite the fact was trailing with RB against RR for most of the time.

Chess Engine OliThink: http://brausch.org/home/chess

jdart
Posts: 3952
Joined: Fri Mar 10, 2006 4:23 am
Location: http://www.arasanchess.org

Re: Looking for automatic Engine Testing Software

Post by jdart » Sun Jul 26, 2020 8:26 pm

How could you improve 700 ELO points? What were the most significant steps?
11.7 was a long time ago. There were a lot of steps. There is actually a lengthy changelog if you want to see: https://github.com/jdart1/arasan-chess/ ... oc/CHANGES

--Jon

OliverBr
Posts: 334
Joined: Tue Dec 18, 2007 8:38 pm
Location: Munich, Germany
Full name: Dr. Oliver Brausch
Contact:

Re: Looking for automatic Engine Testing Software

Post by OliverBr » Sun Jul 26, 2020 9:29 pm

jdart wrote:
Sun Jul 26, 2020 8:26 pm

11.7 was a long time ago. There were a lot of steps. There is actually a lengthy changelog if you want to see: https://github.com/jdart1/arasan-chess/ ... oc/CHANGES

--Jon
It would be interesting to know which steps were most strength gaining.

PS: In this very moment I let Leela (with a GTX1080Ti) analyze this game I posted. She sees only one blunder and losing move in Arasan's game and this is 86...Rxb4??. OliThink did not see a win here, it just exchanged material and got rid off the nasty free pawn.

Here is the result of a 1000-game match between OliThink 5.5.9d and Arasan 22.1:

Code: Select all

   # PLAYER             :  RATING  ERROR  POINTS  PLAYED   (%)    W    D    L  D(%)  CFS(%)
   1 Arasan 22.1        :     788     94   989.0    1000  98.9  984   10    6   1.0     100
   2 OliThink 5.5.9d    :       0   ----    11.0    1000   1.1    6   10  984   1.0     ---

White advantage = -2.80 +/- 2.02
Draw rate (equal opponents) = 6.11 % +/- 4.91
PS2:
This may be a mistake.. Arasan 22.1 resigned while winning. It looks like it happened one other time, the other wins were correct.
Chess Engine OliThink: http://brausch.org/home/chess

jdart
Posts: 3952
Joined: Fri Mar 10, 2006 4:23 am
Location: http://www.arasanchess.org

Re: Looking for automatic Engine Testing Software

Post by jdart » Mon Jul 27, 2020 12:58 am

OliverBr wrote:
Sun Jul 26, 2020 9:29 pm

This may be a mistake.. Arasan 22.1 resigned while winning. It looks like it happened one other time, the other wins were correct.
Could be a bug. I'll see if I can reproduce. I usually run matches with the "-t" flag to Arasan on the command line, which causes it to put out a lot of debug information. But without that log it is hard for me to tell what might be happening.

--Jon

OliverBr
Posts: 334
Joined: Tue Dec 18, 2007 8:38 pm
Location: Munich, Germany
Full name: Dr. Oliver Brausch
Contact:

Re: Looking for automatic Engine Testing Software

Post by OliverBr » Mon Jul 27, 2020 4:01 pm

jdart wrote:
Mon Jul 27, 2020 12:58 am
OliverBr wrote:
Sun Jul 26, 2020 9:29 pm

This may be a mistake.. Arasan 22.1 resigned while winning. It looks like it happened one other time, the other wins were correct.
Could be a bug. I'll see if I can reproduce. I usually run matches with the "-t" flag to Arasan on the command line, which causes it to put out a lot of debug information. But without that log it is hard for me to tell what might be happening.

--Jon
If you want, the next time I do a 1000 games battle against ArasanX I can run it with the "-t" flag (and the same flag for cutechess-cli). If there is another incident I can provide you the log file even I expect it to be huge. It will get a little messy with 32 concurrent games, this I already know :)
Chess Engine OliThink: http://brausch.org/home/chess

Post Reply