New engine releases & news 2021

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Gabor Szots
Posts: 1364
Joined: Sat Jul 21, 2018 7:43 am
Location: Szentendre, Hungary
Full name: Gabor Szots

Re: New engine releases & news 2021

Post by Gabor Szots »

jmcd wrote: Mon Nov 08, 2021 5:49 am Thanks! I've been binging it for the past couple months. I'm excited to see how much its improved on the next CCRL iteration.
I've put it in my testing queue and I expect it to appear at next week's update. Currently it's going to be v1.8 but I have already changed the version number twice.
Gabor Szots
CCRL testing group
User avatar
Rebel
Posts: 6995
Joined: Thu Aug 18, 2011 12:04 pm

Re: New engine releases & news 2021

Post by Rebel »

Guenther wrote: Mon Nov 08, 2021 9:29 am
BTW for the next year I will split up the engine releases thread explicitely
in two threads, one should be only for the announcement of them
and the other one for discussions/bug reports about those.
Of course every one is still free for creating new threads about
each and every program alone, especially the authors themselves.
May be I will also shorten the time span in half years?!

Keep in mind that this thread was started to collect all that information
for future researchers. It will make their work much easier.

Guenther
What is needed is someone who is able and willing to maintain those 2 threads since folks are not listening. Someone who is given access to the forum without becoming a moderator himself. I want to ask the current mods (HGM & Harvey) to consider this option. I propose Guenther for the job, that is if he is willing of course.
90% of coding is debugging, the other 10% is writing bugs.
User avatar
algerbrex
Posts: 596
Joined: Sun May 30, 2021 5:03 am
Location: United States
Full name: Christian Dean

Re: New engine releases & news 2021

Post by algerbrex »

Guenther wrote: Mon Nov 08, 2021 9:29 am I can see from that game alone that Hopper has a time management problem (at least in mps tcs)!
Of course I could calculate it exactly with my import macro, but that it can still use
135s in move 31 for trying to solve the fail low after Nf5 is already proof enough
for an mps game (40/300)!

(Blunders time management OTH is practically linear here, which could be optimized too.)

I guess it is not tested usually with mps time controls?
Yup, I noticed that as well. Hopper took so long to resolve the fail low that occurred that for a second I thought it stalled, so I definitely think that's something that Johnathan could do some more tinkering with.

While we're also at it, I tried doing some ultra-bullet testing with Hopper (tc=inf/10+0.1s) since that's the format I normally use to measure Elo gain and Hopper had some issues handling the time control format, so MPS format may not necessarily be the issue? Not really sure...I'm sure Johnathan could provide some insight here as well.

And as far as Blunder goes, you're also right. Currently, for a MPS format, Blunder allocates the allotted time per move simply by doing time_left / moves_to_go, which is definitely linear. I haven't worked on the time management aspect of Blunder in a while, and I think I'll spend some time playing around with that before v7.2.0 is released. I just remember when I first started, I wanted to at least make sure Blunder's time management system was robust and would never allow the engine to lose on time, if nothing else. So I took inspiration from http://mediocrechess.blogspot.com/2007/ ... ement.html, made some small tweeks to allow different time formats, and ran with it (I also realized I never properly credited the blogpost and if my time mangamenet scheme doesn't change before the next release, I'll definitely add that in).
Guenther wrote: Mon Nov 08, 2021 9:29 am BTW for the next year I will split up the engine releases thread explicitely
in two threads, one should be only for the announcement of them
and the other one for discussions/bug reports about those.
Of course every one is still free for creating new threads about
each and every program alone, especially the authors themselves.
May be I will also shorten the time span in half years?!

Keep in mind that this thread was started to collect all that information
for future researchers. It will make their work much easier.

Guenther
Hmm, sounds like that might work to me. Might be easier to keep version announcement and feedback separate, although that may be slightly more cumbersome.

And with the rate that new engines are popping up, half-years might just be the option to go with :lol:
Joerg Oster
Posts: 937
Joined: Fri Mar 10, 2006 4:29 pm
Location: Germany

Re: New engine releases & news 2021

Post by Joerg Oster »

Matefish 1.0 https://github.com/joergoster/Stockfish/tree/matefish

A mate solving engine like Chest, but nowhere close to its performance.
Still at the very beginning of development.

See also this thread: https://talkchess.com/forum3/viewtopic.php?f=2&t=78595
Jörg Oster
Jouni
Posts: 3291
Joined: Wed Mar 08, 2006 8:15 pm

Re: New engine releases & news 2021

Post by Jouni »

[fen]8/8/1R6/2K5/8/k7/8/8 w - - 0 0[/fen]

info string No mate in 1 found
<< info string No mate in 1 found
<< info string No mate in 1 found
<< info string No mate in 2 found
<< info string No mate in 1 found
<< info string No mate in 1 found
<< info string No mate in 1 found
<< info string No mate in 2 found
<< info string No mate in 2 found
<< info string No mate in 2 found
<< info string No mate in 2 found
<< info string No mate in 3 found
<< info string No mate in 3 found
<< info string No mate in 3 found
<< info string No mate in 3 found
<< info string No mate in 3 found
<< info string No mate in 2 found
<< info string No mate in 3 found
<< info string No mate in 4 found
<< info string No mate in 4 found
<< info string No mate in 4 found
<< info string No mate in 4 found
<< info string No mate in 4 found
<< info string No mate in 4 found
<< info string No mate in 5 found
<< info string No mate in 5 found
<< info string No mate in 5 found
<< info string No mate in 5 found
<< info string No mate in 5 found
<< info string No mate in 5 found

I gave it already a try, but no mate found so far :) .
Jouni
Joerg Oster
Posts: 937
Joined: Fri Mar 10, 2006 4:29 pm
Location: Germany

Re: New engine releases & news 2021

Post by Joerg Oster »

Code: Select all

position fen 8/8/1R6/2K5/8/k7/8/8 w - - 0 1
d

 +---+---+---+---+---+---+---+---+
 |   |   |   |   |   |   |   |   | 8
 +---+---+---+---+---+---+---+---+
 |   |   |   |   |   |   |   |   | 7
 +---+---+---+---+---+---+---+---+
 |   | R |   |   |   |   |   |   | 6
 +---+---+---+---+---+---+---+---+
 |   |   | K |   |   |   |   |   | 5
 +---+---+---+---+---+---+---+---+
 |   |   |   |   |   |   |   |   | 4
 +---+---+---+---+---+---+---+---+
 | k |   |   |   |   |   |   |   | 3
 +---+---+---+---+---+---+---+---+
 |   |   |   |   |   |   |   |   | 2
 +---+---+---+---+---+---+---+---+
 |   |   |   |   |   |   |   |   | 1
 +---+---+---+---+---+---+---+---+
   a   b   c   d   e   f   g   h

Fen: 8/8/1R6/2K5/8/k7/8/8 w - - 0 1
Key: 6DD360BCC1B85DCB
Checkers:
go mate 4
info string No mate in 1 found
info string No mate in 2 found
info string No mate in 3 found
info string Success! Mate in 4 found!
info depth 5 seldepth 7 multipv 1 score mate 4 nodes 26746 nps 1910428 tbhits 0 time 14 pv c5c4 a3a2 c4c3 a2a1 c3c2 a1a2 b6a6
bestmove c5c4 ponder a3a2
At the moment I recommend to use Matefish on the command line.
No tests done under any GUI so far.

Matefish cannot make use of more than 1 thread.
Or, to be more precise, all threads will do exactly the same.
This will change in the future.
Jörg Oster
jmcd
Posts: 58
Joined: Wed Mar 18, 2020 10:00 pm
Full name: Jonathan McDermid

Re: New engine releases & news 2021

Post by jmcd »

algerbrex wrote: Mon Nov 08, 2021 2:36 pm
Yup, I noticed that as well. Hopper took so long to resolve the fail low that occurred that for a second I thought it stalled, so I definitely think that's something that Johnathan could do some more tinkering with.

While we're also at it, I tried doing some ultra-bullet testing with Hopper (tc=inf/10+0.1s) since that's the format I normally use to measure Elo gain and Hopper had some issues handling the time control format, so MPS format may not necessarily be the issue? Not really sure...I'm sure Johnathan could provide some insight here as well.

And as far as Blunder goes, you're also right. Currently, for a MPS format, Blunder allocates the allotted time per move simply by doing time_left / moves_to_go, which is definitely linear. I haven't worked on the time management aspect of Blunder in a while, and I think I'll spend some time playing around with that before v7.2.0 is released. I just remember when I first started, I wanted to at least make sure Blunder's time management system was robust and would never allow the engine to lose on time, if nothing else. So I took inspiration from http://mediocrechess.blogspot.com/2007/ ... ement.html, made some small tweeks to allow different time formats, and ran with it (I also realized I never properly credited the blogpost and if my time mangamenet scheme doesn't change before the next release, I'll definitely add that in).
Hopper definitely has time management issues, and I have never tested a game with it above 1 minute time controls. It just takes so long to get a decent sample size. There have been a million times where I'm testing the performance for a change and one version wins 80% of the first 10-20 games and then I check back an hour later and the other version was proven to be superior in the long run.

As for the exact time management issues, I think a lot of it stems from Hopper only checking the time every time an iterative deepening search has been completed. When the allocated time runs out and the best move has a significantly worse score than Hopper evaluated on the previous turn, it uses emergency time equal to 4x the initially allocated time as long as it hasn't already used up a third of the total time left, as doing another search would likely result in a timeout. I assume the standard way to get around the problem of constantly checking the clock is to use threads, but I haven't done much reading on that yet. These time settings seem to work decently on the fast time controls I test on (though admittedly I dont know what optimal time management looks like), but they might scale poorly to longer settings.
Clovis GitHub
User avatar
algerbrex
Posts: 596
Joined: Sun May 30, 2021 5:03 am
Location: United States
Full name: Christian Dean

Re: New engine releases & news 2021

Post by algerbrex »

jmcd wrote: Mon Nov 08, 2021 8:38 pm It just takes so long to get a decent sample size...
Have you looked into testing using bullet games? As far as I'm aware, this is accepted and most common (but not the only!) method used today to measure Elo gains or losses in an engine. Including stronger engines like Stockfish (actually stockfish may have pioneered this method. I don't have enough knowledge to make an intelligent comment on this though). It's also much quicker than running thousands of games at a time control of say, 40/5 min.

The basic idea is that we have two versions of an engine play thousands of games at ultra-fast time controls and use an engine testing framework like cutechess to measure the Elo loss or gain based on the number of wins, losses, or draws for each side.

For instance, my standard format for testing Elo gaining/losing features in Blunder is to use cutechess to run 2000 games between the new version and the old version with a time control of 10+0.1s. One testing session usually takes anywhere between 4-6 hours (I don't have a great computer right now), and at the end cutechess would give an output looking something like this:

Code: Select all

Score of Blunder 1.0.0 vs MinimalChess 0.2: 323 - 142 - 189  [0.638] 654
...      Blunder 1.0.0 playing White: 152 - 83 - 92  [0.606] 327
...      Blunder 1.0.0 playing Black: 171 - 59 - 97  [0.671] 327
...      White vs Black: 211 - 254 - 189  [0.467] 654
Elo difference: 98.7 +/- 23.0, LOS: 100.0 %, DrawRatio: 28.9 %
SPRT: llr 2.95 (100.2%), lbound -2.94, ubound 2.94 - H1 was accepted
Finished match
The above example isn't a perfect example since it was from an older time, but hopefully, you can get the gist of what I'm saying, and you can see how near the bottom cutechess has calculated what it expects the Elo difference to be between the two engines in the test. In the above case, it believes that Blunder 1.0.0 is 98.7 Elo stronger than MinimalChess 0.2, plus or minus 23 Elo.

I also occasionally use SPRT testing if I suspect one version of my engine I'm testing will be significantly stronger than the other. SPRT saves time since it'll end tests after a certain number of games when it's fairly confident it has the right idea about the strength difference between the programs. (Marcel Vanthoor has a nice writeup on this here https://rustic-chess.org/progress/sprt_testing.html).

And see here to download cutechess and the command line tools it offers: https://github.com/cutechess/cutechess/releases

Apologies if you're already aware of all of this, but from your comment, it sounded as if you weren't, and I'd consider it to be one of the best tools you can use to quickly improve your engine, and know you're making improvements, instead of having to guess after only a few statistically insignificant games.
jmcd wrote: Mon Nov 08, 2021 8:38 pm As for the exact time management issues, I think a lot of it stems from Hopper only checking the time every time an iterative deepening search has been completed. When the allocated time runs out and the best move has a significantly worse score than Hopper evaluated on the previous turn, it uses emergency time equal to 4x the initially allocated time as long as it hasn't already used up a third of the total time left, as doing another search would likely result in a timeout. I assume the standard way to get around the problem of constantly checking the clock is to use threads, but I haven't done much reading on that yet. These time settings seem to work decently on the fast time controls I test on (though admittedly I dont know what optimal time management looks like), but they might scale poorly to longer settings.
The method I use and I've seen others use as well is to check for the time being up at the top of every call to the main alpha-beta and the qsearch routines. I've found this method is pretty robust and allows Blunder to stick to the time allotted for a given move.
jmcd
Posts: 58
Joined: Wed Mar 18, 2020 10:00 pm
Full name: Jonathan McDermid

Re: New engine releases & news 2021

Post by jmcd »

algerbrex wrote: Mon Nov 08, 2021 9:24 pm
Apologies if you're already aware of all of this, but from your comment, it sounded as if you weren't, and I'd consider it to be one of the best tools you can use to quickly improve your engine, and know you're making improvements, instead of having to guess after only a few statistically insignificant games.

The method I use and I've seen others use as well is to check for the time being up at the top of every call to the main alpha-beta and the qsearch routines. I've found this method is pretty robust and allows Blunder to stick to the time allotted for a given move.
I'm aware of cutechess. Currently I'm using arena and it does the job. I like whatever Brunetti and whatever some of the other testers here are doing more though because they have info on illegal moves and timeouts which arent displayed in the final result on arena (though I assume there is some way to do it). I've also seen people talking about benchmark or something like that and its on the long list of things that I need to look into.

I dont really guess off a few insignificant games, but my point was that I am not surprised that Hopper exhibits strange behaviour on long time controls because I've never tested it with them.

I've always thought that the time check at every single node seems like a waste. I'm not sure how expensive time check functions are (I assume pretty cheap) but still, if your usable resolution it 1ms and you're going through 1 million nodes a second, why check every node? Seems like a waste to me, but perhaps never going over your allocated time pays off. Also I think that threads might solve this problem entirely if I understand their capabilities correctly.
Clovis GitHub
User avatar
Guenther
Posts: 4607
Joined: Wed Oct 01, 2008 6:33 am
Location: Regensburg, Germany
Full name: Guenther Simon

Re: New engine releases & news 2021

Post by Guenther »

jmcd wrote: Mon Nov 08, 2021 8:38 pm
algerbrex wrote: Mon Nov 08, 2021 2:36 pm
Yup, I noticed that as well. Hopper took so long to resolve the fail low that occurred that for a second I thought it stalled, so I definitely think that's something that Johnathan could do some more tinkering with.

While we're also at it, I tried doing some ultra-bullet testing with Hopper (tc=inf/10+0.1s) since that's the format I normally use to measure Elo gain and Hopper had some issues handling the time control format, so MPS format may not necessarily be the issue? Not really sure...I'm sure Johnathan could provide some insight here as well.

And as far as Blunder goes, you're also right. Currently, for a MPS format, Blunder allocates the allotted time per move simply by doing time_left / moves_to_go, which is definitely linear. I haven't worked on the time management aspect of Blunder in a while, and I think I'll spend some time playing around with that before v7.2.0 is released. I just remember when I first started, I wanted to at least make sure Blunder's time management system was robust and would never allow the engine to lose on time, if nothing else. So I took inspiration from http://mediocrechess.blogspot.com/2007/ ... ement.html, made some small tweeks to allow different time formats, and ran with it (I also realized I never properly credited the blogpost and if my time mangamenet scheme doesn't change before the next release, I'll definitely add that in).
Hopper definitely has time management issues, and I have never tested a game with it above 1 minute time controls. It just takes so long to get a decent sample size. There have been a million times where I'm testing the performance for a change and one version wins 80% of the first 10-20 games and then I check back an hour later and the other version was proven to be superior in the long run.

As for the exact time management issues, I think a lot of it stems from Hopper only checking the time every time an iterative deepening search has been completed. When the allocated time runs out and the best move has a significantly worse score than Hopper evaluated on the previous turn, it uses emergency time equal to 4x the initially allocated time as long as it hasn't already used up a third of the total time left, as doing another search would likely result in a timeout. I assume the standard way to get around the problem of constantly checking the clock is to use threads, but I haven't done much reading on that yet. These time settings seem to work decently on the fast time controls I test on (though admittedly I dont know what optimal time management looks like), but they might scale poorly to longer settings.
Hi Jonathan,

I think we have a missunderstanding here. I don't think the slower tc is the problem here or how often a search depth is checked, it is a more general problem in your time management at least for mps tcs (haven't checked your code yet).
It simply plays too fast. Now I have imported the pgn with a macro into a sheet for making it better visible, see below (col headers should
be self explaining)

You can see that it had just used around 44.6% of available time until move 30!, that's very suboptimal, if you know your full current time budget
is for 40 moves. Also it is usually suggested to play the first moves after book end (in this case no book was used at all BTW) even slower than
average, as programs still play weaker in the opening.

Edit:
@algerbrex BTW, if you are on WIN I had compiled a more advanced CuteChessGUI and cli version last year than the available one with
a lot of newer features and some little things I changed like always at least two digits time precision...
https://rwbc-chess.de/download.htm

Code: Select all

[White "Blunder 7.1.0"]
[Black "Hopper 1.8"]
[TimeControl "40/300"]

Code: Select all

M#	WMov	WEv	WD	WT	WDAvg	WTSum	WT%	BMov	BEv	BD	BT	BDAvg	BTSum	BT%
1	e4	0.13	14	7.500	14.00	7.500	2.50	e5	-0.18	13	5.200	13.00	5.200	1.73
2	Nf3	0.14	14	7.500	14.00	15.000	5.00	Nc6	-0.21	12	2.200	12.50	7.400	2.47
3	Bc4	0.16	13	7.500	13.50	22.500	7.50	Nf6	-0.08	13	2.500	12.67	9.900	3.30
4	d3	0.16	14	7.500	13.67	30.000	10.00	Bd6	0.03	13	5.300	12.75	15.200	5.07
5	O-O	0.42	14	7.500	13.75	37.500	12.50	Na5	-0.12	14	8.800	13.00	24.000	8.00
6	Bb3	0.51	14	7.500	13.80	45.000	15.00	Nxb3	-0.05	13	3.700	13.00	27.700	9.23
7	axb3	0.58	15	7.500	14.00	52.500	17.50	c5	-0.24	14	9.700	13.14	37.400	12.47
8	Nc3	0.72	14	7.500	14.00	60.000	20.00	O-O	-0.02	13	2.100	13.13	39.500	13.17
9	Bg5	0.86	14	7.500	14.00	67.500	22.50	h6	-0.04	15	2.100	13.33	41.600	13.87
10	Bxf6	0.79	15	7.500	14.11	75.000	25.00	Qxf6	0.01	16	2.600	13.60	44.200	14.73
11	Re1	0.64	14	7.500	14.10	82.500	27.50	b6	0.03	16	4.100	13.82	48.300	16.10
12	Nd5	0.72	15	7.500	14.18	90.000	30.00	Qd8	0.09	15	4.100	13.92	52.400	17.47
13	Nd2	0.72	13	7.500	14.08	97.500	32.50	b5	0.02	13	3.600	13.85	56.000	18.67
14	c4	0.84	13	7.500	14.00	105.000	35.00	b4	0.09	14	2.500	13.86	58.500	19.50
15	Nf1	0.63	13	7.500	13.93	112.500	37.50	Bb7	-0.11	13	4.000	13.80	62.500	20.83
16	Nfe3	0.73	14	7.500	13.93	120.000	40.00	g6	-0.23	13	4.400	13.75	66.900	22.30
17	Qg4	0.66	14	7.500	13.94	127.500	42.50	h5	-0.11	15	8.400	13.82	75.300	25.10
18	Qf3	0.61	15	7.500	14.00	135.000	45.00	Kg7	-0.12	15	3.400	13.89	78.700	26.23
19	Qg3	0.53	14	7.500	14.00	142.500	47.50	Kh7	-0.14	14	3.800	13.89	82.500	27.50
20	h3	0.53	14	7.500	14.00	150.000	50.00	a5	-0.10	14	3.500	13.90	86.000	28.67
21	Qf3	0.49	13	7.500	13.95	157.500	52.50	Kg7	-0.02	13	2.900	13.86	88.900	29.63
22	Qe2	0.39	14	7.500	13.95	165.000	55.00	Bc6	0.00	17	3.100	14.00	92.000	30.67
23	Qc2	0.23	13	7.500	13.91	172.500	57.50	Kg8	0.00	15	2.900	14.04	94.900	31.63
24	Qd1	0.39	13	7.500	13.87	180.000	60.00	h4	0.00	17	5.600	14.17	100.500	33.50
25	Qg4	0.58	15	7.500	13.92	187.500	62.50	Kh7	0.00	18	3.500	14.32	104.000	34.67
26	Kh2	0.43	14	7.500	13.92	195.000	65.00	Re8	0.00	17	14.000	14.42	118.000	39.33
27	Rf1	0.50	14	7.500	13.92	202.500	67.50	Kh8	0.00	14	3.400	14.41	121.400	40.47
28	g3	0.80	14	7.500	13.93	210.000	70.00	Be7	0.19	12	4.600	14.32	126.000	42.00
29	Nxe7	0.87	15	7.500	13.96	217.500	72.50	hxg3+	0.17	15	3.900	14.34	129.900	43.30
30	fxg3	2.36	15	7.500	14.00	225.000	75.00	Qxe7	-0.08	14	3.900	14.33	133.800	44.60
31	Nf5	2.69	15	7.500	14.03	232.500	77.50	Qf6	-3.89	22	135.000	14.58	268.800	89.60
32	Nh6	3.13	15	7.500	14.06	240.000	80.00	Qg7	-2.25	13	1.100	14.53	269.900	89.97
33	Nxf7+	4.25	16	7.500	14.13	247.500	82.50	Kg8	-2.37	14	1.200	14.52	271.100	90.37
34	Qg5	4.15	17	7.500	14.21	255.000	85.00	Rf8	-4.15	18	12.000	14.62	283.100	94.37
35	Rf6	4.12	17	7.500	14.29	262.500	87.50	Kh7	-4.40	17	4.800	14.69	287.900	95.97
36	Raf1	4.20	16	7.500	14.34	270.000	90.00	Rae8	-5.35	14	4.400	14.67	292.300	97.43
37	Qh4+	9.12	18	7.500	14.44	277.500	92.50	Kg8	-6.32	19	3.600	14.78	295.900	98.63
38	Nh6+	9.24	18	7.500	14.54	285.000	95.00	Kh8	-6.34	18	1.900	14.87	297.800	99.27
39	Rf7	9.43	18	7.500	14.63	292.500	97.50	Rxf7	-6.59	17	1.300	14.92	299.100	99.70
40	Nxf7+	9.43	16	7.000	14.67	299.500	99.83	Kg8		
https://rwbc-chess.de

trollwatch:
Talkchess nowadays is a joke - it is full of trolls/idiots/people stuck in the pleistocene > 80% of the posts fall into this category...