I've put it in my testing queue and I expect it to appear at next week's update. Currently it's going to be v1.8 but I have already changed the version number twice.
New engine releases & news 2021
Moderator: Ras
-
- Posts: 1408
- Joined: Sat Jul 21, 2018 7:43 am
- Location: Budapest, Hungary
- Full name: Gabor Szots
Re: New engine releases & news 2021
Gabor Szots
CCRL testing group
CCRL testing group
-
- Posts: 7274
- Joined: Thu Aug 18, 2011 12:04 pm
- Full name: Ed Schröder
Re: New engine releases & news 2021
What is needed is someone who is able and willing to maintain those 2 threads since folks are not listening. Someone who is given access to the forum without becoming a moderator himself. I want to ask the current mods (HGM & Harvey) to consider this option. I propose Guenther for the job, that is if he is willing of course.Guenther wrote: ↑Mon Nov 08, 2021 9:29 am
BTW for the next year I will split up the engine releases thread explicitely
in two threads, one should be only for the announcement of them
and the other one for discussions/bug reports about those.
Of course every one is still free for creating new threads about
each and every program alone, especially the authors themselves.
May be I will also shorten the time span in half years?!
Keep in mind that this thread was started to collect all that information
for future researchers. It will make their work much easier.
Guenther
90% of coding is debugging, the other 10% is writing bugs.
-
- Posts: 608
- Joined: Sun May 30, 2021 5:03 am
- Location: United States
- Full name: Christian Dean
Re: New engine releases & news 2021
Yup, I noticed that as well. Hopper took so long to resolve the fail low that occurred that for a second I thought it stalled, so I definitely think that's something that Johnathan could do some more tinkering with.Guenther wrote: ↑Mon Nov 08, 2021 9:29 am I can see from that game alone that Hopper has a time management problem (at least in mps tcs)!
Of course I could calculate it exactly with my import macro, but that it can still use
135s in move 31 for trying to solve the fail low after Nf5 is already proof enough
for an mps game (40/300)!
(Blunders time management OTH is practically linear here, which could be optimized too.)
I guess it is not tested usually with mps time controls?
While we're also at it, I tried doing some ultra-bullet testing with Hopper (tc=inf/10+0.1s) since that's the format I normally use to measure Elo gain and Hopper had some issues handling the time control format, so MPS format may not necessarily be the issue? Not really sure...I'm sure Johnathan could provide some insight here as well.
And as far as Blunder goes, you're also right. Currently, for a MPS format, Blunder allocates the allotted time per move simply by doing time_left / moves_to_go, which is definitely linear. I haven't worked on the time management aspect of Blunder in a while, and I think I'll spend some time playing around with that before v7.2.0 is released. I just remember when I first started, I wanted to at least make sure Blunder's time management system was robust and would never allow the engine to lose on time, if nothing else. So I took inspiration from http://mediocrechess.blogspot.com/2007/ ... ement.html, made some small tweeks to allow different time formats, and ran with it (I also realized I never properly credited the blogpost and if my time mangamenet scheme doesn't change before the next release, I'll definitely add that in).
Hmm, sounds like that might work to me. Might be easier to keep version announcement and feedback separate, although that may be slightly more cumbersome.Guenther wrote: ↑Mon Nov 08, 2021 9:29 am BTW for the next year I will split up the engine releases thread explicitely
in two threads, one should be only for the announcement of them
and the other one for discussions/bug reports about those.
Of course every one is still free for creating new threads about
each and every program alone, especially the authors themselves.
May be I will also shorten the time span in half years?!
Keep in mind that this thread was started to collect all that information
for future researchers. It will make their work much easier.
Guenther
And with the rate that new engines are popping up, half-years might just be the option to go with
-
- Posts: 953
- Joined: Fri Mar 10, 2006 4:29 pm
- Location: Germany
- Full name: Jörg Oster
Re: New engine releases & news 2021
Matefish 1.0 https://github.com/joergoster/Stockfish/tree/matefish
A mate solving engine like Chest, but nowhere close to its performance.
Still at the very beginning of development.
See also this thread: https://talkchess.com/forum3/viewtopic.php?f=2&t=78595
A mate solving engine like Chest, but nowhere close to its performance.
Still at the very beginning of development.
See also this thread: https://talkchess.com/forum3/viewtopic.php?f=2&t=78595
Jörg Oster
-
- Posts: 3519
- Joined: Wed Mar 08, 2006 8:15 pm
- Full name: Jouni Uski
Re: New engine releases & news 2021
[fen]8/8/1R6/2K5/8/k7/8/8 w - - 0 0[/fen]
info string No mate in 1 found
<< info string No mate in 1 found
<< info string No mate in 1 found
<< info string No mate in 2 found
<< info string No mate in 1 found
<< info string No mate in 1 found
<< info string No mate in 1 found
<< info string No mate in 2 found
<< info string No mate in 2 found
<< info string No mate in 2 found
<< info string No mate in 2 found
<< info string No mate in 3 found
<< info string No mate in 3 found
<< info string No mate in 3 found
<< info string No mate in 3 found
<< info string No mate in 3 found
<< info string No mate in 2 found
<< info string No mate in 3 found
<< info string No mate in 4 found
<< info string No mate in 4 found
<< info string No mate in 4 found
<< info string No mate in 4 found
<< info string No mate in 4 found
<< info string No mate in 4 found
<< info string No mate in 5 found
<< info string No mate in 5 found
<< info string No mate in 5 found
<< info string No mate in 5 found
<< info string No mate in 5 found
<< info string No mate in 5 found
I gave it already a try, but no mate found so far .
info string No mate in 1 found
<< info string No mate in 1 found
<< info string No mate in 1 found
<< info string No mate in 2 found
<< info string No mate in 1 found
<< info string No mate in 1 found
<< info string No mate in 1 found
<< info string No mate in 2 found
<< info string No mate in 2 found
<< info string No mate in 2 found
<< info string No mate in 2 found
<< info string No mate in 3 found
<< info string No mate in 3 found
<< info string No mate in 3 found
<< info string No mate in 3 found
<< info string No mate in 3 found
<< info string No mate in 2 found
<< info string No mate in 3 found
<< info string No mate in 4 found
<< info string No mate in 4 found
<< info string No mate in 4 found
<< info string No mate in 4 found
<< info string No mate in 4 found
<< info string No mate in 4 found
<< info string No mate in 5 found
<< info string No mate in 5 found
<< info string No mate in 5 found
<< info string No mate in 5 found
<< info string No mate in 5 found
<< info string No mate in 5 found
I gave it already a try, but no mate found so far .
Jouni
-
- Posts: 953
- Joined: Fri Mar 10, 2006 4:29 pm
- Location: Germany
- Full name: Jörg Oster
Re: New engine releases & news 2021
Code: Select all
position fen 8/8/1R6/2K5/8/k7/8/8 w - - 0 1
d
+---+---+---+---+---+---+---+---+
| | | | | | | | | 8
+---+---+---+---+---+---+---+---+
| | | | | | | | | 7
+---+---+---+---+---+---+---+---+
| | R | | | | | | | 6
+---+---+---+---+---+---+---+---+
| | | K | | | | | | 5
+---+---+---+---+---+---+---+---+
| | | | | | | | | 4
+---+---+---+---+---+---+---+---+
| k | | | | | | | | 3
+---+---+---+---+---+---+---+---+
| | | | | | | | | 2
+---+---+---+---+---+---+---+---+
| | | | | | | | | 1
+---+---+---+---+---+---+---+---+
a b c d e f g h
Fen: 8/8/1R6/2K5/8/k7/8/8 w - - 0 1
Key: 6DD360BCC1B85DCB
Checkers:
go mate 4
info string No mate in 1 found
info string No mate in 2 found
info string No mate in 3 found
info string Success! Mate in 4 found!
info depth 5 seldepth 7 multipv 1 score mate 4 nodes 26746 nps 1910428 tbhits 0 time 14 pv c5c4 a3a2 c4c3 a2a1 c3c2 a1a2 b6a6
bestmove c5c4 ponder a3a2
No tests done under any GUI so far.
Matefish cannot make use of more than 1 thread.
Or, to be more precise, all threads will do exactly the same.
This will change in the future.
Jörg Oster
-
- Posts: 58
- Joined: Wed Mar 18, 2020 10:00 pm
- Full name: Jonathan McDermid
Re: New engine releases & news 2021
Hopper definitely has time management issues, and I have never tested a game with it above 1 minute time controls. It just takes so long to get a decent sample size. There have been a million times where I'm testing the performance for a change and one version wins 80% of the first 10-20 games and then I check back an hour later and the other version was proven to be superior in the long run.algerbrex wrote: ↑Mon Nov 08, 2021 2:36 pm
Yup, I noticed that as well. Hopper took so long to resolve the fail low that occurred that for a second I thought it stalled, so I definitely think that's something that Johnathan could do some more tinkering with.
While we're also at it, I tried doing some ultra-bullet testing with Hopper (tc=inf/10+0.1s) since that's the format I normally use to measure Elo gain and Hopper had some issues handling the time control format, so MPS format may not necessarily be the issue? Not really sure...I'm sure Johnathan could provide some insight here as well.
And as far as Blunder goes, you're also right. Currently, for a MPS format, Blunder allocates the allotted time per move simply by doing time_left / moves_to_go, which is definitely linear. I haven't worked on the time management aspect of Blunder in a while, and I think I'll spend some time playing around with that before v7.2.0 is released. I just remember when I first started, I wanted to at least make sure Blunder's time management system was robust and would never allow the engine to lose on time, if nothing else. So I took inspiration from http://mediocrechess.blogspot.com/2007/ ... ement.html, made some small tweeks to allow different time formats, and ran with it (I also realized I never properly credited the blogpost and if my time mangamenet scheme doesn't change before the next release, I'll definitely add that in).
As for the exact time management issues, I think a lot of it stems from Hopper only checking the time every time an iterative deepening search has been completed. When the allocated time runs out and the best move has a significantly worse score than Hopper evaluated on the previous turn, it uses emergency time equal to 4x the initially allocated time as long as it hasn't already used up a third of the total time left, as doing another search would likely result in a timeout. I assume the standard way to get around the problem of constantly checking the clock is to use threads, but I haven't done much reading on that yet. These time settings seem to work decently on the fast time controls I test on (though admittedly I dont know what optimal time management looks like), but they might scale poorly to longer settings.
Clovis GitHub
-
- Posts: 608
- Joined: Sun May 30, 2021 5:03 am
- Location: United States
- Full name: Christian Dean
Re: New engine releases & news 2021
Have you looked into testing using bullet games? As far as I'm aware, this is accepted and most common (but not the only!) method used today to measure Elo gains or losses in an engine. Including stronger engines like Stockfish (actually stockfish may have pioneered this method. I don't have enough knowledge to make an intelligent comment on this though). It's also much quicker than running thousands of games at a time control of say, 40/5 min.
The basic idea is that we have two versions of an engine play thousands of games at ultra-fast time controls and use an engine testing framework like cutechess to measure the Elo loss or gain based on the number of wins, losses, or draws for each side.
For instance, my standard format for testing Elo gaining/losing features in Blunder is to use cutechess to run 2000 games between the new version and the old version with a time control of 10+0.1s. One testing session usually takes anywhere between 4-6 hours (I don't have a great computer right now), and at the end cutechess would give an output looking something like this:
Code: Select all
Score of Blunder 1.0.0 vs MinimalChess 0.2: 323 - 142 - 189 [0.638] 654
... Blunder 1.0.0 playing White: 152 - 83 - 92 [0.606] 327
... Blunder 1.0.0 playing Black: 171 - 59 - 97 [0.671] 327
... White vs Black: 211 - 254 - 189 [0.467] 654
Elo difference: 98.7 +/- 23.0, LOS: 100.0 %, DrawRatio: 28.9 %
SPRT: llr 2.95 (100.2%), lbound -2.94, ubound 2.94 - H1 was accepted
Finished match
I also occasionally use SPRT testing if I suspect one version of my engine I'm testing will be significantly stronger than the other. SPRT saves time since it'll end tests after a certain number of games when it's fairly confident it has the right idea about the strength difference between the programs. (Marcel Vanthoor has a nice writeup on this here https://rustic-chess.org/progress/sprt_testing.html).
And see here to download cutechess and the command line tools it offers: https://github.com/cutechess/cutechess/releases
Apologies if you're already aware of all of this, but from your comment, it sounded as if you weren't, and I'd consider it to be one of the best tools you can use to quickly improve your engine, and know you're making improvements, instead of having to guess after only a few statistically insignificant games.
The method I use and I've seen others use as well is to check for the time being up at the top of every call to the main alpha-beta and the qsearch routines. I've found this method is pretty robust and allows Blunder to stick to the time allotted for a given move.jmcd wrote: ↑Mon Nov 08, 2021 8:38 pm As for the exact time management issues, I think a lot of it stems from Hopper only checking the time every time an iterative deepening search has been completed. When the allocated time runs out and the best move has a significantly worse score than Hopper evaluated on the previous turn, it uses emergency time equal to 4x the initially allocated time as long as it hasn't already used up a third of the total time left, as doing another search would likely result in a timeout. I assume the standard way to get around the problem of constantly checking the clock is to use threads, but I haven't done much reading on that yet. These time settings seem to work decently on the fast time controls I test on (though admittedly I dont know what optimal time management looks like), but they might scale poorly to longer settings.
-
- Posts: 58
- Joined: Wed Mar 18, 2020 10:00 pm
- Full name: Jonathan McDermid
Re: New engine releases & news 2021
I'm aware of cutechess. Currently I'm using arena and it does the job. I like whatever Brunetti and whatever some of the other testers here are doing more though because they have info on illegal moves and timeouts which arent displayed in the final result on arena (though I assume there is some way to do it). I've also seen people talking about benchmark or something like that and its on the long list of things that I need to look into.algerbrex wrote: ↑Mon Nov 08, 2021 9:24 pm
Apologies if you're already aware of all of this, but from your comment, it sounded as if you weren't, and I'd consider it to be one of the best tools you can use to quickly improve your engine, and know you're making improvements, instead of having to guess after only a few statistically insignificant games.
The method I use and I've seen others use as well is to check for the time being up at the top of every call to the main alpha-beta and the qsearch routines. I've found this method is pretty robust and allows Blunder to stick to the time allotted for a given move.
I dont really guess off a few insignificant games, but my point was that I am not surprised that Hopper exhibits strange behaviour on long time controls because I've never tested it with them.
I've always thought that the time check at every single node seems like a waste. I'm not sure how expensive time check functions are (I assume pretty cheap) but still, if your usable resolution it 1ms and you're going through 1 million nodes a second, why check every node? Seems like a waste to me, but perhaps never going over your allocated time pays off. Also I think that threads might solve this problem entirely if I understand their capabilities correctly.
Clovis GitHub
-
- Posts: 4718
- Joined: Wed Oct 01, 2008 6:33 am
- Location: Regensburg, Germany
- Full name: Guenther Simon
Re: New engine releases & news 2021
Hi Jonathan,jmcd wrote: ↑Mon Nov 08, 2021 8:38 pmHopper definitely has time management issues, and I have never tested a game with it above 1 minute time controls. It just takes so long to get a decent sample size. There have been a million times where I'm testing the performance for a change and one version wins 80% of the first 10-20 games and then I check back an hour later and the other version was proven to be superior in the long run.algerbrex wrote: ↑Mon Nov 08, 2021 2:36 pm
Yup, I noticed that as well. Hopper took so long to resolve the fail low that occurred that for a second I thought it stalled, so I definitely think that's something that Johnathan could do some more tinkering with.
While we're also at it, I tried doing some ultra-bullet testing with Hopper (tc=inf/10+0.1s) since that's the format I normally use to measure Elo gain and Hopper had some issues handling the time control format, so MPS format may not necessarily be the issue? Not really sure...I'm sure Johnathan could provide some insight here as well.
And as far as Blunder goes, you're also right. Currently, for a MPS format, Blunder allocates the allotted time per move simply by doing time_left / moves_to_go, which is definitely linear. I haven't worked on the time management aspect of Blunder in a while, and I think I'll spend some time playing around with that before v7.2.0 is released. I just remember when I first started, I wanted to at least make sure Blunder's time management system was robust and would never allow the engine to lose on time, if nothing else. So I took inspiration from http://mediocrechess.blogspot.com/2007/ ... ement.html, made some small tweeks to allow different time formats, and ran with it (I also realized I never properly credited the blogpost and if my time mangamenet scheme doesn't change before the next release, I'll definitely add that in).
As for the exact time management issues, I think a lot of it stems from Hopper only checking the time every time an iterative deepening search has been completed. When the allocated time runs out and the best move has a significantly worse score than Hopper evaluated on the previous turn, it uses emergency time equal to 4x the initially allocated time as long as it hasn't already used up a third of the total time left, as doing another search would likely result in a timeout. I assume the standard way to get around the problem of constantly checking the clock is to use threads, but I haven't done much reading on that yet. These time settings seem to work decently on the fast time controls I test on (though admittedly I dont know what optimal time management looks like), but they might scale poorly to longer settings.
I think we have a missunderstanding here. I don't think the slower tc is the problem here or how often a search depth is checked, it is a more general problem in your time management at least for mps tcs (haven't checked your code yet).
It simply plays too fast. Now I have imported the pgn with a macro into a sheet for making it better visible, see below (col headers should
be self explaining)
You can see that it had just used around 44.6% of available time until move 30!, that's very suboptimal, if you know your full current time budget
is for 40 moves. Also it is usually suggested to play the first moves after book end (in this case no book was used at all BTW) even slower than
average, as programs still play weaker in the opening.
Edit:
@algerbrex BTW, if you are on WIN I had compiled a more advanced CuteChessGUI and cli version last year than the available one with
a lot of newer features and some little things I changed like always at least two digits time precision...
https://rwbc-chess.de/download.htm
Code: Select all
[White "Blunder 7.1.0"]
[Black "Hopper 1.8"]
[TimeControl "40/300"]
Code: Select all
M# WMov WEv WD WT WDAvg WTSum WT% BMov BEv BD BT BDAvg BTSum BT%
1 e4 0.13 14 7.500 14.00 7.500 2.50 e5 -0.18 13 5.200 13.00 5.200 1.73
2 Nf3 0.14 14 7.500 14.00 15.000 5.00 Nc6 -0.21 12 2.200 12.50 7.400 2.47
3 Bc4 0.16 13 7.500 13.50 22.500 7.50 Nf6 -0.08 13 2.500 12.67 9.900 3.30
4 d3 0.16 14 7.500 13.67 30.000 10.00 Bd6 0.03 13 5.300 12.75 15.200 5.07
5 O-O 0.42 14 7.500 13.75 37.500 12.50 Na5 -0.12 14 8.800 13.00 24.000 8.00
6 Bb3 0.51 14 7.500 13.80 45.000 15.00 Nxb3 -0.05 13 3.700 13.00 27.700 9.23
7 axb3 0.58 15 7.500 14.00 52.500 17.50 c5 -0.24 14 9.700 13.14 37.400 12.47
8 Nc3 0.72 14 7.500 14.00 60.000 20.00 O-O -0.02 13 2.100 13.13 39.500 13.17
9 Bg5 0.86 14 7.500 14.00 67.500 22.50 h6 -0.04 15 2.100 13.33 41.600 13.87
10 Bxf6 0.79 15 7.500 14.11 75.000 25.00 Qxf6 0.01 16 2.600 13.60 44.200 14.73
11 Re1 0.64 14 7.500 14.10 82.500 27.50 b6 0.03 16 4.100 13.82 48.300 16.10
12 Nd5 0.72 15 7.500 14.18 90.000 30.00 Qd8 0.09 15 4.100 13.92 52.400 17.47
13 Nd2 0.72 13 7.500 14.08 97.500 32.50 b5 0.02 13 3.600 13.85 56.000 18.67
14 c4 0.84 13 7.500 14.00 105.000 35.00 b4 0.09 14 2.500 13.86 58.500 19.50
15 Nf1 0.63 13 7.500 13.93 112.500 37.50 Bb7 -0.11 13 4.000 13.80 62.500 20.83
16 Nfe3 0.73 14 7.500 13.93 120.000 40.00 g6 -0.23 13 4.400 13.75 66.900 22.30
17 Qg4 0.66 14 7.500 13.94 127.500 42.50 h5 -0.11 15 8.400 13.82 75.300 25.10
18 Qf3 0.61 15 7.500 14.00 135.000 45.00 Kg7 -0.12 15 3.400 13.89 78.700 26.23
19 Qg3 0.53 14 7.500 14.00 142.500 47.50 Kh7 -0.14 14 3.800 13.89 82.500 27.50
20 h3 0.53 14 7.500 14.00 150.000 50.00 a5 -0.10 14 3.500 13.90 86.000 28.67
21 Qf3 0.49 13 7.500 13.95 157.500 52.50 Kg7 -0.02 13 2.900 13.86 88.900 29.63
22 Qe2 0.39 14 7.500 13.95 165.000 55.00 Bc6 0.00 17 3.100 14.00 92.000 30.67
23 Qc2 0.23 13 7.500 13.91 172.500 57.50 Kg8 0.00 15 2.900 14.04 94.900 31.63
24 Qd1 0.39 13 7.500 13.87 180.000 60.00 h4 0.00 17 5.600 14.17 100.500 33.50
25 Qg4 0.58 15 7.500 13.92 187.500 62.50 Kh7 0.00 18 3.500 14.32 104.000 34.67
26 Kh2 0.43 14 7.500 13.92 195.000 65.00 Re8 0.00 17 14.000 14.42 118.000 39.33
27 Rf1 0.50 14 7.500 13.92 202.500 67.50 Kh8 0.00 14 3.400 14.41 121.400 40.47
28 g3 0.80 14 7.500 13.93 210.000 70.00 Be7 0.19 12 4.600 14.32 126.000 42.00
29 Nxe7 0.87 15 7.500 13.96 217.500 72.50 hxg3+ 0.17 15 3.900 14.34 129.900 43.30
30 fxg3 2.36 15 7.500 14.00 225.000 75.00 Qxe7 -0.08 14 3.900 14.33 133.800 44.60
31 Nf5 2.69 15 7.500 14.03 232.500 77.50 Qf6 -3.89 22 135.000 14.58 268.800 89.60
32 Nh6 3.13 15 7.500 14.06 240.000 80.00 Qg7 -2.25 13 1.100 14.53 269.900 89.97
33 Nxf7+ 4.25 16 7.500 14.13 247.500 82.50 Kg8 -2.37 14 1.200 14.52 271.100 90.37
34 Qg5 4.15 17 7.500 14.21 255.000 85.00 Rf8 -4.15 18 12.000 14.62 283.100 94.37
35 Rf6 4.12 17 7.500 14.29 262.500 87.50 Kh7 -4.40 17 4.800 14.69 287.900 95.97
36 Raf1 4.20 16 7.500 14.34 270.000 90.00 Rae8 -5.35 14 4.400 14.67 292.300 97.43
37 Qh4+ 9.12 18 7.500 14.44 277.500 92.50 Kg8 -6.32 19 3.600 14.78 295.900 98.63
38 Nh6+ 9.24 18 7.500 14.54 285.000 95.00 Kh8 -6.34 18 1.900 14.87 297.800 99.27
39 Rf7 9.43 18 7.500 14.63 292.500 97.50 Rxf7 -6.59 17 1.300 14.92 299.100 99.70
40 Nxf7+ 9.43 16 7.000 14.67 299.500 99.83 Kg8