New engine releases & news 2021

Lana Samson · Post by **Lana Samson** » Fri Oct 15, 2021 1:24 am

Princhess 0.4.0 released

https://github.com/princesslana/prinche ... /tag/0.4.0

Expected elo ~1865

all_versions_1  | Score of Princhess 0.4.0 vs Princhess 0.3.1: 19 - 11 - 30  [0.567] 60
all_versions_1  | ...      Princhess 0.4.0 playing White: 11 - 4 - 15  [0.617] 30
all_versions_1  | ...      Princhess 0.4.0 playing Black: 8 - 7 - 15  [0.517] 30
all_versions_1  | ...      White vs Black: 18 - 12 - 30  [0.550] 60
all_versions_1  | Elo difference: 46.6 +/- 62.8, LOS: 92.8 %, DrawRatio: 50.0 %

mvanthoor · Post by **mvanthoor** » Fri Oct 15, 2021 12:10 pm

Lana Samson wrote: ↑Fri Oct 15, 2021 1:24 am Princhess 0.4.0 released
....

Another Rust engine... it's getting crowded in Rust-space lately

Maybe I should find some esoteric language no-one knows about and then restart development

Welcome. I'll certainly have a look at it in the weekend

If the engine's rating is indeed 1865, it should be exactly on par with my engine, Rustic, version Alpha 3.0.0. I'll run a test in the weekend.

PS: You need to run at least a thousand games between two versions of your engine to see a meaningful Elo gain or loss.

Lana Samson · Post by **Lana Samson** » Fri Oct 15, 2021 4:44 pm

mvanthoor wrote: ↑Fri Oct 15, 2021 12:10 pm PS: You need to run at least a thousand games between two versions of your engine to see a meaningful Elo gain or loss.

Oh, absolutely. This is more just a quick demo after release. I do run a lot more games when testing changes. The ~50 elo figure matches with what I was expecting from during those tests.

mvanthoor · Post by **mvanthoor** » Fri Oct 15, 2021 7:13 pm

Lana Samson wrote: ↑Fri Oct 15, 2021 4:44 pm
mvanthoor wrote: ↑Fri Oct 15, 2021 12:10 pm PS: You need to run at least a thousand games between two versions of your engine to see a meaningful Elo gain or loss.
Oh, absolutely. This is more just a quick demo after release. I do run a lot more games when testing changes. The ~50 elo figure matches with what I was expecting from during those tests.

I ran a test on my home workstation during the working day, with the following results.

Code: Select all

Score of Rustic Alpha 3.0.0 vs Princhess 0.4.0: 680 - 89 - 299 [0.777]
...      Rustic Alpha 3.0.0 playing White: 334 - 45 - 155  [0.771] 534
...      Rustic Alpha 3.0.0 playing Black: 346 - 44 - 144  [0.783] 534
...      White vs Black: 378 - 391 - 299  [0.494] 1068
Elo difference: 216.5 +/- 19.3, LOS: 100.0 %, DrawRatio: 28.0 %
1068 of 2000 games finished.

Rustic Alpha 3.0.0 is actually 1865 Elo. This result suggests Princhess to be around 1650 Elo.

Code: Select all

Score of Rustic Alpha 1.1 vs Princhess 0.4.0: 405 - 237 - 358 [0.584]
...      Rustic Alpha 1.1 playing White: 206 - 112 - 182  [0.594] 500
...      Rustic Alpha 1.1 playing Black: 199 - 125 - 176  [0.574] 500
...      White vs Black: 331 - 311 - 358  [0.510] 1000
Elo difference: 58.9 +/- 17.4, LOS: 100.0 %, DrawRatio: 35.8 %
1000 of 1000 games finished.

As Alpha 1.0.0 (1675 Elo) is still about 60 Elo stronger than Princhess, this suggests 1615 Elo.

Code: Select all

Score of MinimalChess 0.3 vs Princhess 0.4.0: 334 - 309 - 357 [0.512]
...      MinimalChess 0.3 playing White: 156 - 158 - 186  [0.498] 500
...      MinimalChess 0.3 playing Black: 178 - 151 - 171  [0.527] 500
...      White vs Black: 307 - 336 - 357  [0.485] 1000
Elo difference: 8.7 +/- 17.3, LOS: 83.8 %, DrawRatio: 35.7 %
1000 of 1000 games finished.

Princhess 0.4.0 can hold its own against MinimalChess 0.3 (1570 Elo).

In the end, I'd guess that if it was to be tested for the CCRL Blitz list, Princhess 0.4.0 would end up around 1600 Elo, +/- 50. So I think 1865 is a bit of an over-estimation.

Note: I have seen several games where Princhess is unable to convert to a win, despite a huge material advantage; sometimes it draws, but I have also seen it lose games where it was up 10 points of material. I've also seen games where the engine was unable to deliver checkmate despite having a queen, a rook, and a bishop against a lone king. That will cost a _HUGE_ amount of rating points.

Guenther · Post by **Guenther** » Fri Oct 15, 2021 7:51 pm

mvanthoor wrote: ↑Fri Oct 15, 2021 7:13 pm
Lana Samson wrote: ↑Fri Oct 15, 2021 4:44 pm
mvanthoor wrote: ↑Fri Oct 15, 2021 12:10 pm PS: You need to run at least a thousand games between two versions of your engine to see a meaningful Elo gain or loss.
Oh, absolutely. This is more just a quick demo after release. I do run a lot more games when testing changes. The ~50 elo figure matches with what I was expecting from during those tests.
I ran a test on my home workstation during the working day, with the following results.
Code: Select all
Score of Rustic Alpha 3.0.0 vs Princhess 0.4.0: 680 - 89 - 299 [0.777]
...      Rustic Alpha 3.0.0 playing White: 334 - 45 - 155  [0.771] 534
...      Rustic Alpha 3.0.0 playing Black: 346 - 44 - 144  [0.783] 534
...      White vs Black: 378 - 391 - 299  [0.494] 1068
Elo difference: 216.5 +/- 19.3, LOS: 100.0 %, DrawRatio: 28.0 %
1068 of 2000 games finished.
Rustic Alpha 3.0.0 is actually 1865 Elo. This result suggests Princhess to be around 1650 Elo.
Code: Select all
Score of Rustic Alpha 1.1 vs Princhess 0.4.0: 405 - 237 - 358 [0.584]
...      Rustic Alpha 1.1 playing White: 206 - 112 - 182  [0.594] 500
...      Rustic Alpha 1.1 playing Black: 199 - 125 - 176  [0.574] 500
...      White vs Black: 331 - 311 - 358  [0.510] 1000
Elo difference: 58.9 +/- 17.4, LOS: 100.0 %, DrawRatio: 35.8 %
1000 of 1000 games finished.
As Alpha 1.0.0 (1675 Elo) is still about 60 Elo stronger than Princhess, this suggests 1615 Elo.
Code: Select all
Score of MinimalChess 0.3 vs Princhess 0.4.0: 334 - 309 - 357 [0.512]
...      MinimalChess 0.3 playing White: 156 - 158 - 186  [0.498] 500
...      MinimalChess 0.3 playing Black: 178 - 151 - 171  [0.527] 500
...      White vs Black: 307 - 336 - 357  [0.485] 1000
Elo difference: 8.7 +/- 17.3, LOS: 83.8 %, DrawRatio: 35.7 %
1000 of 1000 games finished.
Princhess 0.4.0 can hold its own against MinimalChess 0.3 (1570 Elo).

In the end, I'd guess that if it was to be tested for the CCRL Blitz list, Princhess 0.4.0 would end up around 1600 Elo, +/- 50. So I think 1865 is a bit of an over-estimation.

Note: I have seen several games where Princhess is unable to convert to a win, despite a huge material advantage; sometimes it draws, but I have also seen it lose games where it was up 10 points of material. I've also seen games where the engine was unable to deliver checkmate despite having a queen, a rook, and a bishop against a lone king. That will cost a _HUGE_ amount of rating points.

Well Princhess is not new (anymore) and version 0.31 was already tested at CCRL:

Code: Select all

519	Princhess 0.3.1 64-bit	1816	+21	−21	50.7%	−4.9	28.0%	757
 	Rustic Alpha 2 64-bit	1816	+20	−20	53.2%	−24.2	15.8%	949

You can see it acchieved the same rating as Rustic 2 Alpha, so either it has a big regression (unlikely), or it doesn't like your testing ;-)
Joking aside, may be you played at very fast tc to get so much games so quick and it is not really good at this?

Mergi · Post by **Mergi** » Fri Oct 15, 2021 8:06 pm

As to Marcel's last point about not being able to checkmate even with a huge material advantage - in testing, to save time, most CC games are usually adjudicated when a certain advantage is reached, so just getting into a winning position is all that matters. In my testing i adjudicate when 8 points advantage is reached. And i remember seeing that for LC0 testing they end the game when 5 point advantage is reached. I'd assume CCRL testers do something similar, as otherwise endgames might take a huge amount of time to conclude. So that's probably where the big ELO desparity comes from, if you let every game reach it's ultimate conclusion (checkmate/draw) in your testing.

Guenther · Post by **Guenther** » Fri Oct 15, 2021 8:26 pm

Mergi wrote: ↑Fri Oct 15, 2021 8:06 pm As to Marcel's last point about not being able to checkmate even with a huge material advantage - in testing, to save time, most CC games are usually adjudicated when a certain advantage is reached, so just getting into a winning position is all that matters. In my testing i adjudicate when 8 points advantage is reached. And i remember seeing that for LC0 testing they end the game when 5 point advantage is reached. I'd assume CCRL testers do something similar, as otherwise endgames might take a huge amount of time to conclude. So that's probably where the big ELO desparity comes from, if you let every game reach it's ultimate conclusion (checkmate/draw) in your testing.

That makes sense! I forgot about early adjourning (how early BTW for CCRL?).
I use very late adjourning for gathering all kind of quirks and bugs in programs.

mvanthoor · Post by **mvanthoor** » Fri Oct 15, 2021 9:11 pm

Guenther wrote: ↑Fri Oct 15, 2021 7:51 pm You can see it acchieved the same rating as Rustic 2 Alpha, so either it has a big regression (unlikely), or it doesn't like your testing
Joking aside, may be you played at very fast tc to get so much games so quick and it is not really good at this?

I did play at a very fast time control (10s+0.1s) which I do with all of my tests, and the estimated ratings I achieve are normally close to what CCRL also measures. Princhess didn't have any time forfeits. It did, however, miss a lot of wins and draws by being unable to convert, checkmate, or hold the draw. As said, I've seen the engine not being able to deliver mate with a queen+ material advantage against a bare king. I've even seen it throw away games where it is up more than a rook in an endgame, losing enough material to get mated itself.

So yes, I do think it could be a regression; or it is indeed an engine that does not function well at super-fast time controls. (I've seen it uses MCTS; maybe this method of searching doesn't work well for fast time controls. I don't know; didn't try it yet. But I plan, at some point, to add MCTS to Rustic as an experimental alternative search feature.) I cannot see how fast it is, because it doesn't put out any statistics (yet) apart from a single PV-line (without depth or speed). And, the TT is mostly already at 100% within the very first few seconds. I run with a 16 MB TT to test replacement schemes at this super-fast TC; but I can imagine the thousands / millions (?) of MCTS playouts fill up the TT extremely fast.

Mergi wrote: ↑Fri Oct 15, 2021 8:06 pm As to Marcel's last point about not being able to checkmate even with a huge material advantage - in testing, to save time, most CC games are usually adjudicated when a certain advantage is reached, so just getting into a winning position is all that matters. In my testing i adjudicate when 8 points advantage is reached. And i remember seeing that for LC0 testing they end the game when 5 point advantage is reached. I'd assume CCRL testers do something similar, as otherwise endgames might take a huge amount of time to conclude. So that's probably where the big ELO desparity comes from, if you let every game reach it's ultimate conclusion (checkmate/draw) in your testing.

Also plausible. Because if engines not being able to convert wins or deliver mate, I don't adjudicate games in my testing, so yes, just getting into a winning position is not enough. The engine actually has to win it. I don't adjudicate in my testing.

So there can be several conclusion. Either:
- This engine is not good at fast time controls
- It has a massive regression with regard to converting wins, checkmates, and holding draws, losing many points
- It has always had problem 2, but it goes unnoticed because of game adjudication.

Lana Samson · Post by **Lana Samson** » Fri Oct 15, 2021 9:37 pm

You have hit two of the issues that I know about:
* It does very poorly in end games without syzygy.
* It three-peats from advantageous positions way too much.

I believe the first happens because it searches too broad rather than deep. So it keeps playing positionally and going for positions it thinks is good, and fails to find any mating plan.

I believe the second happens when it runs out of time to adequately search a position, so it doesn't move away from what its initial model said was the best move. If I'm on the right track here, it would be especially obvious at faster time controls.

FWIW, I currently run my tests at 1+1 with a 5-man syzygy.

Gabor Szots · Post by **Gabor Szots** » Sat Oct 16, 2021 8:56 am

Guenther wrote: ↑Fri Oct 15, 2021 8:26 pm That makes sense! I forgot about early adjourning (how early BTW for CCRL?).

For known weak engines or new engines when I am not sure of their strength I switch off adjudication altogether. In the case of Princhess I used only one adjudication, that of tablebases, which adjudicates a draw when 4-piece tablebases say it is a draw and both engines' scores agree.
So in the case of Princhess the rating you find at CCRL is not an overestimation owing to adjudicating wins when it may not have been able to win. Either it checkmated the opponent or the opponent resigned of its own accord.

New engine releases & news 2021

Re: New engine releases & news 2021

Re: New engine releases & news 2021

Re: New engine releases & news 2021

Re: New engine releases & news 2021

Re: New engine releases & news 2021

Re: New engine releases & news 2021

Re: New engine releases & news 2021

Re: New engine releases & news 2021

Re: New engine releases & news 2021

Re: New engine releases & news 2021