Vice CCRL rating

Guenther · Post by **Guenther** » Thu Apr 29, 2021 5:32 pm

lithander wrote: ↑Thu Apr 29, 2021 4:15 pm Thanks, Gabor! Found it there!

And regarding zahak I have wanted to let MMC play against it before but Windows is giving me a virus alert and quarantines the file immediately. So for now I skipped adding it to my test suite. :/

Note that this Vice 1.1 binary is just 32-bit! The original 1.0 is available as 64-bit too.
The links are all working in my chronology.
https://web.archive.org/web/20161029154 ... Vice10.zip

I get this on my old machine from cmd (start position with default settings)
It is actually faster for Vice and slower for Zahak compared to Amanj numbers.
Vice 1.0 also is above 1.1 in the CCRL list (but that is probably only due to 64 vs. 32 bit)
Anyhow Vice should be much faster on Amanj hardware.

Code: Select all

id name Vice 1.0
id author Bluefever
...
info score cp 22 depth 7 nodes 176808 time 187 pv e2e4 d7d5 e4d5 d8d5 d2d4 e7e5 c1e3
info score cp 5 depth 8 nodes 1308172 time 998 pv e2e4
info score cp 22 depth 9 nodes 2754271 time 1981 pv e2e4 e7e6 d2d4 d7d5 b1c3 d5e4 c3e4 f8b4 c2c3
info score cp 12 depth 10 nodes 18814339 time 12932 pv e2e4
info score cp 20 depth 11 nodes 38738052 time 26115 pv e2e4 e7e5 g1f3

Code: Select all

id name Zahak 1.0.0
id author Amanj
...
info depth 7 seldepth 7 tbhits 2189 hashfull 22 nodes 146951 nps 316704, score cp 44 time 464 pv b1c3 d7d5 d2d4 b8c6 g1f3 g8f6 c1f4
info depth 8 seldepth 8 tbhits 9511 hashfull 88 nodes 584083 nps 429157, score cp 8 time 1361 pv e2e4 d7d5 e4e5 b8c6 d2d4 c8f5 f1b5 e7e6
info depth 9 seldepth 9 tbhits 17934 hashfull 153 nodes 1050361 nps 449640, score cp 42 time 2336 pv e2e4 b8c6 b1c3 g8f6 d2d4 d7d5 e4e5 c8g4 g1f3
info depth 10 seldepth 10 tbhits 40810 hashfull 309 nodes 2469415 nps 474522, score cp 30 time 5204 pv e2e4 b8c6 b1c3 e7e5 g1f3 f8b4 f1b5 g8e7 e1g1 e8g8
info depth 11 seldepth 13 tbhits 99496 hashfull 622 nodes 6418723 nps 457239, score cp 31 time 14038 pv e2e4 d7d5 e4d5 g8f6 f1b5 c8d7 b5d7 b8d7 d1e2 d7b6 c2c4 c7c6 d5c6

Gabor Szots · Post by **Gabor Szots** » Thu Apr 29, 2021 5:34 pm

amanjpro wrote: ↑Thu Apr 29, 2021 5:08 pm Can it be your MAC compile is much faster than which was used in CCRL testing?

I would be really surprised if it is about my compiler (Mac's compiler), both GCC and Go-compiler have very good support for mac...
[/quote]

I meant slower, not faster but you got it right.

lithander · Post by **lithander** » Thu Apr 29, 2021 5:51 pm

amanjpro wrote: ↑Thu Apr 29, 2021 5:08 pm This alone tells me, Vice's rating is probably much closer to Rustic than Zahak 1.0.0

Zahak is probably just especially strong against Vice. When I was experimenting with different PSTs I would often get a set of values that for whatever reason outright destroyed a specific engine but didn't do great on average.

I've just run 400 games against Vice:

Code: Select all

Score of Vice11 vs MinimalChess 0.4: 229 - 121 - 50  [0.635] 400
...      Vice11 playing White: 97 - 67 - 36  [0.575] 200
...      Vice11 playing Black: 132 - 54 - 14  [0.695] 200
...      White vs Black: 151 - 199 - 50  [0.440] 400
Elo difference: 96.2 +/- 33.0, LOS: 100.0 %, DrawRatio: 12.5 %

If it were true that Vice11 is only ~1850 like you claim that would put my engine at ~1750. But my own tests have show that MMC 0.4 is +139 ELO stronger than Bit-Genie 2 64-bit (1877 CCRL) and +118 ELO stronger than Rustic 2 (1815 CCRL) and +68 ELO stronger than WukongJS (1885 CCRL).

This leads me to expect a CCRL rating between 1900 and 1950 for my engine and so my above quoted result from my match against Vice would fit well with Vice's CCRL rating of 2043.

Take all this with a grain of salt, though, my tests are on 5s + 0.5s increment time control.

amanjpro wrote: ↑Thu Apr 29, 2021 4:24 pm That said, Zahak has Makefile, and every version is tagged. You will get the same thing if you simply checkout the tag, and run `make release`

Sadly make files don't really work on Windows unless you install something like cygwin or MSYS2 and then still it often doesn't work out of the box. When I finally get something compiled I wouldn't assume that it is comparable in speed what others compiled and so I prefer to use the same precompiled binaries that the CCRL testers used.

mvanthoor · Post by **mvanthoor** » Thu Apr 29, 2021 5:52 pm

amanjpro wrote: ↑Thu Apr 29, 2021 3:58 pm I would have considered this as a possibility, but I'm talking about results like 70-10-20 in 100 game matches

It doesn't matter. Sometimes an engine is, what is called in German, an "Agnst Gegner" for another engine. An "Angst Gegner" is an opponent that you perform poorly against, even though on paper, you -should- be much stronger.

I have seen the same thing with Bit-Genie 2.

Version 1 is in the CCRL-list at around 1770, so it is about 45 points behind Rustic Alpha 2; the result is 19-13 for RA2 in their head to head match.
Version 2 is at 1877 in the list, so it is now 60 points ahead of RA2... but it still loses 19-13 in their head to head match. I have confirmed this in my own test: even though Bit-Genie 2 is over 100 points stronger than Bit-Genie 1, version 2 doesn't perform any better than version 1 against RA2.

Therefore it's very possible that Vice performs at around 2050 in the CCRL list, but your engine performs much better than expected; which makes VICE seem weaker. It isn't; it's just the fact that your engine is an "Angst Gegner" for VICE. It exploits some weak point of VICE, somewhere, over and over again. Exactly this is the reason why you should never test against only 1-2 engines, but run gauntlets with many different engines in a strength range about 100 Elo under and over your own engine.

Also:
RA2 (1815) against LodoCase 240208 (1749): 17-15
RA2 (1815) against Purple Haze (1877): 14-16

For all intents and purposes, RA2 is the same strength as both LodoCase and Purple Haze, which obviously can't be true.

Rating is not transitive. This means:

If A is stronger than B, and B is stronger than C, it is NOT automatically the case that A is stronger than C.

mvanthoor · Post by **mvanthoor** » Thu Apr 29, 2021 6:00 pm

Lithander makes the same point I see:

RA2 (1815) vs. MMC 0.4: +118 for MMC, which would put MMC 0.4 at 1933
BG2 (1877) vs. MMC 0.4: +139 for MMC, which would put MMC 0.4 at 2016

However, VICE (2040) is +96 against against MMC 0.4, which would put MMC 0.4 at 1944.

The performance of one engine against another can yield very different rating results. MMC gains +118 against RA2, which was at 1815; but MMC gains 139 against BG2, which is already at 1877. So it gains MORE rating against an engine that is ALREADY stronger, which means:

MMC performs less than expected against RA2, which makes RA2 an Angst Gegner for MMC.... or...
BG2 performs less than expected against MMC, which makes MMC an Angst Gegner for Bit-Genie.

amanjpro · Post by **amanjpro** » Thu Apr 29, 2021 6:08 pm

Guenther wrote: ↑Thu Apr 29, 2021 5:32 pm
lithander wrote: ↑Thu Apr 29, 2021 4:15 pm Thanks, Gabor! Found it there!

And regarding zahak I have wanted to let MMC play against it before but Windows is giving me a virus alert and quarantines the file immediately. So for now I skipped adding it to my test suite. :/
Note that this Vice 1.1 binary is just 32-bit! The original 1.0 is available as 64-bit too.
The links are all working in my chronology.
https://web.archive.org/web/20161029154 ... Vice10.zip

I get this on my old machine from cmd (start position with default settings)
It is actually faster for Vice and slower for Zahak compared to Amanj numbers.
Vice 1.0 also is above 1.1 in the CCRL list (but that is probably only due to 64 vs. 32 bit)
Anyhow Vice should be much faster on Amanj hardware.
Code: Select all
id name Vice 1.0
id author Bluefever
...
info score cp 22 depth 7 nodes 176808 time 187 pv e2e4 d7d5 e4d5 d8d5 d2d4 e7e5 c1e3
info score cp 5 depth 8 nodes 1308172 time 998 pv e2e4
info score cp 22 depth 9 nodes 2754271 time 1981 pv e2e4 e7e6 d2d4 d7d5 b1c3 d5e4 c3e4 f8b4 c2c3
info score cp 12 depth 10 nodes 18814339 time 12932 pv e2e4
info score cp 20 depth 11 nodes 38738052 time 26115 pv e2e4 e7e5 g1f3
Code: Select all
id name Zahak 1.0.0
id author Amanj
...
info depth 7 seldepth 7 tbhits 2189 hashfull 22 nodes 146951 nps 316704, score cp 44 time 464 pv b1c3 d7d5 d2d4 b8c6 g1f3 g8f6 c1f4
info depth 8 seldepth 8 tbhits 9511 hashfull 88 nodes 584083 nps 429157, score cp 8 time 1361 pv e2e4 d7d5 e4e5 b8c6 d2d4 c8f5 f1b5 e7e6
info depth 9 seldepth 9 tbhits 17934 hashfull 153 nodes 1050361 nps 449640, score cp 42 time 2336 pv e2e4 b8c6 b1c3 g8f6 d2d4 d7d5 e4e5 c8g4 g1f3
info depth 10 seldepth 10 tbhits 40810 hashfull 309 nodes 2469415 nps 474522, score cp 30 time 5204 pv e2e4 b8c6 b1c3 e7e5 g1f3 f8b4 f1b5 g8e7 e1g1 e8g8
info depth 11 seldepth 13 tbhits 99496 hashfull 622 nodes 6418723 nps 457239, score cp 31 time 14038 pv e2e4 d7d5 e4d5 g8f6 f1b5 c8d7 b5d7 b8d7 d1e2 d7b6 c2c4 c7c6 d5c6

huh! that is interesting. Probably the makefile that is in the Vice repo, is not exactly what is used to build the windows executable. Which explains why Vice is rather slow on my machine

amanjpro · Post by **amanjpro** » Thu Apr 29, 2021 6:12 pm

mvanthoor wrote: ↑Thu Apr 29, 2021 6:00 pm Lithander makes the same point I see:

RA2 (1815) vs. MMC 0.4: +118 for MMC, which would put MMC 0.4 at 1933
BG2 (1877) vs. MMC 0.4: +139 for MMC, which would put MMC 0.4 at 2016

However, VICE (2040) is +96 against against MMC 0.4, which would put MMC 0.4 at 1944.

The performance of one engine against another can yield very different rating results. MMC gains +118 against RA2, which was at 1815; but MMC gains 139 against BG2, which is already at 1877. So it gains MORE rating against an engine that is ALREADY stronger, which means:

MMC performs less than expected against RA2, which makes RA2 an Angst Gegner for MMC.... or...
BG2 performs less than expected against MMC, which makes MMC an Angst Gegner for Bit-Genie.

That is actually a very big issue. Let's say Engine A performs terribly against Engine B, and Engine B performs terribly against Engine C, and no engine has issues with Engine D.

If the CCRL tester to choose Engine B for testing Engine A, but Engine D was chosen to test Engine B. Engine A would appear rather weaker than Engine B, eventhough in reality they might be exactly the same.

It doesn't matter if the gauntlet contains 10 engines or not, a score of 70-0-10 against Engine B is going to affect the final rating of Engine A badly anyways

Guenther · Post by **Guenther** » Thu Apr 29, 2021 6:46 pm

amanjpro wrote: ↑Thu Apr 29, 2021 6:08 pm

Guenther wrote: ↑Thu Apr 29, 2021 5:32 pm ...

Code: Select all

id name Vice 1.0
id author Bluefever
...
info score cp 22 depth 7 nodes 176808 time 187 pv e2e4 d7d5 e4d5 d8d5 d2d4 e7e5 c1e3
info score cp 5 depth 8 nodes 1308172 time 998 pv e2e4
info score cp 22 depth 9 nodes 2754271 time 1981 pv e2e4 e7e6 d2d4 d7d5 b1c3 d5e4 c3e4 f8b4 c2c3
info score cp 12 depth 10 nodes 18814339 time 12932 pv e2e4
info score cp 20 depth 11 nodes 38738052 time 26115 pv e2e4 e7e5 g1f3

Code: Select all

id name Zahak 1.0.0
id author Amanj
...
info depth 7 seldepth 7 tbhits 2189 hashfull 22 nodes 146951 nps 316704, score cp 44 time 464 pv b1c3 d7d5 d2d4 b8c6 g1f3 g8f6 c1f4
info depth 8 seldepth 8 tbhits 9511 hashfull 88 nodes 584083 nps 429157, score cp 8 time 1361 pv e2e4 d7d5 e4e5 b8c6 d2d4 c8f5 f1b5 e7e6
info depth 9 seldepth 9 tbhits 17934 hashfull 153 nodes 1050361 nps 449640, score cp 42 time 2336 pv e2e4 b8c6 b1c3 g8f6 d2d4 d7d5 e4e5 c8g4 g1f3
info depth 10 seldepth 10 tbhits 40810 hashfull 309 nodes 2469415 nps 474522, score cp 30 time 5204 pv e2e4 b8c6 b1c3 e7e5 g1f3 f8b4 f1b5 g8e7 e1g1 e8g8
info depth 11 seldepth 13 tbhits 99496 hashfull 622 nodes 6418723 nps 457239, score cp 31 time 14038 pv e2e4 d7d5 e4d5 g8f6 f1b5 c8d7 b5d7 b8d7 d1e2 d7b6 c2c4 c7c6 d5c6

huh! that is interesting. Probably the makefile that is in the Vice repo, is not exactly what is used to build the windows executable. Which explains why Vice is rather slow on my machine

BTW your Vice 1.1 also reports this:

Code: Select all

id name Vicemod 1.1

Did you change the name id? Otherwise you have used a non original Vice source for 1.1 and someone had mod(ified) it.
(there is no official github repo for it, but a lot of people doctored on it and made repos there)

Edit:
All links to official sources are now posted in this thread BTW and I remember now that Vice 1.1 just has a polyglot book funczionality added,
otherwise it is the same as 1.0 IIRC.

lithander · Post by **lithander** » Thu Apr 29, 2021 8:11 pm

Used MINGW64 to compile Zahak 1.0:

Code: Select all

uci
id name Zahak 85d27f1fc71bf52b0124b74f660d82c7ace3509d
id author Amanj
option name Ponder type check default false
option name Hash type spin default 10 min 1 max 8000
option name Book type check default false
uciok
position startpos
go
[...]
info depth 11 seldepth 12 tbhits 59791 hashfull 316 nodes 1392822 nps 973539 score cp 44 time 1430 pv d2d4 d7d5 e2e3 b8c6 f1b5 g8f6 b5c6 b7c6 b1c3 c8f5 g1f3 e7e6
info depth 12 seldepth 12 tbhits 116964 hashfull 614 nodes 3394586 nps 936381 score cp 15 time 3625 pv d2d4 d7d5 b1c3 g8f6 g1f3 b8c6 f3e5 f6e4 c3e4 c6e5 d4e5 d5e4
info depth 13 seldepth 13 tbhits 145866 hashfull 754 nodes 4893666 nps 924920 score cp 19 time 5290 pv d2d4 d7d5 e2e3 e7e6 g1f3 g8f6 f1d3 f8d6 b1c3 b8c6 e1g1 e8g8 c1d2

Code: Select all

Book File Not Read
HashTable init complete with 2796200 entries
Welcome to Vice! Type 'vice' for console mode...
uci
id name Vice 1.1
id author Bluefever
option name Hash type spin default 64 min 4 max 1024
option name Book type check default true
uciok
position startpos
go
time:-1 start:1261676515 stop:4227372 depth:64 timeset:0
info score cp 30 depth 1 nodes 21 time 0 pv d2d4
info score cp 0 depth 2 nodes 89 time 0 pv d2d4 d7d5
info score cp 25 depth 3 nodes 694 time 0 pv d2d4 d7d5 c1e3
info score cp 0 depth 4 nodes 3667 time 16 pv d2d4 d7d5 c1e3 c8e6
info score cp 25 depth 5 nodes 10577 time 16 pv e2e4 e7e5 d2d4 d7d5 c1e3
info score cp 5 depth 6 nodes 61445 time 31 pv e2e4 e7e5 d2d4 b8c6 g1f3 f8d6
info score cp 22 depth 7 nodes 176808 time 78 pv e2e4 d7d5 e4d5 d8d5 d2d4 e7e5 c1e3
info score cp 5 depth 8 nodes 1308172 time 438 pv e2e4
info score cp 22 depth 9 nodes 2754271 time 891 pv e2e4 e7e6 d2d4 d7d5 b1c3 d5e4 c3e4 f8b4 c2c3
info score cp 12 depth 10 nodes 18814339 time 5875 pv e2e4

My CPU is a Ryzen 3600 @ 4.2Ghz

So my Vice 1.1 copy that I got through the google-drive linked in this thread runs on my machine almost 3x faster than the Vice binary you used.
My Zahak 1.0 copy that I compiled runs 10-20% faster than yours. So I suppose it might actually be true that your Vice version is only worth 1850 ELO like you said but that the version CCRL (and now I) used is 3x faster and thus deserves it's CCRL rating of ~2050 or whatever it was.

Gabor Szots · Post by **Gabor Szots** » Thu Apr 29, 2021 8:33 pm

amanjpro wrote: ↑Thu Apr 29, 2021 6:12 pm
That is actually a very big issue. Let's say Engine A performs terribly against Engine B, and Engine B performs terribly against Engine C, and no engine has issues with Engine D.

If the CCRL tester to choose Engine B for testing Engine A, but Engine D was chosen to test Engine B. Engine A would appear rather weaker than Engine B, eventhough in reality they might be exactly the same.

It doesn't matter if the gauntlet contains 10 engines or not, a score of 70-0-10 against Engine B is going to affect the final rating of Engine A badly anyways

We are aware of the problem. That is the reason we select about 30 opponents and play 32 games against each opponent when testing a new engine. Anomalies even out.

Vice CCRL rating

Re: Vice CCRL rating

Re: Vice CCRL rating

Re: Vice CCRL rating

Re: Vice CCRL rating

Re: Vice CCRL rating

Re: Vice CCRL rating

Re: Vice CCRL rating

Re: Vice CCRL rating

Re: Vice CCRL rating

Re: Vice CCRL rating