CEGT - rating lists March 28th 2021

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

Modern Times
Posts: 3546
Joined: Thu Jun 07, 2012 11:02 pm

Re: CEGT - rating lists March 28th 2021

Post by Modern Times »

Werner wrote: Sat Apr 03, 2021 3:32 pm My result for 40/20 list was:
KDragon 1.0 x64 1CPU (MCTS) (3327) - Stockfish 11.0 x64 1CPU (3442) ; performance 3358 = -84.
I have used the openings called TCEC low draw here. Next week I will start the same match with a very balanced opening set from Frank.
I just ran a match and got very similar results to this:
one thread
256MB hash
Time control 2' + 1"
HS440 suite

Code: Select all

Score of Stockfish 11 vs Dragon By Komodo MCTS: 280 - 57 - 543 [0.627]
Elo difference: 90.0 +/- 13.8, LOS: 100.0 %, DrawRatio: 61.7 %

880 of 880 games finished.
User avatar
Werner
Posts: 2871
Joined: Wed Mar 08, 2006 10:09 pm
Location: Germany
Full name: Werner Schüle

Re: CEGT - rating lists March 28th 2021

Post by Werner »

Werner wrote: Sat Apr 03, 2021 3:32 pm My result for 40/20 list was:
KDragon 1.0 x64 1CPU (MCTS) (3327) - Stockfish 11.0 x64 1CPU (3442) ; performance 3358 = -84.
I have used the openings called TCEC low draw here. Next week I will start the same match with a very balanced opening set from Frank.
result here:

1 Stockfish 11.0 x64 1CPU +63 +23/=72/-5 59.00% 59.0/100
2 KDragon 1.0 x64 1CPU (MCTS) -63 +5/=72/-23 41.00% 41.0/100
Werner
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: CEGT - rating lists March 28th 2021

Post by lkaufman »

Werner wrote: Sun Apr 04, 2021 8:33 am
Werner wrote: Sat Apr 03, 2021 3:32 pm My result for 40/20 list was:
KDragon 1.0 x64 1CPU (MCTS) (3327) - Stockfish 11.0 x64 1CPU (3442) ; performance 3358 = -84.
I have used the openings called TCEC low draw here. Next week I will start the same match with a very balanced opening set from Frank.
result here:

1 Stockfish 11.0 x64 1CPU +63 +23/=72/-5 59.00% 59.0/100
2 KDragon 1.0 x64 1CPU (MCTS) -63 +5/=72/-23 41.00% 41.0/100
I am beginning to suspect that on some older machines Dragon MCTS may run slower and/or sometimes crash with AVX2. Perhaps it wasn't implemented well at first. How old is the hardware you used for this, were there any crashes, and if practical could you compare the NPS in the opening position in MCTS mode for the AVX2 version and the non-AVX2 version? It may be that we should only recommend AVX2 for machines newer than some date or with some other specifications.
Komodo rules!
User avatar
Werner
Posts: 2871
Joined: Wed Mar 08, 2006 10:09 pm
Location: Germany
Full name: Werner Schüle

Re: CEGT - rating lists March 28th 2021

Post by Werner »

lkaufman wrote: Sun Apr 04, 2021 9:48 pm I am beginning to suspect that on some older machines Dragon MCTS may run slower and/or sometimes crash with AVX2. Perhaps it wasn't implemented well at first. How old is the hardware you used for this, were there any crashes, and if practical could you compare the NPS in the opening position in MCTS mode for the AVX2 version and the non-AVX2 version? It may be that we should only recommend AVX2 for machines newer than some date or with some other specifications.
On April 2nd i have sent you an email with the infos of the hardware I am using.
I have had no crashes with KomodoDragon MCTS here. I am beginning to suspect, we go a buggy version of your engine??

here are my results after 30 sec
KDragon 1.0 x64 1CPU (MCTS) AVX2 : 41 nps
KDragon 1.0 x64 1CPU (MCTS) no AVX2: 32 nps

any idea?
best wishes
Werner
Werner
Modern Times
Posts: 3546
Joined: Thu Jun 07, 2012 11:02 pm

Re: CEGT - rating lists March 28th 2021

Post by Modern Times »

lkaufman wrote: Sun Apr 04, 2021 9:48 pm
I am beginning to suspect that on some older machines Dragon MCTS may run slower and/or sometimes crash with AVX2.
The results I posted above. shown again below, were definitely not old hardware. No crashes, no losses on time. Totally reliable.

Code: Select all

Score of Stockfish 11 vs Dragon By Komodo MCTS: 280 - 57 - 543 [0.627]
Elo difference: 90.0 +/- 13.8, LOS: 100.0 %, DrawRatio: 61.7 %

880 of 880 games finished.
Modern Times
Posts: 3546
Joined: Thu Jun 07, 2012 11:02 pm

Re: CEGT - rating lists March 28th 2021

Post by Modern Times »

Werner wrote: Sun Apr 04, 2021 10:37 pm here are my results after 30 sec
KDragon 1.0 x64 1CPU (MCTS) AVX2 : 41 nps
KDragon 1.0 x64 1CPU (MCTS) no AVX2: 32 nps

Similar ratio here:

KDragon 1.0 x64 1CPU (MCTS) AVX2 : 54 nps
KDragon 1.0 x64 1CPU (MCTS) no AVX2: 41 nps
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: CEGT - rating lists March 28th 2021

Post by lkaufman »

Werner wrote: Sun Apr 04, 2021 10:37 pm
lkaufman wrote: Sun Apr 04, 2021 9:48 pm I am beginning to suspect that on some older machines Dragon MCTS may run slower and/or sometimes crash with AVX2. Perhaps it wasn't implemented well at first. How old is the hardware you used for this, were there any crashes, and if practical could you compare the NPS in the opening position in MCTS mode for the AVX2 version and the non-AVX2 version? It may be that we should only recommend AVX2 for machines newer than some date or with some other specifications.
On April 2nd i have sent you an email with the infos of the hardware I am using.
I have had no crashes with KomodoDragon MCTS here. I am beginning to suspect, we go a buggy version of your engine??

here are my results after 30 sec
KDragon 1.0 x64 1CPU (MCTS) AVX2 : 41 nps
KDragon 1.0 x64 1CPU (MCTS) no AVX2: 32 nps

any idea?

best wishes
Werner
Those speeds are in the proper ratio and seem at least in the right ballpark for your hardware (I get 78 nps for AVX2 on a roughly 5 Ghz i9). I'll ask Mark if he can check out the version you got, but a bad version should have also caused poor results in standard mode (probably). It's pretty important for us to figure out what's going on here, the disparity between your results and Stephen Pohl's (as well as our own) is too large to ignore. At the moment, I can only imagine that different GUIs are somehow the problem, though they shouldn't be.
Komodo rules!
User avatar
Werner
Posts: 2871
Joined: Wed Mar 08, 2006 10:09 pm
Location: Germany
Full name: Werner Schüle

Re: CEGT - rating lists March 28th 2021

Post by Werner »

Thanks Larry,
of course I will help you to find out what is going on here.
I never saw difficulties with the GUIs I use here (Arena 3.5.1 and Shredder). The last result I posted was played under Shredder GUI.
I know , different opening sets cause other results, but not so much.
How can I check, whether the Network is used or not ?

C:\Users\Agando\Arena\Engines\Komodo-Dragon>dragon-64bit-avx2.exe
Dragon (C) 2020 Don Dailey, Larry Kaufman, and Mark Lefler
using hardware POPCNT
info string Licensed to Werner Schuele
info string embedded NN is loaded

and
C:\Users\Agando\Arena\Engines\Komodo-Dragon>dragon-64bit.exe
Dragon (C) 2020 Don Dailey, Larry Kaufman, and Mark Lefler
using hardware POPCNT
info string Licensed to Werner Schuele
info string embedded NN is loaded
Werner
Wolfgang
Posts: 893
Joined: Sat May 13, 2006 1:08 am

Re: CEGT - rating lists March 28th 2021

Post by Wolfgang »

Hi Larry,
How old is the hardware you used for this,
I started another testrun with slightly other conditions concerning book. I used a fix testsuite (SilverSuite) now instead of Frank‘s actual FEOBOS-Book and added some more opponents, still 1400 games „only“…

For this run I used five computers:
AMD Ryzen 5 1500x @ ~3,7 GHZ (standard is 3,5) => 300 games/perf. ~3394
AMD Ryzen 5 1600 @ 3,5 GHZ (standard 3,2) => 300 games/perf. ~3360
AMD Ryzen 7 2700 @ 3,5 GHZ (standard 3,2) => 400 games/perf. ~3380

Intel i7 4770 Haswell @ ~3,8 GHZ (standard 3,4) => 200 games/perf. ~3340
Intel i5 6600k Skylake @ 4,0 GHZ (standard 3,5) => 200 games/perf. ~3345

All with AVX2-compile. I allways test previously which compile is fastest on the particular machine.

The overall result (performance) was a bit better, around 15-20 points. The testruns are hard to compare though, as I do not know anymore on which computers the first run was played. Definetely not on all these machines, I‘m sure about that at least.

My impression is – with a huge amount of uncertainty due to much too few games – that scoring on older Intels is a bit worse than on middle old Ryzen 5/7. The problem is: These two Intels are considerably faster than the Ryzens with nearly every engine, even with Komodo Dragon „default“… :shock:

On Haswell and Skylake BMI2-compiles normally are a bit faster, but in a 1-digit percentage range only, so without relevance. Random noise...
were there any crashes
No, not a single one and no time forfeits too. Komodo is one of the most stable engines I know :D
At the moment, I can only imagine that different GUIs are somehow the problem, though they shouldn't be.
Never ever, Larry. In this case Komodo would have a severe bug, but not the complete engine, only one part of the engine (MCTS)?! Activating the MCTS-checkbox causes problems with the GUI, but not with everyone?? And even if so, do you REALLY think this could make a difference of 50, 80 or 100 points? Sorry Larry, really no offense meant ;), but that‘s nonsense.

BTW: I use same interfaces as Werner, i.e. Shredder Classic 13 and Arena 3.5.1, never had problems. Can't talk about LittleBlitzer or cutechess. Never used here and no reason to do so :)
Best
Wolfgang
CEGT-Team
www.cegt.net
www.cegt.forumieren.com
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: CEGT - rating lists March 28th 2021

Post by lkaufman »

Wolfgang wrote: Mon Apr 05, 2021 4:13 pm Hi Larry,
How old is the hardware you used for this,
I started another testrun with slightly other conditions concerning book. I used a fix testsuite (SilverSuite) now instead of Frank‘s actual FEOBOS-Book and added some more opponents, still 1400 games „only“…

For this run I used five computers:
AMD Ryzen 5 1500x @ ~3,7 GHZ (standard is 3,5) => 300 games/perf. ~3394
AMD Ryzen 5 1600 @ 3,5 GHZ (standard 3,2) => 300 games/perf. ~3360
AMD Ryzen 7 2700 @ 3,5 GHZ (standard 3,2) => 400 games/perf. ~3380

Intel i7 4770 Haswell @ ~3,8 GHZ (standard 3,4) => 200 games/perf. ~3340
Intel i5 6600k Skylake @ 4,0 GHZ (standard 3,5) => 200 games/perf. ~3345

All with AVX2-compile. I allways test previously which compile is fastest on the particular machine.

The overall result (performance) was a bit better, around 15-20 points. The testruns are hard to compare though, as I do not know anymore on which computers the first run was played. Definetely not on all these machines, I‘m sure about that at least.

My impression is – with a huge amount of uncertainty due to much too few games – that scoring on older Intels is a bit worse than on middle old Ryzen 5/7. The problem is: These two Intels are considerably faster than the Ryzens with nearly every engine, even with Komodo Dragon „default“… :shock:

On Haswell and Skylake BMI2-compiles normally are a bit faster, but in a 1-digit percentage range only, so without relevance. Random noise...
were there any crashes
No, not a single one and no time forfeits too. Komodo is one of the most stable engines I know :D
At the moment, I can only imagine that different GUIs are somehow the problem, though they shouldn't be.
Never ever, Larry. In this case Komodo would have a severe bug, but not the complete engine, only one part of the engine (MCTS)?! Activating the MCTS-checkbox causes problems with the GUI, but not with everyone?? And even if so, do you REALLY think this could make a difference of 50, 80 or 100 points? Sorry Larry, really no offense meant ;), but that‘s nonsense.

BTW: I use same interfaces as Werner, i.e. Shredder Classic 13 and Arena 3.5.1, never had problems. Can't talk about LittleBlitzer or cutechess. Never used here and no reason to do so :)
I agree that the GUI is unlikely to be the explanation, but I didn't have a better theory. But now I'm suspecting that it may have something to do with Linux vs. Windows. Our tester runs in Linux, but normally speeds and results in Linux and Windows differ by just 3-5 % and 3-5 elo or so. Also, I believe (not sure) that Stephen Pohl uses Windows, so I ruled that out as an explanation. But I'm running the same test in Windows on my Laptop (Little Blitzer GUI like Pohl) and I'm getting similar bad results to those reported in this thread, minus 80 elo vs SF11 after 430 single thread games at 2' + 1". Yet the Linux tester is showing the two engines virtually tied after 2 thousand games! This is bizarre, makes no sense, but we'll have to figure it out.
Komodo rules!