Chess.com 2018 computer chess championship

noobpwnftw · Post by **noobpwnftw** » Sun Sep 16, 2018 11:11 pm

CMCanavessi wrote: ↑Sun Sep 16, 2018 9:46 pm Interesting. I wonder how Intel hyperthreads compare to AMD hyperthreads. As far as I know, AMD ones are way more efficient and work way better, while Intel ones kinda suck, at least for chess. Is there some kind of study about this?

I cannot test every platform/combinations but to my knowledge, with more recent generation of chips both Intel and AMD are getting better efficiency with hyper-threads. Intel HT jumped from 30%(compared to real core) at the age of Core2-ish to about 70% performance with the latest Scalable Xeons, and AMD is doing around 60% with Ryzen 7, should be better according to the available benchmarks of their new chips.

Also note that unlike old times, now it does not matter which core(real or fake) you run your threads on, things get rearranged internally and their combined performance is reliable, it is now a non-issue if you bind all your threads to fake cores and get only hyper-threaded performance, should be easier for NUMA implementations: expect big numbers in Threads option and support Windows processor groups.

As for memory affinity, I think modern OS knows how to schedule/migrate things better enough, unless you have a really big system, nothing can break and it'd work out-of-box. Plus, CCCC runs on virtual machines, hypervisor is the overlord and whatever you do doesn't matter anyways.

mjlef · Post by **mjlef** » Mon Sep 17, 2018 2:23 am

noobpwnftw wrote: ↑Sun Sep 16, 2018 11:11 pm
CMCanavessi wrote: ↑Sun Sep 16, 2018 9:46 pm Interesting. I wonder how Intel hyperthreads compare to AMD hyperthreads. As far as I know, AMD ones are way more efficient and work way better, while Intel ones kinda suck, at least for chess. Is there some kind of study about this?

I cannot test every platform/combinations but to my knowledge, with more recent generation of chips both Intel and AMD are getting better efficiency with hyper-threads. Intel HT jumped from 30%(compared to real core) at the age of Core2-ish to about 70% performance with the latest Scalable Xeons, and AMD is doing around 60% with Ryzen 7, should be better according to the available benchmarks of their new chips.

Also note that unlike old times, now it does not matter which core(real or fake) you run your threads on, things get rearranged internally and their combined performance is reliable, it is now a non-issue if you bind all your threads to fake cores and get only hyper-threaded performance, should be easier for NUMA implementations: expect big numbers in Threads option and support Windows processor groups.

As for memory affinity, I think modern OS knows how to schedule/migrate things better enough, unless you have a really big system, nothing can break and it'd work out-of-box. Plus, CCCC runs on virtual machines, hypervisor is the overlord and whatever you do doesn't matter anyways.

I thought chess.com bought a machine specifically for this event. So I do not think it is running on virtual machines.

Mark

mjlef · Post by **mjlef** » Mon Sep 17, 2018 2:34 am

jstanback wrote: ↑Sun Sep 16, 2018 11:05 pm
mjlef wrote: ↑Sun Sep 16, 2018 4:03 am
AndrewGrant wrote: ↑Sun Sep 16, 2018 2:55 am [
That jump assumes that all engines have NUMA support. It is my understanding that only Stockfish, Houdini, Texel, and Ethereal have such support. I'm tempted to add Komodo into that too, based on some chats with Mark, but I don't know that first hand.
The key to decent NUMA support is to have each thread allocate as much of the memory that is uses itself, to ensure the memory is on the NUMA node it is placed on. This can be done without resorting to special NUMA calls. But in Windows, running on more than 64 Threads does require special code since Windows only deals with 64 core "processor groups" which requires these calls. I have always wondered how Windows
would deal with more than 64 CPUs on one processor chip...would it split them into two processor groups?

Although I have tried special NUMA calls to do things like locking a thread to a NUMA node, they have never paid off. The OS seems to have the best knowledge of where a threads should go, and I have no found a way to get access to things like which CPU is busy. I suppose the OS things for security, this is none of my business!
Hi Mark,

Do you think it's worthwhile to duplicate global variables that might be used a lot such as pre-calculated bitboards of moves on an empty board such that each thread has it's own copy? Do you know if global constants are handled differently by the compiler/OS such that they might have faster memory access?

John

I do not know the answer to that question. But I think Bob Hyatt did some experiments a while ago, but I cannot find the post or results.

Each chip has three caches these days. The sizes vary, but in general the L3 cache is pretty big. So commonly used things like say the magic multiplication numbers should quickly reside in these caches, giving fast access. You can look up your CPU online to see how big they are. In a NUMA machine, the memory controller passes changes back and forth between the two or more nodes. As along as values do not change, it does not have to send anything. But things like main hash has to be sent when a copy is not already in one of the caches. And that is slow (something like 100 clock cycles instead of just a few for on chip caches). Basically, try not to have a lot of big shared among all the processors tables. Have local copies of history tables, killer moves, anything only that thread needs. It would not be hard to try copying some of the things like constants used in move generation. But as long as they are not too big, they get cached anyway so it might not be worth the bother. When I last checked, Stockfish had just a few megs for pawn hash, and material hash tables for each thread. So I think that makes it pretty cache friendly, and certainly they are small enough to be on the local NUMA node memory if they do not get cached enough.

jstanback · Post by **jstanback** » Mon Sep 17, 2018 3:19 am

mjlef wrote: ↑Mon Sep 17, 2018 2:34 am
jstanback wrote: ↑Sun Sep 16, 2018 11:05 pm
mjlef wrote: ↑Sun Sep 16, 2018 4:03 am
AndrewGrant wrote: ↑Sun Sep 16, 2018 2:55 am [
That jump assumes that all engines have NUMA support. It is my understanding that only Stockfish, Houdini, Texel, and Ethereal have such support. I'm tempted to add Komodo into that too, based on some chats with Mark, but I don't know that first hand.
The key to decent NUMA support is to have each thread allocate as much of the memory that is uses itself, to ensure the memory is on the NUMA node it is placed on. This can be done without resorting to special NUMA calls. But in Windows, running on more than 64 Threads does require special code since Windows only deals with 64 core "processor groups" which requires these calls. I have always wondered how Windows
would deal with more than 64 CPUs on one processor chip...would it split them into two processor groups?

Although I have tried special NUMA calls to do things like locking a thread to a NUMA node, they have never paid off. The OS seems to have the best knowledge of where a threads should go, and I have no found a way to get access to things like which CPU is busy. I suppose the OS things for security, this is none of my business!
Hi Mark,

Do you think it's worthwhile to duplicate global variables that might be used a lot such as pre-calculated bitboards of moves on an empty board such that each thread has it's own copy? Do you know if global constants are handled differently by the compiler/OS such that they might have faster memory access?

John
I do not know the answer to that question. But I think Bob Hyatt did some experiments a while ago, but I cannot find the post or results.

Each chip has three caches these days. The sizes vary, but in general the L3 cache is pretty big. So commonly used things like say the magic multiplication numbers should quickly reside in these caches, giving fast access. You can look up your CPU online to see how big they are. In a NUMA machine, the memory controller passes changes back and forth between the two or more nodes. As along as values do not change, it does not have to send anything. But things like main hash has to be sent when a copy is not already in one of the caches. And that is slow (something like 100 clock cycles instead of just a few for on chip caches). Basically, try not to have a lot of big shared among all the processors tables. Have local copies of history tables, killer moves, anything only that thread needs. It would not be hard to try copying some of the things like constants used in move generation. But as long as they are not too big, they get cached anyway so it might not be worth the bother. When I last checked, Stockfish had just a few megs for pawn hash, and material hash tables for each thread. So I think that makes it pretty cache friendly, and certainly they are small enough to be on the local NUMA node memory if they do not get cached enough.

Ahh, you're probably right that as long as the commonly used global data is reasonable small it will probably reside in cache. In Wasp each thread has it's own pawn and eval hash tables, history tables, search stack, etc... The main hash table is shared, of course, and in Wasp the whole thing is allocated by the main thread, so I would guess that threads on another NUMA node would get slower access to it. But one thing that puzzles me is my experience with a structure of only maybe 4 Kbytes that I use during evaluation. If I allocate this using the main thread as a global array of structures, ie struct tEVDATA EvalData[maxthreads] I get much lower nps than if I include it in the ThreadData structure where it is allocated and written to when the thread is initialized. I guess this gets kinda big at 64*4 = 256 Kbytes and maybe doesn't always stay in cache.

John

Gary Internet · Post by **Gary Internet** » Mon Sep 17, 2018 8:19 am

They need to implement tablebase adjudication for this tournament to cut down o the number of times we end up watching engines messing about for 100+ moves of pointless crap. It works perfectly fine for TCEC. I don't see why CCCC couldn't do it as well. At least 6 man if not 7 man adjudication. It would probably have saved a good few hours, but more importantly, it will help viewership. There were at leat 3 times I logged off because it was boring and pointless viewing when actually the next game was what I really wanted to watch.

Jouni · Post by **Jouni** » Mon Sep 17, 2018 11:02 am

100 moves is no problem, but 200-300 is too much

.

Joerg Oster · Post by **Joerg Oster** » Mon Sep 17, 2018 11:12 am

Gary Internet wrote: ↑Mon Sep 17, 2018 8:19 am They need to implement tablebase adjudication for this tournament to cut down o the number of times we end up watching engines messing about for 100+ moves of pointless crap. It works perfectly fine for TCEC. I don't see why CCCC couldn't do it as well. At least 6 man if not 7 man adjudication. It would probably have saved a good few hours, but more importantly, it will help viewership. There were at leat 3 times I logged off because it was boring and pointless viewing when actually the next game was what I really wanted to watch.

OTOH, it's revealing some weaknesses/shortcomings/bugs in one or another engine quite mercilessly.

marsell · Post by **marsell** » Mon Sep 17, 2018 11:18 am

I've said it before! This is a waste of resources and time. Nothing more

AndrewGrant · Post by **AndrewGrant** » Mon Sep 17, 2018 12:02 pm

Joerg Oster wrote: ↑Mon Sep 17, 2018 11:12 am
Gary Internet wrote: ↑Mon Sep 17, 2018 8:19 am They need to implement tablebase adjudication for this tournament to cut down o the number of times we end up watching engines messing about for 100+ moves of pointless crap. It works perfectly fine for TCEC. I don't see why CCCC couldn't do it as well. At least 6 man if not 7 man adjudication. It would probably have saved a good few hours, but more importantly, it will help viewership. There were at leat 3 times I logged off because it was boring and pointless viewing when actually the next game was what I really wanted to watch.
OTOH, it's revealing some weaknesses/shortcomings/bugs in one or another engine quite mercilessly.

As far as I know the only TB bug that was caught was for Fizbo 1.9, which had already been patched in 2.0.

George Tsavdaris · Post by **George Tsavdaris** » Mon Sep 17, 2018 1:37 pm

AndrewGrant wrote: ↑Mon Sep 17, 2018 12:02 pm
Joerg Oster wrote: ↑Mon Sep 17, 2018 11:12 am
Gary Internet wrote: ↑Mon Sep 17, 2018 8:19 am They need to implement tablebase adjudication for this tournament to cut down o the number of times we end up watching engines messing about for 100+ moves of pointless crap. It works perfectly fine for TCEC. I don't see why CCCC couldn't do it as well. At least 6 man if not 7 man adjudication. It would probably have saved a good few hours, but more importantly, it will help viewership. There were at leat 3 times I logged off because it was boring and pointless viewing when actually the next game was what I really wanted to watch.
OTOH, it's revealing some weaknesses/shortcomings/bugs in one or another engine quite mercilessly.
As far as I know the only TB bug that was caught was for Fizbo 1.9, which had already been patched in 2.0.

Not really.

209...Rd2+ draws, 209...Re1 loses(this was the move Fizbo 1.9 played in CCCC)
[d]7k/5Q2/8/3K4/8/8/4r3/8 b - - 70 209

And Fizbo 2.0 with 3,4,5,6 syzygy Tbs says:
FEN: 7k/5Q2/8/3K4/8/8/4r3/8 b - - 70 209

Fizbo2x64_bmi2:
4/1 00:00 43 2,047 -0,01 Re2-e1
5/1 00:00 57 2,714 -0,01 Re2-e1
6/1 00:00 71 3,380 -0,01 Re2-e1
7/1 00:00 85 4,047 -0,01 Re2-e1
8/1 00:00 99 4,714 -0,01 Re2-e1
9/1 00:00 113 5,380 -0,01 Re2-e1
10/1 00:00 127 6,047 -0,01 Re2-e1
11/1 00:00 141 6,714 -0,01 Re2-e1
12/1 00:00 155 7,380 -0,01 Re2-e1
13/1 00:00 169 8,047 -0,01 Re2-e1
14/1 00:00 183 8,714 -0,01 Re2-e1
15/1 00:00 197 8,954 -0,01 Re2-e1
16/1 00:00 211 9,590 -0,01 Re2-e1
17/1 00:00 225 10,227 -0,01 Re2-e1
18/1 00:00 239 10,863 -0,01 Re2-e1
19/1 00:00 253 11,500 -0,01 Re2-e1
20/1 00:00 267 12,136 -0,01 Re2-e1
21/1 00:00 281 12,772 -0,01 Re2-e1
22/1 00:00 295 13,409 -0,01 Re2-e1
23/1 00:00 309 14,045 -0,01 Re2-e1
24/1 00:00 323 14,681 -0,01 Re2-e1
25/1 00:00 337 15,318 -0,01 Re2-e1
26/1 00:00 351 15,954 -0,01 Re2-e1
27/1 00:00 365 16,590 -0,01 Re2-e1
28/1 00:00 379 17,227 -0,01 Re2-e1
29/1 00:00 393 17,863 -0,01 Re2-e1
30/1 00:00 407 18,500 -0,01 Re2-e1
31/1 00:00 421 19,136 -0,01 Re2-e1
32/1 00:00 435 19,772 -0,01 Re2-e1
33/1 00:00 449 20,409 -0,01 Re2-e1
34/1 00:00 463 21,045 -0,01 Re2-e1
35/1 00:00 477 21,681 -0,01 Re2-e1
36/1 00:00 491 22,318 -0,01 Re2-e1
37/1 00:00 505 22,954 -0,01 Re2-e1
38/1 00:00 519 23,590 -0,01 Re2-e1
39/1 00:00 533 24,227 -0,01 Re2-e1
40/1 00:00 547 24,863 -0,01 Re2-e1
41/1 00:00 561 25,500 -0,01 Re2-e1
42/1 00:00 575 26,136 -0,01 Re2-e1
43/1 00:00 589 26,772 -0,01 Re2-e1
44/1 00:00 603 27,409 -0,01 Re2-e1
45/1 00:00 617 28,045 -0,01 Re2-e1
46/1 00:00 631 28,681 -0,01 Re2-e1
46/1 00:00 644 28,000 -0,01 Re2-e1

Chess.com 2018 computer chess championship

Re: Chess.com 2018 computer chess championship

Re: Chess.com 2018 computer chess championship

Re: Chess.com 2018 computer chess championship

Re: Chess.com 2018 computer chess championship

Re: Chess.com 2018 computer chess championship

Re: Chess.com 2018 computer chess championship

Re: Chess.com 2018 computer chess championship

Re: Chess.com 2018 computer chess championship

Re: Chess.com 2018 computer chess championship

Re: Chess.com 2018 computer chess championship