Chess.com 2018 computer chess championship

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

AndrewGrant
Posts: 1750
Joined: Tue Apr 19, 2016 6:08 am
Location: U.S.A
Full name: Andrew Grant

Re: Chess.com 2018 computer chess championship

Post by AndrewGrant »

CMCanavessi wrote: Sat Sep 15, 2018 11:52 pm
chessdev wrote: Sat Sep 15, 2018 10:49 pm Hello all. Update time. After extensive discussions, we have decided to turn Ponder OFF at this time (for Stage 2). We recognize that there are some irregularities with how threads are being allocated in some engines and situations, and until we figure that out, we are going to play it safe.

That said, we believe the games have been incredibly entertaining, and the outcome is more or less expected, which leads us to believe that this was not a meaningful issue in the scheme of things. But we do want to find solutions going forward. We are looking at options for things we can do on our end to ensure that each engine gets the processing power it deserves. That may mean that Ponder will be turned back ON for Stage 3. We will see.

Additionally, we will also be providing the Arena logs for Stage 2 and beyond. Links will be provided when Stage 2 begins. As for Stage 1 logs, some have been already provided, and we may reconsider releasing all of Stage 1 in the future. Right now we need to be fully focused on the rest of the event.

Thank you all again for your insights and patience!
Just to clarify, ponder will be off and thread count for each engine will double, right? Otherwise it makes little sense.
I don't think so. The jump from 46 threads to using 46 real cores is massive. The jump from 46 real cores to 92 threads is < 30 elo.

That jump assumes that all engines have NUMA support. It is my understanding that only Stockfish, Houdini, Texel, and Ethereal have such support. I'm tempted to add Komodo into that too, based on some chats with Mark, but I don't know that first hand.
#WeAreAllDraude #JusticeForDraude #RememberDraude #LeptirBigUltra
"Those who can't do, clone instead" - Eduard ( A real life friend, not this forum's Eduard )
brianr
Posts: 536
Joined: Thu Mar 09, 2006 3:01 pm

Re: Chess.com 2018 computer chess championship

Post by brianr »

AndrewGrant wrote: Sun Sep 16, 2018 2:55 am That jump assumes that all engines have NUMA support. It is my understanding that only Stockfish, Houdini, Texel, and Ethereal have such support. I'm tempted to add Komodo into that too, based on some chats with Mark, but I don't know that first hand.
Crafty has NUMA support
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: Chess.com 2018 computer chess championship

Post by Milos »

AndrewGrant wrote: Sun Sep 16, 2018 2:55 am That jump assumes that all engines have NUMA support. It is my understanding that only Stockfish, Houdini, Texel, and Ethereal have such support. I'm tempted to add Komodo into that too, based on some chats with Mark, but I don't know that first hand.
Adding NUMA support is pretty trivial. I can't believe that only SF, you and RH copied it so far from Peter? :roll:
AndrewGrant
Posts: 1750
Joined: Tue Apr 19, 2016 6:08 am
Location: U.S.A
Full name: Andrew Grant

Re: Chess.com 2018 computer chess championship

Post by AndrewGrant »

Milos wrote: Sun Sep 16, 2018 3:19 am
AndrewGrant wrote: Sun Sep 16, 2018 2:55 am That jump assumes that all engines have NUMA support. It is my understanding that only Stockfish, Houdini, Texel, and Ethereal have such support. I'm tempted to add Komodo into that too, based on some chats with Mark, but I don't know that first hand.
Adding NUMA support is pretty trivial. I can't believe that only SF, you and RH copied it so far from Peter? :roll:
Its a fairly low priority for most. And seeing as a large chunk of engines in TCEC can't even run well with 43 cores, most authors have bigger fish to fry. Why support 64+ threads when your engine is not able to make use of 32?
#WeAreAllDraude #JusticeForDraude #RememberDraude #LeptirBigUltra
"Those who can't do, clone instead" - Eduard ( A real life friend, not this forum's Eduard )
mjlef
Posts: 1494
Joined: Thu Mar 30, 2006 2:08 pm

Re: Chess.com 2018 computer chess championship

Post by mjlef »

AndrewGrant wrote: Sun Sep 16, 2018 2:55 am [
That jump assumes that all engines have NUMA support. It is my understanding that only Stockfish, Houdini, Texel, and Ethereal have such support. I'm tempted to add Komodo into that too, based on some chats with Mark, but I don't know that first hand.
The key to decent NUMA support is to have each thread allocate as much of the memory that is uses itself, to ensure the memory is on the NUMA node it is placed on. This can be done without resorting to special NUMA calls. But in Windows, running on more than 64 Threads does require special code since Windows only deals with 64 core "processor groups" which requires these calls. I have always wondered how Windows
would deal with more than 64 CPUs on one processor chip...would it split them into two processor groups?

Although I have tried special NUMA calls to do things like locking a thread to a NUMA node, they have never paid off. The OS seems to have the best knowledge of where a threads should go, and I have no found a way to get access to things like which CPU is busy. I suppose the OS things for security, this is none of my business!
AndrewGrant
Posts: 1750
Joined: Tue Apr 19, 2016 6:08 am
Location: U.S.A
Full name: Andrew Grant

Re: Chess.com 2018 computer chess championship

Post by AndrewGrant »

mjlef wrote: Sun Sep 16, 2018 4:03 am
AndrewGrant wrote: Sun Sep 16, 2018 2:55 am [
That jump assumes that all engines have NUMA support. It is my understanding that only Stockfish, Houdini, Texel, and Ethereal have such support. I'm tempted to add Komodo into that too, based on some chats with Mark, but I don't know that first hand.
The key to decent NUMA support is to have each thread allocate as much of the memory that is uses itself, to ensure the memory is on the NUMA node it is placed on. This can be done without resorting to special NUMA calls. But in Windows, running on more than 64 Threads does require special code since Windows only deals with 64 core "processor groups" which requires these calls. I have always wondered how Windows
would deal with more than 64 CPUs on one processor chip...would it split them into two processor groups?

Although I have tried special NUMA calls to do things like locking a thread to a NUMA node, they have never paid off. The OS seems to have the best knowledge of where a threads should go, and I have no found a way to get access to things like which CPU is busy. I suppose the OS things for security, this is none of my business!
Yes. In this whole thread, when I talk about NUMA support, I'm really referring to the Numa Windows Processor Group problem.
#WeAreAllDraude #JusticeForDraude #RememberDraude #LeptirBigUltra
"Those who can't do, clone instead" - Eduard ( A real life friend, not this forum's Eduard )
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Chess.com 2018 computer chess championship

Post by zullil »

mjlef wrote: Sun Sep 16, 2018 4:03 am
AndrewGrant wrote: Sun Sep 16, 2018 2:55 am [
That jump assumes that all engines have NUMA support. It is my understanding that only Stockfish, Houdini, Texel, and Ethereal have such support. I'm tempted to add Komodo into that too, based on some chats with Mark, but I don't know that first hand.
The key to decent NUMA support is to have each thread allocate as much of the memory that is uses itself, to ensure the memory is on the NUMA node it is placed on. This can be done without resorting to special NUMA calls. But in Windows, running on more than 64 Threads does require special code since Windows only deals with 64 core "processor groups" which requires these calls. I have always wondered how Windows
would deal with more than 64 CPUs on one processor chip...would it split them into two processor groups?

Although I have tried special NUMA calls to do things like locking a thread to a NUMA node, they have never paid off. The OS seems to have the best knowledge of where a threads should go, and I have no found a way to get access to things like which CPU is busy. I suppose the OS things for security, this is none of my business!
Cfish binds threads to nodes, so they are always "close" to their thread-specific data. Assuming a non-defective OS, I wonder how often a non-bound thread would be run on a node other than its "home" node? I suppose there are NUMA-related profiling tools that would answer this? Running on a "non-home" node makes fetching thread-specific data slower---but that might still be better than waiting for chance to run on the "home" node.

I wonder how much testing Ronald was able to do (or get help with) for this aspect of Cfish?
mjlef
Posts: 1494
Joined: Thu Mar 30, 2006 2:08 pm

Re: Chess.com 2018 computer chess championship

Post by mjlef »

AndrewGrant wrote: Sun Sep 16, 2018 2:55 am
I don't think so. The jump from 46 threads to using 46 real cores is massive. The jump from 46 real cores to 92 threads is < 30 elo.

That jump assumes that all engines have NUMA support. It is my understanding that only Stockfish, Houdini, Texel, and Ethereal have such support. I'm tempted to add Komodo into that too, based on some chats with Mark, but I don't know that first hand.
I completely agree (with one difference). On the fastgm.de site, Andreas did experiments doubling threads and doubling time. The threads results are here:

http://www.fastgm.de/schach/SMP-scaling-SF8-K10.4.pdf

As shown, Komodo only gained 12 elo going from 16 to 32 threads, and Stockfish gained 6 elo. Barring some clever scheme to better use threads, further doublings would be worth even less (you can see the trend in the data). So

But doubling time, which is about the same as the roughly doubling of CPU speed you get with either using half the cores (even with hyperthreading off, the OS will pair threads to two hyperthread when available) is worth a lot more. For example:

http://fastgm.de/time-control4.html

Even then 5120 vs 2560 (plus 1%) time doubling shows a 41 elo gain. Way above any likely elo gain going from 44 real cores to 88 hyperthreads (or whatever they decide for keeping some threads free for t Arena and the OS, etc.). From limited tests I have done, hyperthreading off helps a little, probably by forcing the OS to always assign the threads properly. But it is a small thing.

So summary, I think best is use real thread, no pondering, ideally with hyperthreading off. Do not double the threads.

Mark
User avatar
CMCanavessi
Posts: 1142
Joined: Thu Dec 28, 2017 4:06 pm
Location: Argentina

Re: Chess.com 2018 computer chess championship

Post by CMCanavessi »

mjlef wrote: Sun Sep 16, 2018 6:58 pm
AndrewGrant wrote: Sun Sep 16, 2018 2:55 am
I don't think so. The jump from 46 threads to using 46 real cores is massive. The jump from 46 real cores to 92 threads is < 30 elo.

That jump assumes that all engines have NUMA support. It is my understanding that only Stockfish, Houdini, Texel, and Ethereal have such support. I'm tempted to add Komodo into that too, based on some chats with Mark, but I don't know that first hand.
I completely agree (with one difference). On the fastgm.de site, Andreas did experiments doubling threads and doubling time. The threads results are here:

http://www.fastgm.de/schach/SMP-scaling-SF8-K10.4.pdf

As shown, Komodo only gained 12 elo going from 16 to 32 threads, and Stockfish gained 6 elo. Barring some clever scheme to better use threads, further doublings would be worth even less (you can see the trend in the data). So

But doubling time, which is about the same as the roughly doubling of CPU speed you get with either using half the cores (even with hyperthreading off, the OS will pair threads to two hyperthread when available) is worth a lot more. For example:

http://fastgm.de/time-control4.html

Even then 5120 vs 2560 (plus 1%) time doubling shows a 41 elo gain. Way above any likely elo gain going from 44 real cores to 88 hyperthreads (or whatever they decide for keeping some threads free for t Arena and the OS, etc.). From limited tests I have done, hyperthreading off helps a little, probably by forcing the OS to always assign the threads properly. But it is a small thing.

So summary, I think best is use real thread, no pondering, ideally with hyperthreading off. Do not double the threads.

Mark
Interesting. I wonder how Intel hyperthreads compare to AMD hyperthreads. As far as I know, AMD ones are way more efficient and work way better, while Intel ones kinda suck, at least for chess. Is there some kind of study about this?
Follow my tournament and some Leela gauntlets live at http://twitch.tv/ccls
jstanback
Posts: 130
Joined: Fri Jun 17, 2016 4:14 pm
Location: Colorado, USA
Full name: John Stanback

Re: Chess.com 2018 computer chess championship

Post by jstanback »

mjlef wrote: Sun Sep 16, 2018 4:03 am
AndrewGrant wrote: Sun Sep 16, 2018 2:55 am [
That jump assumes that all engines have NUMA support. It is my understanding that only Stockfish, Houdini, Texel, and Ethereal have such support. I'm tempted to add Komodo into that too, based on some chats with Mark, but I don't know that first hand.
The key to decent NUMA support is to have each thread allocate as much of the memory that is uses itself, to ensure the memory is on the NUMA node it is placed on. This can be done without resorting to special NUMA calls. But in Windows, running on more than 64 Threads does require special code since Windows only deals with 64 core "processor groups" which requires these calls. I have always wondered how Windows
would deal with more than 64 CPUs on one processor chip...would it split them into two processor groups?

Although I have tried special NUMA calls to do things like locking a thread to a NUMA node, they have never paid off. The OS seems to have the best knowledge of where a threads should go, and I have no found a way to get access to things like which CPU is busy. I suppose the OS things for security, this is none of my business!
Hi Mark,

Do you think it's worthwhile to duplicate global variables that might be used a lot such as pre-calculated bitboards of moves on an empty board such that each thread has it's own copy? Do you know if global constants are handled differently by the compiler/OS such that they might have faster memory access?

John