237 Mn/s for Stockfish on an 2xEPYC 7742

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Vinvin
Posts: 5228
Joined: Thu Mar 09, 2006 9:40 am
Full name: Vincent Lejeune

Re: 237 Mn/s for Stockfish on an 2xEPYC 7742

Post by Vinvin »

Ovyron wrote: Sat Jan 18, 2020 12:50 pm I think chess engines have become so much stronger than humans that ELO breaks down when trying to measure by how much.
The trick is to reduce the speed (or the time) use by the engine.
40 real games (90min+30sec) by a slow Stockfish 11 (around 10 kilonodes/second or 1/100 the time on an average smartphone) against several GMs between 2500-2600 Elo should enlighten us a lot !
Anybody to sponsor that ($150 for a win, $50 for a draw, nothing for a loss) ?
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: 237 Mn/s for Stockfish on an 2xEPYC 7742

Post by bob »

Laskos wrote: Fri Jan 17, 2020 10:37 pm
bob wrote: Thu Jan 16, 2020 7:21 pm How far it is above Crafty really doesn't affect my comments. 2K nodes per second is not going to beat a GM. Might not beat a master, since most of the speculative pruning stuff really needs exceptional search depth to bypass the holes it causes.

1 second per move vs 180 seconds per move? Maybe. But certainly not 360K nodes total vs 180 seconds for a GM.

As far as the other numbers you quoted, I don't think any of 'em are even close to reality. Stockfish at 1K nodes per second is going to get roasted by DB. Completely roasted. That is only a factor of 100,000 in terms of nodes per second. Would be pretty easy to test. Take the online Cray Blitz source, and let it run for 180 * 200M nodes total and limit stockfish to 360K total. Cray Blitz was fairly similar to DB and others of that time frame, primarily using recursive null-move but nothing else regarding forward pruning, except maybe in the last few plies before q-search. Would be an interesting thing to try, but it would certainly take forever for a single game. I'm pretty sure nobody would want to scale that down to 2M nodes per second vs 10 nodes per second to keep the time ratio the same. I can certainly predict how badly that will end.

I would be willing to play stockfish at 100 nodes per second myself, and I have not actively played chess in 30 years (for tournaments anyway).

18000 nodes total is NOT going to play like a GM, in any shape for form.

With SF_dev at 5 kn/s I have no doubts SF_dev is no less than 2700 FIDE Elo level at tournament time control. Bob, software has advanced much. In some maybe 600-700 computer Elo-ish points since 1990s. Meaning some 400 FIDE Elo-ish points against humans.

There are many ways to reach this conclusion. One of the many is that Rebel 9 in 1997 was already some 2600 FIDE Elo level TPR on P200 in a good number of games against humans, having some 60 kn/s IIRC.

Do the math.
I did. But we are not talking apples to apples. You are talking 5K nodes per second, which I agreed MIGHT produce a 2700 Elo. I was talking about the 100 nps claim, giving a total search space of 18K for the entire search tree, and giving the GM 180 seconds. I do NOT believe the program searching 18K nodes total per move will have any significant chance to win even a single game.
Uri Blass
Posts: 10267
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: 237 Mn/s for Stockfish on an 2xEPYC 7742

Post by Uri Blass »

bob wrote: Thu Jan 23, 2020 4:31 am
Laskos wrote: Fri Jan 17, 2020 10:37 pm
bob wrote: Thu Jan 16, 2020 7:21 pm How far it is above Crafty really doesn't affect my comments. 2K nodes per second is not going to beat a GM. Might not beat a master, since most of the speculative pruning stuff really needs exceptional search depth to bypass the holes it causes.

1 second per move vs 180 seconds per move? Maybe. But certainly not 360K nodes total vs 180 seconds for a GM.

As far as the other numbers you quoted, I don't think any of 'em are even close to reality. Stockfish at 1K nodes per second is going to get roasted by DB. Completely roasted. That is only a factor of 100,000 in terms of nodes per second. Would be pretty easy to test. Take the online Cray Blitz source, and let it run for 180 * 200M nodes total and limit stockfish to 360K total. Cray Blitz was fairly similar to DB and others of that time frame, primarily using recursive null-move but nothing else regarding forward pruning, except maybe in the last few plies before q-search. Would be an interesting thing to try, but it would certainly take forever for a single game. I'm pretty sure nobody would want to scale that down to 2M nodes per second vs 10 nodes per second to keep the time ratio the same. I can certainly predict how badly that will end.

I would be willing to play stockfish at 100 nodes per second myself, and I have not actively played chess in 30 years (for tournaments anyway).

18000 nodes total is NOT going to play like a GM, in any shape for form.

With SF_dev at 5 kn/s I have no doubts SF_dev is no less than 2700 FIDE Elo level at tournament time control. Bob, software has advanced much. In some maybe 600-700 computer Elo-ish points since 1990s. Meaning some 400 FIDE Elo-ish points against humans.

There are many ways to reach this conclusion. One of the many is that Rebel 9 in 1997 was already some 2600 FIDE Elo level TPR on P200 in a good number of games against humans, having some 60 kn/s IIRC.

Do the math.
I did. But we are not talking apples to apples. You are talking 5K nodes per second, which I agreed MIGHT produce a 2700 Elo. I was talking about the 100 nps claim, giving a total search space of 18K for the entire search tree, and giving the GM 180 seconds. I do NOT believe the program searching 18K nodes total per move will have any significant chance to win even a single game.
You were talking also about the option of 2K nodes per second

Your words:
"How far it is above Crafty really doesn't affect my comments. 2K nodes per second is not going to beat a GM. Might not beat a master, since most of the speculative pruning stuff really needs exceptional search depth to bypass the holes it causes."

If 5 knodes per second produce 2700 elo then I guess 2 knodes per second may give 2500 elo.
yorkman
Posts: 105
Joined: Thu Jul 27, 2017 10:59 pm

Re: 237 Mn/s for Stockfish on an 2xEPYC 7742

Post by yorkman »

So I just finished building my new 2x Epyc 7742 system on a Supermicro H11DSi-NT but I can't figure out why I only get about 80,000 kN/s in Windows with SMT/HT enabled or disabled. Only Linux is better as I get double that, 165,000 kN/s. If I disabled SMT/HT there's no difference.

Anyone have this kind of setup? I've spent hours trying different OSes...Win'10 Ent, Win'2012R2, Win'2019, CentOS 8.1.

I don't know what the benchmarks were of in the previous pages here. Is that SF with HT enabled and under Linux or Windows?

I haven't bought 3200MHz ram for it yet so I'm currently testing with the only ram I have, 8x8GB 2166MHz. But I doubt I'll get me 235,000+ kN/s all of a sudden simply by replacing the ram with 3200MHz.

Can anyone help?
Alayan
Posts: 550
Joined: Tue Nov 19, 2019 8:48 pm
Full name: Alayan Feh

Re: 237 Mn/s for Stockfish on an 2xEPYC 7742

Post by Alayan »

Are you using the same version of Stockfish as in the 237Mnps benchmark ?
yorkman
Posts: 105
Joined: Thu Jul 27, 2017 10:59 pm

Re: 237 Mn/s for Stockfish on an 2xEPYC 7742

Post by yorkman »

Thanks, but what version is that exactly? Just stockfish 9? So no dev version? Can one still actually download SF9?

Here's what I just got in CentOS 8.1 with SMT/HT enabled and 8 dimms populated with 2166MHz ram (8 GB each):

Total time (ms) : 299493
Nodes searched : 19609843392
Nodes/second : 65476800

That's pathetic! Seems like I was getting better performance the less dimms I put in.
Zenmastur
Posts: 919
Joined: Sat May 31, 2014 8:28 am

Re: 237 Mn/s for Stockfish on an 2xEPYC 7742

Post by Zenmastur »

yorkman wrote: Thu Apr 09, 2020 3:32 am Thanks, but what version is that exactly? Just stockfish 9? So no dev version? Can one still actually download SF9?

Here's what I just got in CentOS 8.1 with SMT/HT enabled and 8 dimms populated with 2166MHz ram (8 GB each):

Total time (ms) : 299493
Nodes searched : 19609843392
Nodes/second : 65476800

That's pathetic! Seems like I was getting better performance the less dimms I put in.
It would help if we had a little more information. Like:

How many cores are actually being used by stockfish? What clock speed are the cores running while running the benchmark?
What is the CPU temperature before, during, and after the benchmark?

Why are you using DDR4-2166 memory? This will slow the system by a huge amount, especially with so many cores. 2166 is VERY high latentcy low bandwidth memory. Not good for such a fast system.

What about NUMA settings of the OS? Are you using a NUMA aware version of SF? ETC. ETC.
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.
yorkman
Posts: 105
Joined: Thu Jul 27, 2017 10:59 pm

Re: 237 Mn/s for Stockfish on an 2xEPYC 7742

Post by yorkman »

Here are my results with SF9, the same as in the benchmark from this thread:

./stockfish-9-popcnt bench 1024 256 26

Total time (ms) : 519221
Nodes searched : 45414888487
Nodes/second : 87467356

Look at the Total time! Took forever to finish this benchmark in latest CentOS 8.1. Temps were fine in Windows and speed, also in Windows, was also good at 3.20 GHz. Win'10 Ent reported 4 logical processors with HT enabled. I have two physical processors @ 64 cores each. I'm now using Linux but will probably go back to Win'10 Ent. since I now know the problem persists in Linux too. Strangely, I did get 160,000 kN/s to 170,000 kN/s a few times in Linux but for some reason not anymore. It may be because I went from 4 to 8 dimms (8x8=64GB).

@zenmastur: That's the only ram I have at the moment. I wanted to buy 16 * 3200MHz but it's very hard to get that right now with the covid19 virus. And the prices are ridiculous too, especially since the cad$ dropped a lot so for the time being I have to wait and test with what I have.

I have tried the very latest dev version of SF which I believe is NUMA aware, at least with processor groups but I don't know if can support 4 numa nodes.

So you're saying it's most likely the 2166MHz ram that's killing all the speed on this system? I'm starting to think the same as there's not much else for me left to try.
User avatar
MikeB
Posts: 4889
Joined: Thu Mar 09, 2006 6:34 am
Location: Pen Argyl, Pennsylvania

Re: 237 Mn/s for Stockfish on an 2xEPYC 7742

Post by MikeB »

yorkman wrote: Thu Apr 09, 2020 4:37 am Here are my results with SF9, the same as in the benchmark from this thread:

./stockfish-9-popcnt bench 1024 256 26

Total time (ms) : 519221
Nodes searched : 45414888487
Nodes/second : 87467356

Look at the Total time! Took forever to finish this benchmark in latest CentOS 8.1. Temps were fine in Windows and speed, also in Windows, was also good at 3.20 GHz. Win'10 Ent reported 4 logical processors with HT enabled. I have two physical processors @ 64 cores each. I'm now using Linux but will probably go back to Win'10 Ent. since I now know the problem persists in Linux too. Strangely, I did get 160,000 kN/s to 170,000 kN/s a few times in Linux but for some reason not anymore. It may be because I went from 4 to 8 dimms (8x8=64GB).

@zenmastur: That's the only ram I have at the moment. I wanted to buy 16 * 3200MHz but it's very hard to get that right now with the covid19 virus. And the prices are ridiculous too, especially since the cad$ dropped a lot so for the time being I have to wait and test with what I have.

I have tried the very latest dev version of SF which I believe is NUMA aware, at least with processor groups but I don't know if can support 4 numa nodes.

So you're saying it's most likely the 2166MHz ram that's killing all the speed on this system? I'm starting to think the same as there's not much else for me left to try.
This is the authoritative page for chess benchmarks on high end systems

http://ipmanchess.yolasite.com/amd---in ... -bench.php

You must use asmfish from 05-22-2017 to do a real compare to benchmarks posted there
Download Link from Ipman's site
https://mega.nz/#!aQB3nSAY!Js7t0ws5U_K1 ... w83ObNeGew

If you prefer a Dropbox link, I just created this one with the same file here:

https://www.dropbox.com/s/x2jgwajvpxuxv ... 22.7z?dl=0
Image
User avatar
MikeB
Posts: 4889
Joined: Thu Mar 09, 2006 6:34 am
Location: Pen Argyl, Pennsylvania

Re: 237 Mn/s for Stockfish on an 2xEPYC 7742

Post by MikeB »

using 05-22-2017 asmfish
all runs are self throttle to 79C to keep the noise down to a hum ( setup is in loft area overlooking family room)

no large pages

asmFishW_2017-05-22_popcnt
bench 1024 64 26
*** bench hash 1024 threads 64 depth 26 realtime 0 ***
info string hash set to 1024 MB no large pages
info string node 0 has threads 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
1: nodes: 1003225620 87886 knps
2: nodes: 1096399452 97069 knps
3: nodes: 35870504 121594 knps
4: nodes: 791892441 104650 knps
5: nodes: 307372089 96324 knps
6: nodes: 212604583 94743 knps
7: nodes: 189352370 99449 knps
8: nodes: 1954239961 100926 knps
9: nodes: 2569479008 94947 knps
10: nodes: 114132586 105288 knps
11: nodes: 1022255567 94417 knps
12: nodes: 389192971 92687 knps
13: nodes: 96630339 105261 knps
14: nodes: 998514339 95096 knps
15: nodes: 269623603 104101 knps
16: nodes: 109903450 120113 knps
17: nodes: 168598022 134341 knps
18: nodes: 74156460 114971 knps
19: nodes: 94693139 131884 knps
20: nodes: 397690020 139540 knps
21: nodes: 31600232 114909 knps
22: nodes: 90970272 124616 knps
23: nodes: 200133929 122706 knps
24: nodes: 63196571 117030 knps
25: nodes: 174984684 145577 knps
26: nodes: 11217292 119332 knps
27: nodes: 29488451 115640 knps
28: nodes: 482959643 116656 knps
29: nodes: 133640074 106741 knps
30: nodes: 16238802 119402 knps
31: nodes: 10748560 109679 knps
32: nodes: 7246441 95347 knps
33: nodes: 4534629 85559 knps
34: nodes: 143937603 122604 knps
35: nodes: 15132829 108869 knps
36: nodes: 3564563 99015 knps
37: nodes: 5623887 119657 knps
===========================
Total time (ms) : 132715
Nodes searched : 13321044986
Nodes/second : 100373318

with Large pages

setoption name LargePages value true
bench 1024 64 26
*** bench hash 1024 threads 64 depth 26 realtime 0 ***
info string hash set to 1024 MB page size 2048 KB // when large pages are set to true. it auto changes hash to 2048
1: nodes: 1091256707 98143 knps
2: nodes: 1336298761 110960 knps
3: nodes: 36393354 143847 knps
4: nodes: 311433662 119644 knps
5: nodes: 410798768 108418 knps
6: nodes: 260990534 106788 knps
7: nodes: 370311885 110905 knps
8: nodes: 1797622527 114221 knps
9: nodes: 826412569 107858 knps
10: nodes: 254806732 120135 knps
11: nodes: 1378518364 107319 knps
12: nodes: 1970185044 101676 knps
13: nodes: 127409287 117211 knps
14: nodes: 1478631165 106644 knps
15: nodes: 318547015 118374 knps
16: nodes: 135510626 140425 knps
17: nodes: 271118209 152742 knps
18: nodes: 91135932 133630 knps
19: nodes: 85629190 142241 knps
20: nodes: 159601037 156012 knps
21: nodes: 31548333 124206 knps
22: nodes: 60819206 128854 knps
23: nodes: 511962919 132051 knps
24: nodes: 209818923 137857 knps
25: nodes: 29740039 144369 knps
26: nodes: 8526767 133230 knps
27: nodes: 27121861 130393 knps
28: nodes: 519147154 136366 knps
29: nodes: 82783549 124113 knps
30: nodes: 31685358 134259 knps
31: nodes: 21511043 122221 knps
32: nodes: 9874695 108513 knps
33: nodes: 3589499 87548 knps
34: nodes: 44833000 128830 knps
35: nodes: 16226559 121093 knps
36: nodes: 4343029 108575 knps
37: nodes: 4813085 130083 knps
===========================
Total time (ms) : 128203
Nodes searched : 14330956387
Nodes/second : 111783315

my fastest run was at 121M nps - but it was cooking at 95C (CPU /MB limit) with all fans going full blast - not very comfortable from both heat and noise perspective, plus the add'l heat will shorten the lives of the CPUs

From literally running hundreds of benches for testing stability and tuning , 95% of the benchmarks for this version of asmfish will be between the NPS for position #2 and position #8

Easiest way to tune an AMD Threadripper CPU on Windows is to use the Ryzen Master app and manually set the DRAM timings to spec or a little faster.

Enable large page Support - Windows 10 Pro:

From the Start menu, open Local Security Policy (under Administrative Tools).
Under Local Policies\User Rights Assignment, double click the Lock Pages in Memory setting.
Click Add User or Group and type your Windows user name.
Restart your computer

Image
Image