237 Mn/s for Stockfish on an 2xEPYC 7742

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Zenmastur
Posts: 919
Joined: Sat May 31, 2014 8:28 am

Re: 237 Mn/s for Stockfish on an 2xEPYC 7742

Post by Zenmastur »

yorkman wrote: Thu Apr 09, 2020 4:37 am Here are my results with SF9, the same as in the benchmark from this thread:

./stockfish-9-popcnt bench 1024 256 26

Total time (ms) : 519221
Nodes searched : 45414888487
Nodes/second : 87467356

Look at the Total time! Took forever to finish this benchmark in latest CentOS 8.1. Temps were fine in Windows and speed, also in Windows, was also good at 3.20 GHz. Win'10 Ent reported 4 logical processors with HT enabled. I have two physical processors @ 64 cores each. I'm now using Linux but will probably go back to Win'10 Ent. since I now know the problem persists in Linux too. Strangely, I did get 160,000 kN/s to 170,000 kN/s a few times in Linux but for some reason not anymore. It may be because I went from 4 to 8 dimms (8x8=64GB).

@zenmastur: That's the only ram I have at the moment. I wanted to buy 16 * 3200MHz but it's very hard to get that right now with the covid19 virus. And the prices are ridiculous too, especially since the cad$ dropped a lot so for the time being I have to wait and test with what I have.
You need to use a utility that lets you see the individual temperature and clock speed/%cpu load of each physical core in real time while the benchmark is running. Otherwise you are only guessing what's going on. I just looked at 512GB (8 x 64GB) of ECC RDIMMS DDR4-3200 for < $3200. It wasn't hard to find and considering the cost of one Epyc 7742 is over $7500 I wouldn't call it expensive. Even if you put 2TB of ram in your system it's still less than the cost of the CPU's.

I would try some simple tests before I got my panties in a bunch. Like what is the speed in Mnps of a single core running SF on your system. Then you can judge if this is what you should expect considering the CPU clock speed as compared to other RYZEN systems. If it's NOT a "reasonable" number then you have a startinging point. If it is a "reasonable" number then double the core count and repeat.
It should scale until 64 cores are reached. You do need to make sure all cores are on the same NUMA node for the scaling to be "good". once you go over the 64-core limit you should see the scaling change ( or any time the core are split between to NUMA nodes). If it's a NUMA issue you shouldn't see much scaling loss UNTIL you get core or more than one NUMA node. If it's a memory latency issue you should see LOW NPS on a single core. If it's a band-width issue you should see it scale poorly as soon as the memory is near saturation. If you test the system with an app like Aida64 you can see both the latency and max memory bandwidth so should be able to guess based on NPS per core when the memory bus should saturate.

Regards,

Zenmastur
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.
yorkman
Posts: 105
Joined: Thu Jul 27, 2017 10:59 pm

Re: 237 Mn/s for Stockfish on an 2xEPYC 7742

Post by yorkman »

Yes of course I used a utilty to see the individual temps/clock speeds and each node is 2800 MHz with the turbo speed boost.

As for $3200 for that ram...I spent all my savings on the cpus and the rest was supposed to be for ram 16x16GB (256 GB) of ECC DDR-3200 but that's on hold now since it's hard to find compatible ram in stock for this motherboard (as per Supermicro). I was supposed to buy it for under $2000 but it's a lot more than that now so I'm just waiting for the dollar to go back up and prices to relax. Just because I spent a lot of money on the cpu's doesn't mean I can spend so much on ram. And I don't need more than 256 GB as even that is overkill right now. I just want to fill all the slots that way I get octa memory bandwidth.

Anyway, I went back to Win'10 Enterprise. I added the other 4 dimms of ram I had to give me 8 total and 4 channel memory at 2166MHz. Now here's the strange thing. In Aquarium I'll analyze various positions and usually get about 170-180,000 kN/s with my latest compiled SF with LP added. But when I just run the exe directly and then bench 1024 256 26 I get the kind of results similar to before or even worse since I now get about 58,000 kN/s. But again, in Aquarium it's reasonable with 180,000 kN/s.

And yes I've already tried with asmFishW-2017-05-22_popcnt before but still never got anywhere near 230,000+ kN/s (I think it was about 150,000 kN/s that time).

I've just run asmF again and got these results:

asmFishW_2017-05-22_popcnt
bench 1024 256 26
*** bench hash 1024 threads 256 depth 26 realtime 0 ***
info string hash set to 1024 MB no large pages
info string node 0 has threads 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 128 132 136 140 144 148 152 156 160 164 168 172 176 180 184 188 192 196 200 204 208 212 216 220 224 228 232 236 240 244 248 252
info string node 1 has threads 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 129 133 137 141 145 149 153 157 161 165 169 173 177 181 185 189 193 197 201 205 209 213 217 221 225 229 233 237 241 245 249 253
info string node 2 has threads 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 130 134 138 142 146 150 154 158 162 166 170 174 178 182 186 190 194 198 202 206 210 214 218 222 226 230 234 238 242 246 250 254
info string node 3 has threads 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 131 135 139 143 147 151 155 159 163 167 171 175 179 183 187 191 195 199 203 207 211 215 219 223 227 231 235 239 243 247 251 255
1: nodes: 1972886290 111437 knps
2: nodes: 2709824062 119396 knps
3: nodes: 63539254 126320 knps
4: nodes: 612344821 125686 knps
5: nodes: 1042227709 122083 knps
6: nodes: 1477667819 123118 knps
7: nodes: 535811661 126729 knps
8: nodes: 3655829721 126167 knps
9: nodes: 3361383814 128169 knps
10: nodes: 556747777 123092 knps
11: nodes: 2087273792 124006 knps
12: nodes: 1282020288 128381 knps
13: nodes: 498715834 127190 knps
14: nodes: 1709616294 126404 knps
15: nodes: 500842403 118290 knps
16: nodes: 557466743 129522 knps
17: nodes: 14129884 129631 knps
18: nodes: 430846523 121228 knps
19: nodes: 92630307 125515 knps
20: nodes: 1034693897 143091 knps
21: nodes: 70462283 139253 knps
22: nodes: 118615556 150718 knps
23: nodes: 1569323296 158118 knps
24: nodes: 580988423 116244 knps
25: nodes: 830718952 136698 knps
26: nodes: 15622081 127008 knps
27: nodes: 71965765 118755 knps
28: nodes: 663974872 122595 knps
29: nodes: 181775692 115413 knps
30: nodes: 66616818 141737 knps
31: nodes: 41147374 123195 knps
32: nodes: 20817340 124654 knps
33: nodes: 7899888 112855 knps
34: nodes: 18671876 123654 knps
35: nodes: 52207733 115248 knps
36: nodes: 20863141 132045 knps
37: nodes: 11367921 114827 knps
===========================
Total time (ms) : 226616
Nodes searched : 28539537904
Nodes/second : 125937876

One thing I noticed compared to SF is that with asmF I get 3.20 GHz per core whereas with SF it's 2.80 GHz per core. Temps during this time are 68C per cpu.

Also thanks for the ipman links but I'm well aware of the data on his site and I saw what others got with the same cpu's.

Thanks for your results. I don't know what system that is but looks like a single 64 core and those results are really good compared to what I'm getting with dual 7742's.

asmFishW_2017-05-22_popcnt
bench 1024 256 26
*** bench hash 1024 threads 256 depth 26 realtime 0 ***
info string hash set to 1024 MB no large pages
info string node 0 has threads 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 128 132 136 140 144 148 152 156 160 164 168 172 176 180 184 188 192 196 200 204 208 212 216 220 224 228 232 236 240 244 248 252
info string node 1 has threads 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 129 133 137 141 145 149 153 157 161 165 169 173 177 181 185 189 193 197 201 205 209 213 217 221 225 229 233 237 241 245 249 253
info string node 2 has threads 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 130 134 138 142 146 150 154 158 162 166 170 174 178 182 186 190 194 198 202 206 210 214 218 222 226 230 234 238 242 246 250 254
info string node 3 has threads 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 131 135 139 143 147 151 155 159 163 167 171 175 179 183 187 191 195 199 203 207 211 215 219 223 227 231 235 239 243 247 251 255
1: nodes: 1972886290 111437 knps
2: nodes: 2709824062 119396 knps
3: nodes: 63539254 126320 knps
4: nodes: 612344821 125686 knps
5: nodes: 1042227709 122083 knps
6: nodes: 1477667819 123118 knps
7: nodes: 535811661 126729 knps
8: nodes: 3655829721 126167 knps
9: nodes: 3361383814 128169 knps
10: nodes: 556747777 123092 knps
11: nodes: 2087273792 124006 knps
12: nodes: 1282020288 128381 knps
13: nodes: 498715834 127190 knps
14: nodes: 1709616294 126404 knps
15: nodes: 500842403 118290 knps
16: nodes: 557466743 129522 knps
17: nodes: 14129884 129631 knps
18: nodes: 430846523 121228 knps
19: nodes: 92630307 125515 knps
20: nodes: 1034693897 143091 knps
21: nodes: 70462283 139253 knps
22: nodes: 118615556 150718 knps
23: nodes: 1569323296 158118 knps
24: nodes: 580988423 116244 knps
25: nodes: 830718952 136698 knps
26: nodes: 15622081 127008 knps
27: nodes: 71965765 118755 knps
28: nodes: 663974872 122595 knps
29: nodes: 181775692 115413 knps
30: nodes: 66616818 141737 knps
31: nodes: 41147374 123195 knps
32: nodes: 20817340 124654 knps
33: nodes: 7899888 112855 knps
34: nodes: 18671876 123654 knps
35: nodes: 52207733 115248 knps
36: nodes: 20863141 132045 knps
37: nodes: 11367921 114827 knps
===========================
Total time (ms) : 226616
Nodes searched : 28539537904
Nodes/second : 125937876

Now I restart the engine and enable LP:

asmFishW_2017-05-22_popcnt
setoption name Largepages value True
bench 1024 256 26
*** bench hash 1024 threads 256 depth 26 realtime 0 ***
info string hash set to 1024 MB no large pages
info string node 0 has threads 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 128 132 136 140 144 148 152 156 160 164 168 172 176 180 184 188 192 196 200 204 208 212 216 220 224 228 232 236 240 244 248 252
info string node 1 has threads 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 129 133 137 141 145 149 153 157 161 165 169 173 177 181 185 189 193 197 201 205 209 213 217 221 225 229 233 237 241 245 249 253
info string node 2 has threads 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 130 134 138 142 146 150 154 158 162 166 170 174 178 182 186 190 194 198 202 206 210 214 218 222 226 230 234 238 242 246 250 254
info string node 3 has threads 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 131 135 139 143 147 151 155 159 163 167 171 175 179 183 187 191 195 199 203 207 211 215 219 223 227 231 235 239 243 247 251 255
1: nodes: 2465010980 105604 knps
2: nodes: 6341195709 118230 knps
3: nodes: 91890318 122520 knps
4: nodes: 1902370048 119017 knps
5: nodes: 2719193860 117307 knps
6: nodes: 1905450402 117431 knps
7: nodes: 2200974885 121587 knps
8: nodes: 6854689910 120367 knps
9: nodes: 3087383898 121838 knps
10: nodes: 233557861 115451 knps
11: nodes: 2315765997 120833 knps
12: nodes: 6522700169 122540 knps
13: nodes: 521934165 119792 knps
14: nodes: 1402632944 120016 knps
15: nodes: 1242721036 115763 knps
16: nodes: 919146369 121259 knps
17: nodes: 1028366419 130702 knps
18: nodes: 880360423 117963 knps
19: nodes: 563507584 120743 knps
20: nodes: 2658602498 137494 knps
21: nodes: 313926652 134271 knps
22: nodes: 341391477 123737 knps
23: nodes: 1681469768 130124 knps
24: nodes: 1405606963 108474 knps
25: nodes: 3449024210 133305 knps
26: nodes: 61954075 121004 knps
27: nodes: 73401662 114154 knps
28: nodes: 878346403 116507 knps
29: nodes: 533204593 109062 knps
30: nodes: 1472867579 134434 knps
31: nodes: 209262578 119510 knps
32: nodes: 21889026 112830 knps
33: nodes: 15507158 106945 knps
34: nodes: 127833109 128994 knps
35: nodes: 603310502 135423 knps
36: nodes: 92182208 126798 knps
37: nodes: 22156729 116614 knps
===========================
Total time (ms) : 471458
Nodes searched : 57160790167
Nodes/second : 121242592

Almost no difference with LP. I know I have LP working on my system because when I start my compiled SF it reports 16 Mb Large Pages (and I can also tell by looking at the memory info for the SF process in the Resource Manager):

Stockfish 090420 64 POPCNT by T. Romstad, M. Costalba, J. Kiiski, G. Linscott
info string Hash LargePages 16 Mb
info string Found 0 tablebases

Given these results, I just don't believe that performance would be about 50% slower with the 8 * 2166MHz ram than if I was to get 16 * 3200MHz ram. But I guess I hope I'm wrong because there's not much else that I can try that I haven't already.
yorkman
Posts: 105
Joined: Thu Jul 27, 2017 10:59 pm

Re: 237 Mn/s for Stockfish on an 2xEPYC 7742

Post by yorkman »

Actually hang on. I just realized the 2nd results with LP enabled says large pages are NOT enabled. But when I run SF with LP it says it's enabled. I think this is because when I added myself to Lock pages in Memory using gpedit.msc I did it when I renamed my computer name but didn't reboot so it was showing as User1 instead of myhostname\User1. Still strange that SF showed large pages enabled.

Anyway I've removed and readded my account to Lock pages in memory and after rebooting I see LP enabled in asmFish but results were even worse:

asmFishW_2017-05-22_popcnt
setoption name LargePages value true
bench 1024 256 26
*** bench hash 1024 threads 256 depth 26 realtime 0 ***
info string hash set to 1024 MB page size 2048 KB
info string node 0 has threads 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 128 132 136 140 144 148 152 156 160 164 168 172 176 180 184 188 192 196 200 204 208 212 216 220 224 228 232 236 240 244 248 252
info string node 1 has threads 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 129 133 137 141 145 149 153 157 161 165 169 173 177 181 185 189 193 197 201 205 209 213 217 221 225 229 233 237 241 245 249 253
info string node 2 has threads 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 130 134 138 142 146 150 154 158 162 166 170 174 178 182 186 190 194 198 202 206 210 214 218 222 226 230 234 238 242 246 250 254
info string node 3 has threads 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 131 135 139 143 147 151 155 159 163 167 171 175 179 183 187 191 195 199 203 207 211 215 219 223 227 231 235 239 243 247 251 255
1: nodes: 1020764871 81180 knps
2: nodes: 3031337162 87104 knps
3: nodes: 60857250 92347 knps
4: nodes: 347918331 90368 knps
5: nodes: 1325619487 90989 knps
6: nodes: 574189719 89381 knps
7: nodes: 670123401 92418 knps
8: nodes: 4563858259 89261 knps
9: nodes: 2887889333 91530 knps
10: nodes: 366233716 90405 knps
11: nodes: 1822848879 90634 knps
12: nodes: 2050101383 92739 knps
13: nodes: 309141535 90843 knps
14: nodes: 1705888040 91360 knps
15: nodes: 568272345 89717 knps
16: nodes: 186854872 93802 knps
17: nodes: 376218173 96318 knps
18: nodes: 396043512 91932 knps
19: nodes: 109164355 92277 knps
20: nodes: 355344225 101036 knps
21: nodes: 78345338 84061 knps
22: nodes: 189270474 89957 knps
23: nodes: 308527304 89688 knps
24: nodes: 333756596 87508 knps
25: nodes: 393443452 97146 knps
26: nodes: 12199023 89698 knps
27: nodes: 51028695 77433 knps
28: nodes: 341752148 91622 knps
29: nodes: 295609073 84243 knps
30: nodes: 46715564 98348 knps
31: nodes: 23113988 78887 knps
32: nodes: 10000890 80652 knps
33: nodes: 7163625 80490 knps
34: nodes: 22455704 79913 knps
35: nodes: 23231526 77697 knps
36: nodes: 12891544 78606 knps
37: nodes: 16764626 85533 knps
===========================
Total time (ms) : 276687
Nodes searched : 24894938418
Nodes/second : 89975092

and yet I continue to get 180,000 kN/s in many positions using Aquarium although that's with 32GB hash. From the start position Aquarium shows me 170,000 kN/s using SF with LP enabled.
yorkman
Posts: 105
Joined: Thu Jul 27, 2017 10:59 pm

Re: 237 Mn/s for Stockfish on an 2xEPYC 7742

Post by yorkman »

Clock speed starts at 1500 but it is 2800MHz when SF bench is running, and 3200MHz when asmFish bench is running.
And with the 8 * 8 GB 2166MHz ram it seems I do get octal memory channel already. I thought I needed all dimm slots populated for that.

Image

and Aida64 Queen test:

Image

BIOS settings:

Image
yorkman
Posts: 105
Joined: Thu Jul 27, 2017 10:59 pm

Re: 237 Mn/s for Stockfish on an 2xEPYC 7742

Post by yorkman »

Sorry for the missing images from previous post. These should be working fine:

Image

Image

Image
yorkman
Posts: 105
Joined: Thu Jul 27, 2017 10:59 pm

Re: 237 Mn/s for Stockfish on an 2xEPYC 7742

Post by yorkman »

In Aquarium I just tried the asmFish 2017-05-22-popcnt engine and I got 248,000 kN/s with LP and HT enabled in many positions. This is more like it. And I just realized that the bench is only giving me poor results like 94,000 kN/s when I bench with only 1024 MB of hash. When I set hash to 32GB I get the same or better speeds than in Aquarium with the same engine:

Code: Select all

asmFishW_2017-05-22_popcnt
setoption name largepages value true
bench 32768 256 26
*** bench hash 32768 threads 256 depth 26 realtime 0 ***
info string hash set to 32768 MB page size 2048 KB
1:      nodes:  1099654648      256210 knps
2:      nodes:  3541528719      281722 knps
3:      nodes:  52490671        293243 knps
4:      nodes:  821501556       296892 knps
5:      nodes:  1108391488      274354 knps
6:      nodes:  931360048       270037 knps
7:      nodes:  639670632       280311 knps
8:      nodes:  3933248779      289379 knps
9:      nodes:  1844261779      274117 knps
10:     nodes:  443375274       300186 knps
11:     nodes:  1943369726      276086 knps
12:     nodes:  3372899847      266569 knps
13:     nodes:  515207197       295416 knps
14:     nodes:  4767216586      275243 knps
15:     nodes:  585526534       294086 knps
16:     nodes:  654540305       339668 knps
17:     nodes:  84681061        320761 knps
18:     nodes:  226750881       284505 knps
19:     nodes:  108675223       275127 knps
20:     nodes:  1349868025      329638 knps
21:     nodes:  68275741        254760 knps
22:     nodes:  133301813       224413 knps
23:     nodes:  513561672       281866 knps
24:     nodes:  736776192       297446 knps
25:     nodes:  3757645         163375 knps
26:     nodes:  23393845        311917 knps
27:     nodes:  48054093        233272 knps
28:     nodes:  583293177       331228 knps
29:     nodes:  369188513       285528 knps
30:     nodes:  83286534        302860 knps
31:     nodes:  20867790        198740 knps
32:     nodes:  15393663        181101 knps
33:     nodes:  8694542         189011 knps
34:     nodes:  24335423        236266 knps
35:     nodes:  14902997        186287 knps
36:     nodes:  13943550        217867 knps
37:     nodes:  14931262        261951 knps
===========================
Total time (ms) : 108936
Nodes searched  : 30700177431
Nodes/second    : 281818475
That's more like it!

I wish I could say the same about latest SF with LP added:

===========================
Total time (ms) : 54464
Nodes searched : 11764651729
Nodes/second : 216007853

Quite a bit worse. NUMA/processor groups bug with this particular system?
User avatar
MikeB
Posts: 4889
Joined: Thu Mar 09, 2006 6:34 am
Location: Pen Argyl, Pennsylvania

Re: 237 Mn/s for Stockfish on an 2xEPYC 7742

Post by MikeB »

yorkman wrote: Fri Apr 10, 2020 3:14 am In Aquarium I just tried the asmFish 2017-05-22-popcnt engine and I got 248,000 kN/s with LP and HT enabled in many positions. This is more like it. And I just realized that the bench is only giving me poor results like 94,000 kN/s when I bench with only 1024 MB of hash. When I set hash to 32GB I get the same or better speeds than in Aquarium with the same engine:

Code: Select all

asmFishW_2017-05-22_popcnt
setoption name largepages value true
bench 32768 256 26
*** bench hash 32768 threads 256 depth 26 realtime 0 ***
info string hash set to 32768 MB page size 2048 KB
1:      nodes:  1099654648      256210 knps
2:      nodes:  3541528719      281722 knps
3:      nodes:  52490671        293243 knps
4:      nodes:  821501556       296892 knps
5:      nodes:  1108391488      274354 knps
6:      nodes:  931360048       270037 knps
7:      nodes:  639670632       280311 knps
8:      nodes:  3933248779      289379 knps
9:      nodes:  1844261779      274117 knps
10:     nodes:  443375274       300186 knps
11:     nodes:  1943369726      276086 knps
12:     nodes:  3372899847      266569 knps
13:     nodes:  515207197       295416 knps
14:     nodes:  4767216586      275243 knps
15:     nodes:  585526534       294086 knps
16:     nodes:  654540305       339668 knps
17:     nodes:  84681061        320761 knps
18:     nodes:  226750881       284505 knps
19:     nodes:  108675223       275127 knps
20:     nodes:  1349868025      329638 knps
21:     nodes:  68275741        254760 knps
22:     nodes:  133301813       224413 knps
23:     nodes:  513561672       281866 knps
24:     nodes:  736776192       297446 knps
25:     nodes:  3757645         163375 knps
26:     nodes:  23393845        311917 knps
27:     nodes:  48054093        233272 knps
28:     nodes:  583293177       331228 knps
29:     nodes:  369188513       285528 knps
30:     nodes:  83286534        302860 knps
31:     nodes:  20867790        198740 knps
32:     nodes:  15393663        181101 knps
33:     nodes:  8694542         189011 knps
34:     nodes:  24335423        236266 knps
35:     nodes:  14902997        186287 knps
36:     nodes:  13943550        217867 knps
37:     nodes:  14931262        261951 knps
===========================
Total time (ms) : 108936
Nodes searched  : 30700177431
Nodes/second    : 281818475
That's more like it!

I wish I could say the same about latest SF with LP added:

===========================
Total time (ms) : 54464
Nodes searched : 11764651729
Nodes/second : 216007853

Quite a bit worse. NUMA/processor groups bug with this particular system?
Run current SF without LP - it should be 10% less based on above - if they are the same ,the LP implementation may not be correct.

Brainfish has the correct LP implementation - so if that runs at 237M+ nps, you're good.
Image
yorkman
Posts: 105
Joined: Thu Jul 27, 2017 10:59 pm

Re: 237 Mn/s for Stockfish on an 2xEPYC 7742

Post by yorkman »

Ahh I forgot to test with BrainFish. Thanks.

Yes, on my other dual Xeon E5-2696v3 I'd get about 10-15% more kN/s with LP enabled. On this system using Brainfish, with HT and LP enabled I get 37% more kN/s.

That means I go from 140,000 kN/s to 192,000 kN/s. Still missing about 38,000 kN/s, or 20% though :-(

This also means that asmFishW on my system is a whopping 40% faster than Brainfish with the same configuration. It's too bad asmFish isn't updated anymore or I'd just go with that and not bother the SF team.

In any case, that goes back to NUMA/Processor Groups as the most likely culprit on this system. Can someone please look at this part of the code in SF?
User avatar
yurikvelo
Posts: 710
Joined: Sat Dec 06, 2014 1:53 pm

Re: 237 Mn/s for Stockfish on an 2xEPYC 7742

Post by yurikvelo »

bob wrote:That 5K nodes per second is a REAL restriction. Many were doing 5K nodes per second in the 70's and 80's. And far beyond. Without a GM being produced.

I am certain that the 2700 Elo at 2K nodes per second is a wild exaggeration of reality. Maybe 500K nodes per second, possible. Certainly not 2K.
2K is more than enough, 1K is better estimate.
Solving positions is not a metric for estimating 2700-elo playing perfomance.

5K nodes in 1980's and 200 MN/sec of DeepBlue were targeted to seek bad moves as hard as possible.
If evaluation function was tuned to treat Queen same as pawn (I exxagerate just to introduce general idea), DeepBlue will search 120 billion nodes (in 10 minutes) to find all positions were he can sacrifice Queen for pawn.
DeepBlue put all his brute force to maximize his ill evaluation function.
Dann Corbit
Posts: 12538
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: 237 Mn/s for Stockfish on an 2xEPYC 7742

Post by Dann Corbit »

yorkman wrote: Fri Apr 10, 2020 3:14 am In Aquarium I just tried the asmFish 2017-05-22-popcnt engine and I got 248,000 kN/s with LP and HT enabled in many positions.
{snip}
I wish I could say the same about latest SF with LP added:

===========================
Total time (ms) : 54464
Nodes searched : 11764651729
Nodes/second : 216007853

Quite a bit worse. NUMA/processor groups bug with this particular system?
That assumes that NPS is really important. asmFish is way behind SF in development.
With that lower NPS count, the current SF will still wipe the floor with asmFish.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.