Nodes/sec. with last new CPU's!

jpqy · Post by **jpqy** » Mon Aug 28, 2017 6:58 pm

Milos is joking..right?!

JP.

Dann Corbit · Post by **Dann Corbit** » Mon Aug 28, 2017 9:17 pm

Vinvin wrote:
Waschbaer wrote:Interesting

I tried to do the test on my system, Ryzen 7 1800 X at 3.95 GHz 16 threads, 64MByte RAM, so the RAM speed was lower as on system with only 2 RAM slots used.

ASMfish for download: I only found the version 2015-5-15 popcnt.

Result:

Total time (ms) : 186547
Nodes searched : 4857078580
Nodes/second : 26036755
You can find more recent versions of ASMfish here : https://github.com/lantonov/asmFish/tre ... es/Windows
Code: Select all
asmFishW_2017-08-25_base.exe 	
asmFishW_2017-08-25_bmi2.exe 	
asmFishW_2017-08-25_popcnt.exe

The benchmark is supposed to use that particular version (2015-5-15).

zullil · Post by **zullil** » Mon Aug 28, 2017 10:00 pm

jpqy wrote:Milos is joking..right?!

JP.

224 Threads ---- how are these distributed when asmFish search using its Lazy SMP? I'm too busy/lazy to study the code. What depths are these various threads looking at? What size hash would be needed to get real benefit from so many threads?

Houdini · Post by **Houdini** » Tue Aug 29, 2017 12:59 am

jpqy wrote:Milos is joking..right?!

JP.

Assuming a factor of 11 in node speed (275 MN/sec vs 25 MN/sec), and a factor of 14 in number of cores (112 vs 8 cores or 224 vs 16 threads), the Ryzen 1700 will need about a time advantage of 6 to 1 to get level.

Milos' claim is not that far off.

Milos · Post by **Milos** » Tue Aug 29, 2017 1:21 am

Houdini wrote:
jpqy wrote:Milos is joking..right?!

JP.
Assuming a factor of 11 in node speed (275 MN/sec vs 25 MN/sec), and a factor of 14 in number of cores (112 vs 8 cores or 224 vs 16 threads), the Ryzen 1700 will need about a time advantage of 6 to 1 to get level.

Milos' claim is not that far off.

SF has parallelization efficiency of LazySMP of 95.5% when using up to 16 cores. For more it drops, when using NUMA it drops even further.
112 cores with 275MN/sec is 2.5MN/sec per core assuming 99% linear nps scaling.
8 cores with 25MN/sec is 3.2MN/sec per core. That is 28% extra speed per core for Ryzen 7 compared to Intel.
Ryzen 7 equivalent speed up of 8 cores (assuming 95.5% parallelization efficiency) according to Amdahl's Law is 6.1 times.
Intel equivalent speed up of 112 cores (assuming 95.5% parallelization efficiency which is science fiction for so many cores and especially NUMA) is 18.7.
18.7/(6.1*1.28) = 2.4.
So Ryzen 7 on 8 cores would only need 2.4 time odds to most certainly beat Intel on 112 cores using SF with the current LazySMP implementation.

Dann Corbit · Post by **Dann Corbit** » Tue Aug 29, 2017 2:15 am

Measured: 277481942 NPS for 4x Intel Xeon Platinum 8180 4x28c/56t using 224threads

Measured: 25358721 NPS for AMD Ryzen R7 1700 @3.8Ghz using 16threads

277481942 / 25358721 = 10.94 times as many nodes per second (measured, not theoretical).

Now, what that means as to Elo is anyone's guess.
I guess it's a lot (at least 100 Elo and maybe more).

If you want the Ryzen to see as many nodes you will have to let it run eleven times as long as that top end Xeon. That is simple math. If there is SMP loss for these chess machines, it is already factored in, because this is the measured speed when actually performing a variety of searches.

Now, I am not going to spend $40,000 for the CPUs alone just to analyze chess positions. But if you have $100,000 burning a hole in your pocket and you want to analyze chess positions faster than anyone else, the 4x Xeon Platinum 8180 seems pretty fast. But I guess for the same money you can get about 20 complete Ryzen 7 machines, and ten if you get really fancy ones.

I don't think that there are any claims of superior nodes per dollar for the high end machines.

Dann Corbit · Post by **Dann Corbit** » Tue Aug 29, 2017 2:42 am

zullil wrote:
jpqy wrote:Milos is joking..right?!

JP.
224 Threads ---- how are these distributed when asmFish search using its Lazy SMP? I'm too busy/lazy to study the code. What depths are these various threads looking at? What size hash would be needed to get real benefit from so many threads?

His printout shows the thread distribution. There are 4 NUMA nodes with threads as described:

asmFishL_2017-05-22_bmi2
*** bench hash 1024 threads 224 depth 26 realtime 0 ***
info string hash set to 1024 MB no large pages
info string node 0 has threads 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 112 116 120 124 128 132 136 140 144 148 152 156 160 164 168 172 176 180 184 188 192 196 200 204 208 212 216 220
info string node 1 has threads 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 113 117 121 125 129 133 137 141 145 149 153 157 161 165 169 173 177 181 185 189 193 197 201 205 209 213 217 221
info string node 2 has threads 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 114 118 122 126 130 134 138 142 146 150 154 158 162 166 170 174 178 182 186 190 194 198 202 206 210 214 218 222
info string node 3 has threads 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 115 119 123 127 131 135 139 143 147 151 155 159 163 167 171 175 179 183 187 191 195 199 203 207 211 215 219 223

Dann Corbit · Post by **Dann Corbit** » Tue Aug 29, 2017 2:46 am

Dann Corbit wrote:
zullil wrote:
jpqy wrote:Milos is joking..right?!

JP.
224 Threads ---- how are these distributed when asmFish search using its Lazy SMP? I'm too busy/lazy to study the code. What depths are these various threads looking at? What size hash would be needed to get real benefit from so many threads?
His printout shows the thread distribution. There are 4 NUMA nodes with threads as described:

asmFishL_2017-05-22_bmi2
*** bench hash 1024 threads 224 depth 26 realtime 0 ***
info string hash set to 1024 MB no large pages
info string node 0 has threads 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 112 116 120 124 128 132 136 140 144 148 152 156 160 164 168 172 176 180 184 188 192 196 200 204 208 212 216 220
info string node 1 has threads 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 113 117 121 125 129 133 137 141 145 149 153 157 161 165 169 173 177 181 185 189 193 197 201 205 209 213 217 221
info string node 2 has threads 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 114 118 122 126 130 134 138 142 146 150 154 158 162 166 170 174 178 182 186 190 194 198 202 206 210 214 218 222
info string node 3 has threads 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 115 119 123 127 131 135 139 143 147 151 155 159 163 167 171 175 179 183 187 191 195 199 203 207 211 215 219 223

As for RAM, the test is run as follows:
asmFishW_2017-05-22
bench 1024 <threadcount> 26

So 1 GB RAM is used, no matter what the machine has.
The analysis depth is set to 26 plies.

Since it is a shallow search, big RAM size is not crucial.

But I have seen that with just 64 threads at 2.3 GHz, my hash table fills really fast.

I guess with the giant piles of threads from those Xeons, a large pool of really fast RAM would be a big benefit, especially for tournament time control.

Houdini · Post by **Houdini** » Tue Aug 29, 2017 3:03 am

Dann Corbit wrote:If there is SMP loss for these chess machines, it is already factored in, because this is the measured speed when actually performing a variety of searches..

No, that is not the case. The SMP (in)efficiency is not factored in the node speed.
An engine with 1 thread producing 10 MN/s is about as strong as with 8 threads producing 15 MN/s combined.

Generally speaking, experience with current engines suggest an effective speed-up of about 1.75 for each doubling of the number of threads (at constant node speed per thread).
Using this formula, the effective strength of 275 MN/s with 112 cores will be close to 160 MN/s with 8 cores.
Inasmuch as the Ryzen 1700 produces about 25 MN/s, it requires 6x more time to achieve this strength. Which still is amazing, considering it's a €300 CPU.

Dann Corbit · Post by **Dann Corbit** » Tue Aug 29, 2017 3:25 am

Houdini wrote:
Dann Corbit wrote:If there is SMP loss for these chess machines, it is already factored in, because this is the measured speed when actually performing a variety of searches..
No, that is not the case. The SMP (in)efficiency is not factored in the node speed.
An engine with 1 thread producing 10 MN/s is about as strong as with 8 threads producing 15 MN/s combined.

Generally speaking, experience with current engines suggest an effective speed-up of about 1.75 for each doubling of the number of threads (at constant node speed per thread).
Using this formula, the effective strength of 275 MN/s with 112 cores will be close to 160 MN/s with 8 cores.
Inasmuch as the Ryzen 1700 produces about 25 MN/s, it requires 6x more time to achieve this strength. Which still is amazing, considering it's a €300 CPU.

I suppose that is reasonable.

I guess that engines with this sort of thread count have never been measured for Elo. So it might even be worse than that.

Of course, it is also possible that with a super high thread count there are beneficial effects also (e.g. a thread may stumble upon a good move early) but I suspect that would just even out in the long run.

I suppose to me the most interesting thing is that the NPS of deeper blue have been equaled for the first time. 480 hardware chess processors have met their NPS match, since the average throughput of Deeper Blue was said to be about 200M NPS.

Nodes/sec. with last new CPU's!

Re: Nodes/sec. with last new CPU's!

Re: Nodes/sec. with last new CPU's!

Re: Nodes/sec. with last new CPU's!

Re: Nodes/sec. with last new CPU's!

Re: Nodes/sec. with last new CPU's!

Re: Nodes/sec. with last new CPU's!

Re: Nodes/sec. with last new CPU's!

Re: Nodes/sec. with last new CPU's!

Re: Nodes/sec. with last new CPU's!

Re: Nodes/sec. with last new CPU's!