Nodes/sec. with last new CPU's!

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

jpqy
Posts: 550
Joined: Thu Apr 24, 2008 9:31 am
Location: Belgium

Re: Nodes/sec. with last new CPU's!

Post by jpqy »

Thanks for this information :!:

So,if i want to run games i give AMD Ryzen 1700X 6min. and this 4x Xeon 8180 1min. with 100games.. Ryzen would win it..?!

Or we can get a Cluster version from asmFish and buy 6 1700X systems and i have same chess power as a 4x Xeon 8180 system ?! or i need 11 systems..

JP.
Waschbaer
Posts: 68
Joined: Mon Dec 12, 2011 11:27 pm

Re: Nodes/sec. with last new CPU's!

Post by Waschbaer »

Dann Corbit wrote:
Vinvin wrote:
Waschbaer wrote:Interesting

I tried to do the test on my system, Ryzen 7 1800 X at 3.95 GHz 16 threads, 64MByte RAM, so the RAM speed was lower as on system with only 2 RAM slots used.

ASMfish for download: I only found the version 2015-5-15 popcnt.

Result:

Total time (ms) : 186547
Nodes searched : 4857078580
Nodes/second : 26036755
You can find more recent versions of ASMfish here : https://github.com/lantonov/asmFish/tre ... es/Windows

Code: Select all

asmFishW_2017-08-25_base.exe 	
asmFishW_2017-08-25_bmi2.exe 	
asmFishW_2017-08-25_popcnt.exe
The benchmark is supposed to use that particular version (2015-5-15).
In post #1 of this thread the linked info tells:

For this Chess Bench i use asmFishW_2017-05-22
User avatar
Thomas Lagershausen
Posts: 328
Joined: Mon Jun 11, 2007 6:59 pm

Re: Nodes/sec. with last new CPU's!

Post by Thomas Lagershausen »

Waschbaer wrote:Interesting

I tried to do the test on my system, Ryzen 7 1800 X at 3.95 GHz 16 threads, 64MByte RAM, so the RAM speed was lower as on system with only 2 RAM slots used.

ASMfish for download: I only found the version 2015-5-15 popcnt.

Result:

Total time (ms) : 186547
Nodes searched : 4857078580
Nodes/second : 26036755
Please can you tell us which mainboard you use and was it the right choice?
TL
User avatar
Houdini
Posts: 1471
Joined: Tue Mar 16, 2010 12:00 am

Re: Nodes/sec. with last new CPU's!

Post by Houdini »

jpqy wrote:So,if i want to run games i give AMD Ryzen 1700X 6min. and this 4x Xeon 8180 1min. with 100games.. Ryzen would win it..?!
The number is nothing more than a reasonable best guess, but it would probably be a close match. Maybe you can give it a try? :)
jpqy wrote:Or we can get a Cluster version from asmFish and buy 6 1700X systems and i have same chess power as a 4x Xeon 8180 system ?! or i need 11 systems..
You would need 10 Ryzen systems to achieve the same effective strength. You would then have 80 cores (160 threads) producing 250 MN/s, which is about as good as 112 cores (224 threads) producing 275 MN/s.
Again, all this is nothing more than reasonable guesswork.
Dann Corbit
Posts: 12538
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Nodes/sec. with last new CPU's!

Post by Dann Corbit »

Waschbaer wrote:
Dann Corbit wrote:
Vinvin wrote:
Waschbaer wrote:Interesting

I tried to do the test on my system, Ryzen 7 1800 X at 3.95 GHz 16 threads, 64MByte RAM, so the RAM speed was lower as on system with only 2 RAM slots used.

ASMfish for download: I only found the version 2015-5-15 popcnt.

Result:

Total time (ms) : 186547
Nodes searched : 4857078580
Nodes/second : 26036755
You can find more recent versions of ASMfish here : https://github.com/lantonov/asmFish/tre ... es/Windows

Code: Select all

asmFishW_2017-08-25_base.exe 	
asmFishW_2017-08-25_bmi2.exe 	
asmFishW_2017-08-25_popcnt.exe
The benchmark is supposed to use that particular version (2015-5-15).
In post #1 of this thread the linked info tells:

For this Chess Bench i use asmFishW_2017-05-22
You are right. I stand corrected
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Nodes/sec. with last new CPU's!

Post by Laskos »

Houdini wrote:
Dann Corbit wrote:If there is SMP loss for these chess machines, it is already factored in, because this is the measured speed when actually performing a variety of searches..
No, that is not the case. The SMP (in)efficiency is not factored in the node speed.
An engine with 1 thread producing 10 MN/s is about as strong as with 8 threads producing 15 MN/s combined.

Generally speaking, experience with current engines suggest an effective speed-up of about 1.75 for each doubling of the number of threads (at constant node speed per thread).
Using this formula, the effective strength of 275 MN/s with 112 cores will be close to 160 MN/s with 8 cores.
Inasmuch as the Ryzen 1700 produces about 25 MN/s, it requires 6x more time to achieve this strength. Which still is amazing, considering it's a €300 CPU.
Your formula is naive. Milos is right with his calculation, if efficiency is indeed 0.955. You missed this thread:
http://talkchess.com/forum/viewtopic.php?t=62146
where speed-up was derived according to Amdahl's law as:

speed-up = 1 / (1 - 0.955 + 0.955/n_cores)

It predicted 8 threaded result with high accuracy, having small statistical noise. Probably the most precise experiments on this to our day.
Efficiency of 0.955 could go up with better implementation of SMP, but I guess in this current bench, with NUMA, it might be even lower. So, Milos' 2.4 effective time factor needed to compensate for difference in strength is probably close to reality. These 64+ core monsters are probably a waste strength-wise with engines like Stockfish, Komodo or Houdini (but very useful in testing). Special search algorithms have to be designed, maybe like in Jonny, to gain strength significantly from hundreds of cores.
Vinvin
Posts: 5228
Joined: Thu Mar 09, 2006 9:40 am
Full name: Vincent Lejeune

Re: Nodes/sec. with last new CPU's!

Post by Vinvin »

Dann Corbit wrote:
zullil wrote:
jpqy wrote:Milos is joking..right?! :D

JP.
224 Threads ---- how are these distributed when asmFish search using its Lazy SMP? I'm too busy/lazy to study the code. What depths are these various threads looking at? What size hash would be needed to get real benefit from so many threads?
His printout shows the thread distribution. There are 4 NUMA nodes with threads as described:

asmFishL_2017-05-22_bmi2
*** bench hash 1024 threads 224 depth 26 realtime 0 ***
info string hash set to 1024 MB no large pages
info string node 0 has threads 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 112 116 120 124 128 132 136 140 144 148 152 156 160 164 168 172 176 180 184 188 192 196 200 204 208 212 216 220
info string node 1 has threads 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 113 117 121 125 129 133 137 141 145 149 153 157 161 165 169 173 177 181 185 189 193 197 201 205 209 213 217 221
info string node 2 has threads 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 114 118 122 126 130 134 138 142 146 150 154 158 162 166 170 174 178 182 186 190 194 198 202 206 210 214 218 222
info string node 3 has threads 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 115 119 123 127 131 135 139 143 147 151 155 159 163 167 171 175 179 183 187 191 195 199 203 207 211 215 219 223
Which depth does it reach after 3 minute in the starting position ? :-)
User avatar
Houdini
Posts: 1471
Joined: Tue Mar 16, 2010 12:00 am

Re: Nodes/sec. with last new CPU's!

Post by Houdini »

Until someone actually performs real tests with 64 or more threads it's a bit premature to label one formula naive or outdated.
Your adoption of Amdahl's law is as much an extrapolation as the other formula - the data you used only goes to 8 or 16 threads.

I can easily give you some experience with 24 threads that counters your Amdahl formula.

If one plays a match on a 12-core Xeon with 12 threads running against 24 threads, the 24 (hyper-)threads are running at 0.65x the node speed.
Let's put the formula [speed-up = 1 / (1 - 0.955 + 0.955/n_cores)] to the test:
- for 12 threads the formula predicts a speed-up of 8.0.
- for 24 threads the formula predicts a speed-up of 11.8. Running at 0.65x the node speed, the real speed-up is 11.8 x 0.65 = 7.7.

This means that your formula predicts the 24-thread engine to lose. In fact, using Houdini 5, it wins the match by about 10 Elo as mentioned here: http://www.cruxis.com/chess/manual/core ... gement.htm .

If even for 24 threads your formula doesn't give any useful prediction, how can you make any claims about 224 threads?
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Nodes/sec. with last new CPU's!

Post by zullil »

Vinvin wrote:
Dann Corbit wrote:
zullil wrote:
jpqy wrote:Milos is joking..right?! :D

JP.
224 Threads ---- how are these distributed when asmFish search using its Lazy SMP? I'm too busy/lazy to study the code. What depths are these various threads looking at? What size hash would be needed to get real benefit from so many threads?
His printout shows the thread distribution. There are 4 NUMA nodes with threads as described:

asmFishL_2017-05-22_bmi2
*** bench hash 1024 threads 224 depth 26 realtime 0 ***
info string hash set to 1024 MB no large pages
info string node 0 has threads 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 112 116 120 124 128 132 136 140 144 148 152 156 160 164 168 172 176 180 184 188 192 196 200 204 208 212 216 220
info string node 1 has threads 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 113 117 121 125 129 133 137 141 145 149 153 157 161 165 169 173 177 181 185 189 193 197 201 205 209 213 217 221
info string node 2 has threads 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 114 118 122 126 130 134 138 142 146 150 154 158 162 166 170 174 178 182 186 190 194 198 202 206 210 214 218 222
info string node 3 has threads 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 115 119 123 127 131 135 139 143 147 151 155 159 163 167 171 175 179 183 187 191 195 199 203 207 211 215 219 223
Which depth does it reach after 3 minute in the starting position ? :-)
Thanks, but that's not answering the question I was trying to ask. My question was unclear and probably not that interesting. I was trying to ask about how those threads are used by asmFish's lazySMP implementation. Some number of threads are told to search the root to depth n, others to depth n+1, etc, etc. Here n=26 I guess. And all those threads are sharing the same 1 GB hash table. So yes, a lot of nodes will be visited, but many of those nodes will be duplicates visited repeatedly. So it's not at all clear to me that 224 threads with LazySMP will gain much elo. Seems like other SMP algorithms, properly implemented, should be better. But I'm certainly no expert.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Nodes/sec. with last new CPU's!

Post by Laskos »

Houdini wrote:Until someone actually performs real tests with 64 or more threads it's a bit premature to label one formula naive or outdated.
Your adoption of Amdahl's law is as much an extrapolation as the other formula - the data you used only goes to 8 or 16 threads.

I can easily give you some experience with 24 threads that counters your Amdahl formula.

If one plays a match on a 12-core Xeon with 12 threads running against 24 threads, the 24 (hyper-)threads are running at 0.65x the node speed.
Let's put the formula [speed-up = 1 / (1 - 0.955 + 0.955/n_cores)] to the test:
- for 12 threads the formula predicts a speed-up of 8.0.
- for 24 threads the formula predicts a speed-up of 11.8. Running at 0.65x the node speed, the real speed-up is 11.8 x 0.65 = 7.7.

This means that your formula predicts the 24-thread engine to lose. In fact, using Houdini 5, it wins the match by about 10 Elo as mentioned here: http://www.cruxis.com/chess/manual/core ... gement.htm .

If even for 24 threads your formula doesn't give any useful prediction, how can you make any claims about 224 threads?
Your linked result seems to be inside error margins of exactly this Amdahl's with exactly this efficiency of Stockfish of 0.955. Maybe just a bit higher. You got 7 +/- 10 ELO points gain.
7.7/8.0 is 0.9625, or 3-4 ELO points loss predicted by Amdahl's law (depending on time control, but I guess on 12 cores the doubling in time cannot be too high in ELO even at fast TC). It is also possible that efficiency of Houdini is a bit higher that that of Stockfish, say 0.960 instead of 0.955. The reasons to adopt Amdahl's law are many, and your claim that going from say 32 threads to 64 threads gives the same speed-up of 1.75 as going from 1 thread to 2 seems a bit weird to me. 1 to 2 and 4 I tested even on my meager i7.