Thanks for this information
So,if i want to run games i give AMD Ryzen 1700X 6min. and this 4x Xeon 8180 1min. with 100games.. Ryzen would win it..?!
Or we can get a Cluster version from asmFish and buy 6 1700X systems and i have same chess power as a 4x Xeon 8180 system ?! or i need 11 systems..
JP.
Nodes/sec. with last new CPU's!
Moderators: hgm, Rebel, chrisw
-
- Posts: 550
- Joined: Thu Apr 24, 2008 9:31 am
- Location: Belgium
-
- Posts: 68
- Joined: Mon Dec 12, 2011 11:27 pm
Re: Nodes/sec. with last new CPU's!
In post #1 of this thread the linked info tells:Dann Corbit wrote:The benchmark is supposed to use that particular version (2015-5-15).Vinvin wrote:You can find more recent versions of ASMfish here : https://github.com/lantonov/asmFish/tre ... es/WindowsWaschbaer wrote:Interesting
I tried to do the test on my system, Ryzen 7 1800 X at 3.95 GHz 16 threads, 64MByte RAM, so the RAM speed was lower as on system with only 2 RAM slots used.
ASMfish for download: I only found the version 2015-5-15 popcnt.
Result:
Total time (ms) : 186547
Nodes searched : 4857078580
Nodes/second : 26036755Code: Select all
asmFishW_2017-08-25_base.exe asmFishW_2017-08-25_bmi2.exe asmFishW_2017-08-25_popcnt.exe
For this Chess Bench i use asmFishW_2017-05-22
-
- Posts: 328
- Joined: Mon Jun 11, 2007 6:59 pm
Re: Nodes/sec. with last new CPU's!
Please can you tell us which mainboard you use and was it the right choice?Waschbaer wrote:Interesting
I tried to do the test on my system, Ryzen 7 1800 X at 3.95 GHz 16 threads, 64MByte RAM, so the RAM speed was lower as on system with only 2 RAM slots used.
ASMfish for download: I only found the version 2015-5-15 popcnt.
Result:
Total time (ms) : 186547
Nodes searched : 4857078580
Nodes/second : 26036755
TL
-
- Posts: 1471
- Joined: Tue Mar 16, 2010 12:00 am
Re: Nodes/sec. with last new CPU's!
The number is nothing more than a reasonable best guess, but it would probably be a close match. Maybe you can give it a try?jpqy wrote:So,if i want to run games i give AMD Ryzen 1700X 6min. and this 4x Xeon 8180 1min. with 100games.. Ryzen would win it..?!
You would need 10 Ryzen systems to achieve the same effective strength. You would then have 80 cores (160 threads) producing 250 MN/s, which is about as good as 112 cores (224 threads) producing 275 MN/s.jpqy wrote:Or we can get a Cluster version from asmFish and buy 6 1700X systems and i have same chess power as a 4x Xeon 8180 system ?! or i need 11 systems..
Again, all this is nothing more than reasonable guesswork.
-
- Posts: 12538
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Nodes/sec. with last new CPU's!
You are right. I stand correctedWaschbaer wrote:In post #1 of this thread the linked info tells:Dann Corbit wrote:The benchmark is supposed to use that particular version (2015-5-15).Vinvin wrote:You can find more recent versions of ASMfish here : https://github.com/lantonov/asmFish/tre ... es/WindowsWaschbaer wrote:Interesting
I tried to do the test on my system, Ryzen 7 1800 X at 3.95 GHz 16 threads, 64MByte RAM, so the RAM speed was lower as on system with only 2 RAM slots used.
ASMfish for download: I only found the version 2015-5-15 popcnt.
Result:
Total time (ms) : 186547
Nodes searched : 4857078580
Nodes/second : 26036755Code: Select all
asmFishW_2017-08-25_base.exe asmFishW_2017-08-25_bmi2.exe asmFishW_2017-08-25_popcnt.exe
For this Chess Bench i use asmFishW_2017-05-22
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Nodes/sec. with last new CPU's!
Your formula is naive. Milos is right with his calculation, if efficiency is indeed 0.955. You missed this thread:Houdini wrote:No, that is not the case. The SMP (in)efficiency is not factored in the node speed.Dann Corbit wrote:If there is SMP loss for these chess machines, it is already factored in, because this is the measured speed when actually performing a variety of searches..
An engine with 1 thread producing 10 MN/s is about as strong as with 8 threads producing 15 MN/s combined.
Generally speaking, experience with current engines suggest an effective speed-up of about 1.75 for each doubling of the number of threads (at constant node speed per thread).
Using this formula, the effective strength of 275 MN/s with 112 cores will be close to 160 MN/s with 8 cores.
Inasmuch as the Ryzen 1700 produces about 25 MN/s, it requires 6x more time to achieve this strength. Which still is amazing, considering it's a €300 CPU.
http://talkchess.com/forum/viewtopic.php?t=62146
where speed-up was derived according to Amdahl's law as:
speed-up = 1 / (1 - 0.955 + 0.955/n_cores)
It predicted 8 threaded result with high accuracy, having small statistical noise. Probably the most precise experiments on this to our day.
Efficiency of 0.955 could go up with better implementation of SMP, but I guess in this current bench, with NUMA, it might be even lower. So, Milos' 2.4 effective time factor needed to compensate for difference in strength is probably close to reality. These 64+ core monsters are probably a waste strength-wise with engines like Stockfish, Komodo or Houdini (but very useful in testing). Special search algorithms have to be designed, maybe like in Jonny, to gain strength significantly from hundreds of cores.
-
- Posts: 5228
- Joined: Thu Mar 09, 2006 9:40 am
- Full name: Vincent Lejeune
Re: Nodes/sec. with last new CPU's!
Which depth does it reach after 3 minute in the starting position ?Dann Corbit wrote:His printout shows the thread distribution. There are 4 NUMA nodes with threads as described:zullil wrote:224 Threads ---- how are these distributed when asmFish search using its Lazy SMP? I'm too busy/lazy to study the code. What depths are these various threads looking at? What size hash would be needed to get real benefit from so many threads?jpqy wrote:Milos is joking..right?!
JP.
asmFishL_2017-05-22_bmi2
*** bench hash 1024 threads 224 depth 26 realtime 0 ***
info string hash set to 1024 MB no large pages
info string node 0 has threads 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 112 116 120 124 128 132 136 140 144 148 152 156 160 164 168 172 176 180 184 188 192 196 200 204 208 212 216 220
info string node 1 has threads 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 113 117 121 125 129 133 137 141 145 149 153 157 161 165 169 173 177 181 185 189 193 197 201 205 209 213 217 221
info string node 2 has threads 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 114 118 122 126 130 134 138 142 146 150 154 158 162 166 170 174 178 182 186 190 194 198 202 206 210 214 218 222
info string node 3 has threads 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 115 119 123 127 131 135 139 143 147 151 155 159 163 167 171 175 179 183 187 191 195 199 203 207 211 215 219 223
-
- Posts: 1471
- Joined: Tue Mar 16, 2010 12:00 am
Re: Nodes/sec. with last new CPU's!
Until someone actually performs real tests with 64 or more threads it's a bit premature to label one formula naive or outdated.
Your adoption of Amdahl's law is as much an extrapolation as the other formula - the data you used only goes to 8 or 16 threads.
I can easily give you some experience with 24 threads that counters your Amdahl formula.
If one plays a match on a 12-core Xeon with 12 threads running against 24 threads, the 24 (hyper-)threads are running at 0.65x the node speed.
Let's put the formula [speed-up = 1 / (1 - 0.955 + 0.955/n_cores)] to the test:
- for 12 threads the formula predicts a speed-up of 8.0.
- for 24 threads the formula predicts a speed-up of 11.8. Running at 0.65x the node speed, the real speed-up is 11.8 x 0.65 = 7.7.
This means that your formula predicts the 24-thread engine to lose. In fact, using Houdini 5, it wins the match by about 10 Elo as mentioned here: http://www.cruxis.com/chess/manual/core ... gement.htm .
If even for 24 threads your formula doesn't give any useful prediction, how can you make any claims about 224 threads?
Your adoption of Amdahl's law is as much an extrapolation as the other formula - the data you used only goes to 8 or 16 threads.
I can easily give you some experience with 24 threads that counters your Amdahl formula.
If one plays a match on a 12-core Xeon with 12 threads running against 24 threads, the 24 (hyper-)threads are running at 0.65x the node speed.
Let's put the formula [speed-up = 1 / (1 - 0.955 + 0.955/n_cores)] to the test:
- for 12 threads the formula predicts a speed-up of 8.0.
- for 24 threads the formula predicts a speed-up of 11.8. Running at 0.65x the node speed, the real speed-up is 11.8 x 0.65 = 7.7.
This means that your formula predicts the 24-thread engine to lose. In fact, using Houdini 5, it wins the match by about 10 Elo as mentioned here: http://www.cruxis.com/chess/manual/core ... gement.htm .
If even for 24 threads your formula doesn't give any useful prediction, how can you make any claims about 224 threads?
-
- Posts: 6442
- Joined: Tue Jan 09, 2007 12:31 am
- Location: PA USA
- Full name: Louis Zulli
Re: Nodes/sec. with last new CPU's!
Thanks, but that's not answering the question I was trying to ask. My question was unclear and probably not that interesting. I was trying to ask about how those threads are used by asmFish's lazySMP implementation. Some number of threads are told to search the root to depth n, others to depth n+1, etc, etc. Here n=26 I guess. And all those threads are sharing the same 1 GB hash table. So yes, a lot of nodes will be visited, but many of those nodes will be duplicates visited repeatedly. So it's not at all clear to me that 224 threads with LazySMP will gain much elo. Seems like other SMP algorithms, properly implemented, should be better. But I'm certainly no expert.Vinvin wrote:Which depth does it reach after 3 minute in the starting position ?Dann Corbit wrote:His printout shows the thread distribution. There are 4 NUMA nodes with threads as described:zullil wrote:224 Threads ---- how are these distributed when asmFish search using its Lazy SMP? I'm too busy/lazy to study the code. What depths are these various threads looking at? What size hash would be needed to get real benefit from so many threads?jpqy wrote:Milos is joking..right?!
JP.
asmFishL_2017-05-22_bmi2
*** bench hash 1024 threads 224 depth 26 realtime 0 ***
info string hash set to 1024 MB no large pages
info string node 0 has threads 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 112 116 120 124 128 132 136 140 144 148 152 156 160 164 168 172 176 180 184 188 192 196 200 204 208 212 216 220
info string node 1 has threads 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 113 117 121 125 129 133 137 141 145 149 153 157 161 165 169 173 177 181 185 189 193 197 201 205 209 213 217 221
info string node 2 has threads 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 114 118 122 126 130 134 138 142 146 150 154 158 162 166 170 174 178 182 186 190 194 198 202 206 210 214 218 222
info string node 3 has threads 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 115 119 123 127 131 135 139 143 147 151 155 159 163 167 171 175 179 183 187 191 195 199 203 207 211 215 219 223
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Nodes/sec. with last new CPU's!
Your linked result seems to be inside error margins of exactly this Amdahl's with exactly this efficiency of Stockfish of 0.955. Maybe just a bit higher. You got 7 +/- 10 ELO points gain.Houdini wrote:Until someone actually performs real tests with 64 or more threads it's a bit premature to label one formula naive or outdated.
Your adoption of Amdahl's law is as much an extrapolation as the other formula - the data you used only goes to 8 or 16 threads.
I can easily give you some experience with 24 threads that counters your Amdahl formula.
If one plays a match on a 12-core Xeon with 12 threads running against 24 threads, the 24 (hyper-)threads are running at 0.65x the node speed.
Let's put the formula [speed-up = 1 / (1 - 0.955 + 0.955/n_cores)] to the test:
- for 12 threads the formula predicts a speed-up of 8.0.
- for 24 threads the formula predicts a speed-up of 11.8. Running at 0.65x the node speed, the real speed-up is 11.8 x 0.65 = 7.7.
This means that your formula predicts the 24-thread engine to lose. In fact, using Houdini 5, it wins the match by about 10 Elo as mentioned here: http://www.cruxis.com/chess/manual/core ... gement.htm .
If even for 24 threads your formula doesn't give any useful prediction, how can you make any claims about 224 threads?
7.7/8.0 is 0.9625, or 3-4 ELO points loss predicted by Amdahl's law (depending on time control, but I guess on 12 cores the doubling in time cannot be too high in ELO even at fast TC). It is also possible that efficiency of Houdini is a bit higher that that of Stockfish, say 0.960 instead of 0.955. The reasons to adopt Amdahl's law are many, and your claim that going from say 32 threads to 64 threads gives the same speed-up of 1.75 as going from 1 thread to 2 seems a bit weird to me. 1 to 2 and 4 I tested even on my meager i7.