How are you measuring "scaling"? We have run lots of tests comparing Rybka and Crafty on 8 cores. I have not seen Rybka scale better, unless you cherry-pick one position out of 10 or whatever... I do not believe that Rybka represents a "silver bullet" with regard to parallel search.M ANSARI wrote:Rybka nodes with more than 8 cores are under valued. I remember on a 48 core AMD machine the nodes were unusually low, but the scaling of the 48 cores strength wise (ELO strength) seemed to follow traditional scaling gains when strength tested. I don't think that Rybka has a proper method of giving an accurate guesstimate to what knps it should show for more than 8 cores, but it scales quite well at higher cores, and I would disregard the knps shown as a pointer to performance.
One more thing, although Rybka seems to be the best scaling engine on more than 8 cores today, Zappa Mexico II also has excellent scaling even though its search and evaluation might be outdated. Rybka only managed to reach the scaling of ZM II with R3, and it lagged quite a bit in scaling before that.
Best engine for greater than 8-core SMP system
Moderators: hgm, Rebel, chrisw
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Best engine for greater than 8-core SMP system
-
- Posts: 3707
- Joined: Thu Mar 16, 2006 7:10 pm
Re: Best engine for greater than 8-core SMP system
Actually I have never tested against Crafty, but when testing Rybka against Zappa, the "scaling" I mean was that for example scores of Rybka 2.3.2a against ZMII on single core were say 80 ELO for Rybka, on 2 cores it would something like 60 ELO and so on. This was quite linear until at 8 cores and 5_0 matches, ZMII was pulling very close and pulling ahead when cores were pushed to 5 Ghz. That led me to think that ZMII was scaling better than R 2.3.2a. This was not the case with R3 where scores against ZMII were pretty much the same as you increased cores.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Best engine for greater than 8-core SMP system
That's a very imprecise way of measuring "scaling" and could be a way of measuring "parallel search bugs" in fact.M ANSARI wrote:Actually I have never tested against Crafty, but when testing Rybka against Zappa, the "scaling" I mean was that for example scores of Rybka 2.3.2a against ZMII on single core were say 80 ELO for Rybka, on 2 cores it would something like 60 ELO and so on. This was quite linear until at 8 cores and 5_0 matches, ZMII was pulling very close and pulling ahead when cores were pushed to 5 Ghz. That led me to think that ZMII was scaling better than R 2.3.2a. This was not the case with R3 where scores against ZMII were pretty much the same as you increased cores.
-
- Posts: 3707
- Joined: Thu Mar 16, 2006 7:10 pm
Re: Best engine for greater than 8-core SMP system
Probably, but it is the only way I can think of measuring scaling cross platform, as I have yet see two original engines create similar knps profiles. I also think that "parallel search bugs" are part of the equation when seeing how an engine performs with multiple cores. I guess the more cores you have the more squeaky clean your code has to be, because the chances of a rare bug hitting become less rare. You can see that in the latest R3 derivatives ... they seem to be very stable at single core and even dual cores, but on 8 cores they are most definitely not stable and it is hard to play a 100 game tourney without an exception fault.bob wrote:That's a very imprecise way of measuring "scaling" and could be a way of measuring "parallel search bugs" in fact.M ANSARI wrote:Actually I have never tested against Crafty, but when testing Rybka against Zappa, the "scaling" I mean was that for example scores of Rybka 2.3.2a against ZMII on single core were say 80 ELO for Rybka, on 2 cores it would something like 60 ELO and so on. This was quite linear until at 8 cores and 5_0 matches, ZMII was pulling very close and pulling ahead when cores were pushed to 5 Ghz. That led me to think that ZMII was scaling better than R 2.3.2a. This was not the case with R3 where scores against ZMII were pretty much the same as you increased cores.
-
- Posts: 3232
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
Re: Best engine for greater than 8-core SMP system
And what do you want to do with more than 8-cores ?FlavusSnow wrote:I've done a fair share or research trying to find engines that can use more than 8 cores. Crafty and a couple of crafty's offspring are the only world-class engines that I can find for such hardware.
You should also note that StockFish is probably stronger on 1 CPU and Crafty would be on 8 CPU (sorry Bob, don't take it personnally). Plus it's a free software, and its authors are intelligent and open minded people, unlike some authors of commercial softwares that I will not name...
If you're only interested in ELO strength, the strongest you can get is probably Houdini on 8 cores. It's not open source, but it's free. It's only available on Windows however.
-
- Posts: 1627
- Joined: Thu Mar 09, 2006 12:35 pm
Re: Best engine for greater than 8-core SMP system
The only general and valid way i see for measuring scaling is the simple procedure of let's say taking 10 different positions and measuring the time an engine does to reach a certain depth in each position(the bigger the depth the better) for each hardware and then dividing the times from the different hardwares to see the speedup for each engine, and then you take the average or something to have a good value of the actual speedup.M ANSARI wrote:Probably, but it is the only way I can think of measuring scaling cross platform, as I have yet see two original engines create similar knps profiles.bob wrote:That's a very imprecise way of measuring "scaling" and could be a way of measuring "parallel search bugs" in fact.M ANSARI wrote:Actually I have never tested against Crafty, but when testing Rybka against Zappa, the "scaling" I mean was that for example scores of Rybka 2.3.2a against ZMII on single core were say 80 ELO for Rybka, on 2 cores it would something like 60 ELO and so on. This was quite linear until at 8 cores and 5_0 matches, ZMII was pulling very close and pulling ahead when cores were pushed to 5 Ghz. That led me to think that ZMII was scaling better than R 2.3.2a. This was not the case with R3 where scores against ZMII were pretty much the same as you increased cores.
For example for Rybka,Crafty on Quad, Octal, with positions named P1,P2,...,P10 and depth to reach in the corresponding positions, D1(e.g 25 plies),D2,..,D10, with times to reach every depth on every position, t1,t2,...,t10 :
You first let Rybka run each of the positions on the Quad and you keep the time:
Quad_P1: time to reach D1 -> tq1
Quad_P2: time to reach D2 -> tq2
...............................
Quad_P10: time to reach D10 -> tq10
Then you let Rybka run each of the positions on the Octal. And you keep the time again:
Octal_P1: time to reach D1 -> to1
Octal_P2: time to reach D2 -> to2
...............................
Octal_P10: time to reach D10 -> to10
And the speedup, the scaling from Quad to Octal for Rybka is:
Position-1: to1/tq1
Position-2: to2/tq2
...........................
Position-10: to10/tq10
So average speedup = the average of the above.
The same for Crafty.
This seems a legitimate method of measuring the actual speedup, i.e the scaling.
Is there a better one or does this has any flaw?
After his son's birth they've asked him:
"Is it a boy or girl?"
YES! He replied.....
"Is it a boy or girl?"
YES! He replied.....
-
- Posts: 89
- Joined: Thu Apr 01, 2010 5:28 am
- Location: Omaha, NE
Re: Best engine for greater than 8-core SMP system
I most often run Stockfish on 4 cores at 3.4 Ghz, but I've had that machine for over a year now and my budget is annual (looking to upgrade). Unfortunately, for less than $2,000, it seems like there isn't much of a hardware upgrade that would do noticeably better than what I already have.
I don't have a particular reason to do the 10k game tests like each of the authors do, so I don't see any other benefit of getting another system. I have volunteered CPU time to a handful of chess projects, but I've gotten no responses. So for now I think the quad core machine will just stay what it is, playing on FICS most days.
I don't have a particular reason to do the 10k game tests like each of the authors do, so I don't see any other benefit of getting another system. I have volunteered CPU time to a handful of chess projects, but I've gotten no responses. So for now I think the quad core machine will just stay what it is, playing on FICS most days.
-
- Posts: 6401
- Joined: Thu Mar 09, 2006 8:30 pm
- Location: Chicago, Illinois, USA
Re: Best engine for greater than 8-core SMP system
10 positions is way too few, because the speed up is very dependent on positions. I would choose enough so the SD becomes small enough.George Tsavdaris wrote:The only general and valid way i see for measuring scaling is the simple procedure of let's say taking 10 different positions and measuring the time an engine does to reach a certain depth in each position(the bigger the depth the better) for each hardware and then dividing the times from the different hardwares to see the speedup for each engine, and then you take the average or something to have a good value of the actual speedup.M ANSARI wrote:Probably, but it is the only way I can think of measuring scaling cross platform, as I have yet see two original engines create similar knps profiles.bob wrote:That's a very imprecise way of measuring "scaling" and could be a way of measuring "parallel search bugs" in fact.M ANSARI wrote:Actually I have never tested against Crafty, but when testing Rybka against Zappa, the "scaling" I mean was that for example scores of Rybka 2.3.2a against ZMII on single core were say 80 ELO for Rybka, on 2 cores it would something like 60 ELO and so on. This was quite linear until at 8 cores and 5_0 matches, ZMII was pulling very close and pulling ahead when cores were pushed to 5 Ghz. That led me to think that ZMII was scaling better than R 2.3.2a. This was not the case with R3 where scores against ZMII were pretty much the same as you increased cores.
For example for Rybka,Crafty on Quad, Octal, with positions named P1,P2,...,P10 and depth to reach in the corresponding positions, D1(e.g 25 plies),D2,..,D10, with times to reach every depth on every position, t1,t2,...,t10 :
You first let Rybka run each of the positions on the Quad and you keep the time:
Quad_P1: time to reach D1 -> tq1
Quad_P2: time to reach D2 -> tq2
...............................
Quad_P10: time to reach D10 -> tq10
Then you let Rybka run each of the positions on the Octal. And you keep the time again:
Octal_P1: time to reach D1 -> to1
Octal_P2: time to reach D2 -> to2
...............................
Octal_P10: time to reach D10 -> to10
And the speedup, the scaling from Quad to Octal for Rybka is:
Position-1: to1/tq1
Position-2: to2/tq2
...........................
Position-10: to10/tq10
So average speedup = the average of the above.
The same for Crafty.
This seems a legitimate method of measuring the actual speedup, i.e the scaling.
Is there a better one or does this has any flaw?
The alternative is to measure time to solution for positions that are relativerly quiet, like the STS.
I think that the only method that takes into account all the potential issues and parameters is to measure Delta ELO vs N of CPUs, but of course, it is time consuming.
Miguel
-
- Posts: 1627
- Joined: Thu Mar 09, 2006 12:35 pm
Re: Best engine for greater than 8-core SMP system
If the speed up is heavily dependent on the positions then this method is not so good after all.michiguel wrote:10 positions is way too few, because the speed up is very dependent on positions. I would choose enough so the SD becomes small enough.George Tsavdaris wrote: The only general and valid way i see for measuring scaling is the simple procedure of let's say taking 10 different positions and measuring the time an engine does to reach a certain depth in each position(the bigger the depth the better) for each hardware and then dividing the times from the different hardwares to see the speedup for each engine, and then you take the average or something to have a good value of the actual speedup.
For example for Rybka,Crafty on Quad, Octal, with positions named P1,P2,...,P10 and depth to reach in the corresponding positions, D1(e.g 25 plies),D2,..,D10, with times to reach every depth on every position, t1,t2,...,t10 :
You first let Rybka run each of the positions on the Quad and you keep the time:
Quad_P1: time to reach D1 -> tq1
Quad_P2: time to reach D2 -> tq2
...............................
Quad_P10: time to reach D10 -> tq10
Then you let Rybka run each of the positions on the Octal. And you keep the time again:
Octal_P1: time to reach D1 -> to1
Octal_P2: time to reach D2 -> to2
...............................
Octal_P10: time to reach D10 -> to10
And the speedup, the scaling from Quad to Octal for Rybka is:
Position-1: to1/tq1
Position-2: to2/tq2
...........................
Position-10: to10/tq10
So average speedup = the average of the above.
The same for Crafty.
This seems a legitimate method of measuring the actual speedup, i.e the scaling.
Is there a better one or does this has any flaw?
But in fact i can't understand why they are dependent on the positions.
I mean a very small difference is normal, but a big one as you propose is a puzzle for me. I would have expected only tiny differences and even 10 positions to be many and actually a bit of a waste of time.
Does it really happening such a high deviation of the speedups noticed between various positions?
What is Delta ELO?I think that the only method that takes into account all the potential issues and parameters is to measure Delta ELO vs N of CPUs, but of course, it is time consuming.
After his son's birth they've asked him:
"Is it a boy or girl?"
YES! He replied.....
"Is it a boy or girl?"
YES! He replied.....
-
- Posts: 4190
- Joined: Wed Nov 25, 2009 1:47 am
Re: Best engine for greater than 8-core SMP system
Before posting ridiculous stuff, try to at least read Bob's paper on parallel search...George Tsavdaris wrote:But in fact i can't understand why they are dependent on the positions.
I mean a very small difference is normal, but a big one as you propose is a puzzle for me. I would have expected only tiny differences and even 10 positions to be many and actually a bit of a waste of time.
Does it really happening such a high deviation of the speedups noticed between various positions?