bob wrote:
I don't use positions for anything other than very quick evaluation of a search change, or for measuring changes to the SMP code. Everything else I do is game-based. The problem with Elo and SMP is this.
Program A gains +50 Elo from using 4 cpus.
Program B gains +70 Elo from using 4 cpus.
Which one has the better SMP algorithm?
You can't answer that. A may search much deeper than B, and we know for certain that adding a ply has a diminishing return on elo gain. If B is much slower than A, then B might have a _worse_ SMP algorithm, but the speed gain helps B more than A.
That's the problem with using Elo, until you have a direct Elo to speed gain for each program. And once you have that, you don't need the Elo stuff.... Not to compare SMP effectiveness.
And I don't use it to compare unrelated programs, but variants of the same program.
I should have added that my objective is to avoid the use game playing for tuning purposes. Instead I reserve game playing for validation of the method. I don't have access to a gazillion number of CPUs exploding my electricity bill and emitting greenhouse gasses along the way. A game provides just about 1 bit of information, that seems quite wasteful to me.
bob wrote:The elo per cpu is not very accurate, and doesn't compare program to program, and a program actually might gain more elo in one part of the game than in another. Too many variables. Too many averages of averages. Everyone has a good idea of what happens when a specific program runs 2x faster. Some get +50, some +70, some even more, some even less. But running 2x faster is a known quantity, which for any given program has a resulting implied Elo gain. Given good speedup data, I can predict the Elo gain more accurately than using the Elo gain to predict the speedup, with no speedup data to use for verification.
yes, I like to use games for all testing and decision-making. But for parallel search, speedup is by far the best measurement to base programming decisions on. There is a direct coupling. The Elo measurement is an indirect coupling.
Bunch of crap.
You reverse the argument and what is relevant for Elo you claim for "speedup". Elo is very valid and precise metric. Speedup on selected position is nothing but your own "condolence metric".
If my program by doubling number of cores gains 50 Elo, and yours only 30 Elo, my SMP implementation is better. End of story. No ifs and buts. You can rationalize until the end of the world but you are still wrong.
bob wrote:
Program A gains +50 Elo from using 4 cpus.
Program B gains +70 Elo from using 4 cpus.
Which one has the better SMP algorithm?
You can't answer that. A may search much deeper than B, and we know for certain that adding a ply has a diminishing return on elo gain. If B is much slower than A, then B might have a _worse_ SMP algorithm, but the speed gain helps B more than A.
That's the problem with using Elo, until you have a direct Elo to speed gain for each program. And once you have that, you don't need the Elo stuff.... Not to compare SMP effectiveness.
Bunch of rationalizations. SMP of program B is of course better.
Your ego is just bigger than the Moon. You can't admit you are wrong even though anyone knows that. Sorry, there are better SMP implementations than yours, no matter how hard this is for you to admit. Houdart's implementation is one of them.
Laskos wrote:And on topic, Houdini 1.5 parallelization _practically_ works better than Crafty's
Data?
Don't have right now, but I did the following: NPS and time to depth are useless.
In other words, you have no data. Time to depth is the _only_ way to measure SMP performance. and I do mean the _ONLY_ way.
A little silly from you. You must know that the ELO gain and time to solution are MUCH more reliable. Depth "8 core" is not the same as Depth "1 core" (maybe in Crafty it's the same, that would be pretty unusual).
Kai
You do realize that for _most_ positions, time to depth and time to solution are _identical_? Only those odd positions where a program changes its mind due to pure non-determinism will be different. Choose enough positions and those won't be a problem. If there is a clear best move, a program should find that move at the same depth, same PV, same score, most of the time. Hence a lot of runs to wash out the sampling error.
What I started doing recently, and I'm not claiming it is sound yet but it looks promising, is measuring score difference with as oracle an insanely long search of say 60 minutes.
I take this oracle reference and then measure the difference with, say, a 3 minute search on 1 core. I square these errors, average them and square root at the end. That is the 'performance indicator for the 3 minute single core search. Closer to 0 is better.
I compare that then with what 30 second searches on a 6-core give. This would be a larger number because a single thread search is more effective. But at some longer interval, say 60 seconds, the indicators are the same. Then I say that I have a 'speedup equivalent' of 3.0 on 6 cores.
The idea is that I get more than 1 bit of information out of every position search, and thus need fewer positions to get a good indicator. The downside is the need for long oracle searches. I had a few computers calculate those while I was traveling for a few weeks.
What about the positions where there is really a "right" score and everything else is wrong? Mate-in-N as one example. But any position where you can win material, or win a positional concession, or whatever. I am not so interested in the variance of the score, because one can produce enough of that by just playing the same game N times, ponder=on, and just taking a different amount of time before entering each move.
The only goal of SMP is to search faster. You can use time to complete a specific depth, or time to find the best move, but you do need a "time" measurement since the SMP performance metric is "speedup". Defined as
speedup = 1-cpu-time / N-cpu-time.
Normally N-cpu-time is smaller so the speedup is a number > 1.0, hopefully.
I have selected the test positions to have no scores outside abs(score)>2.5 I don't care too much what the engine does outside that range.
It is not just for SMP testing, that is a special case. I judge any search change with it at the moment.
To do a full validation of the method takes me some more time, but I'm setting it up, hopefully some data by this summer. A similar method works for me for judging evaluation changes, but that is a simpler problem.
I don't use positions for anything other than very quick evaluation of a search change, or for measuring changes to the SMP code. Everything else I do is game-based. The problem with Elo and SMP is this.
Program A gains +50 Elo from using 4 cpus.
Program B gains +70 Elo from using 4 cpus.
Which one has the better SMP algorithm?
You can't answer that. A may search much deeper than B, and we know for certain that adding a ply has a diminishing return on elo gain. If B is much slower than A, then B might have a _worse_ SMP algorithm, but the speed gain helps B more than A.
That's the problem with using Elo, until you have a direct Elo to speed gain for each program. And once you have that, you don't need the Elo stuff.... Not to compare SMP effectiveness.
Elo is a measure for the entire program not only for SMP. SMP mean nothing if the branching factor is for example 6. You can add 3 more processors without any benefit.
bob wrote:
I don't use positions for anything other than very quick evaluation of a search change, or for measuring changes to the SMP code. Everything else I do is game-based. The problem with Elo and SMP is this.
Program A gains +50 Elo from using 4 cpus.
Program B gains +70 Elo from using 4 cpus.
Which one has the better SMP algorithm?
You can't answer that. A may search much deeper than B, and we know for certain that adding a ply has a diminishing return on elo gain. If B is much slower than A, then B might have a _worse_ SMP algorithm, but the speed gain helps B more than A.
That's the problem with using Elo, until you have a direct Elo to speed gain for each program. And once you have that, you don't need the Elo stuff.... Not to compare SMP effectiveness.
And I don't use it to compare unrelated programs, but variants of the same program.
I should have added that my objective is to avoid the use game playing for tuning purposes. Instead I reserve game playing for validation of the method. I don't have access to a gazillion number of CPUs exploding my electricity bill and emitting greenhouse gasses along the way. A game provides just about 1 bit of information, that seems quite wasteful to me.
The _only_ reason to measure SMP performance is to compare it to another program to see if you are better or worse. If worse, you know you have some work to do. If better, what you have might be as good as can be expected.
To compare numbers, we have to have a common frame of refrerence. I personally believe that _anyone_ can grasp the idea that if a program is 1.7x faster, it is going to experience an Elo gain that can be predicted if interested. But _everybody_ knows that going 2x faster is better in all cases. And everyone knows that you are not consistently going to run more than 2x faster with just two cpus. 2x is an upper bound, 1x ought to be the lower bound. How are you doing? And how does it compare to my results? Speedup is the way to express that in parallel algorithm circles...
bob wrote:
Program A gains +50 Elo from using 4 cpus.
Program B gains +70 Elo from using 4 cpus.
Which one has the better SMP algorithm?
You can't answer that. A may search much deeper than B, and we know for certain that adding a ply has a diminishing return on elo gain. If B is much slower than A, then B might have a _worse_ SMP algorithm, but the speed gain helps B more than A.
That's the problem with using Elo, until you have a direct Elo to speed gain for each program. And once you have that, you don't need the Elo stuff.... Not to compare SMP effectiveness.
Bunch of rationalizations. SMP of program B is of course better.
Your ego is just bigger than the Moon. You can't admit you are wrong even though anyone knows that. Sorry, there are better SMP implementations than yours, no matter how hard this is for you to admit. Houdart's implementation is one of them.
I figured you didn't understand. Assume Both have a EBF of 2.0. A factor of 2.0 gives you another ply. B goes from 24 to 25 and gains only 50 Elo. A goes from 10 to 11 plies and gains 70 elo. A is no more efficient than B in terms of parallel search, both are giving _exactly_ 2.0x speedup.
You need to understand the topic before jumping in to the discussion....
bob wrote:The only goal of SMP is to search faster. You can use time to complete a specific depth, or time to find the best move, but you do need a "time" measurement since the SMP performance metric is "speedup". Defined as
speedup = 1-cpu-time / N-cpu-time.
Normally N-cpu-time is smaller so the speedup is a number > 1.0, hopefully.
Wrong. Again you use your favorite logic fallacy - Ignoratio elenchi. Searching faster is irrelevant. The only relevant goal of SMP (and any other chess programming technique) is to increase the strength of a chess program.
As you claim yourself, ratio between "speed of search" and ELO is not linear. Ergo "speed of search" is not a relevant metric. Ergo your insisting on it is nothing but ignoratio elenchi. Q.E.D.
Or, your argument is simply "ignorant" (simple form)...
Those that understand the topic get it. Those that don't try to convince others they are wrong. Good luck with that.
bob wrote:Those that understand the topic get it. Those that don't try to convince others they are wrong. Good luck with that.
Calling for authority or simply patronizing the other party in argument doesn't make you be less wrong.
You are the person that never admits that he's wrong. There is not a single post at the whole CCC where Bob Hyatt admitted he was wrong for something. Just that speaks volumes about ego...
bob wrote:I figured you didn't understand. Assume Both have a EBF of 2.0. A factor of 2.0 gives you another ply. B goes from 24 to 25 and gains only 50 Elo. A goes from 10 to 11 plies and gains 70 elo. A is no more efficient than B in terms of parallel search, both are giving _exactly_ 2.0x speedup.
Diminishing returns theory of Heinz is just wrong today. Most of your assumptions about computer chess are from the previous millennium when LMR was thought as stupid and useless. Try once for a change testing a program that is 400 Elo stronger than yours, you might really get surprised about your convictions.
I am not sure you will understand this but I'll give it a try.
10 years ago (and probably in recent Crafty versions) relative variance of EBF between plies was less than 20%. In modern programs with aggressive pruning relative variance of EBF is more than 50%. You can not make any conclusions about diminishing returns before going up to depths of like 50 in middle game positions. All conclusion before that are statistically irrelevant.
bob wrote:
I don't use positions for anything other than very quick evaluation of a search change, or for measuring changes to the SMP code. Everything else I do is game-based. The problem with Elo and SMP is this.
Program A gains +50 Elo from using 4 cpus.
Program B gains +70 Elo from using 4 cpus.
Which one has the better SMP algorithm?
You can't answer that. A may search much deeper than B, and we know for certain that adding a ply has a diminishing return on elo gain. If B is much slower than A, then B might have a _worse_ SMP algorithm, but the speed gain helps B more than A.
That's the problem with using Elo, until you have a direct Elo to speed gain for each program. And once you have that, you don't need the Elo stuff.... Not to compare SMP effectiveness.
And I don't use it to compare unrelated programs, but variants of the same program.
I should have added that my objective is to avoid the use game playing for tuning purposes. Instead I reserve game playing for validation of the method. I don't have access to a gazillion number of CPUs exploding my electricity bill and emitting greenhouse gasses along the way. A game provides just about 1 bit of information, that seems quite wasteful to me.
The _only_ reason to measure SMP performance is to compare it to another program to see if you are better or worse. If worse, you know you have some work to do. If better, what you have might be as good as can be expected.
To compare numbers, we have to have a common frame of refrerence. I personally believe that _anyone_ can grasp the idea that if a program is 1.7x faster, it is going to experience an Elo gain that can be predicted if interested. But _everybody_ knows that going 2x faster is better in all cases. And everyone knows that you are not consistently going to run more than 2x faster with just two cpus. 2x is an upper bound, 1x ought to be the lower bound. How are you doing? And how does it compare to my results? Speedup is the way to express that in parallel algorithm circles...
Yes, and elo is the way to express performance in chess circles.
You can apply the same reasoning, replacing "SMP" with "NPS". Still, focussing on going from "fast NPS" to "extreme NPS" is one of the last things to do to improve elo performance. There is only a finite amount of water to squeeze out of that stone. Same holds for SMP. Who cares if your speedup is 2.9 with a dumb algorithm instead of 3.5 with some extreme measures on 4 cores if on the other side of the room there is a branching factor to be reduced, or if you can add knowledge making better use of the system as a whole.
With diminishing returns of extra plies, if it is true, speedup becomes even less relevant. You can use the extra CPU's to apply different algorithms. Algorithms that make less sense on a single CPU but scale better than PVS does and eventually outperform PVS beyond some number of cores. Speedup is completely blind for that while measuring elo catches it nicely and guides the right way.
Coming back to where the thread left of, the statement that speedup is the only metric in town: I'm experimenting with an alternative which is a closer derivative of elo than measuring time-to-depth, and much cheaper than playing games.