Questions on SMP search

bob · Post by **bob** » Thu Apr 28, 2011 12:15 am

Milos wrote:
bob wrote:I figured you didn't understand. Assume Both have a EBF of 2.0. A factor of 2.0 gives you another ply. B goes from 24 to 25 and gains only 50 Elo. A goes from 10 to 11 plies and gains 70 elo. A is no more efficient than B in terms of parallel search, both are giving _exactly_ 2.0x speedup.
Diminishing returns theory of Heinz is just wrong today. Most of your assumptions about computer chess are from the previous millennium when LMR was thought as stupid and useless. Try once for a change testing a program that is 400 Elo stronger than yours, you might really get surprised about your convictions.

I am not sure you will understand this but I'll give it a try.
10 years ago (and probably in recent Crafty versions) relative variance of EBF between plies was less than 20%. In modern programs with aggressive pruning relative variance of EBF is more than 50%. You can not make any conclusions about diminishing returns before going up to depths of like 50 in middle game positions. All conclusion before that are statistically irrelevant.

Diminishing returns is still a constant in computer chess specifically, and in tree-searching in general. To believe otherwise, after all the data that has been published since the 1980's just defies rationality...

Milos · Post by **Milos** » Thu Apr 28, 2011 12:27 am

bob wrote:Diminishing returns is still a constant in computer chess specifically, and in tree-searching in general. To believe otherwise, after all the data that has been published since the 1980's just defies rationality...

You can fit any kind of straight line through 3 dots. That's what Heinz did (with laughable 3000 test games) and that's not science.
Computer chess was never a serious research topic (I already wrote about that before). What has passed a review process in ICGA journal would not have passed a review process in other more serious journals. In the other words, if ICGA had any scientific relevance its impact factor wouldn't be miserable 0.5. Sorry if I offended your feelings (still you can be sure I won't go so low into personal level as you did in the other thread) but science is one thing and computer chess is another.
Holding for "scientific evidence" from 10 or 20 years ago as there is no tomorrow is nothing but elderly stubbornness...

Milos · Post by **Milos** » Thu Apr 28, 2011 1:13 am

bob wrote:Diminishing returns is still a constant in computer chess specifically, and in tree-searching in general. To believe otherwise, after all the data that has been published since the 1980's just defies rationality...

Since this is a technical thread just to illustrate for ppl who didn't read all that "huge" theory published since 1980's how meaningless are your believes.
All that data used are collected using only 3 different programs (if we exclude more than prehistoric Bell) - Crafty, Fritz and DarkThought. The "newest" was Crafty 19.6. None of these programs had any kind of LMR and the amount of pruning was ridiculously small compared to todays standards. For the time necessary for those program to reach depth of 12, Stockish today reaches depth 30. So try to imagine how valid is Bob's theory about "same shape of the search tree" all around...
Moreover, the strongest of these programs was more than 700 (seven hundred) Elo weaker than todays strongest.
The main "method" to validate the hypotheses was the ratio between previous best and fresh best moves between iterations. Highly dubious approach at best.
All the programs used irrelevant and completely uncertain "test positions methodology" obviously under strong influence of Bob (with the same set of position Bob used, originating from times before even integrated circuits were invented). So much about "independent testing".
And finally, the newest of the papers (which is still so full of holes, especially on statistics part) even using the old data couldn't verify most of Heinz's hypotheses.

This is all that huge "science" our Bob so firmly bases his believes on...
In addition 90% of that science is covered by one single PhD thesis of E. Heinz. So much about depth of research field...

Albert Silver · Post by **Albert Silver** » Thu Apr 28, 2011 1:27 am

bob wrote:
Milos wrote:
bob wrote:I figured you didn't understand. Assume Both have a EBF of 2.0. A factor of 2.0 gives you another ply. B goes from 24 to 25 and gains only 50 Elo. A goes from 10 to 11 plies and gains 70 elo. A is no more efficient than B in terms of parallel search, both are giving _exactly_ 2.0x speedup.
Diminishing returns theory of Heinz is just wrong today. Most of your assumptions about computer chess are from the previous millennium when LMR was thought as stupid and useless. Try once for a change testing a program that is 400 Elo stronger than yours, you might really get surprised about your convictions.

I am not sure you will understand this but I'll give it a try.
10 years ago (and probably in recent Crafty versions) relative variance of EBF between plies was less than 20%. In modern programs with aggressive pruning relative variance of EBF is more than 50%. You can not make any conclusions about diminishing returns before going up to depths of like 50 in middle game positions. All conclusion before that are statistically irrelevant.
Diminishing returns is still a constant in computer chess specifically, and in tree-searching in general. To believe otherwise, after all the data that has been published since the 1980's just defies rationality...

Yes, the logic behind it is really crystal clear, even without exact data:

You search four plies deep, and this depth will determine your move. One more ply is more than just twice as many nodes (presuming a 2.0 branching factor). One more ply is seeing 25% deeper and that extra 25% affects your root decision. If it doesn't affect your decision then nothing was gained by the extra ply after all.

Now compare 18 plies depth to 19 plies depth. Will that extra 19th ply over the 18th really affect the root decision as often and as decisively as the 5th ply over only four?

In a sense it is much like the actual expectations in plain Elos. Suppose you are as strong as David but I am 100 Elo more than you. Having zero Elo edge, you are expected to score 50% against him whereas my 100 Elo edge is expected to garner me ~63%. Now suppose you are 400 Elo more than him, and I continue to be 100 Elo more than you. You are expected to score 91% while I score 96% for an increase of only 5%. Diminishing returns.

That said, reinvestigating it to get more precise numbers would certainly be worthwhile and far more feasible with today's hardware and software.

Milos · Post by **Milos** » Thu Apr 28, 2011 1:38 am

Albert Silver wrote:You search four plies deep, and this depth will determine your move. One more ply is more than just twice as many nodes (presuming a 2.0 branching factor). One more ply is seeing 25% deeper and that extra 25% affects your root decision. If it doesn't affect your decision then nothing was gained by the extra ply after all.
Now compare 18 plies depth to 19 plies depth. Will that extra 19th ply over the 18th really affect the root decision as often and as decisively as the 5th ply over only four?

There absolutely no valid theory that gives the function between how often you change your mind when searching and Elo. That's why the whole "fresh best vs. previous best" metric is dubious at best. Since finding the best move at different depths has different impact on strength.

bob · Post by **bob** » Thu Apr 28, 2011 1:47 am

Karlo Bala wrote:
bob wrote:
marcelk wrote:
bob wrote:
marcelk wrote:
bob wrote:
Laskos wrote:
bob wrote:
Laskos wrote:
Albert Silver wrote:
Laskos wrote:And on topic, Houdini 1.5 parallelization _practically_ works better than Crafty's
Data?
Don't have right now, but I did the following:
NPS and time to depth are useless.

In other words, you have no data. Time to depth is the _only_ way to measure SMP performance. and I do mean the _ONLY_ way.

A little silly from you. You must know that the ELO gain and time to solution are MUCH more reliable. Depth "8 core" is not the same as Depth "1 core" (maybe in Crafty it's the same, that would be pretty unusual).

Kai
You do realize that for _most_ positions, time to depth and time to solution are _identical_? Only those odd positions where a program changes its mind due to pure non-determinism will be different. Choose enough positions and those won't be a problem. If there is a clear best move, a program should find that move at the same depth, same PV, same score, most of the time. Hence a lot of runs to wash out the sampling error.
What I started doing recently, and I'm not claiming it is sound yet but it looks promising, is measuring score difference with as oracle an insanely long search of say 60 minutes.

I take this oracle reference and then measure the difference with, say, a 3 minute search on 1 core. I square these errors, average them and square root at the end. That is the 'performance indicator for the 3 minute single core search. Closer to 0 is better.

I compare that then with what 30 second searches on a 6-core give. This would be a larger number because a single thread search is more effective. But at some longer interval, say 60 seconds, the indicators are the same. Then I say that I have a 'speedup equivalent' of 3.0 on 6 cores.

The idea is that I get more than 1 bit of information out of every position search, and thus need fewer positions to get a good indicator. The downside is the need for long oracle searches. I had a few computers calculate those while I was traveling for a few weeks.
What about the positions where there is really a "right" score and everything else is wrong? Mate-in-N as one example. But any position where you can win material, or win a positional concession, or whatever. I am not so interested in the variance of the score, because one can produce enough of that by just playing the same game N times, ponder=on, and just taking a different amount of time before entering each move.

The only goal of SMP is to search faster. You can use time to complete a specific depth, or time to find the best move, but you do need a "time" measurement since the SMP performance metric is "speedup". Defined as

speedup = 1-cpu-time / N-cpu-time.

Normally N-cpu-time is smaller so the speedup is a number > 1.0, hopefully.
I have selected the test positions to have no scores outside abs(score)>2.5 I don't care too much what the engine does outside that range.

It is not just for SMP testing, that is a special case. I judge any search change with it at the moment.

To do a full validation of the method takes me some more time, but I'm setting it up, hopefully some data by this summer. A similar method works for me for judging evaluation changes, but that is a simpler problem.
I don't use positions for anything other than very quick evaluation of a search change, or for measuring changes to the SMP code. Everything else I do is game-based. The problem with Elo and SMP is this.

Program A gains +50 Elo from using 4 cpus.
Program B gains +70 Elo from using 4 cpus.

Which one has the better SMP algorithm?

You can't answer that. A may search much deeper than B, and we know for certain that adding a ply has a diminishing return on elo gain. If B is much slower than A, then B might have a _worse_ SMP algorithm, but the speed gain helps B more than A.

That's the problem with using Elo, until you have a direct Elo to speed gain for each program. And once you have that, you don't need the Elo stuff.... Not to compare SMP effectiveness.
Elo is a measure for the entire program not only for SMP. SMP mean nothing if the branching factor is for example 6. You can add 3 more processors without any benefit.

Sorry, but that is simply wrong. A program with a branching factor of 6 gets nothing from a CPU twice as fast? Absolutely false. Backed up by years and years of data...

bob · Post by **bob** » Thu Apr 28, 2011 1:48 am

Milos wrote:
bob wrote:Those that understand the topic get it. Those that don't try to convince others they are wrong. Good luck with that.
Calling for authority or simply patronizing the other party in argument doesn't make you be less wrong.
You are the person that never admits that he's wrong. There is not a single post at the whole CCC where Bob Hyatt admitted he was wrong for something. Just that speaks volumes about ego...

Actually there are several, you just don't read the technical stuff enough to see them. That's your lack of attention, not my lack of humility...

bob · Post by **bob** » Thu Apr 28, 2011 1:50 am

Milos wrote:
bob wrote:Diminishing returns is still a constant in computer chess specifically, and in tree-searching in general. To believe otherwise, after all the data that has been published since the 1980's just defies rationality...
You can fit any kind of straight line through 3 dots. That's what Heinz did (with laughable 3000 test games) and that's not science.
Computer chess was never a serious research topic (I already wrote about that before). What has passed a review process in ICGA journal would not have passed a review process in other more serious journals. In the other words, if ICGA had any scientific relevance its impact factor wouldn't be miserable 0.5. Sorry if I offended your feelings (still you can be sure I won't go so low into personal level as you did in the other thread) but science is one thing and computer chess is another.
Holding for "scientific evidence" from 10 or 20 years ago as there is no tomorrow is nothing but elderly stubbornness...

Your lack of understanding apparently knows no bounds. Do you know why the impact of the ICGA is so low? How big a field do you think computer games are, compared to all the other research that is ongoing?

jeez...

bob · Post by **bob** » Thu Apr 28, 2011 1:53 am

Milos wrote:
bob wrote:Diminishing returns is still a constant in computer chess specifically, and in tree-searching in general. To believe otherwise, after all the data that has been published since the 1980's just defies rationality...
Since this is a technical thread just to illustrate for ppl who didn't read all that "huge" theory published since 1980's how meaningless are your believes.
All that data used are collected using only 3 different programs (if we exclude more than prehistoric Bell) - Crafty, Fritz and DarkThought. The "newest" was Crafty 19.6. None of these programs had any kind of LMR and the amount of pruning was ridiculously small compared to todays standards. For the time necessary for those program to reach depth of 12, Stockish today reaches depth 30. So try to imagine how valid is Bob's theory about "same shape of the search tree" all around...
Moreover, the strongest of these programs was more than 700 (seven hundred) Elo weaker than todays strongest.
The main "method" to validate the hypotheses was the ratio between previous best and fresh best moves between iterations. Highly dubious approach at best.
All the programs used irrelevant and completely uncertain "test positions methodology" obviously under strong influence of Bob (with the same set of position Bob used, originating from times before even integrated circuits were invented). So much about "independent testing".
And finally, the newest of the papers (which is still so full of holes, especially on statistics part) even using the old data couldn't verify most of Heinz's hypotheses.

This is all that huge "science" our Bob so firmly bases his believes on...
In addition 90% of that science is covered by one single PhD thesis of E. Heinz. So much about depth of research field...

Please cite _any_ source that claims that somehow, magically, LMR invalidates the concept of diminishing returns. Of course you can't, because there isn't one. All we are doing is going deeper along selective lines, ignoring more garbage than we used to, but programs still have horizon effects and quite often can't see deep tactics. Take Shirov's Bh3 move from a few years ago as a prime example. So get off the "I know everything about computer chess and tree searching" and get back to reality.

bob · Post by **bob** » Thu Apr 28, 2011 1:55 am

Milos wrote:
bob wrote:The elo per cpu is not very accurate, and doesn't compare program to program, and a program actually might gain more elo in one part of the game than in another. Too many variables. Too many averages of averages. Everyone has a good idea of what happens when a specific program runs 2x faster. Some get +50, some +70, some even more, some even less. But running 2x faster is a known quantity, which for any given program has a resulting implied Elo gain. Given good speedup data, I can predict the Elo gain more accurately than using the Elo gain to predict the speedup, with no speedup data to use for verification.

yes, I like to use games for all testing and decision-making. But for parallel search, speedup is by far the best measurement to base programming decisions on. There is a direct coupling. The Elo measurement is an indirect coupling.
Bunch of crap.
You reverse the argument and what is relevant for Elo you claim for "speedup". Elo is very valid and precise metric. Speedup on selected position is nothing but your own "condolence metric".
If my program by doubling number of cores gains 50 Elo, and yours only 30 Elo, my SMP implementation is better. End of story. No ifs and buts. You can rationalize until the end of the world but you are still wrong.

Only to someone that doesn't know what he is talking about. Find _any_ parallel processing journal article that discusses _any_ metric rather than simple speedup. And I do mean _any_.

Then come back and continue. That ought to give us a few years break from your "I know everything" nonsense.

Questions on SMP search

Re: Questions on SMP search

Re: Questions on SMP search

Re: Questions on SMP search

Re: Questions on SMP search

Re: Questions on SMP search

Re: Questions on SMP search

Re: Questions on SMP search

Re: Questions on SMP search

Re: Questions on SMP search

Re: Questions on SMP search