Dragulic wrote:With these views I, a permanent beginner, concur. But parallelisation as we see it today, people think of 8 core, 16 thread, that level. OK, maybe 200 cluster.
IBM and Intel have indicated future lies in massive parallelisation. On chip, limited functionality but 1000, 10000 cores. And many chips. Machine with 1,000,000 threads can be common within our working lives.
How will these algorithms cope and scale?
Dragulic it's not about scaling of the algorithms. You must reverse the question here.
I gave a speech in Switzerland for the HPC advisory council. My sheets are probably online there. If i compare Diep at 2 million nps wit hdeep blue, and i turn OFF all algorithmic improvements sincethen, so i just search fullwidth,
then i'm searching 2 ply deeper than deep blue, whereas diep is NOT forward pruning last few plies and deep blue was.
The information that reached me is that Deep blue used the aphid algorithm from Jonathan Schaeffer - in those days a logical choice if your focus was getting more nps for your boss - regardless whether the nps was any useful.
Deep Blue doesn't count nodes. I just start of course with the first move out of book - comparing of course in endgames the difference is much bigger as their hashtable wasn't there simply or worked crap because of using hardware processors. Also algorithmic the deep blue team showed they are not so clever, claiming they had 'hardware hashtables' nearly ready to go; as hashtables in hardware would not have mattered much if you do 4 ply searches in hardware of course, as the other 479 processors don't communicate with this one. Chrilly Donninger having had a similar experience with hardware processors doesn't understand how they could search 4 ply in hardware: "as doing 4 ply in hardware is extremely inefficient, i basically do mostly 2 ply searches with hydra and try to push everything towards a software search".
Deep Blue was already searching in time of opponent and finishes 10 ply
after 1.Nf3,d5 2.g3,Bg4 3.b3
Diep in fact is doubting there and in the end switches to Bxf3, so it switches from PV, and as we can see deep blue also switches from PV yet to the wrong move.
Deep Blue needed roughly 167 seconds at 480 hardware processors, and we forgive it then it also searched in time of opponent which Diep didn't do here. Now the marketing department of ibm claimed it got 200 million nps.
The deep blue team claims in an official publication later in 2001 it was 133 million nps.
Aphid is an algorithm that in itself should scale well.
The last improvements in diep's move ordering were done in 1998, it's not that i have 14 years of advantage there. Also diep never was tuned to be fullwidth real efficient.
If i turn on nullmove, so fullwidth search + nullmove. Not a single form of other forwardpruning other than this, then Diep reaches 17 plies.
Just R=3, i already used that back then, see my postings onto RGCC.
Diep was just 4 years old in 1998.
The 1998 version willl actually use less nodes to reach that 10 ply than todays Diep, let me assure you that.
So Diep needs about a 14 million nodes to finish 10 ply and with nullmove at 2 million nps it would have reached 17 ply back then. An algorithm well known back then and used by everyone, after Frans Morsch was so honest to publicly shout he won the world title 1995 with it. You need to give Frans credits for that.
Deep Blue 1997 gets 10 ply there with 133 million nps. That's what they OFFICIALLY claim in an official publication. Not some vague writing on the internet, official publication for advances in artificial intelligence.
133 mln nps * 167 seconds = 22.2 billion nodes
Diep: 14 million nodes
Factor 1586 difference.
Searching fullwidth diep gets 2.6 million nps.
Searching with nullmove R=3, diep gets 17 ply and 2.0 million nps.
This is at 16 processor barcelona box, a non-optimized executable.
In reality the real question is: why didn't they use nullmove?
The answer is of course obvious: they would've gotten less nps,
but even the n that's not a good answer.
I don't want to redo that discussion of course. They weren't bad guys algorithmic as they invented a few of their own - yet they were pretty clumsy in algorithms overall from lossless searchdepth viewpoint;
Where they did do well is extending their 1988 plan;
they got with expensive hardware and their own cpu's a huge amount of
nps that only 15 years later we also can get. Just by 1997 that was total outdated because of their clumsiness in searching efficient.
The hardware did do 4 ply searches which is way too much; they didn't use killermoves in hardware; we'll forgive them their parallel search as doing a good parallel search simply is very tough and by 1997 no one had achieved an efficient parallel search for so many processors yet in an efficient manner.
That took until Diep, and i easily profitted from the knowledge of others; Bob (Hyatt) is important there and Rainer Feldmann for explaining me back in 1998 in Paderborn clearly YBW.
Yet Diep at their 30 nodes RS/6000 machine would have handsdown outsearched Deep Blue; realize diep is a real SLOW searcher in nps.
If one of the slowest engines is outsearching you - that isn't very good news for you. Know what i mean?
All those algorithms get so total hammered by the efficiency of good implemented YBW programs, that your question on scaling i would want to rephrase.
The question is not: how well does algorithm X or Y scale?
The real question to you is: "how well can you get YBW to work and
TAKE CARE it scales well at many cores?"
And if you don't ask the right question, you'll be factor 1000+ slower anyway than the guy with a good functioning YBW, as you don't preserve your branching factor with other forms of parallel search!
Vincent