Rebel wrote:
If we look at it objectively, Deep Blue had the most horrible branching factor ever.
Agree. Due to massive extensions. At least that's what we have been told. It explains. And is a choice. And quite successful at the time.
Let's not bother your bad memory then about what actually was improved in Rebel8 versus 7, as basically its nps was 2x higher and it hardly searched deeper other than the increased factor 2 in nps which was magnificent progress
It's history. Deep Blue having 'bleeding edge' hardware is interesting to analyze.
They used 480 hardware processors. Each software node of the RS/6000 controlled 16 hardware processors.
Having run myself at 512 processors as well (500 usable) i know a bit of the problems there. It's not easy.
They had more time to get it working well than i had. I had just a few days to fix it during the world champs 2003. I managed. Slept 2 hours a night.
Of course i lost points because of it. They had a tad more time.
Yet their focus was not hurt the nps. In that sense it was a brain dead project and i'm not sure why they carried it out as they did do. They sure had some innovative novelties, but these had nothing to do with the search depth of the thing.
Winning 2 ply meanwhile getting factor 200 faster is not very good to say polite.
I tend to believe their SMP algorithm, which seems to have been APHID, but i could be wrong, was very inefficient.
Furthermore they claim they got 133M nps. That's quite a lot if you look to the grand total, but APHID would really scale better than that i would guess blindfolded.
they mixed different processors, another weird idea, except if your only job is optimize the nps. note that's not what the ywrote themselves. There is different stories there.
Yet i had some private email back then. The email i got from one of the programmers himself, it indicated a total other nps than what they later claimed.
If you have 480 processors, each one capable of reaching 2.5 million nps, that's a grand total of far over a billion nps. Actually it's 1.2 billion nps.
All the logfiles do not have any form of proof/evidence how many searches per second were carried out. Pretty much a beginnersmistake to not show it except if you have to hide something.
Then later on correcting that to 133M nps means they lose a factor 9 somewhere.
With all respect that's not very good. It means that with 30 software processors you effectively use (133M / 1200M ) * 30 = 3.3 processors @ 16 hardware cpu's.
Or it's similar effectively to (133M/1200M) * 480 = 53 hardware processors.
So this is after a 3 minutes search don't forget that.
So their initial LOSS in parallellism is far bigger than what any HPC organisation considers a good scaling.
They didn't scale well. They started losing factor 9 to just simple scaling. Using only 1 out of each 9 processor.
They had a few layers of losses close to factor 10 like this at several chains.
That's just such amateurism that i have few words for it. Calling it Kindergarten science would be a compliment.
Chrilly has said a few words about it yet i wouldn't repeat it here as it would be censored here, yet it's the truth.
They had huge speed yet very clumsy elo versus speed.
Finishing 10 ply with hardware capable getting 1.2 billion nps on paper that's very inefficient.
Just simply parallellizing gnuchess in software at 30 software RS/6000 cpu's would've given them a bigger search depth back then and a better quality search in theory. However that's theory.
My experience is that Diep was the first engine on the planet on a big supercomputer in the hundreds of cpu's that didn't first lose factor 40 to 50 somewhere in order to scale something.
I can't avoid the impression that Deep Blue had this problem that the other supercomputer programs from the 90s had as well.
Effectively i dare to claim that after this factor 9 loss they had another huge factor loss to the SMP. Clumsy implemented YBW?
We'll never know maybe. What we do know is that it got a crap search depth even with that 133M nps it still got after losing factor 9.
Now they were very inefficient in the hardware. One of the things Chrilly didn't understand is why they were doing 4 ply searches without even killermoves in the hardware.
Hydra/Brutus when doing 2 ply searches mostly in the hardware still was very inefficient searching in the hardware according to Chrilly. A number of 3 ply searches he did do in the hardware and that was to him already so so inefficient, yet Hydra used for example killermoves in the hardware, deep blue didn't and searched 4 ply.
So Chrilly really didn't understand how they could have done that with any form of any efficiency at all; in hardware there is no move ordering at all without stuff like killermoves.
My explanation there is simple. If you don't know how to search parallel, then it's easier to get a huge nps by doing huge searches in the hardware as that reduces the number of software searches.
Chrilly with hydra did do the opposite of what deep blue did. Chrilly tried to push the search as much as possible to the software by doing the smallest possible searches in the hardware, so maximizing the number of searches.
That requires of course a far better parallel implementation at a cluster than what Deep Blue obviously used.
I tend to believe they just didn't know much from how to do a parallel search. Mathematically not strong enough simply.
Chrilly's attempt later there to push things to the software and use nullmove there is a far superior concept. With a slightly higher nps than Deep Blue got, the final hydra reached 18-20 ply. At 3 minutes a move i believe it was 20+ plies each move minimum. So beating deep blue by factor 2 there using basically the same searching techniques available. Maybe one he had that deep blue didn't have - but i believe he didn't win more than 2 ply with it. So it's arguably a 8+ ply win. Yet a clear win.
As i said before the way Chrilly did do it is far superior yet he didn't profit much from the hardware - he used supertiny evaluation, just like deep blue, and he used also a SMP algorithm i don't think much from, yet using nullmove and pushing the hardware as much possible back to the software search, that won him plies and plies compared to deep blue.
It wasn't an ultimate effort Chrilly did do, yet the fascinating thing is that deep blue team did do so little to reach deeper plydepths.
We can imagine how tough a hardware project can be - yet they forgot the most crucial aspect - search deep with what you got. If you hardly have an evaluation function in hardware - at least try to search deep. And Deep Blue failed miserable there.
Now maybe someone can lookup what Mhz deep blue was clocked. Something like 25Mhz if i remember well.
480 * 25Mhz hardware processing speed = 48 * 250 = 24 * 500 = 12 Ghz in hardware speed
Hydra was at its peak around a 31Mhz * 64 = under 2Ghz hardware speed
Yet in search depth Chrilly really kicked the deep blue team major league. By 10 ply effectivey - reducing it by some stuff that might not have existed by mid 90s, then Chrilly won 8 ply as a minimum onto them with factor 6 less processing power.
And Chrilly did do that to big extend by himself and himself alone. No nothing expensive paid projectteam; I can confirm Chrilly is an Austrian in lederhosen who once a year sees a bath. This lederhosen Austrian beated the hell out of Deep Blue with stuff like nullmoves, something they had years time for in deep blue to implement it, and they didn't do.