Elo versus speed

Rebel · Post by **Rebel** » Mon Apr 09, 2012 1:30 am

diep wrote:
Rebel wrote:
diep wrote: Ed, didn't you beat deep thought as well back then? You outsearched them by 3 ply isn't it?
Nah. That was against a crippled Deep Blue Internet version. Great fun but meaningless. And it did not show its depth as far as I can remember.

Happy Easter to all.
You refer to the one playing without book in 1999. Well obviously it was a bit outdated by then.

I speak of deep THOUGHT. World champs start 90s. Didn't you beat them then in Madrid?

No. DT wasn't playing in Madrid 1992 IIRC.

You just outsearched them if i have my math correct, as in 1988 they got 8 ply search depth with 500k nps.

By 1991 they can't have searched much deeper than that, with still the same immature qsearch and simple evaluation, which they must've improved around end 1996 or so, upgrading it then to something similar like Gnuchess back then; as from all engines the 1997 deep blue plays basically the same moves like ZarkovX does at 10 ply search depth.

Note ZarkovX hardly used nullmove like we know it today, yet it plays nearly all moves deep blue against kasparov played, including all the big blunders deep blue played in every opening and game, which would've had deep blue lose of course every game against FM's as well, as it played 1600 or so in the opening.

For 1995 world champs standards deep blue still was ok, but for the micro world champs 1997, even if it would've been allowed to join, deep blue would've been total outdated because of its openingsplay. Engines back then also made similar mistakes still in 1997, but by 1999 all that was out of most top engines, besides some got 14-17 ply back then, 4 ply deeper than Deep Blue worst case. Fritz was systematically 17 ply, 7 ply deeper than deep blue in opening.

So i'm interested what search depth they got back start 90s

As in Madrid according to my observation of chessmachine schroeder that some older chessplayers still have at home, and analyse with; Rebel must've been at least on par if not outsearch Deep Thought. Be it of c ourse in a very dubious manner; unlike Fritz by 1995 that didn't outsearch them in a dubious manner, yet simply with recursive nullmove.

Weren't you the first to outsearch them Ed?

I can only remember a draw between the forerunner of DT (forgot the name) and Rebel running at 5Mhz or so. I don't remember any other confrontation, but maybe my selective memory fails me here

As for outseraching, the moment I discovered the power of reductions it put me on top of the SSDF list with a 65 elo margin. That was Rebel 8. Back then it was only by static considerations. It's what nowadays is baptized as "static null-move" as I just recently learned. Then when recursive-null-move made its entree it was all over

If we look at it objectively, Deep Blue had the most horrible branching factor ever.

Agree. Due to massive extensions. At least that's what we have been told. It explains. And is a choice. And quite successful at the time.

diep · Post by **diep** » Mon Apr 09, 2012 2:47 am

Rebel wrote:
If we look at it objectively, Deep Blue had the most horrible branching factor ever.
Agree. Due to massive extensions. At least that's what we have been told. It explains. And is a choice. And quite successful at the time.

Let's not bother your bad memory then about what actually was improved in Rebel8 versus 7, as basically its nps was 2x higher and it hardly searched deeper other than the increased factor 2 in nps which was magnificent progress

It's history. Deep Blue having 'bleeding edge' hardware is interesting to analyze.

They used 480 hardware processors. Each software node of the RS/6000 controlled 16 hardware processors.

Having run myself at 512 processors as well (500 usable) i know a bit of the problems there. It's not easy.

They had more time to get it working well than i had. I had just a few days to fix it during the world champs 2003. I managed. Slept 2 hours a night.

Of course i lost points because of it. They had a tad more time.

Yet their focus was not hurt the nps. In that sense it was a brain dead project and i'm not sure why they carried it out as they did do. They sure had some innovative novelties, but these had nothing to do with the search depth of the thing.

Winning 2 ply meanwhile getting factor 200 faster is not very good to say polite.

I tend to believe their SMP algorithm, which seems to have been APHID, but i could be wrong, was very inefficient.

Furthermore they claim they got 133M nps. That's quite a lot if you look to the grand total, but APHID would really scale better than that i would guess blindfolded.

they mixed different processors, another weird idea, except if your only job is optimize the nps. note that's not what the ywrote themselves. There is different stories there.

Yet i had some private email back then. The email i got from one of the programmers himself, it indicated a total other nps than what they later claimed.

If you have 480 processors, each one capable of reaching 2.5 million nps, that's a grand total of far over a billion nps. Actually it's 1.2 billion nps.

All the logfiles do not have any form of proof/evidence how many searches per second were carried out. Pretty much a beginnersmistake to not show it except if you have to hide something.

Then later on correcting that to 133M nps means they lose a factor 9 somewhere.

With all respect that's not very good. It means that with 30 software processors you effectively use (133M / 1200M ) * 30 = 3.3 processors @ 16 hardware cpu's.

Or it's similar effectively to (133M/1200M) * 480 = 53 hardware processors.

So this is after a 3 minutes search don't forget that.

So their initial LOSS in parallellism is far bigger than what any HPC organisation considers a good scaling.

They didn't scale well. They started losing factor 9 to just simple scaling. Using only 1 out of each 9 processor.

They had a few layers of losses close to factor 10 like this at several chains.

That's just such amateurism that i have few words for it. Calling it Kindergarten science would be a compliment.

Chrilly has said a few words about it yet i wouldn't repeat it here as it would be censored here, yet it's the truth.

They had huge speed yet very clumsy elo versus speed.

Finishing 10 ply with hardware capable getting 1.2 billion nps on paper that's very inefficient.

Just simply parallellizing gnuchess in software at 30 software RS/6000 cpu's would've given them a bigger search depth back then and a better quality search in theory. However that's theory.

My experience is that Diep was the first engine on the planet on a big supercomputer in the hundreds of cpu's that didn't first lose factor 40 to 50 somewhere in order to scale something.

I can't avoid the impression that Deep Blue had this problem that the other supercomputer programs from the 90s had as well.

Effectively i dare to claim that after this factor 9 loss they had another huge factor loss to the SMP. Clumsy implemented YBW?

We'll never know maybe. What we do know is that it got a crap search depth even with that 133M nps it still got after losing factor 9.

Now they were very inefficient in the hardware. One of the things Chrilly didn't understand is why they were doing 4 ply searches without even killermoves in the hardware.

Hydra/Brutus when doing 2 ply searches mostly in the hardware still was very inefficient searching in the hardware according to Chrilly. A number of 3 ply searches he did do in the hardware and that was to him already so so inefficient, yet Hydra used for example killermoves in the hardware, deep blue didn't and searched 4 ply.

So Chrilly really didn't understand how they could have done that with any form of any efficiency at all; in hardware there is no move ordering at all without stuff like killermoves.

My explanation there is simple. If you don't know how to search parallel, then it's easier to get a huge nps by doing huge searches in the hardware as that reduces the number of software searches.

Chrilly with hydra did do the opposite of what deep blue did. Chrilly tried to push the search as much as possible to the software by doing the smallest possible searches in the hardware, so maximizing the number of searches.

That requires of course a far better parallel implementation at a cluster than what Deep Blue obviously used.

I tend to believe they just didn't know much from how to do a parallel search. Mathematically not strong enough simply.

Chrilly's attempt later there to push things to the software and use nullmove there is a far superior concept. With a slightly higher nps than Deep Blue got, the final hydra reached 18-20 ply. At 3 minutes a move i believe it was 20+ plies each move minimum. So beating deep blue by factor 2 there using basically the same searching techniques available. Maybe one he had that deep blue didn't have - but i believe he didn't win more than 2 ply with it. So it's arguably a 8+ ply win. Yet a clear win.

As i said before the way Chrilly did do it is far superior yet he didn't profit much from the hardware - he used supertiny evaluation, just like deep blue, and he used also a SMP algorithm i don't think much from, yet using nullmove and pushing the hardware as much possible back to the software search, that won him plies and plies compared to deep blue.

It wasn't an ultimate effort Chrilly did do, yet the fascinating thing is that deep blue team did do so little to reach deeper plydepths.

We can imagine how tough a hardware project can be - yet they forgot the most crucial aspect - search deep with what you got. If you hardly have an evaluation function in hardware - at least try to search deep. And Deep Blue failed miserable there.

Now maybe someone can lookup what Mhz deep blue was clocked. Something like 25Mhz if i remember well.

480 * 25Mhz hardware processing speed = 48 * 250 = 24 * 500 = 12 Ghz in hardware speed

Hydra was at its peak around a 31Mhz * 64 = under 2Ghz hardware speed

Yet in search depth Chrilly really kicked the deep blue team major league. By 10 ply effectivey - reducing it by some stuff that might not have existed by mid 90s, then Chrilly won 8 ply as a minimum onto them with factor 6 less processing power.

And Chrilly did do that to big extend by himself and himself alone. No nothing expensive paid projectteam; I can confirm Chrilly is an Austrian in lederhosen who once a year sees a bath. This lederhosen Austrian beated the hell out of Deep Blue with stuff like nullmoves, something they had years time for in deep blue to implement it, and they didn't do.

rbarreira · Post by **rbarreira** » Mon Apr 09, 2012 3:56 am

diep wrote: Winning 2 ply meanwhile getting factor 200 faster is not very good to say polite.

How do you know this is an apples-to-apples comparison? Maybe their later version which got 2 more plies also did a ton more extensions, which would mean it's really getting much more than 2 real plies than the old version...

Either you have some additional information here, or you assuming too much just by reading log files.

diep · Post by **diep** » Mon Apr 09, 2012 10:39 am

rbarreira wrote:
diep wrote: Winning 2 ply meanwhile getting factor 200 faster is not very good to say polite.
How do you know this is an apples-to-apples comparison? Maybe their later version which got 2 more plies also did a ton more extensions, which would mean it's really getting much more than 2 real plies than the old version...

Either you have some additional information here, or you assuming too much just by reading log files.

Actually their 1988 box already did do a ton of extensions. They soved the winatchess mate in 18 at ply 8 already. See publications.

It doesn't matter how much you extend. 10 ply is 10 ply. In Diep i'm also doing a ton of extensions, more than you probably do. In 90s i did do way more extensions than today in fact. The 1994 version which got a ply or 6, it in fact did do singular extensions without reduction. I had invented singular extensions myself - it's a trivial thing to invent. I just started computerchess programming...

It's not trivial to implement it efficient though.

One of the main problems with extensions is that they don't help much for the mainline.

Usually the mainline is the best move for white and for black. In those crucial first moves of the game there isn't too many forced moves in what black and white play. A few exchanges of pieces maybe at most.

It's best to put your king on a safe spot. Both for black and for white. No nothing checks in the mainline therefore. If i can give a ton of checks to your king, you probably played into a bad line isn't it?

That's 95% of the games sir.

Worldchampions like Karpov take that even more extreme. In his mind if there is a lot of tactics, then someone or both sides have played some really bad moves...

All those extensions just extended tactics. Checks. Captures. Passed pawns.

8 ply is 8 ply.
10 ply is 10 ply.
This is why the current deep searches, despite getting done extremely dubious, are adding elo to the programs - they do have a chance to see things deeper. With an iteration depth of 10 ply you don't make a chance there.

Chrilly Donninger: "i don't see how they could do 4 ply in hardware; i'm doing 2 ply in hardware and that is already very inefficient in fact i'm using killermoves, Deep Blue didn't even use that".

They must have been massive inefficient in the hardware. Just search in hardware without any sorting mechanism other than doing captures first. So captures also are unsorted. Nothing is sorted.

That is one of the explanations if you ask me.

Another one is they are hopelessly inefficient using the hardware processors. Losing factor 9 to scaling simply. Really too much.

They didn't use recursive nullmove, by then quite well known.
If you use nullmove your nps drops of course and a lot with 480 hardware processors.

If you already scale bad because you don't know how to get something in parallel going, then adding nullmove hurts even more.

They did do everything to get more nps, and nearly nothing, not even killermoves, to search more efficient.

They got 10 ply allright. Sure a finished 10 ply. But it's peanuts. They won 2 ply in 10 years time. By all standards that's total beginnerswork.

Uri Blass · Post by **Uri Blass** » Mon Apr 09, 2012 6:56 pm

I disagree and 10 plies with many extensions are not 10 plies without them.

It may be possible that 10 plies with all their additional extensions may be equivalent to 14 plies without their additional extensions.
I do not claim that the additional extensions of deep blue were a good idea
and I guess that it was possible to get better playing strength without them but I totally agree with Ricardo Barreira that they probably got much more than 2 real plies.

Note that I understood that they could not test their extensions in games because the hardware of deep blue was not ready before the match.

IanO · Post by **IanO** » Mon Apr 09, 2012 8:19 pm

petero2 wrote:Thanks for all responses so far.

From latest CCRL 40/4: http://computerchess.org.uk/ccrl/404.live/

Code: Select all

Rank Name                        Rating                Score    Average Opponent    Draws    Games
1    Houdini 2.0c 64-bit 6CPU    3407    +11    -11    68.8%    -135.3              37.8%    2826
34   Texel 1.01 64-bit           2907    +31    -30    57.3%    -50.7               27.6%    370
80   CuckooChess 1.12 64-bit     2677    +15    -15    52.6%    -20.0               25.9%    1584

Remarkably, this is almost exactly the rating improvement obtained by Critter, when it changed from 32-bit Delphi in Critter 0.42 to 64-bit C++ in Critter 0.52.

Code: Select all

 	Critter 0.52 64-bit	2897	+18	−18	59.0%	−63.8	32.3%	1115
 	Critter 0.40       	2674	+18	−18	55.6%	−63.2	27.6%	1131

(I noticed this because I've been researching the state-of-the-art engines in languages other than C/C++. Congratulations on having the strongest Java engine!)

diep · Post by **diep** » Tue Apr 10, 2012 10:25 am

Uri Blass wrote:I disagree and 10 plies with many extensions are not 10 plies without them.

It may be possible that 10 plies with all their additional extensions may be equivalent to 14 plies without their additional extensions.
I do not claim that the additional extensions of deep blue were a good idea
and I guess that it was possible to get better playing strength without them but I totally agree with Ricardo Barreira that they probably got much more than 2 real plies.

Note that I understood that they could not test their extensions in games because the hardware of deep blue was not ready before the match.

You still don't understand why they try to search that deep nowadays do you?

As for Deep Blue, they ran the year before with about half the amount of hardware cpu's, so they did have plenty of experience running with that box and they had a smaller sized thing which still carried hundreds of hardware cpu's. So there is no question about them having had enough testing time with hundreds of cpu's.

Knowing they lost factor 9 during the match somewhere, meanwhile claiming it got them more nps than the smaller thing, even today it doesn't make sense to me to lose directly factor 9.

They had years to fix this scaling problem.
They had years to add nullmove and experiment with it.

Algorithmic they weren't beginners as they invented some things, but for example Hsu wrote in this PHD thesis the opposite of what his chess computer was doing.

In his thesis the focus is at searchdepth, meanwhile in reality deep blue neglected about anything that could have search them deeper and in 10 years time they manage to get 2 ply deeper.

You don't hear me critiicze their evaluation function. I'm under impression they really knew what they were doing there taking over gnuchess evalaution and tuning it well. The passive play of deep blue giving opponents all chance to destroy you, that was very common mid 90s.

Fritz before 1997 back then had habit to cover b2 pawn not with Ra1-b1 but with Ra1-a2, putting its rook more passive for example.

Also i do know that one has always a limited time to fix a thing in your program, even when that tmie is years. You can't improve everything at the same time.

For example nullmove is so easy to experiment with. It's kindergarten to add it. they didn't do it.

Now you post the similar nonsense as we heard in the 90s. Nullmove would be dubious - a BS story - well i designed double nullmove to prove the opposite. Now you tell me that extensions can make up for a bunch of plies. By that you contradict also several worldchampions chess, who have the same observation like a titled player like me - that's that in majority of mainlines where you need to make choices - tactics just plays a minor role - not a major role; so it can never make up many plies then.

The simplest way to refute you however is that if you prune a lot, be it with nullmove be it with reductions, then you can afford to extend more, whereas any extension deep blue did do, would have been a big problem.

We do know however that their hardware was so inefficient already that they could not afford to do many extensions in hardware, so that also disproves you. Only the first ply of the hardware search they had an extension, again something that just helps for tactics, not for positional play at all.

Now i'm not saying nothing happened as it seems from just counting material it had the gnuchess evaluation in search, programming that into a cpu just using logical blocks is not so easy - it's not as easy as for example Verilog.

The real problem with deep blue is that ZarkovX at a RS/6000 would have played at exactly the same strength and would have searched the same depth like Deep Blue.

Where most supercomputers really won big search depth always in the software search, just lost it based upon book+eval, there deep blue didn't even get the depth one would expect such a formidable box gets.

They had a lot of reasons to scrapheap it.

Uri Blass · Post by **Uri Blass** » Tue Apr 10, 2012 10:58 pm

I do not claim that their search was optimal and of course I believe that not using null move was a mistake of them but it seems that they simply did not believe in it.

My point is only that their plies are not the same plies as your plies and comparing plies is comparing apples with oranges.

If they add extensions then the question is how much the extension help at fixed depth.

If deep blue fixed depth 10 (with the additional extensions that they added) can win against
earlier deep blue fixed depth 11 (without the additional extensions) then their additional extensions worth more than one ply.

diep · Post by **diep** » Wed Apr 11, 2012 12:37 pm

Uri Blass wrote:I do not claim that their search was optimal and of course I believe that not using null move was a mistake of them but it seems that they simply did not believe in it.

My point is only that their plies are not the same plies as your plies and comparing plies is comparing apples with oranges.

If they add extensions then the question is how much the extension help at fixed depth.

If deep blue fixed depth 10 (with the additional extensions that they added) can win against
earlier deep blue fixed depth 11 (without the additional extensions) then their additional extensions worth more than one ply.

You are dead wrong. 10 ply is 10 ply. Tactical extensions will work for tactics but not improve positional playstrength.

The proof of that has been empirically delivered enough by todays engines which basically see tactical near nothing last few plies.

Their deep blue 1996 was using other evaluation function than 1997, so that's worth already enough to beat the old thing - selftests are not a valid comparision anyway.

It's obvious they used something similar in their hardware like the gnuchess evaluation for their 1997 thing. That was cheap to do and that's what they did do. I'm not sure whether it was 100% legal, just like using Fruit's eval in a commercial entity is very dubious. But that's a historic discussion for another thread.

In my powerpoint presentation i show how Diep behaves when i turn off all forward pruning / reductions / nullmove in the same position and how it behaves when just adding nullmove.

We see then how both fullwidth as well as only with nullmove it total outsearches Deep Blue. Those search depths are very well comparable.

Back in 90s i did do the same test and had the same outcome, namely that Deep Blue was horrible inefficient. I posted that onto the RGCC back then and said that with its claimed nps (back then it was 200+ mln nps, later publication in 2001 they corrected it to 133M nps on average), deep blue equipped with nullmove could have easily searched 18-20 ply.

By equipping Diep just with nullmove and no other form of pruning (of course hashtables but that's a transposition pruning), Diep reaches 20+ ply if we would extrapolate its nps to 200 million.

Therefore i conclude i have thereby proven the 90s challenge.

Uri Blass · Post by **Uri Blass** » Wed Apr 11, 2012 6:18 pm

diep wrote:
Uri Blass wrote:I do not claim that their search was optimal and of course I believe that not using null move was a mistake of them but it seems that they simply did not believe in it.

My point is only that their plies are not the same plies as your plies and comparing plies is comparing apples with oranges.

If they add extensions then the question is how much the extension help at fixed depth.

If deep blue fixed depth 10 (with the additional extensions that they added) can win against
earlier deep blue fixed depth 11 (without the additional extensions) then their additional extensions worth more than one ply.
You are dead wrong. 10 ply is 10 ply. Tactical extensions will work for tactics but not improve positional playstrength.

The proof of that has been empirically delivered enough by todays engines which basically see tactical near nothing last few plies.

It is not a proof.

The algorithm of today engines is better because the real test is at fixed time and not at fixed depth.

It is still possible that the additional extensions practically give 2 plies advantage at fixed depth but it is faster to search 3 plies deeper without the additional extensions so the additional extensions are not a good idea for playing strength.

Elo versus speed

Re: Elo versus speed

Re: Elo versus speed

Re: Elo versus speed

Re: Elo versus speed

Re: Elo versus speed

Re: Elo versus speed

Re: Elo versus speed

Re: Elo versus speed

Re: Elo versus speed

Re: Elo versus speed