Elo versus speed

diep · Post by **diep** » Tue Apr 03, 2012 6:04 pm

Rebel wrote:
Daniel Shawul wrote:It would also be interesting to compare ratings of parallel speedups for 1,2 and 4 processors. Incidentally this seems to roughly match your tests if you assume a 1,1.8 and 3 speedups. That is generally expected efficiency for YBW implementation. The increase in elo is much lower as expected. For example I see for stockfish 2.2.1 2952 2997 3011. So with this the +70 elo per doubling estimate looks a rather good one but this of course depends on the efficiency of parallel implementation as I already mentioned.
Perhaps the hash-table-size plays a role in this? Branch factor goes up when the hash table becomes full. What happens if you triple the hash-table size for a quad?

World champs 2003 i experimented with 400MB hashtable for 500 processors versus 200GB hashtable.

After 10 hours of search overnight, the total searchdepth difference between those 2 searches was exactly 1 ply, which is very little.

It's a matter of a good replacement strategy.

So i 'blew' 2 nights of search of the supercomputer to something that in the end made hardly a difference.

Do not forget, old engines like cilkchess if i remember well, they did do just 1 probe. Obviously THAT is a big difference with a more mature replacement strategy.

That said - we agree that no good science has been conducted here; the experiment here of elo vs speed has flaws everywhere.

jacobbl · Post by **jacobbl** » Fri Apr 06, 2012 9:05 pm

I've just done some testing on my engine Sjakk, too see how much it would improve with a 4x speedup. I normaly test against 10 engines, 400 games each, total of 4000 games. Time control is 40 moves in 1 minute. I use Arena for testing, and set up a new tournament with time control 4 minutes, and put the strength on alle the other engines at 25%. This I believe should be equivalent to a 4x speedup on my engine. The results gave an elo increase of 280 points (ranging from 195 to 511 stdev:98). It should be noted that the opponents on average are quite a lot weaker than Sjakk. With equal time Sjakk scores about 72% and with the 4x advantage it scores about 92%.

So my results also confirms that the benefit is much greater than 50-70 elo for a doubling of speed. Of course my engine doesn't search anywhere close to as deep as the best engines, so I might still have some distance to go before the gains are getting smaller.

Regards
Jacob

diep · Post by **diep** » Sat Apr 07, 2012 12:09 am

jacobbl wrote:I've just done some testing on my engine Sjakk, too see how much it would improve with a 4x speedup. I normaly test against 10 engines, 400 games each, total of 4000 games. Time control is 40 moves in 1 minute. I use Arena for testing, and set up a new tournament with time control 4 minutes, and put the strength on alle the other engines at 25%. This I believe should be equivalent to a 4x speedup on my engine. The results gave an elo increase of 280 points (ranging from 195 to 511 stdev:98). It should be noted that the opponents on average are quite a lot weaker than Sjakk. With equal time Sjakk scores about 72% and with the 4x advantage it scores about 92%.

So my results also confirms that the benefit is much greater than 50-70 elo for a doubling of speed. Of course my engine doesn't search anywhere close to as deep as the best engines, so I might still have some distance to go before the gains are getting smaller.

Regards
Jacob

You tested it at too fast time controls.
Unclear to me also is what 'strength at 25%' means.
You gave them 1 core instead of 4, or did it add 25% randomness to the search, or did it lineair scale down the elorating by factor 4?

Assuming you don't have a rybka type clone built over there, as i see Sjakk at the rating lists at 24xx, you typically will initially profit a lot from searching more accurate as it hammers away all kind of worst cases.

What you can do is this. That's write down the elo increase and start testing with more and more time for your engine.

And *then* project the rating gains in a graph. You soon will end up at 3 minutes a move or so. When you project your engines rating win at such time controls onto a graph you'll soon see the problem.

All the tests and science done over here in this CCC is just like 1 minute type stories. It's nonsense science to a large extend.

You're testing probably at 1 core as well at some oldie type hardware i guess. Maybe even a P4 who knows. If it's a core2 at 2.4ghz, you give it like 2 seconds a move.

In Ghz minutes a move expressed that's 2.4Ghz * 2 seconds / 60 = 4.8 / 60 = about equal to 0 Ghz minutes a move (i always work in integers)

World champs 1999 we were playing at 6Ghz minute a move. Back then i played at Bob's quad socket Xeon Box using a remote connection (which is another story in itself back in 1999 but that's for another time).

And you're amazed to see big elowin now when moving up from 0.xGhz minutes a move to 4 * 0.x Ghz == 0.y Ghz minute a move?

You're still testing somewhere at nearby start 90s in terms of Ghz minute a move.

Ed Schroeder was world champion back then by means of massive forward pruning, and a superior evaluation function. The pruning was way more than any of todays engine is doing.

Ed, didn't you beat deep thought as well back then? You outsearched them by 3 ply isn't it?

Vincent

p.s. Muppet show about to restart gotta hurry back to my seat

jacobbl · Post by **jacobbl** » Sat Apr 07, 2012 2:15 am

Im sorry if I was unclear. Strength 25% means having 25% of the time. So in a 4 min match, Sjakk will have 4 min on 40 moves, and the oponents will have 1 min.

All engines are running on 1 core, mainly because engines in the 2400 level dont support more than 1 core

My computer is getting old (1, 5years), but it is still a i7-980 so it should do for now. Of course I should test at longer time limits, but its a trade off on how many games one get. And 4 min per 40 moves is not extremley fast. So I still believe (and hope) that Sjakk will earn more than 70 elo on a doubling of speed

Regards Jacob

diep · Post by **diep** » Sat Apr 07, 2012 11:03 am

jacobbl wrote:Im sorry if I was unclear. Strength 25% means having 25% of the time. So in a 4 min match, Sjakk will have 4 min on 40 moves, and the oponents will have 1 min.

All engines are running on 1 core, mainly because engines in the 2400 level dont support more than 1 core

My computer is getting old (1, 5years), but it is still a i7-980 so it should do for now. Of course I should test at longer time limits, but its a trade off on how many games one get. And 4 min per 40 moves is not extremley fast. So I still believe (and hope) that Sjakk will earn more than 70 elo on a doubling of speed

Regards Jacob

Wishful thinking is a popular strategy...

petero2 · Post by **petero2** » Sun Apr 08, 2012 11:37 am

Thanks for all responses so far.

From latest CCRL 40/4: http://computerchess.org.uk/ccrl/404.live/

Code: Select all

Rank Name                        Rating                Score    Average Opponent    Draws    Games
1    Houdini 2.0c 64-bit 6CPU    3407    +11    -11    68.8%    -135.3              37.8%    2826
34   Texel 1.01 64-bit           2907    +31    -30    57.3%    -50.7               27.6%    370
80   CuckooChess 1.12 64-bit     2677    +15    -15    52.6%    -20.0               25.9%    1584

Given the changes I made, http://talkchess.com/forum/viewtopic.ph ... 36&t=42999, I still don't think the measured rating difference between Texel and CuckooChess has been adequately explained.

Some possible partial explanations:

* The elo gain per speed doubling could be much larger for engines 500-700 elo below the top engines.

* Texel gets twice as many hash entries in the same amount of memory because of java memory overhead that does not exist in C++. Not sure how much that is supposed to affect the strength.

* Some of the search/eval changes I made could be much better at longer time controls than at the hyper bullet self tests I ran. My self tests indicated perhaps 20-30 elo improvement in total, which in the past has translated to about half of that on the CCRL list.

Rebel · Post by **Rebel** » Sun Apr 08, 2012 1:16 pm

diep wrote: Ed, didn't you beat deep thought as well back then? You outsearched them by 3 ply isn't it?

Nah. That was against a crippled Deep Blue Internet version. Great fun but meaningless. And it did not show its depth as far as I can remember.

Happy Easter to all.

diep · Post by **diep** » Sun Apr 08, 2012 4:43 pm

Rebel wrote:
diep wrote: Ed, didn't you beat deep thought as well back then? You outsearched them by 3 ply isn't it?
Nah. That was against a crippled Deep Blue Internet version. Great fun but meaningless. And it did not show its depth as far as I can remember.

Happy Easter to all.

You refer to the one playing without book in 1999. Well obviously it was a bit outdated by then.

I speak of deep THOUGHT. World champs start 90s. Didn't you beat them then in Madrid?

You just outsearched them if i have my math correct, as in 1988 they got 8 ply search depth with 500k nps.

By 1991 they can't have searched much deeper than that, with still the same immature qsearch and simple evaluation, which they must've improved around end 1996 or so, upgrading it then to something similar like Gnuchess back then; as from all engines the 1997 deep blue plays basically the same moves like ZarkovX does at 10 ply search depth.

Note ZarkovX hardly used nullmove like we know it today, yet it plays nearly all moves deep blue against kasparov played, including all the big blunders deep blue played in every opening and game, which would've had deep blue lose of course every game against FM's as well, as it played 1600 or so in the opening.

For 1995 world champs standards deep blue still was ok, but for the micro world champs 1997, even if it would've been allowed to join, deep blue would've been total outdated because of its openingsplay. Engines back then also made similar mistakes still in 1997, but by 1999 all that was out of most top engines, besides some got 14-17 ply back then, 4 ply deeper than Deep Blue worst case. Fritz was systematically 17 ply, 7 ply deeper than deep blue in opening.

So i'm interested what search depth they got back start 90s

As in Madrid according to my observation of chessmachine schroeder that some older chessplayers still have at home, and analyse with; Rebel must've been at least on par if not outsearch Deep Thought. Be it of c ourse in a very dubious manner; unlike Fritz by 1995 that didn't outsearch them in a dubious manner, yet simply with recursive nullmove.

Weren't you the first to outsearch them Ed?

If we look at it objectively, Deep Blue had the most horrible branching factor ever.

Back in 1988 with 500k nps Deep Thought gets 8 ply, and by 1997 with a later corrected GUESSED 133M nps they get 10 ply worst case in opening at 3 minutes a move.

Over a factor 200 increase in speed winning just 2 ply. Total focus upon nps, whereas in his thesis he still cares about search depths and branching factor.

Uri Blass · Post by **Uri Blass** » Sun Apr 08, 2012 5:58 pm

It is a fact that Deep thought could beat GM's(not kasparov) so it is clear that even Deep thought was better than Fide masters.

claiming that Deep blue could lose every game against FM's is simply nonsense and I also do not believe that Zarkov at depth 10 played the same as deep blue.

Maybe it played the same move in most cases but even in this case the minority is important and one different move is enough to change the result of the game.

diep · Post by **diep** » Sun Apr 08, 2012 9:58 pm

Uri Blass wrote:It is a fact that Deep thought could beat GM's(not kasparov) so it is clear that even Deep thought was better than Fide masters.

claiming that Deep blue could lose every game against FM's is simply nonsense and I also do not believe that Zarkov at depth 10 played the same as deep blue.

Maybe it played the same move in most cases but even in this case the minority is important and one different move is enough to change the result of the game.

Deep thought 1988 is a reported 500k nps, versus Deep Blue 1997 their own claim in advances in AI is 133 million.

Over factor 200 in speed, just winning 2 plies as we see from the logfiles. We ignore that they also forward pruned in hardware then by the way.

It played ultra passive. So it was easy to play for players and win from.

Just no one got paid to win from it...

But all payments to players aside to lose, let's face it.

They get faster in 10 years time factor 200+. Now you can say this is great or bad, it doesn't matter. Factor 200 is a lot.

Yet they just win 2 ply worst case.

What do you say about THAT?
What is your qualifications of someone who writes his PHD about searching as deep as possible and needing a branching factor of 45+ to get 10 years later just 2 ply deeper?

Around 1989 most COMMERCIAL engines get around a ply or 4-7.
In world champs 1999 they get 14-17 ply.

In endgame i got at Bob's machine actually 20+ ply, but that's not interesting for the discussion now.

Another 10+ years later now in 2012, most get 30 ply.

Say roughly a ply a year won.

Deep Thought to Deep Blue needs 5 years to win 1 ply.

What do you think of THAT achievement?

Vincent

p.s. ZarkovX needed an hour a move to get 10 ply, so it is not a relevant discussion, these engines all died from stupid extensions that brought hardly elo meanwhile not using nullmove. ZarkovX and Deep Blue play however the same move at the same depth in majority of the cases. Like 95% match or so.

ZarkovX is the further development as you know from Gnuchess - not much had changed in its evaluation from gnuchess to zarkovx - John Stanback was the author of the original gnuchess, that's why.

Elo versus speed

Re: Elo versus speed

Re: Elo versus speed

Re: Elo versus speed

Re: Elo versus speed

Re: Elo versus speed

Re: Elo versus speed

Re: Elo versus speed

Re: Elo versus speed

Re: Elo versus speed

Re: Elo versus speed