swami wrote:Tord Romstad wrote:I just started a 100 game match against Phalanx. The first game was a draw, Phalanx has an advantage in the endgame in the second. I played another match against Phalanx a few months back, and at that time, Phalanx was crushingly superior (I don't remember the exact score). It will probably be closer this time, but I still expect Phalanx to win comfortably.
Results tomorrow.
I'm looking forward to the results.
OK, here they are:
Code: Select all
Stockfish 090804: 25.5 (+18,=15,-67)
phalanx: 74.5 (+67,=15,-18)
More or less as I expected. Stockfish simply gets terribly outsearched. It often holds its own into the late endgame, but then falls apart tactically in time trouble.
It is likely that Stockfish's results could have been slightly better with some minor timing adjustments. I noticed that 11 of the games were lost on time, at least one of them in an easily won endgame. Some of the 11 time losses could probably have been avoided by checking for timeout a little more often, but Phalanx would still have won by a huge amount.
This result indicates that Stockfish on the 1st generation iPod Touch (with a 416 MHz ARM CPU) has a strength of about 2300 on the CCRL scale.
I think maybe Human IM's and GM norm players are over-rated. Maybe 2200 rated engines on today hardware play at their level. I remember Danasah scoring positive results against IM's/GM's of 2500 rating range in some tournaments that Pedro reported, I don't remember the actual scores but I guess it was certainly high for Danasah, which is a 2500 rated engine btw. So maybe humans are actually at around the level of a 2300 engine on today's PC.
We don't have enough data to be sure, but I suspect that the opposite is true. The CEGT and CCRL lists are around 200 points too high compared to FIDE ratings.