expert understanding of computer chess

duncan · Post by **duncan** » Sun Dec 02, 2012 2:33 pm

if you had houdini or any top engine play itself at 10 plys, 11plys, 12 plys etc until 24 plys. 10 games were played for each ply making a total of 150 games.

a grandmaster looks at all the games (totally mixed up)without computer help. and has to work out what ply each game was played at. He gets a point for getting within 2 of the right ply and loses a point if he is more than 2 out. so for example if a game was played at 16 ply and he estimates anywhere from 14 to 18 he gets a point. 13 and 19 he looses a point.

maximum amount of points possible 150. How would a grandmaster do in such a test . 150 ? 0? - 50? , -100?

duncan

Ajedrecista · Post by **Ajedrecista** » Sun Dec 02, 2012 2:58 pm

Hello:

duncan wrote:if you had houdini or any top engine play itself at 10 plys, 11plys, 12 plys etc until 24 plys. 10 games were played for each ply making a total of 150 games.

a grandmaster looks at all the games (totally mixed up)without computer help. and has to work out what ply each game was played at. He gets a point for getting within 2 of the right ply and loses a point if he is more than 2 out. so for example if a game was played at 16 ply and he estimates anywhere from 14 to 18 he gets a point. 13 and 19 he looses a point.

maximum amount of points possible 150. How would a grandmaster do in such a test . 150 ? 0? - 50? , -100?

duncan

Your proposal is very tricky IMHO because each engine outputs a different depth... I mean, search depths are not equivalent between engines. The most extreme case that comes to my mind right now are Rybka and Stockfish among top engines. If you play a fixed depth match between Rybka and Stockfish, Rybka will destroy SF by a large margin, specially with non-high depths; but SF and Rybka have a similar rating (with the same hardware) in several rating lists if I am not wrong.

If the GM is familiarized with top engines and their behaviour and he/she knows which engine is used, then he/she could earn few points; otherwise I see it like a true lottery.

Other fact that I see is what happens with very high depths (for example, depth 40 in SF due to a great hardware and very long time controls): would anyone say something between 38 and 42? Probably an engine can not play very different moves with depths 30 or 40 (diminishing returns and other things). Very low depths can bring similar problems...

SUMMARIZING: I am totally clueless, but at least I tried to show some drawbacks to your proposal.

Regards from Spain.

Ajedrecista.

duncan · Post by **duncan** » Sun Dec 02, 2012 3:26 pm

Ajedrecista wrote:
Your proposal is very tricky IMHO because each engine outputs a different depth... I mean, search depths are not equivalent between engines. The most extreme case that comes to my mind right now are Rybka and Stockfish among top engines. If you play a fixed depth match between Rybka and Stockfish, Rybka will destroy SF by a large margin, specially with non-high depths; but SF and Rybka have a similar rating (with the same hardware) in several rating lists if I am not wrong.

because of this issue. I suggested self play

wrote: If the GM is familiarized with top engines and their behaviour and he/she knows which engine is used, then he/she could earn few points; otherwise I see it like a true lottery.

interesting. so you say, it is not a lack of grandmaster skill that is the problem here, even at depths 10 to to 24.

wrote: Other fact that I see is what happens with very high depths (for example, depth 40 in SF due to a great hardware and very long time controls): would anyone say something between 38 and 42? Probably an engine can not play very different moves with depths 30 or 40 (diminishing returns and other things). Very low depths can bring similar problems...

SUMMARIZING: I am totally clueless, but at least I tried to show some drawbacks to your proposal.

Regards from Spain.

Ajedrecista.

Ajedrecista · Post by **Ajedrecista** » Sun Dec 02, 2012 4:46 pm

Hello again:

duncan wrote:
Ajedrecista wrote:
Your proposal is very tricky IMHO because each engine outputs a different depth... I mean, search depths are not equivalent between engines. The most extreme case that comes to my mind right now are Rybka and Stockfish among top engines. If you play a fixed depth match between Rybka and Stockfish, Rybka will destroy SF by a large margin, specially with non-high depths; but SF and Rybka have a similar rating (with the same hardware) in several rating lists if I am not wrong.

because of this issue. I suggested self play

Sorry if I did not explain myself enough clear: I understood that you propose self play, but I say that for example a GM sees a game of a top engine and if he/she can understand the typical depths for the strength of the moves played in that game, it will be a lottery to say 'depth X' blindly without knowing the used engine because Rybka can play those strong moves at depth 5 while SF do at depth 10. I think it is the reason why you accept X ± 2.

It is an example with numbers, so please do not take it as I said 'Rybka (depth 5) and SF (depth 10) play moves of similar quality'.

An interesting experiment could be the following one: first doing your proposal as is (without knowing the engine) and then repeat it knowing which engine was used (with the same criteria of accepting X ± 2): it is interesting to see if the GM can benefit from the info given, that is, the name of the engine. The GM should not know his/her score until the two proofs are finished.

I hope you will understand now my point; otherwise I am unable of explain it better. Try to convince a GM to do your experiment!

Good luck.

Regards from Spain.

Ajedrecista.

duncan · Post by **duncan** » Sun Dec 02, 2012 11:22 pm

Ajedrecista wrote: Sorry if I did not explain myself enough clear: I understood that you propose self play, but I say that for example a GM sees a game of a top engine and if he/she can understand the typical depths for the strength of the moves played in that game, it will be a lottery to say 'depth X' blindly without knowing the used engine because Rybka can play those strong moves at depth 5 while SF do at depth 10. I think it is the reason why you accept X ± 2.

I see your point.

but (1) he does not have to get the ply no directly. but order the games in 15 categories of strength. then infer the ply no from the category no.

or (2) what about games played 1 sec/move 2 sec/move 4 sec, 8 sec.. 512 sec.

gm gets a point if if he gets within 25% of time, above or below. how do you think he would do.?

duncan

wrote: It is an example with numbers, so please do not take it as I said 'Rybka (depth 5) and SF (depth 10) play moves of similar quality'.

An interesting experiment could be the following one: first doing your proposal as is (without knowing the engine) and then repeat it knowing which engine was used (with the same criteria of accepting X ± 2): it is interesting to see if the GM can benefit from the info given, that is, the name of the engine. The GM should not know his/her score until the two proofs are finished.

I hope you will understand now my point; otherwise I am unable of explain it better. Try to convince a GM to do your experiment! Good luck.

Regards from Spain.

Ajedrecista.

Don · Post by **Don** » Mon Dec 03, 2012 4:17 am

duncan wrote:if you had houdini or any top engine play itself at 10 plys, 11plys, 12 plys etc until 24 plys. 10 games were played for each ply making a total of 150 games.

a grandmaster looks at all the games (totally mixed up)without computer help. and has to work out what ply each game was played at. He gets a point for getting within 2 of the right ply and loses a point if he is more than 2 out. so for example if a game was played at 16 ply and he estimates anywhere from 14 to 18 he gets a point. 13 and 19 he looses a point.

maximum amount of points possible 150. How would a grandmaster do in such a test . 150 ? 0? - 50? , -100?

duncan

That is a good question. A score of zero would imply he is right the same number of times he is wrong and I think a GM would score negative. That is not necessary bad - a random guess would be very negative so I'm saying he would do better than random but not that much better.

There are probably better scoring functions that would be a little friendlier to the GM such as decreasing scores for be off more and more. I think a scoring function that returned zero (or 50%) for a random guess would be good as it makes it clearer whether the answer is better than a random guess.

IGarcia · Post by **IGarcia** » Mon Dec 03, 2012 12:36 pm

Don wrote:
duncan wrote:if you had houdini or any top engine play itself at 10 plys, 11plys, 12 plys etc until 24 plys. 10 games were played for each ply making a total of 150 games.

a grandmaster looks at all the games (totally mixed up)without computer help. and has to work out what ply each game was played at. He gets a point for getting within 2 of the right ply and loses a point if he is more than 2 out. so for example if a game was played at 16 ply and he estimates anywhere from 14 to 18 he gets a point. 13 and 19 he looses a point.

maximum amount of points possible 150. How would a grandmaster do in such a test . 150 ? 0? - 50? , -100?

duncan

That is a good question. A score of zero would imply he is right the same number of times he is wrong and I think a GM would score negative. That is not necessary bad - a random guess would be very negative so I'm saying he would do better than random but not that much better.

There are probably better scoring functions that would be a little friendlier to the GM such as decreasing scores for be off more and more. I think a scoring function that returned zero (or 50%) for a random guess would be good as it makes it clearer whether the answer is better than a random guess.

What are you looking?
What will be the conclusions if this test?

Don · Post by **Don** » Mon Dec 03, 2012 1:57 pm

IGarcia wrote:
Don wrote:
duncan wrote:if you had houdini or any top engine play itself at 10 plys, 11plys, 12 plys etc until 24 plys. 10 games were played for each ply making a total of 150 games.

a grandmaster looks at all the games (totally mixed up)without computer help. and has to work out what ply each game was played at. He gets a point for getting within 2 of the right ply and loses a point if he is more than 2 out. so for example if a game was played at 16 ply and he estimates anywhere from 14 to 18 he gets a point. 13 and 19 he looses a point.

maximum amount of points possible 150. How would a grandmaster do in such a test . 150 ? 0? - 50? , -100?

duncan

That is a good question. A score of zero would imply he is right the same number of times he is wrong and I think a GM would score negative. That is not necessary bad - a random guess would be very negative so I'm saying he would do better than random but not that much better.

There are probably better scoring functions that would be a little friendlier to the GM such as decreasing scores for be off more and more. I think a scoring function that returned zero (or 50%) for a random guess would be good as it makes it clearer whether the answer is better than a random guess.
What are you looking?
What will be the conclusions if this test?

This isn't my question, but I still find it interesting because in the past I have heard people say they didn't need to test, they only need to see a couple of games and they could tell by looking at how the computer played whether it was better.

I thought of a scoring function. The GM assigns 10 each of 10 depth number to all the games. Using that data you can sample any 2 games (I suggest 2 not of the same level) and determine which game of the 2 the GM would consider better played. So your scoring function is the percentage he gets correct and 50% would be no better than random. I think there are 57,500 combinations of two games you can compare not including games of the same level assuming I calculated that correctly. It is 250 * 10 * 23

But ranking 250 games is a lot, I think if you simply used 2 different levels (perhaps 2 ply apart) and played 10 games each you would get 100 comparisons - the GM simply put them into 2 buckets, the weak bucket and the strong bucket. Then you can score from 0 to 100 but of course the results would have a lot of statistical error and it would depend on how far apart the levels are.

duncan · Post by **duncan** » Mon Dec 03, 2012 2:24 pm

Don wrote: That is a good question. A score of zero would imply he is right the same number of times he is wrong and I think a GM would score negative. That is not necessary bad - a random guess would be very negative so I'm saying he would do better than random but not that much better.

There are probably better scoring functions that would be a little friendlier to the GM such as decreasing scores for be off more and more. I think a scoring function that returned zero (or 50%) for a random guess would be good as it makes it clearer whether the answer is better than a random guess.

what I was trying to find out if there is a point in computer chess search, beyond which a grand master is completely out of his depth and will have zero understanding of. could this test be used as evidence one way or the other. If not. what test would you set ?

duncan

expert understanding of computer chess

expert understanding of computer chess

Re: Expert understanding of computer chess.

Re: Expert understanding of computer chess.

Re: Expert understanding of computer chess.

Re: Expert understanding of computer chess.

Re: expert understanding of computer chess

Re: expert understanding of computer chess

Re: expert understanding of computer chess

Re: expert understanding of computer chess