expert understanding of computer chess

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

duncan
Posts: 12038
Joined: Mon Jul 07, 2008 10:50 pm

expert understanding of computer chess

Post by duncan »

if you had houdini or any top engine play itself at 10 plys, 11plys, 12 plys etc until 24 plys. 10 games were played for each ply making a total of 150 games.

a grandmaster looks at all the games (totally mixed up)without computer help. and has to work out what ply each game was played at. He gets a point for getting within 2 of the right ply and loses a point if he is more than 2 out. so for example if a game was played at 16 ply and he estimates anywhere from 14 to 18 he gets a point. 13 and 19 he looses a point.


maximum amount of points possible 150. How would a grandmaster do in such a test . 150 ? 0? - 50? , -100?

duncan
User avatar
Ajedrecista
Posts: 1971
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: Expert understanding of computer chess.

Post by Ajedrecista »

Hello:
duncan wrote:if you had houdini or any top engine play itself at 10 plys, 11plys, 12 plys etc until 24 plys. 10 games were played for each ply making a total of 150 games.

a grandmaster looks at all the games (totally mixed up)without computer help. and has to work out what ply each game was played at. He gets a point for getting within 2 of the right ply and loses a point if he is more than 2 out. so for example if a game was played at 16 ply and he estimates anywhere from 14 to 18 he gets a point. 13 and 19 he looses a point.


maximum amount of points possible 150. How would a grandmaster do in such a test . 150 ? 0? - 50? , -100?

duncan
Your proposal is very tricky IMHO because each engine outputs a different depth... I mean, search depths are not equivalent between engines. The most extreme case that comes to my mind right now are Rybka and Stockfish among top engines. If you play a fixed depth match between Rybka and Stockfish, Rybka will destroy SF by a large margin, specially with non-high depths; but SF and Rybka have a similar rating (with the same hardware) in several rating lists if I am not wrong.

If the GM is familiarized with top engines and their behaviour and he/she knows which engine is used, then he/she could earn few points; otherwise I see it like a true lottery.

Other fact that I see is what happens with very high depths (for example, depth 40 in SF due to a great hardware and very long time controls): would anyone say something between 38 and 42? Probably an engine can not play very different moves with depths 30 or 40 (diminishing returns and other things). Very low depths can bring similar problems...

SUMMARIZING: I am totally clueless, but at least I tried to show some drawbacks to your proposal.

Regards from Spain.

Ajedrecista.
duncan
Posts: 12038
Joined: Mon Jul 07, 2008 10:50 pm

Re: Expert understanding of computer chess.

Post by duncan »

Ajedrecista wrote:
Your proposal is very tricky IMHO because each engine outputs a different depth... I mean, search depths are not equivalent between engines. The most extreme case that comes to my mind right now are Rybka and Stockfish among top engines. If you play a fixed depth match between Rybka and Stockfish, Rybka will destroy SF by a large margin, specially with non-high depths; but SF and Rybka have a similar rating (with the same hardware) in several rating lists if I am not wrong.
because of this issue. I suggested self play
wrote: If the GM is familiarized with top engines and their behaviour and he/she knows which engine is used, then he/she could earn few points; otherwise I see it like a true lottery.
interesting. so you say, it is not a lack of grandmaster skill that is the problem here, even at depths 10 to to 24.

wrote: Other fact that I see is what happens with very high depths (for example, depth 40 in SF due to a great hardware and very long time controls): would anyone say something between 38 and 42? Probably an engine can not play very different moves with depths 30 or 40 (diminishing returns and other things). Very low depths can bring similar problems...

SUMMARIZING: I am totally clueless, but at least I tried to show some drawbacks to your proposal.

Regards from Spain.

Ajedrecista.
User avatar
Ajedrecista
Posts: 1971
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: Expert understanding of computer chess.

Post by Ajedrecista »

Hello again:
duncan wrote:
Ajedrecista wrote:
Your proposal is very tricky IMHO because each engine outputs a different depth... I mean, search depths are not equivalent between engines. The most extreme case that comes to my mind right now are Rybka and Stockfish among top engines. If you play a fixed depth match between Rybka and Stockfish, Rybka will destroy SF by a large margin, specially with non-high depths; but SF and Rybka have a similar rating (with the same hardware) in several rating lists if I am not wrong.
because of this issue. I suggested self play
Sorry if I did not explain myself enough clear: I understood that you propose self play, but I say that for example a GM sees a game of a top engine and if he/she can understand the typical depths for the strength of the moves played in that game, it will be a lottery to say 'depth X' blindly without knowing the used engine because Rybka can play those strong moves at depth 5 while SF do at depth 10. I think it is the reason why you accept X ± 2.

It is an example with numbers, so please do not take it as I said 'Rybka (depth 5) and SF (depth 10) play moves of similar quality'.

An interesting experiment could be the following one: first doing your proposal as is (without knowing the engine) and then repeat it knowing which engine was used (with the same criteria of accepting X ± 2): it is interesting to see if the GM can benefit from the info given, that is, the name of the engine. The GM should not know his/her score until the two proofs are finished.

I hope you will understand now my point; otherwise I am unable of explain it better. Try to convince a GM to do your experiment! ;) Good luck.

Regards from Spain.

Ajedrecista.
duncan
Posts: 12038
Joined: Mon Jul 07, 2008 10:50 pm

Re: Expert understanding of computer chess.

Post by duncan »

Ajedrecista wrote: Sorry if I did not explain myself enough clear: I understood that you propose self play, but I say that for example a GM sees a game of a top engine and if he/she can understand the typical depths for the strength of the moves played in that game, it will be a lottery to say 'depth X' blindly without knowing the used engine because Rybka can play those strong moves at depth 5 while SF do at depth 10. I think it is the reason why you accept X ± 2.

I see your point.

but (1) he does not have to get the ply no directly. but order the games in 15 categories of strength. then infer the ply no from the category no.

or (2) what about games played 1 sec/move 2 sec/move 4 sec, 8 sec.. 512 sec.

gm gets a point if if he gets within 25% of time, above or below. how do you think he would do.?

duncan
wrote: It is an example with numbers, so please do not take it as I said 'Rybka (depth 5) and SF (depth 10) play moves of similar quality'.

An interesting experiment could be the following one: first doing your proposal as is (without knowing the engine) and then repeat it knowing which engine was used (with the same criteria of accepting X ± 2): it is interesting to see if the GM can benefit from the info given, that is, the name of the engine. The GM should not know his/her score until the two proofs are finished.

I hope you will understand now my point; otherwise I am unable of explain it better. Try to convince a GM to do your experiment! ;) Good luck.

Regards from Spain.

Ajedrecista.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: expert understanding of computer chess

Post by Don »

duncan wrote:if you had houdini or any top engine play itself at 10 plys, 11plys, 12 plys etc until 24 plys. 10 games were played for each ply making a total of 150 games.

a grandmaster looks at all the games (totally mixed up)without computer help. and has to work out what ply each game was played at. He gets a point for getting within 2 of the right ply and loses a point if he is more than 2 out. so for example if a game was played at 16 ply and he estimates anywhere from 14 to 18 he gets a point. 13 and 19 he looses a point.


maximum amount of points possible 150. How would a grandmaster do in such a test . 150 ? 0? - 50? , -100?

duncan
That is a good question. A score of zero would imply he is right the same number of times he is wrong and I think a GM would score negative. That is not necessary bad - a random guess would be very negative so I'm saying he would do better than random but not that much better.

There are probably better scoring functions that would be a little friendlier to the GM such as decreasing scores for be off more and more. I think a scoring function that returned zero (or 50%) for a random guess would be good as it makes it clearer whether the answer is better than a random guess.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
IGarcia
Posts: 543
Joined: Mon Jul 05, 2010 10:27 pm

Re: expert understanding of computer chess

Post by IGarcia »

Don wrote:
duncan wrote:if you had houdini or any top engine play itself at 10 plys, 11plys, 12 plys etc until 24 plys. 10 games were played for each ply making a total of 150 games.

a grandmaster looks at all the games (totally mixed up)without computer help. and has to work out what ply each game was played at. He gets a point for getting within 2 of the right ply and loses a point if he is more than 2 out. so for example if a game was played at 16 ply and he estimates anywhere from 14 to 18 he gets a point. 13 and 19 he looses a point.


maximum amount of points possible 150. How would a grandmaster do in such a test . 150 ? 0? - 50? , -100?

duncan

That is a good question. A score of zero would imply he is right the same number of times he is wrong and I think a GM would score negative. That is not necessary bad - a random guess would be very negative so I'm saying he would do better than random but not that much better.

There are probably better scoring functions that would be a little friendlier to the GM such as decreasing scores for be off more and more. I think a scoring function that returned zero (or 50%) for a random guess would be good as it makes it clearer whether the answer is better than a random guess.
What are you looking?
What will be the conclusions if this test?
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: expert understanding of computer chess

Post by Don »

IGarcia wrote:
Don wrote:
duncan wrote:if you had houdini or any top engine play itself at 10 plys, 11plys, 12 plys etc until 24 plys. 10 games were played for each ply making a total of 150 games.

a grandmaster looks at all the games (totally mixed up)without computer help. and has to work out what ply each game was played at. He gets a point for getting within 2 of the right ply and loses a point if he is more than 2 out. so for example if a game was played at 16 ply and he estimates anywhere from 14 to 18 he gets a point. 13 and 19 he looses a point.


maximum amount of points possible 150. How would a grandmaster do in such a test . 150 ? 0? - 50? , -100?

duncan

That is a good question. A score of zero would imply he is right the same number of times he is wrong and I think a GM would score negative. That is not necessary bad - a random guess would be very negative so I'm saying he would do better than random but not that much better.

There are probably better scoring functions that would be a little friendlier to the GM such as decreasing scores for be off more and more. I think a scoring function that returned zero (or 50%) for a random guess would be good as it makes it clearer whether the answer is better than a random guess.
What are you looking?
What will be the conclusions if this test?
This isn't my question, but I still find it interesting because in the past I have heard people say they didn't need to test, they only need to see a couple of games and they could tell by looking at how the computer played whether it was better.

I thought of a scoring function. The GM assigns 10 each of 10 depth number to all the games. Using that data you can sample any 2 games (I suggest 2 not of the same level) and determine which game of the 2 the GM would consider better played. So your scoring function is the percentage he gets correct and 50% would be no better than random. I think there are 57,500 combinations of two games you can compare not including games of the same level assuming I calculated that correctly. It is 250 * 10 * 23

But ranking 250 games is a lot, I think if you simply used 2 different levels (perhaps 2 ply apart) and played 10 games each you would get 100 comparisons - the GM simply put them into 2 buckets, the weak bucket and the strong bucket. Then you can score from 0 to 100 but of course the results would have a lot of statistical error and it would depend on how far apart the levels are.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
duncan
Posts: 12038
Joined: Mon Jul 07, 2008 10:50 pm

Re: expert understanding of computer chess

Post by duncan »

Don wrote: That is a good question. A score of zero would imply he is right the same number of times he is wrong and I think a GM would score negative. That is not necessary bad - a random guess would be very negative so I'm saying he would do better than random but not that much better.

There are probably better scoring functions that would be a little friendlier to the GM such as decreasing scores for be off more and more. I think a scoring function that returned zero (or 50%) for a random guess would be good as it makes it clearer whether the answer is better than a random guess.
what I was trying to find out if there is a point in computer chess search, beyond which a grand master is completely out of his depth and will have zero understanding of. could this test be used as evidence one way or the other. If not. what test would you set ?

duncan