Borislav Ivanov: a Lilov's add-on

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Dr. Axel Schumacher
Posts: 1191
Joined: Thu Mar 09, 2006 1:32 pm
Location: Cologne-Uppsala-St. Petersburg-Cambridge-Toronto-Munich-Basel

Re: Borislav Ivanov: a Lilov's add-on

Post by Dr. Axel Schumacher »

Actually, there is a guy that did a very nice study on move correlations between GMs and computer programs; his name is Haroldo Ribeiro. (Although he looked more on the error rate compared to the best move correlation). I'm working with him on one study where we look at many games from a huge database. His study is not published yet, but you may want to contact him, and I'm sure he will send you the manuscript of the paper (see contact details in link below).

According to his first analyses, it is interesting to note that some players have a very low error rate, much lower than the former world chess champions, yet who never dominated the chess world. One such example of very low error rate (high computer move correlation) is Kasimdzhanov (despite his FIDE WC title, he was never one of the "best").

here is another publication from Haroldo:
http://www.plosone.org/article/info%3Ad ... 09F7E7B5EA

In case you want to do a really large-scale analysis, please send me the results; I may want to include some of the results in my book "The Science of Chess".

Best,
Axel
"A child of five would understand this. Send someone to fetch a child of five".
Groucho Marx
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Borislav Ivanov: a Lilov's add-on

Post by Don »

mwyoung wrote:
Don wrote:
mwyoung wrote:
Don wrote:I'm also processing some games for Carlsen - extracted from twic data - games of the last year or two.

After 4 games are processed Carlsen is matching 47.37 percent of the games. I think I have over 100 games.

Don
This is not going to be a simple process it looks like. If there is no good corralation been rating and move matching. I would like to see if a more tactical player like nakamora or a morphy would have a higher hit rate. This is going to take some time to work out the patterns. Many player will need to be looked at with the deveation between winning tournaments. And doing badly in tournaments. To see how the hit pattern changes or stays relatively the same. Many variables.
I'm started a run against several thousand twic games - but I'm restricting it to GM vs GM. I hope to get many hundred games in with lots of samples per player. In order to have a usable test you must have data like this to compare against.

I have 364 games evaluated now with 129 players total and growing.

At the moment the highest match rate for a single game where there are 20 or more samples is 75% If there are 25 samples it's 69%. My feeling now is that (using my methodology) seeing a match rate of 75% or higher for a single game (with at least 20 samples) is perhaps not a common occurrence but not uncommon enough to be suspicious.

It's probably more about how many moves you can sample that it is games. We can figure this out later.

I'm hoping that we can show that no player is predisposed to wandering very far from the median which is about 45. Carlsen's median was about 47 - a little above the median. I think the type of game will probably affect the statistics a lot and perhaps even the openings played. My easy move filter seems to help a lot though.
Yes it will be interesting to see the results of the GM vs GM games. I think it is important to use them as the baseline. After all we are trying to separate human play for computer play.

It will be interesting to see if you are correct about the median score of 45, and what type of deviation we see across a wide range of GM players.

If you would include the Capablanca games in the GM base. I have always heard GM Capablanca has the highest move correlation with computers. It will be interesting to see if this is true, and if true what deviation from the median do we see in the data.
I'll dig up some sampling of Capablanca games at some point and add them in. I might as well pick a sampling of other great players too, as the twic database only contains recent players and I am just using the most recent years worth of twic data - already probably too many to run.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
casaschi
Posts: 164
Joined: Wed Dec 23, 2009 1:57 pm

Re: Borislav Ivanov: a Lilov's add-on

Post by casaschi »

Don wrote:So basically if Houdini changes his mind any moves considered on 7th ply and beyond can be a match.
Could you share on average how many moves per position are you considering a match?
I imagine this would have a huge impact on the reliability of such a system for cheat detection based on game analysis.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Borislav Ivanov: a Lilov's add-on

Post by Don »

casaschi wrote:
Don wrote:So basically if Houdini changes his mind any moves considered on 7th ply and beyond can be a match.
Could you share on average how many moves per position are you considering a match?
I imagine this would have a huge impact on the reliability of such a system for cheat detection based on game analysis.
I already outlined the method I am using - and I will probably try to improve it further as I learn more.

Here is the observation: Not all positions are equally useful for matching.

The reason is that some moves are so natural anyone would play the move. It would generated a 100% hit rate against not only Houdini but any other program.

Another observation: Some games have a preponderance of moves that "play themselves", for example in the Capablanca game his opponent made a blunder and his moves were easy to find (Stockfish found the majority of Capablanca's moves on a 1 ply search.)

A third observation: We cannot expect to get 100% match - even Houdini would not match itself 100% of the time because the search is non-deterministic. It depends on the level played, the size of the hash tables, what is in the hash tables before the search begins - etc .....

This third observation was probably why Lilov consider the top 3 moves of Houdini a match. The idea is to identify if a move the human plays is a move Houdini "might" have played but in my opinion this is a really poor way to do that.

A fourth observation is that every human and every computer has it's own individual style of play and it's not strongly correlated with how strong the player is. Komodo will play the same moves on a 5 ply search that it would play on a 20 ply search most of the time and it might be a different move than Houdini would play. This is the basis of the similarity tester. It's a myth that you should expect a GM to match Houdini much more than a FIDE master. Maybe he will match slightly more on average, but much more important is the playing style of the player, not his strength.

So here is the exact move matching procedure in my tool at the moment:

1. I start matching at the 15 move - an attempt to avoid book.

2. If Stockfish plays the same move on depth 1-12 (without varying) and so does Houdini and the player - I ignore this move. It was probably an obvious recapture or other obvious move.

3. Otherwise, I consider the move a match if Houdini would have played the move AT ANY POINT starting at depth 7.

The reasoning for step 3 is as follows:

1. If I start matching at a low depth, even the computer can make a relatively shallow tactical blunder. So the human may not be using a computer, and yet blunder and still get a match if I accept shallow depth matches.

2. Matching whatever is shown over a few iterations is better than matching the top N moves because it is typical for a program to be "torn" between two or three moves. If it appears as the first move in any PV it means Houdini might have played it given the right depth - and of course we cannot know what level the human used.

You asked how many moves on average I am considering a match? I have not studied that, but you can try a sampling of position with Houdini and let it think for about 2 or 3 seconds and count the number of moves it might have played beyond the 6th iteration. In the position after 1. e2e4 e7e5 I counted 2 moves, 2. d4 on depth 10 and 2. Nf3 on depth 11-16

The average is obviously more than 1, but I would estimate that it's less than 2.

Maybe later I will take a sampling to estimate this.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
casaschi
Posts: 164
Joined: Wed Dec 23, 2009 1:57 pm

Re: Borislav Ivanov: a Lilov's add-on

Post by casaschi »

Don wrote:Here is the observation: Not all positions are equally useful for matching.
I agree, my point being that position where you'd consider 3 or more different moves as matching houdini, maybe those positions are not very useful to determine matching. If Houdini shifts between four different moves, the chances are pretty high that a decent player would pick one of those.
Don wrote:The average is obviously more than 1, but I would estimate that it's less than 2.

Maybe later I will take a sampling to estimate this.
It would be good to know, anyway an average smaller than 2 would certainly be acceptable.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Borislav Ivanov: a Lilov's add-on

Post by Don »

casaschi wrote:
Don wrote:Here is the observation: Not all positions are equally useful for matching.
I agree, my point being that position where you'd consider 3 or more different moves as matching houdini, maybe those positions are not very useful to determine matching. If Houdini shifts between four different moves, the chances are pretty high that a decent player would pick one of those.
Don wrote:The average is obviously more than 1, but I would estimate that it's less than 2.

Maybe later I will take a sampling to estimate this.
It would be good to know, anyway an average smaller than 2 would certainly be acceptable.
I opened a new thread in the technical forum where I am placing some of my results. It seems more appropriate for it to be there since we are being to focus on the more technical issues.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
Terry McCracken
Posts: 16465
Joined: Wed Aug 01, 2007 4:16 am
Location: Canada

Re: Borislav Ivanov: a Lilov's add-on

Post by Terry McCracken »

Don wrote:
Terry McCracken wrote:Don, I love your avatar....Let's Play Rex Chess! Better yet, rename Komodo to T-Rex Chess! or T-Rex ver. ;)
Yes, we should go back to Rex!!

Don
I really do like Komodo as a name but I agree, Rex! Not the Bulldog go with the T-Rex when you think Komodo is all grown up. :)

I'm glad we're not at odds but in agreement...it's a nice change. I mean that sincerely. I know I can be difficult but I also don't want to push people away. I'm working on it.

Best,
Terry McCracken