"easy/hard" one more time

Discussion of chess software programming and technical issues.

Moderator: Ras

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

"easy/hard" one more time

Post by bob »

So far, nothing significant to report. Which is surprising. But I do have a question about some simple correlation analysis that is NOT my area of expertise.

Suppose I have some "observations" that look like this:

1. 2x
2. 3x
3. 1x
4. 1x
5. 2x

Where those "2x" things mean the growth rate for each ply-one move over the previous iteration (ie 2x means 2x more nodes for last iteration over previous iteration.)

I then have something I "hope" is related, indicated as the following letters:

A. no change in best move
B. Best move was replaced but new move was slightly worse.
C. Best move was replaced, and new move was significantly worse.
D. Best move was replaced, but new move was slightly better.
E. Best move was replaced, and new move was significantly better.

My interest is in taking thousands of such "observations/outcomes" to see if there is any correlation between any of them. So far, it looks to be completely random. Sometimes the best move changes significantly but the growth rate for ALL moves has been constant over the last N iterations. Sometimes the growth rate for the new to be best move grows significantly, as expected, but not always (or even most of the time).

Question is, given enough such observations (and just growth rate is not the only way this could be done, there are other measures, such as percent of tree in second node as opposed to first, etc) is there a straightforward statistical tool that will eat a large volume of such data to see if there is any meaningful correlation between ANYTHING related to tree size and potential move changes???

I'd like to see something that might catch some condition and point out what that suggests (second move is growing faster than first for 2 or 3 iterations is strongly correlated with second move eventually becoming the best move.

At one time I think this was straightforward, but not today. With deep null-move reductions, LMR, and forward pruning, the node counts are pretty bizarre, to say the least. If you start printing them out as I have been for weeks, you begin to notice some AMAZING things (some moves take zero (ZERO!) nodes to search thanks to our good friend LMR and hash hits...)
gavanewijk
Posts: 9
Joined: Tue Mar 12, 2013 6:00 pm
Location: Netherlands

Re: "easy/hard" one more time

Post by gavanewijk »

bob wrote: Question is, given enough such observations (and just growth rate is not the only way this could be done, there are other measures, such as percent of tree in second node as opposed to first, etc) is there a straightforward statistical tool that will eat a large volume of such data to see if there is any meaningful correlation between ANYTHING related to tree size and potential move changes???
I can recomment RapidMiner. It's a free data mining tool with lot's of machine learning techniques. The interface takes a bit of getting used to, but there are examples to follow. You have a classification problem at hand, a Decision Tree or Decision Stump algorithm would be my first thing to try.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: "easy/hard" one more time

Post by bob »

gavanewijk wrote:
bob wrote: Question is, given enough such observations (and just growth rate is not the only way this could be done, there are other measures, such as percent of tree in second node as opposed to first, etc) is there a straightforward statistical tool that will eat a large volume of such data to see if there is any meaningful correlation between ANYTHING related to tree size and potential move changes???
I can recomment RapidMiner. It's a free data mining tool with lot's of machine learning techniques. The interface takes a bit of getting used to, but there are examples to follow. You have a classification problem at hand, a Decision Tree or Decision Stump algorithm would be my first thing to try.
I had thought of data mining, and even have a group of those folks here. I suppose my best bet might be to talk to them about optimal formatting, and then just give 'em a mountain of data and let 'em sort it out to see if anything correlates with anything useful. Particularly something odd happening now, and then a real move or two later in the game, something breaks.