Chess analysis project

Discussion of chess software programming and technical issues.

Moderators: hgm, Dann Corbit, Harvey Williamson

Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Chess analysis project

Post by Ferdy »

Dann Corbit wrote:
Ferdy wrote:Depth 20 may not be enough.

Code: Select all

Objectives

We hope to gather various interesting insights on the skills, ratings, or styles of (famous) chess players. In fact numerous applications can be and have been considered such as cheat detection, computation of an intrinsic, "universal" rating, or the determination of key moments chess players blunder. For instance we would like to answer a question like "Who are the best chess players in history?"

[...]

In average, Stockfish calculates 7 millions combinaisons at depth 20. So, we can think Igrida has, at total, calculates more than 2e15 nodes (~2 000 000 000 000 000).
Actually, depth 20 is pretty useless unless you mini-max the whole mess when you are done.

On high end hardware, SF gets to 20 in a sneeze. So what did we gain?
Chessbase's let's check feature has already done this.
Dann Corbit
Posts: 12482
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Chess analysis project

Post by Dann Corbit »

Ferdy wrote:
Dann Corbit wrote:
Ferdy wrote:Depth 20 may not be enough.

Code: Select all

Objectives

We hope to gather various interesting insights on the skills, ratings, or styles of (famous) chess players. In fact numerous applications can be and have been considered such as cheat detection, computation of an intrinsic, "universal" rating, or the determination of key moments chess players blunder. For instance we would like to answer a question like "Who are the best chess players in history?"

[...]

In average, Stockfish calculates 7 millions combinaisons at depth 20. So, we can think Igrida has, at total, calculates more than 2e15 nodes (~2 000 000 000 000 000).
Actually, depth 20 is pretty useless unless you mini-max the whole mess when you are done.

On high end hardware, SF gets to 20 in a sneeze. So what did we gain?
Chessbase's let's check feature has already done this.
I guess they did not analyze every move that has ever been played.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Chess analysis project

Post by Ferdy »

Dann Corbit wrote:
Ferdy wrote:
Dann Corbit wrote:
Ferdy wrote:Depth 20 may not be enough.

Code: Select all

Objectives

We hope to gather various interesting insights on the skills, ratings, or styles of (famous) chess players. In fact numerous applications can be and have been considered such as cheat detection, computation of an intrinsic, "universal" rating, or the determination of key moments chess players blunder. For instance we would like to answer a question like "Who are the best chess players in history?"

[...]

In average, Stockfish calculates 7 millions combinaisons at depth 20. So, we can think Igrida has, at total, calculates more than 2e15 nodes (~2 000 000 000 000 000).
Actually, depth 20 is pretty useless unless you mini-max the whole mess when you are done.

On high end hardware, SF gets to 20 in a sneeze. So what did we gain?
Chessbase's let's check feature has already done this.
I guess they did not analyze every move that has ever been played.
Probably not, but they are well capable of if they want to, and it is growing every day.

Took one of my book collections and I was surprise to see that all positions of this game are all analyzed in let's check, even after the last position there are still analysis. I also guess they have the data of moves that are not yet played in actual game. I also guess all the positions of the games played by top players of the past are all analyzed.

[pgn]
[Event "?"]
[Site "?"]
[Date "????.??.??"]
[Round "?"]
[White "New game"]
[Black "?"]
[Result "*"]
[PlyCount "101"]

1. e4 e5 2. Nf3 Nc6 3. Bb5 a6 4. Ba4 Nf6 5. O-O Be7 6. Re1 b5 7. Bb3 d6 8. c3
O-O 9. h3 Nb8 10. d4 Nbd7 11. c4 c6 12. a3 bxc4 13. Bxc4 d5 14. exd5 cxd5 15.
Ba2 e4 16. Ne5 Bb7 17. Nc3 Nb6 18. f3 Rc8 19. Bb3 Ba8 20. Bg5 Rc7 21. Rc1 Nfd7
22. Bf4 Bg5 23. Bxg5 Qxg5 24. fxe4 dxe4 25. Qg4 Qxg4 26. Nxg4 g6 27. Nf2 Re8
28. d5 Kg7 29. Nfxe4 Nxd5 30. Nd6 Rxe1+ 31. Rxe1 N5f6 32. Re7 Rc6 33. Rxf7+ Kh6
34. Nc4 Re6 35. Kf2 Kg5 36. Bc2 Bc6 37. Nd2 h5 38. Nb3 Ne5 39. Ra7 Neg4+ 40.
hxg4 Nxg4+ 41. Kf1 Ne3+ 42. Kg1 Nxc2 43. Rxa6 Bd7 44. Rxe6 Bxe6 45. Nc5 Bc4 46.
a4 Kf4 47. a5 Nb4 48. b3 Bf7 49. Nd3+ Nxd3 50. a6 Be8 51. Nd5+ *
[/pgn]

Also in let's check they are only displaying 3 best moves from 3 recognized engines that have the greatest depth of analysis (one bm per engine). The analysis gets deeper every minute or hour, replacing those shallow depths. Their data could be vast if they keep those replaced analysis lines.
Dann Corbit
Posts: 12482
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Chess analysis project

Post by Dann Corbit »

I have 7 million analyzed positions in my database.

I believe this new project involves hundreds of millions.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Zenmastur
Posts: 919
Joined: Sat May 31, 2014 8:28 am

Re: Chess analysis project

Post by Zenmastur »

jdart wrote:I found this on Github:

https://github.com/ChessAnalysis/chess-analysis.

They are using a giant cluster to analyze all the FENs from 5 million or so historical games.

--Jon
I don't see much point to this project. It's very unclear what they expect to gain.

Example: Why analyze a position with 7-pieces or less? These are covered by 7-man table-bases so there is little to be gained.

Second, Ferdinand is right about the openings. Let's Check has already done this in a hap hazard fashion and it has proved next to worthless. In a correspondence game my opponent, unknown to me, followed a line of play that was suggested by Let's Check and I promptly crushed him because the supposedly best move by three engines Bf5 was not only not the best move it was a mistake.

I have a 7 million game data base, that I trimmed down by removing the lowest rated and unrated games from. This left just under 5 million games. In those games there were 284,697,792 unique positions. However most of these positions aren't worth analyzing. There are only 19,683,822 positions that were seen more than once and only 14,038,921 of these occured in the first 60 plies of the game. There are probably a few duplicate games in any data base (even the ones you pay money for) so it would probably be wise to exclude positions that only occur a few times. For positions that occur 5 times or more in the first 60 plies the number of unique positions is less than 2 million. So if I were going to analyze positions these are the ones I would spend time on, at least if my goal were to make an opening book.

So, I would say that the project supervisor needs to step in and give this project a little direction either that or modify the introduction to more accurately reflect their goals.

I did notice that the math on the Github main page seems to be in error. 270,000,000 positions analyzed in 444,000 hours is about 6 seconds per position. If they are only searching on average 7,000,000 nodes per FEN then I have a sub-$100 tablet that can search faster than the hardware they are using! So something in the milk isn't white, so to speak.

Regards,

Forrest
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Chess analysis project

Post by Ferdy »

Zenmastur wrote:I don't see much point to this project. It's very unclear what they expect to gain.
One idea in doing this is to compare the human moves to the bestmove found by computer, you also get the score of the move of human based on engine and the score of the bestmove found by engine itself. This way they can calculate the error of the move made by human on that game. It is then possible to compare who has the least score difference between any player separated by time say Morphy or Capablanca based average score difference.
Another use of the data is finding a human player that is tricky, a player that tries to play to complicate the position without regards if it is best or not.

I have seen in lichess a new feature called chess insights like "If I exchange queen what will be my performance?" The system will look at your games get the pgn without queen and calculate performance. It is only a simple insight, it gets complicated when you add additional criteria.

I have planned something like this myself, I just type,
MagnusC profile RP ending, and I get stats on that criteria. How accurate a player is in this ending, compare it with other player, and you will have an idea of who is good at what ending. This is only possible if you have analysis of all positions played by these players. I guess chessbase is capable of doing this, generating weaknesses of players in certain opening or some stats on tendencies, like a player has the tendency to play worst without queen.

There are a lot of applications in the results of these position analysis, but it is important that the bestmove in a position should be as accurate as possible.
Zenmastur
Posts: 919
Joined: Sat May 31, 2014 8:28 am

Re: Chess analysis project

Post by Zenmastur »

Ferdy wrote:
Zenmastur wrote:I don't see much point to this project. It's very unclear what they expect to gain.
One idea in doing this is to compare the human moves to the bestmove found by computer, you also get the score of the move of human based on engine and the score of the bestmove found by engine itself. This way they can calculate the error of the move made by human on that game. It is then possible to compare who has the least score difference between any player separated by time say Morphy or Capablanca based average score difference.
Another use of the data is finding a human player that is tricky, a player that tries to play to complicate the position without regards if it is best or not.

I have seen in lichess a new feature called chess insights like "If I exchange queen what will be my performance?" The system will look at your games get the pgn without queen and calculate performance. It is only a simple insight, it gets complicated when you add additional criteria.

I have planned something like this myself, I just type,
MagnusC profile RP ending, and I get stats on that criteria. How accurate a player is in this ending, compare it with other player, and you will have an idea of who is good at what ending. This is only possible if you have analysis of all positions played by these players. I guess chessbase is capable of doing this, generating weaknesses of players in certain opening or some stats on tendencies, like a player has the tendency to play worst without queen.

There are a lot of applications in the results of these position analysis, but it is important that the bestmove in a position should be as accurate as possible.
I guess I can see why you would do this for highly rated players like Magnus and the other top 100 players or those of historical interest, but most of those games are from amateurs so why bother with them?

IIRC about 9% of all games will end in 7-man (or less) table base position. I dont recall what the percentage is for 6-man TB's but I would think that a huge number of positions could be checked with 6-man table bases at the rate of several thousand per second if all they are interested in is human errors. This would significantly reduce the total work required.

Since they are going to analyze all positions it would behoove them to isolate all positions in the opening that are duplicated in the data base and analyze them first and make a database with that data so when they actually analyze the games they can just refer to the opening database. Many opening positions will be seen tens of thousands of times in a 5-million game data base. E.g. in my database the average game length was 82.53 plies. The average number of unique positions per game is 58.85. Which means 23-24 plies per game are duplicated else where in the database. So they could save 125,000 hours of CPU time (~28% of 440,000 hours they claimed on the github page) by thinking a little more before they go jumping off the cliff.

Regards,

Forrest
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Chess analysis project

Post by Ferdy »

Zenmastur wrote:
Ferdy wrote:
Zenmastur wrote:I don't see much point to this project. It's very unclear what they expect to gain.
One idea in doing this is to compare the human moves to the bestmove found by computer, you also get the score of the move of human based on engine and the score of the bestmove found by engine itself. This way they can calculate the error of the move made by human on that game. It is then possible to compare who has the least score difference between any player separated by time say Morphy or Capablanca based average score difference.
Another use of the data is finding a human player that is tricky, a player that tries to play to complicate the position without regards if it is best or not.

I have seen in lichess a new feature called chess insights like "If I exchange queen what will be my performance?" The system will look at your games get the pgn without queen and calculate performance. It is only a simple insight, it gets complicated when you add additional criteria.

I have planned something like this myself, I just type,
MagnusC profile RP ending, and I get stats on that criteria. How accurate a player is in this ending, compare it with other player, and you will have an idea of who is good at what ending. This is only possible if you have analysis of all positions played by these players. I guess chessbase is capable of doing this, generating weaknesses of players in certain opening or some stats on tendencies, like a player has the tendency to play worst without queen.

There are a lot of applications in the results of these position analysis, but it is important that the bestmove in a position should be as accurate as possible.
I guess I can see why you would do this for highly rated players like Magnus and the other top 100 players or those of historical interest, but most of those games are from amateurs so why bother with them?

IIRC about 9% of all games will end in 7-man (or less) table base position. I dont recall what the percentage is for 6-man TB's but I would think that a huge number of positions could be checked with 6-man table bases at the rate of several thousand per second if all they are interested in is human errors. This would significantly reduce the total work required.

Since they are going to analyze all positions it would behoove them to isolate all positions in the opening that are duplicated in the data base and analyze them first and make a database with that data so when they actually analyze the games they can just refer to the opening database. Many opening positions will be seen tens of thousands of times in a 5-million game data base. E.g. in my database the average game length was 82.53 plies. The average number of unique positions per game is 58.85. Which means 23-24 plies per game are duplicated else where in the database. So they could save 125,000 hours of CPU time (~28% of 440,000 hours they claimed on the github page) by thinking a little more before they go jumping off the cliff.

Regards,

Forrest
It is better to start at top 100 for example, then the next 200 and so on bypassing duplicates as they progressed.