White/Draw/Black Masters+ Dataset

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
jshriver
Posts: 1349
Joined: Wed Mar 08, 2006 9:41 pm
Location: Morgantown, WV, USA

White/Draw/Black Masters+ Dataset

Post by jshriver »

What would you all recommend or like to see in regard to depth for such a database? I know it's been done several times on various sites but wanting to do one myself. I see lichess is using 2200+ from 1950 to Dec 2022, with roughly 2.5 million games.

My current curated game list is 2200elo, year 1950 to last week and just over 8m games. My 2500+ is 1.5 million. My main concern now is depth.

Any commentary is helpful. Planning to opensource the code and possibly the dataset if it will fit on github.
-Josh
chesskobra
Posts: 194
Joined: Thu Jul 21, 2022 12:30 am
Full name: Chesskobra

Re: White/Draw/Black Masters+ Dataset

Post by chesskobra »

What does depth mean? Do you mean finding all distinct positions up to n plies (in a certain database of games)? And then their frequency and WDL statistics? Such data in raw form would be quite useful.

Some time ago I was interested in looking at the number of distinct move sequences, going to say 16 plies, that have appeared in the games of modern players like Carlsen, Nakamura compared to older players like Anand, Kasparov, Ivanchuk, etc. Of course, these numbers would tend to be higher for players who have played more games, players who play more blitz, and the move sequences would also depend on how their opponents play against them. But still it is interesting to see that the games of Carlsen and Nakamura have much more variety. I was doing it without serious scripting (using just pgn-extract, and Unix commands like sort, cut, uniq, etc.) Also, I was not doing WDL. So definitely data like this and the associated scripts would be interesting to play with.
jefk
Posts: 675
Joined: Sun Jul 25, 2010 10:07 pm
Location: the Netherlands
Full name: Jef Kaan

Re: White/Draw/Black Masters+ Dataset

Post by jefk »

well, when reading in the pgn in 'Bookbuilder' i usually limited the pgn at
around 60 ply (years ago at about 50 ply) and then exported the endnodes.epd
subsequently analyzed and minimaxed.

yes nowadays there's more opening variety because in chess, being a draw,
there is not such a thing as a 'best' opening for White; there are many options.

PS that's one of the ways if found confirmation to the old (Steinitz) knowledge
that chess is a draw with best play; action is minus reaction (Newton) :)
For every winning plan there are zillions of defensive plans keeping the draw.
It's a matter of options. In a river, you can push the water, but if you go
with a shovel pushing against the ocean you will achieve nothing;
White can only play one move at the time; if it would be three moves
(and then Black every time only one move), it will be a different story.
In discussions on chess.com -in extenstive discussion(s) about this subject)
there's still a woman (rating around 1700) who believes that with best play White
can win, because the computer engines aren't yet looking deep enough; utterly
bs ofcourse, as also said by an American correspondence player on that site.