For data lovers only

Discussion of anything and everything relating to chess playing software and machines.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
jorose
Posts: 269
Joined: Thu Jan 22, 2015 2:21 pm
Location: Zurich, Switzerland
Full name: Jonathan Rosenthal

Re: For data lovers only

Post by jorose » Wed Jul 10, 2019 7:19 pm

Ovyron wrote:
Wed Jul 10, 2019 6:29 am
Rebel wrote:
Wed Jul 10, 2019 5:57 am
So far I have 3.7 million human-human games between at least 2200 elo rated players.
Why was 2200 elo chosen as the cutting point? I believe it's equivalent to 1900 elo of chess.com.
That might have been true at some point in the passed, but nowadays the distributions match fairly closely in that rating range. 2200 Lichess blitz is roughly 2200 chess.com blitz.
-Jonathan

User avatar
Rebel
Posts: 4788
Joined: Thu Aug 18, 2011 10:04 am

Re: For data lovers only

Post by Rebel » Wed Jul 10, 2019 9:32 pm

All done...

In total 25.6 million annotated Stockfish games.

Say average game length is 60 moves, we get:

25.000.000 x 120 = 3.000.000.000 positions

for NN training or other purposes.
90% of coding is debugging, the other 10% is writing bugs.

fersbery
Posts: 8
Joined: Mon Aug 13, 2018 4:08 am
Full name: Daniel Uranga

Re: For data lovers only

Post by fersbery » Wed Jul 10, 2019 10:01 pm

Thanks for this, it will be very useful for supervised learning.

OT: Long time fan of Rebel here, remember playing a lot of games vs one of the old DOS versions.

jorose
Posts: 269
Joined: Thu Jan 22, 2015 2:21 pm
Location: Zurich, Switzerland
Full name: Jonathan Rosenthal

Re: For data lovers only

Post by jorose » Wed Jul 10, 2019 10:14 pm

Rebel wrote:
Wed Jul 10, 2019 9:32 pm
All done...

In total 25.6 million annotated Stockfish games.

Say average game length is 60 moves, we get:

25.000.000 x 120 = 3.000.000.000 positions

for NN training or other purposes.
Are they SF games (... against SF?) annotated by SF or are they just arbitrary games annotated by SF?
-Jonathan

Dann Corbit
Posts: 10204
Joined: Wed Mar 08, 2006 7:57 pm
Location: Redmond, WA USA
Contact:

Re: For data lovers only

Post by Dann Corbit » Wed Jul 10, 2019 10:15 pm

Rebel wrote:
Wed Jul 10, 2019 9:32 pm
All done...

In total 25.6 million annotated Stockfish games.

Say average game length is 60 moves, we get:

25.000.000 x 120 = 3.000.000.000 positions

for NN training or other purposes.
If you post a link, I will be grateful to greedily snatch a copy.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.

brianr
Posts: 358
Joined: Thu Mar 09, 2006 2:01 pm

Re: For data lovers only

Post by brianr » Wed Jul 10, 2019 11:21 pm

Are the links not the ones in the first post?

User avatar
Rebel
Posts: 4788
Joined: Thu Aug 18, 2011 10:04 am

Re: For data lovers only

Post by Rebel » Wed Jul 10, 2019 11:22 pm

Dann Corbit wrote:
Wed Jul 10, 2019 10:15 pm
Rebel wrote:
Wed Jul 10, 2019 9:32 pm
All done...

In total 25.6 million annotated Stockfish games.

Say average game length is 60 moves, we get:

25.000.000 x 120 = 3.000.000.000 positions

for NN training or other purposes.
If you post a link, I will be grateful to greedily snatch a copy.
It's on the link in the OP - http://rebel13.nl/download/data.html
90% of coding is debugging, the other 10% is writing bugs.

User avatar
Ovyron
Posts: 2828
Joined: Tue Jul 03, 2007 2:30 am

Re: For data lovers only

Post by Ovyron » Thu Jul 11, 2019 1:07 am

jorose wrote:
Wed Jul 10, 2019 7:19 pm
That might have been true at some point in the passed, but nowadays the distributions match fairly closely in that rating range. 2200 Lichess blitz is roughly 2200 chess.com blitz.
When analyzing the quality of the moves people play, it seems chess.com's 1300 elo players play at the same level as 1600 elo players of Lichess. I wonder if there's a tool to check for this automatically. But, yeah, my mistake was assuming this was an offset in rating, I didn't expect the difference to shrink the higher the rating so that lichess "catches up" at some point.
Great spirits have always encountered violent opposition from mediocre minds.

Dann Corbit
Posts: 10204
Joined: Wed Mar 08, 2006 7:57 pm
Location: Redmond, WA USA
Contact:

Re: For data lovers only

Post by Dann Corbit » Thu Jul 11, 2019 1:26 am

Rebel wrote:
Wed Jul 10, 2019 11:22 pm
Dann Corbit wrote:
Wed Jul 10, 2019 10:15 pm
Rebel wrote:
Wed Jul 10, 2019 9:32 pm
All done...

In total 25.6 million annotated Stockfish games.

Say average game length is 60 moves, we get:

25.000.000 x 120 = 3.000.000.000 positions

for NN training or other purposes.
If you post a link, I will be grateful to greedily snatch a copy.
It's on the link in the OP - http://rebel13.nl/download/data.html
Too bad there are no search depth indicators in the data.
It is hard to know what time of search means since we don't know the hardware so the evaluations can only be taken as a very crude measurement.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.

jorose
Posts: 269
Joined: Thu Jan 22, 2015 2:21 pm
Location: Zurich, Switzerland
Full name: Jonathan Rosenthal

Re: For data lovers only

Post by jorose » Thu Jul 11, 2019 1:27 am

Ovyron wrote:
Thu Jul 11, 2019 1:07 am
jorose wrote:
Wed Jul 10, 2019 7:19 pm
That might have been true at some point in the passed, but nowadays the distributions match fairly closely in that rating range. 2200 Lichess blitz is roughly 2200 chess.com blitz.
When analyzing the quality of the moves people play, it seems chess.com's 1300 elo players play at the same level as 1600 elo players of Lichess. I wonder if there's a tool to check for this automatically. But, yeah, my mistake was assuming this was an offset in rating, I didn't expect the difference to shrink the higher the rating so that lichess "catches up" at some point.
It's hard to say exactly. This link gives a rough impression and is based on player surveys. I imagine the data has some pretty big error bars though. My understanding is the gap used to be larger between the two sites, but chess.com ratings have inflated quite a bit in recent years.
-Jonathan

Post Reply