For data lovers only

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

jorose
Posts: 358
Joined: Thu Jan 22, 2015 3:21 pm
Location: Zurich, Switzerland
Full name: Jonathan Rosenthal

Re: For data lovers only

Post by jorose »

Ovyron wrote: Wed Jul 10, 2019 8:29 am
Rebel wrote: Wed Jul 10, 2019 7:57 amSo far I have 3.7 million human-human games between at least 2200 elo rated players.
Why was 2200 elo chosen as the cutting point? I believe it's equivalent to 1900 elo of chess.com.
That might have been true at some point in the passed, but nowadays the distributions match fairly closely in that rating range. 2200 Lichess blitz is roughly 2200 chess.com blitz.
-Jonathan
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: For data lovers only

Post by Rebel »

All done...

In total 25.6 million annotated Stockfish games.

Say average game length is 60 moves, we get:

25.000.000 x 120 = 3.000.000.000 positions

for NN training or other purposes.
90% of coding is debugging, the other 10% is writing bugs.
fersbery
Posts: 8
Joined: Mon Aug 13, 2018 6:08 am
Full name: Daniel Uranga

Re: For data lovers only

Post by fersbery »

Thanks for this, it will be very useful for supervised learning.

OT: Long time fan of Rebel here, remember playing a lot of games vs one of the old DOS versions.
jorose
Posts: 358
Joined: Thu Jan 22, 2015 3:21 pm
Location: Zurich, Switzerland
Full name: Jonathan Rosenthal

Re: For data lovers only

Post by jorose »

Rebel wrote: Wed Jul 10, 2019 11:32 pm All done...

In total 25.6 million annotated Stockfish games.

Say average game length is 60 moves, we get:

25.000.000 x 120 = 3.000.000.000 positions

for NN training or other purposes.
Are they SF games (... against SF?) annotated by SF or are they just arbitrary games annotated by SF?
-Jonathan
Dann Corbit
Posts: 12538
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: For data lovers only

Post by Dann Corbit »

Rebel wrote: Wed Jul 10, 2019 11:32 pm All done...

In total 25.6 million annotated Stockfish games.

Say average game length is 60 moves, we get:

25.000.000 x 120 = 3.000.000.000 positions

for NN training or other purposes.
If you post a link, I will be grateful to greedily snatch a copy.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
brianr
Posts: 536
Joined: Thu Mar 09, 2006 3:01 pm

Re: For data lovers only

Post by brianr »

Are the links not the ones in the first post?
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: For data lovers only

Post by Rebel »

Dann Corbit wrote: Thu Jul 11, 2019 12:15 am
Rebel wrote: Wed Jul 10, 2019 11:32 pm All done...

In total 25.6 million annotated Stockfish games.

Say average game length is 60 moves, we get:

25.000.000 x 120 = 3.000.000.000 positions

for NN training or other purposes.
If you post a link, I will be grateful to greedily snatch a copy.
It's on the link in the OP - http://rebel13.nl/download/data.html
90% of coding is debugging, the other 10% is writing bugs.
User avatar
Ovyron
Posts: 4556
Joined: Tue Jul 03, 2007 4:30 am

Re: For data lovers only

Post by Ovyron »

jorose wrote: Wed Jul 10, 2019 9:19 pmThat might have been true at some point in the passed, but nowadays the distributions match fairly closely in that rating range. 2200 Lichess blitz is roughly 2200 chess.com blitz.
When analyzing the quality of the moves people play, it seems chess.com's 1300 elo players play at the same level as 1600 elo players of Lichess. I wonder if there's a tool to check for this automatically. But, yeah, my mistake was assuming this was an offset in rating, I didn't expect the difference to shrink the higher the rating so that lichess "catches up" at some point.
Your beliefs create your reality, so be careful what you wish for.
Dann Corbit
Posts: 12538
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: For data lovers only

Post by Dann Corbit »

Rebel wrote: Thu Jul 11, 2019 1:22 am
Dann Corbit wrote: Thu Jul 11, 2019 12:15 am
Rebel wrote: Wed Jul 10, 2019 11:32 pm All done...

In total 25.6 million annotated Stockfish games.

Say average game length is 60 moves, we get:

25.000.000 x 120 = 3.000.000.000 positions

for NN training or other purposes.
If you post a link, I will be grateful to greedily snatch a copy.
It's on the link in the OP - http://rebel13.nl/download/data.html
Too bad there are no search depth indicators in the data.
It is hard to know what time of search means since we don't know the hardware so the evaluations can only be taken as a very crude measurement.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
jorose
Posts: 358
Joined: Thu Jan 22, 2015 3:21 pm
Location: Zurich, Switzerland
Full name: Jonathan Rosenthal

Re: For data lovers only

Post by jorose »

Ovyron wrote: Thu Jul 11, 2019 3:07 am
jorose wrote: Wed Jul 10, 2019 9:19 pmThat might have been true at some point in the passed, but nowadays the distributions match fairly closely in that rating range. 2200 Lichess blitz is roughly 2200 chess.com blitz.
When analyzing the quality of the moves people play, it seems chess.com's 1300 elo players play at the same level as 1600 elo players of Lichess. I wonder if there's a tool to check for this automatically. But, yeah, my mistake was assuming this was an offset in rating, I didn't expect the difference to shrink the higher the rating so that lichess "catches up" at some point.
It's hard to say exactly. This link gives a rough impression and is based on player surveys. I imagine the data has some pretty big error bars though. My understanding is the gap used to be larger between the two sites, but chess.com ratings have inflated quite a bit in recent years.
-Jonathan