For data lovers only

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
User avatar
Rebel
Posts: 4472
Joined: Thu Aug 18, 2011 10:04 am

For data lovers only

Post by Rebel » Wed Jul 10, 2019 5:57 am

Made a start trying (emphasis added) to separate chaff of corn from the giant monthly Lichess downloads.

So far I have 3.7 million human-human games between at least 2200 elo rated players.

About 8 million annotated (score only) Stockfish games. And I am not even halfway.

http://rebel13.nl/download/data.html
Everybody is unique, except me.

User avatar
Ovyron
Posts: 2143
Joined: Tue Jul 03, 2007 2:30 am

Re: For data lovers only

Post by Ovyron » Wed Jul 10, 2019 6:29 am

Rebel wrote:
Wed Jul 10, 2019 5:57 am
So far I have 3.7 million human-human games between at least 2200 elo rated players.
Why was 2200 elo chosen as the cutting point? I believe it's equivalent to 1900 elo of chess.com.
Make someone happy today.

Dann Corbit
Posts: 9782
Joined: Wed Mar 08, 2006 7:57 pm
Location: Redmond, WA USA
Contact:

Re: For data lovers only

Post by Dann Corbit » Wed Jul 10, 2019 7:13 am

Rebel wrote:
Wed Jul 10, 2019 5:57 am
Made a start trying (emphasis added) to separate chaff of corn from the giant monthly Lichess downloads.

So far I have 3.7 million human-human games between at least 2200 elo rated players.

About 8 million annotated (score only) Stockfish games. And I am not even halfway.

http://rebel13.nl/download/data.html
I think all of it has value because of the tremendous volume.
What mistakes do players below 1000 tend to make?
Below 1500?
Below 2000?

If we only care to find the best moves we need to filter. But there are other interesting answers to questions hid in that data,
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.

jp
Posts: 603
Joined: Mon Apr 23, 2018 5:54 am

Re: For data lovers only

Post by jp » Wed Jul 10, 2019 9:28 am

Dann Corbit wrote:
Wed Jul 10, 2019 7:13 am
I think all of it has value because of the tremendous volume.
What mistakes do players below **** tend to make?
But then it'd really help to know what Lichess ratings would correspond to in FIDE elo.

Otherwise, we just know they are very bad, but not exactly how bad.

User avatar
Rebel
Posts: 4472
Joined: Thu Aug 18, 2011 10:04 am

Re: For data lovers only

Post by Rebel » Wed Jul 10, 2019 10:16 am

Ovyron wrote:
Wed Jul 10, 2019 6:29 am
Rebel wrote:
Wed Jul 10, 2019 5:57 am
So far I have 3.7 million human-human games between at least 2200 elo rated players.
Why was 2200 elo chosen as the cutting point?
Keep the volume reasonable. Maybe I will do 2100 and 2000 later.
Everybody is unique, except me.

jp
Posts: 603
Joined: Mon Apr 23, 2018 5:54 am

Re: For data lovers only

Post by jp » Wed Jul 10, 2019 11:18 am

Rebel wrote:
Wed Jul 10, 2019 10:16 am
Keep the volume reasonable. Maybe I will do 2100 and 2000 later.
Or 2400 and 2500 for smaller download sizes.

EroSennin
Posts: 104
Joined: Fri Apr 09, 2010 1:26 am

Re: For data lovers only

Post by EroSennin » Wed Jul 10, 2019 11:44 am

Ovyron wrote:
Wed Jul 10, 2019 6:29 am
Rebel wrote:
Wed Jul 10, 2019 5:57 am
So far I have 3.7 million human-human games between at least 2200 elo rated players.
Why was 2200 elo chosen as the cutting point? I believe it's equivalent to 1900 elo of chess.com.
The higher you go, the closer the ratings get to chess.com's. At least Lichess 2500 is already 2500 on chess.com

Daniel Shawul
Posts: 3664
Joined: Tue Mar 14, 2006 10:34 am
Location: Ethiopia
Contact:

Re: For data lovers only

Post by Daniel Shawul » Wed Jul 10, 2019 1:18 pm

Rebel wrote:
Wed Jul 10, 2019 5:57 am
Made a start trying (emphasis added) to separate chaff of corn from the giant monthly Lichess downloads.

So far I have 3.7 million human-human games between at least 2200 elo rated players.

About 8 million annotated (score only) Stockfish games. And I am not even halfway.

http://rebel13.nl/download/data.html
This will be useful for supervized neural network training.
It is good if it is sorted by elo from lowest to highest pairs, so that it can emulate
the progression of the neural network in self-training.

User avatar
Rebel
Posts: 4472
Joined: Thu Aug 18, 2011 10:04 am

Re: For data lovers only

Post by Rebel » Wed Jul 10, 2019 3:12 pm

Daniel Shawul wrote:
Wed Jul 10, 2019 1:18 pm
Rebel wrote:
Wed Jul 10, 2019 5:57 am
Made a start trying (emphasis added) to separate chaff of corn from the giant monthly Lichess downloads.

So far I have 3.7 million human-human games between at least 2200 elo rated players.

About 8 million annotated (score only) Stockfish games. And I am not even halfway.

http://rebel13.nl/download/data.html
This will be useful for supervized neural network training.
It is good if it is sorted by elo from lowest to highest pairs, so that it can emulate
the progression of the neural network in self-training.
I heard that before. Just for the record, only the 3.7M human games are checked on 2200 elo, the SF games not.
Everybody is unique, except me.

User avatar
Rebel
Posts: 4472
Joined: Thu Aug 18, 2011 10:04 am

Re: For data lovers only

Post by Rebel » Wed Jul 10, 2019 3:14 pm

jp wrote:
Wed Jul 10, 2019 11:18 am
Rebel wrote:
Wed Jul 10, 2019 10:16 am
Keep the volume reasonable. Maybe I will do 2100 and 2000 later.
Or 2400 and 2500 for smaller download sizes.
Once you have the human database of 3.784.887 games installed you can extract from that database higher elo rated database such as 2300, 2400, 2500 etc. with SOMU 1.5, see the page.
Everybody is unique, except me.

Post Reply