For data lovers only

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

For data lovers only

Post by Rebel »

Made a start trying (emphasis added) to separate chaff of corn from the giant monthly Lichess downloads.

So far I have 3.7 million human-human games between at least 2200 elo rated players.

About 8 million annotated (score only) Stockfish games. And I am not even halfway.

http://rebel13.nl/download/data.html
90% of coding is debugging, the other 10% is writing bugs.
User avatar
Ovyron
Posts: 4556
Joined: Tue Jul 03, 2007 4:30 am

Re: For data lovers only

Post by Ovyron »

Rebel wrote: Wed Jul 10, 2019 7:57 amSo far I have 3.7 million human-human games between at least 2200 elo rated players.
Why was 2200 elo chosen as the cutting point? I believe it's equivalent to 1900 elo of chess.com.
Your beliefs create your reality, so be careful what you wish for.
Dann Corbit
Posts: 12537
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: For data lovers only

Post by Dann Corbit »

Rebel wrote: Wed Jul 10, 2019 7:57 am Made a start trying (emphasis added) to separate chaff of corn from the giant monthly Lichess downloads.

So far I have 3.7 million human-human games between at least 2200 elo rated players.

About 8 million annotated (score only) Stockfish games. And I am not even halfway.

http://rebel13.nl/download/data.html
I think all of it has value because of the tremendous volume.
What mistakes do players below 1000 tend to make?
Below 1500?
Below 2000?

If we only care to find the best moves we need to filter. But there are other interesting answers to questions hid in that data,
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
jp
Posts: 1470
Joined: Mon Apr 23, 2018 7:54 am

Re: For data lovers only

Post by jp »

Dann Corbit wrote: Wed Jul 10, 2019 9:13 am I think all of it has value because of the tremendous volume.
What mistakes do players below **** tend to make?
But then it'd really help to know what Lichess ratings would correspond to in FIDE elo.

Otherwise, we just know they are very bad, but not exactly how bad.
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: For data lovers only

Post by Rebel »

Ovyron wrote: Wed Jul 10, 2019 8:29 am
Rebel wrote: Wed Jul 10, 2019 7:57 amSo far I have 3.7 million human-human games between at least 2200 elo rated players.
Why was 2200 elo chosen as the cutting point?
Keep the volume reasonable. Maybe I will do 2100 and 2000 later.
90% of coding is debugging, the other 10% is writing bugs.
jp
Posts: 1470
Joined: Mon Apr 23, 2018 7:54 am

Re: For data lovers only

Post by jp »

Rebel wrote: Wed Jul 10, 2019 12:16 pm Keep the volume reasonable. Maybe I will do 2100 and 2000 later.
Or 2400 and 2500 for smaller download sizes.
EroSennin
Posts: 133
Joined: Fri Apr 09, 2010 3:26 am

Re: For data lovers only

Post by EroSennin »

Ovyron wrote: Wed Jul 10, 2019 8:29 am
Rebel wrote: Wed Jul 10, 2019 7:57 amSo far I have 3.7 million human-human games between at least 2200 elo rated players.
Why was 2200 elo chosen as the cutting point? I believe it's equivalent to 1900 elo of chess.com.
The higher you go, the closer the ratings get to chess.com's. At least Lichess 2500 is already 2500 on chess.com
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: For data lovers only

Post by Daniel Shawul »

Rebel wrote: Wed Jul 10, 2019 7:57 am Made a start trying (emphasis added) to separate chaff of corn from the giant monthly Lichess downloads.

So far I have 3.7 million human-human games between at least 2200 elo rated players.

About 8 million annotated (score only) Stockfish games. And I am not even halfway.

http://rebel13.nl/download/data.html
This will be useful for supervized neural network training.
It is good if it is sorted by elo from lowest to highest pairs, so that it can emulate
the progression of the neural network in self-training.
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: For data lovers only

Post by Rebel »

Daniel Shawul wrote: Wed Jul 10, 2019 3:18 pm
Rebel wrote: Wed Jul 10, 2019 7:57 am Made a start trying (emphasis added) to separate chaff of corn from the giant monthly Lichess downloads.

So far I have 3.7 million human-human games between at least 2200 elo rated players.

About 8 million annotated (score only) Stockfish games. And I am not even halfway.

http://rebel13.nl/download/data.html
This will be useful for supervized neural network training.
It is good if it is sorted by elo from lowest to highest pairs, so that it can emulate
the progression of the neural network in self-training.
I heard that before. Just for the record, only the 3.7M human games are checked on 2200 elo, the SF games not.
90% of coding is debugging, the other 10% is writing bugs.
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: For data lovers only

Post by Rebel »

jp wrote: Wed Jul 10, 2019 1:18 pm
Rebel wrote: Wed Jul 10, 2019 12:16 pm Keep the volume reasonable. Maybe I will do 2100 and 2000 later.
Or 2400 and 2500 for smaller download sizes.
Once you have the human database of 3.784.887 games installed you can extract from that database higher elo rated database such as 2300, 2400, 2500 etc. with SOMU 1.5, see the page.
90% of coding is debugging, the other 10% is writing bugs.