For data lovers only

Discussion of anything and everything relating to chess playing software and machines.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
User avatar
Ovyron
Posts: 2829
Joined: Tue Jul 03, 2007 2:30 am

Re: For data lovers only

Post by Ovyron » Thu Jul 11, 2019 2:01 am

jorose wrote:
Thu Jul 11, 2019 1:27 am
It's hard to say exactly. This link gives a rough impression and is based on player surveys. I imagine the data has some pretty big error bars though. My understanding is the gap used to be larger between the two sites, but chess.com ratings have inflated quite a bit in recent years.
Interesting. I had 1800 at lichess and can't break 1400 at chess.com (survey says I should have no problem going over 1600), now I wonder why I'm such a big outlier.
Great spirits have always encountered violent opposition from mediocre minds.

User avatar
Rebel
Posts: 4788
Joined: Thu Aug 18, 2011 10:04 am

Re: For data lovers only

Post by Rebel » Thu Jul 11, 2019 5:56 am

Dann Corbit wrote:
Thu Jul 11, 2019 1:26 am
Rebel wrote:
Wed Jul 10, 2019 11:22 pm
Dann Corbit wrote:
Wed Jul 10, 2019 10:15 pm
Rebel wrote:
Wed Jul 10, 2019 9:32 pm
All done...

In total 25.6 million annotated Stockfish games.

Say average game length is 60 moves, we get:

25.000.000 x 120 = 3.000.000.000 positions

for NN training or other purposes.
If you post a link, I will be grateful to greedily snatch a copy.
It's on the link in the OP - http://rebel13.nl/download/data.html
Too bad there are no search depth indicators in the data.
It is hard to know what time of search means since we don't know the hardware so the evaluations can only be taken as a very crude measurement.
Indeed.

But I am still optimistic since I realize that:

1. SF10 at 0.1 second already plays at 3000 elo.
2. SF10 at 1.0 second already plays at 3250 elo.

A chess tree would be nice.

Maybe even a Polyglot book.
90% of coding is debugging, the other 10% is writing bugs.

Dann Corbit
Posts: 10204
Joined: Wed Mar 08, 2006 7:57 pm
Location: Redmond, WA USA
Contact:

Re: For data lovers only

Post by Dann Corbit » Thu Jul 11, 2019 6:06 am

All the downloads failed at the very end for me (using FireFox derivative PaleMoon).
But I did something underhanded to find the absolute address and downloaded them using wget (I was nice, one at a time).
I know you want to enforce your own interface, but I figured you would not get mad since I tried to do it that way first.

I would be curious to know if anyone else got the failures at the very end like I did,
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.

jp
Posts: 840
Joined: Mon Apr 23, 2018 5:54 am

Re: For data lovers only

Post by jp » Thu Jul 11, 2019 8:12 am

jorose wrote:
Thu Jul 11, 2019 1:27 am
Ovyron wrote:
Thu Jul 11, 2019 1:07 am
jorose wrote:
Wed Jul 10, 2019 7:19 pm
That might have been true at some point in the passed, but nowadays the distributions match fairly closely in that rating range. 2200 Lichess blitz is roughly 2200 chess.com blitz.
When analyzing the quality of the moves people play, it seems chess.com's 1300 elo players play at the same level as 1600 elo players of Lichess. I wonder if there's a tool to check for this automatically. But, yeah, my mistake was assuming this was an offset in rating, I didn't expect the difference to shrink the higher the rating so that lichess "catches up" at some point.
It's hard to say exactly. This link gives a rough impression and is based on player surveys. I imagine the data has some pretty big error bars though. My understanding is the gap used to be larger between the two sites, but chess.com ratings have inflated quite a bit in recent years.
What would be useful would be comparison with FIDE ratings.

User avatar
Rebel
Posts: 4788
Joined: Thu Aug 18, 2011 10:04 am

Re: For data lovers only

Post by Rebel » Thu Jul 11, 2019 8:50 am

Dann Corbit wrote:
Thu Jul 11, 2019 6:06 am
All the downloads failed at the very end for me (using FireFox derivative PaleMoon).
But I did something underhanded to find the absolute address and downloaded them using wget (I was nice, one at a time).
I know you want to enforce your own interface, but I figured you would not get mad since I tried to do it that way first.

I would be curious to know if anyone else got the failures at the very end like I did,
Tried comp-2017-12.7z with IE and Firefox, both ok. So the thing to get mad at is PaleMoon :D
90% of coding is debugging, the other 10% is writing bugs.

chrisw
Posts: 2190
Joined: Tue Apr 03, 2012 2:28 pm

Re: For data lovers only

Post by chrisw » Thu Jul 11, 2019 10:24 am

Dann Corbit wrote:
Thu Jul 11, 2019 6:06 am
All the downloads failed at the very end for me (using FireFox derivative PaleMoon).
But I did something underhanded to find the absolute address and downloaded them using wget (I was nice, one at a time).
I know you want to enforce your own interface, but I figured you would not get mad since I tried to do it that way first.

I would be curious to know if anyone else got the failures at the very end like I did,
I downloaded 2017 to current without any problem apart from huge amount of data. It takes several days here in rural France with slow and unreliable internet. The files, btw, get rejected as too large by memory sticks, so bear in mind how you're going to transfer them between PCs. It's possible some part of your system is rejecting the final save on grounds of size (as per all my memory sticks).

jp
Posts: 840
Joined: Mon Apr 23, 2018 5:54 am

Re: For data lovers only

Post by jp » Thu Jul 11, 2019 4:53 pm

chrisw wrote:
Thu Jul 11, 2019 10:24 am
I downloaded 2017 to current without any problem apart from huge amount of data. It takes several days here in rural France with slow and unreliable internet. The files, btw, get rejected as too large by memory sticks, so bear in mind how you're going to transfer them between PCs. It's possible some part of your system is rejecting the final save on grounds of size (as per all my memory sticks).
Yes, I think in some places, and not just rural France, it'll be almost impossible to download huge files. (Theoretically, it could be done very, very slowly, but the connection will get broken before that long time is finished.) But for PC-to-PC transfers, surely an external hard drive should be fine.

chrisw
Posts: 2190
Joined: Tue Apr 03, 2012 2:28 pm

Re: For data lovers only

Post by chrisw » Thu Jul 11, 2019 6:58 pm

jp wrote:
Thu Jul 11, 2019 4:53 pm
chrisw wrote:
Thu Jul 11, 2019 10:24 am
I downloaded 2017 to current without any problem apart from huge amount of data. It takes several days here in rural France with slow and unreliable internet. The files, btw, get rejected as too large by memory sticks, so bear in mind how you're going to transfer them between PCs. It's possible some part of your system is rejecting the final save on grounds of size (as per all my memory sticks).
Yes, I think in some places, and not just rural France, it'll be almost impossible to download huge files. (Theoretically, it could be done very, very slowly, but the connection will get broken before that long time is finished.) But for PC-to-PC transfers, surely an external hard drive should be fine.
Chrome downloads appear to have a “resume” feature where if communications are cut, you can get restart from where you left off. 8GB data files transfer between networked PCs, or they do on mine, just not on sticks. Then there’s the decompression. Better have some spare HD space. And then the clean and cull, they break SKID. Python chess routines will take a couple of days to process then (I tried). But Ed’s utility will cull down to something manageable quite fast.Thus is the point you realise “big data” management is a thing in itself.

It would be good actually if somebody saved a culled data set for “community” access, but that’s true for SF saved test games and LCZERO test games too. At the moment one has to go on web scraping operations, all the formats are different, some are eval-ed and some not and so on. Be nice for independent NN developers.

Post Reply