Survey: tools for database analysis

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Fulvio
Posts: 395
Joined: Fri Aug 12, 2016 8:43 pm

Survey: tools for database analysis

Post by Fulvio »

I would like to know about the available tools to extract custom statistics from a chess database.

Let me make an example with the latest lichess database (12 million games, 26GB unzipped):
https://database.lichess.org/
and let's say that I want to make a D3.js graph:
https://github.com/d3/d3/wiki/Gallery
which needs a csv file:

Code: Select all

elo_range, n_players, avg_n_games
1300, 34, 5
1350, 58, 30
1400, 23, 9
....
The steps are easy:
-calculate the average elo for each player and assign the player to an elo_range
-count the number of players for each elo_range
-calculate the average number of games played for each elo_range (total number of games of the players in the group divided by the number of players).

Which tools are able to extract the data?
jdart
Posts: 4366
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: Survey: tools for database analysis

Post by jdart »

Not trivial because of the input format (PGN).

If you have programming skills, you might look at python-chess (https://pypi.python.org/pypi/python-chess), which is very handy for writing little scripts that process PGN, among other things.

--Jon
Dann Corbit
Posts: 12537
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Survey: tools for database analysis

Post by Dann Corbit »

There is a language called CQL (Chess Query Language).

http://www.gadycosteff.com/cql/

They were even so nice as to send me the source code.

In general, tools for this sort of thing are like hen's teeth and pickle smoke.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Fulvio
Posts: 395
Joined: Fri Aug 12, 2016 8:43 pm

Re: Survey: tools for database analysis

Post by Fulvio »

jdart wrote:(https://pypi.python.org/pypi/python-chess), which is very handy for writing little scripts that process PGN, among other things.
Thanks, I'm impressed by the quality of their documentation.
Fulvio
Posts: 395
Joined: Fri Aug 12, 2016 8:43 pm

Re: Survey: tools for database analysis

Post by Fulvio »

Dann Corbit wrote:There is a language called CQL (Chess Query Language).
Thanks, but I think that CQL is not able to produce statistics.
Henk
Posts: 7216
Joined: Mon May 27, 2013 10:31 am

Re: Survey: tools for database analysis

Post by Henk »

Just saw a video where Martin says database is an implementation detail. A plug-in like the web is a plug-in. Making the job of a database administrator ridiculous.

So maybe better call it data analysis,
Fulvio
Posts: 395
Joined: Fri Aug 12, 2016 8:43 pm

Re: Survey: tools for database analysis

Post by Fulvio »

Henk wrote:Just saw a video where Martin
If it's interesting please post the link :)
Henk wrote: says database is an implementation detail.
That's actually the meaning of my question.
The example of data analysis I made, count the number of players based on their average ELO, was intentionally trivial.
However, many people cannot do that, not because it's difficult, but because of the implementation detail (a PGN file of 26GB).
That's why I wonder if it exists a tool to solve the problem (make the access to the data easy).
And if if does not exists, I wonder if it is because very few people are interested in it.
Henk
Posts: 7216
Joined: Mon May 27, 2013 10:31 am

Re: Survey: tools for database analysis

Post by Henk »

This one. Somewhere at the end of the video perhaps after about 40 minutes.

https://www.youtube.com/watch?v=lZq8Jlq18Ec&t=81s
Henk
Posts: 7216
Joined: Mon May 27, 2013 10:31 am

Re: Survey: tools for database analysis

Post by Henk »

I don't understand/trust all these videos. Some also claim object oriented programming is bad, functional programming is bad. So what can you use instead.

I can understand one should reduce dependencies between modules.