Open Chess Game Database Standard

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Open Chess Game Database Standard

Post by mvanthoor »

dangi12012 wrote: Mon Nov 15, 2021 12:35 am A Thread pool for IO? Asynchronous IO runs on a single thread.
If you look at Sopels history - he is a forum troll so better not engage with him.
Maybe, before labeling someone a troll, you should also take into account that he has contributed a great deal to Stockfish's NNUE development, especially with regard to optimization.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
Sopel
Posts: 391
Joined: Tue Oct 08, 2019 11:39 pm
Full name: Tomasz Sobczyk

Re: Open Chess Game Database Standard

Post by Sopel »

dangi12012 wrote: Mon Nov 15, 2021 12:35 am
Sopel wrote: Sun Nov 14, 2021 10:23 pm You want to read the PGNs asynchronously. So either use a dedicated IO thread pool or mmap.
A Thread pool for IO? Asynchronous IO runs on a single thread.
If you look at Sopels history - he is a forum troll so better not engage with him.
Real async IO is a pain in the ass to do in a portable way. A dedicated thread pool for IO works well enough and is simple to set up.
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.

Maybe you copied your stockfish commits from someone else too?
I will look into that.
Guenther
Posts: 4718
Joined: Wed Oct 01, 2008 6:33 am
Location: Regensburg, Germany
Full name: Guenther Simon

Re: Open Chess Game Database Standard

Post by Guenther »

dangi12012 wrote: Mon Nov 15, 2021 12:35 am
Sopel wrote: Sun Nov 14, 2021 10:23 pm You want to read the PGNs asynchronously. So either use a dedicated IO thread pool or mmap.
...
If you look at Sopels history - he is a forum troll so better not engage with him.
Quite the opposite, which was clear since your first posts here...
https://rwbc-chess.de

[Trolls n'existent pas...]
User avatar
phhnguyen
Posts: 1524
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: Open Chess Game Database Standard

Post by phhnguyen »

Sopel wrote: Sun Nov 14, 2021 10:23 pm You want to read the PGNs asynchronously. So either use a dedicated IO thread pool or mmap.
Could you tell me more? Do you have ideas to use them efficiently? It’s best if you could provide some codes or requests (for GitHub).

I have tested already memory map files (mmap). It is good, surprising me but it is just the second-best, behind the fastest one with a significant margin when its code is more complicated.

I don’t have problems with having some ideas and/or implementing them in general. However, sometimes I struggled with how to apply them efficiently. In this case, it is multi-threading. Both reading input (the PGN file) and writing to the database (inserting records) are almost sequencings. Thus logically multi-threading won't help much, especially when other processes (such as parsing PGN tags) are very fast too (not much work to share between threads).
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
Sopel
Posts: 391
Joined: Tue Oct 08, 2019 11:39 pm
Full name: Tomasz Sobczyk

Re: Open Chess Game Database Standard

Post by Sopel »

phhnguyen wrote: Mon Nov 15, 2021 12:31 pm
Sopel wrote: Sun Nov 14, 2021 10:23 pm You want to read the PGNs asynchronously. So either use a dedicated IO thread pool or mmap.
Could you tell me more? Do you have ideas to use them efficiently? It’s best if you could provide some codes or requests (for GitHub).

I have tested already memory map files (mmap). It is good, surprising me but it is just the second-best, behind the fastest one with a significant margin when its code is more complicated.

I don’t have problems with having some ideas and/or implementing them in general. However, sometimes I struggled with how to apply them efficiently. In this case, it is multi-threading. Both reading input (the PGN file) and writing to the database (inserting records) are almost sequencings. Thus logically multi-threading won't help much, especially when other processes (such as parsing PGN tags) are very fast too (not much work to share between threads).
The thing is your benchmark does no work other than reading, so synchronous solutions will not look worse than asynchronous ones. You should add some processing in between the reads to see the issue. For easy asynchronous IO std::async (https://en.cppreference.com/w/cpp/thread/async) is an easy solution that's good enough.
In this case, it is multi-threading. Both reading input (the PGN file) and writing to the database (inserting records) are almost sequencings.
I highly doubt this is the case. You have a pipeline with 4 stages. 1. Reading the file. 2. Parsing the file. 3. Creating the import statements. 4. Importing into DB
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.

Maybe you copied your stockfish commits from someone else too?
I will look into that.
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Open Chess Game Database Standard

Post by dangi12012 »

phhnguyen wrote: Mon Nov 15, 2021 12:31 pm I don’t have problems with having some ideas and/or implementing them in general. However, sometimes I struggled with how to apply them efficiently. In this case, it is multi-threading. Both reading input (the PGN file) and writing to the database (inserting records) are almost sequencings. Thus logically multi-threading won't help much, especially when other processes (such as parsing PGN tags) are very fast too (not much work to share between threads).
As usual very bad trolling advice from sopel:
Sopel wrote: Mon Nov 15, 2021 2:47 pm I highly doubt this is the case. You have a pipeline with 4 stages. 1. Reading the file. 2. Parsing the file. 3. Creating the import statements. 4. Importing into DB
Heres the answer:
Of course you can parse a file multithreaded and very fast - and here is how:
You have to index the offsets and lengths of all games in a pgn without parsing. Just the raw offset with json or a text seeker. This can be insanely fast because maybe you only search for {} tokens or double newlines.
For Lichess DB I did this already and there i just search for doubled newlines with C++ memchr() on a memory mapped file.

Once you have this vector of offsets and lengths - you can easily spawn 32 Threads and parse each offset and length seperately in via a mapped file.

The trick is that seeking for simple tokens in phase 0 is faster than parsing pgn fully - so you just remember the pointeroffset where each game starts and on a second pass you parse in parallel. If your game is a class you can even generate fuctions to prepare a sql statement for insert. This can also be done in parallel.

DB inserts cannot be done in parallel with sqlite (with proper server sql dbs its faster) - but sqlite is threadsafe anyways so no worries and just commit transactions from multiple threads. Should also be a little bit faster because inserts dont stall.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Sopel
Posts: 391
Joined: Tue Oct 08, 2019 11:39 pm
Full name: Tomasz Sobczyk

Re: Open Chess Game Database Standard

Post by Sopel »

dangi12012 wrote: Mon Nov 15, 2021 3:12 pm
phhnguyen wrote: Mon Nov 15, 2021 12:31 pm I don’t have problems with having some ideas and/or implementing them in general. However, sometimes I struggled with how to apply them efficiently. In this case, it is multi-threading. Both reading input (the PGN file) and writing to the database (inserting records) are almost sequencings. Thus logically multi-threading won't help much, especially when other processes (such as parsing PGN tags) are very fast too (not much work to share between threads).
As usual very bad trolling advice from sopel:
Sopel wrote: Mon Nov 15, 2021 2:47 pm I highly doubt this is the case. You have a pipeline with 4 stages. 1. Reading the file. 2. Parsing the file. 3. Creating the import statements. 4. Importing into DB
Heres the answer:
Of course you can parse a file multithreaded and very fast - and here is how:
You have to index the offsets and lengths of all games in a pgn without parsing. Just the raw offset with json or a text seeker. This can be insanely fast because maybe you only search for {} tokens or double newlines.
For Lichess DB I did this already and there i just search for doubled newlines with C++ memchr() on a memory mapped file.

Once you have this vector of offsets and lengths - you can easily spawn 32 Threads and parse each offset and length seperately in via a mapped file.

The trick is that seeking for simple tokens in phase 0 is faster than parsing pgn fully - so you just remember the pointeroffset where each game starts and on a second pass you parse in parallel. If your game is a class you can even generate fuctions to prepare a sql statement for insert. This can also be done in parallel.

DB inserts cannot be done in parallel with sqlite (with proper server sql dbs its faster) - but sqlite is threadsafe anyways so no worries and just commit transactions from multiple threads. Should also be a little bit faster because inserts dont stall.
You really like to assume that memory is infinite and the IO is free, don't you. You DO REQUIRE asynchronicity aside from parallelism to make it work for files of all sizes in an efficient way.
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.

Maybe you copied your stockfish commits from someone else too?
I will look into that.
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Open Chess Game Database Standard

Post by dangi12012 »

Sopel wrote: Mon Nov 15, 2021 3:49 pm You really like to assume that memory is infinite and the IO is free, don't you. You DO REQUIRE asynchronicity aside from parallelism to make it work for files of all sizes in an efficient way.
Last time I will reply since its obvious you are a forum troll. Please lookup how memory mapped files work and how "memory is infinite" is wrong. You obviously dont know that memory mapped files dont get loaded into RAM. Each thread can read via a pointer and all IO is handled via pagefaults. You dont even need to copy from/to any buffers like with streaming IO - the OS maps the "buffer" = 1 page of memory directly into your virtual adress space.
Its the most efficient way for files on windows/linux - and you can even give hints to the OS if you will access randomly or sequentially.

Read both carefully - but I wont reply to your troll attempts anymore.
https://en.wikipedia.org/wiki/Memory_management_unit
https://docs.microsoft.com/en-us/dotnet ... pped-files
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Sopel
Posts: 391
Joined: Tue Oct 08, 2019 11:39 pm
Full name: Tomasz Sobczyk

Re: Open Chess Game Database Standard

Post by Sopel »

dangi12012 wrote: Mon Nov 15, 2021 4:24 pm
Sopel wrote: Mon Nov 15, 2021 3:49 pm You really like to assume that memory is infinite and the IO is free, don't you. You DO REQUIRE asynchronicity aside from parallelism to make it work for files of all sizes in an efficient way.
Last time I will reply since its obvious you are a forum troll. Please lookup how memory mapped files work and how "memory is infinite" is wrong. You obviously dont know that memory mapped files dont get loaded into RAM. Each thread can read via a pointer and all IO is handled via pagefaults. You dont even need to copy from/to any buffers like with streaming IO - the OS maps the "buffer" = 1 page of memory directly into your virtual adress space.
Its the most efficient way for files on windows/linux - and you can even give hints to the OS if you will access randomly or sequentially.

Read both carefully - but I wont reply to your troll attempts anymore.
https://en.wikipedia.org/wiki/Memory_management_unit
https://docs.microsoft.com/en-us/dotnet ... pped-files
I'm done with your straw man arguments. You're unable to hold a discussion.
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.

Maybe you copied your stockfish commits from someone else too?
I will look into that.
Fulvio
Posts: 396
Joined: Fri Aug 12, 2016 8:43 pm

Re: Open Chess Game Database Standard

Post by Fulvio »

Sopel wrote: Mon Nov 15, 2021 2:47 pm I highly doubt this is the case. You have a pipeline with 4 stages. 1. Reading the file. 2. Parsing the file. 3. Creating the import statements. 4. Importing into DB
In my experience (SCID reads PGN files in 128kb chunks, automatically doubling the buffer up to 128MB if it encounters larger games) O.S. are pretty good at optimizing point 1. Moving it to a separate thread may increase complexity without improving the performance.