Maybe, before labeling someone a troll, you should also take into account that he has contributed a great deal to Stockfish's NNUE development, especially with regard to optimization.dangi12012 wrote: ↑Mon Nov 15, 2021 12:35 am A Thread pool for IO? Asynchronous IO runs on a single thread.
If you look at Sopels history - he is a forum troll so better not engage with him.
Open Chess Game Database Standard
Moderator: Ras
-
- Posts: 1784
- Joined: Wed Jul 03, 2019 4:42 pm
- Location: Netherlands
- Full name: Marcel Vanthoor
Re: Open Chess Game Database Standard
-
- Posts: 391
- Joined: Tue Oct 08, 2019 11:39 pm
- Full name: Tomasz Sobczyk
Re: Open Chess Game Database Standard
Real async IO is a pain in the ass to do in a portable way. A dedicated thread pool for IO works well enough and is simple to set up.dangi12012 wrote: ↑Mon Nov 15, 2021 12:35 amA Thread pool for IO? Asynchronous IO runs on a single thread.
If you look at Sopels history - he is a forum troll so better not engage with him.
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.
Maybe you copied your stockfish commits from someone else too?
I will look into that.
-
- Posts: 4718
- Joined: Wed Oct 01, 2008 6:33 am
- Location: Regensburg, Germany
- Full name: Guenther Simon
Re: Open Chess Game Database Standard
Quite the opposite, which was clear since your first posts here...dangi12012 wrote: ↑Mon Nov 15, 2021 12:35 am...
If you look at Sopels history - he is a forum troll so better not engage with him.
-
- Posts: 1524
- Joined: Wed Apr 21, 2010 4:58 am
- Location: Australia
- Full name: Nguyen Hong Pham
Re: Open Chess Game Database Standard
Could you tell me more? Do you have ideas to use them efficiently? It’s best if you could provide some codes or requests (for GitHub).
I have tested already memory map files (mmap). It is good, surprising me but it is just the second-best, behind the fastest one with a significant margin when its code is more complicated.
I don’t have problems with having some ideas and/or implementing them in general. However, sometimes I struggled with how to apply them efficiently. In this case, it is multi-threading. Both reading input (the PGN file) and writing to the database (inserting records) are almost sequencings. Thus logically multi-threading won't help much, especially when other processes (such as parsing PGN tags) are very fast too (not much work to share between threads).
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
The most features chess GUI, based on opensource Banksia - the chess tournament manager
-
- Posts: 391
- Joined: Tue Oct 08, 2019 11:39 pm
- Full name: Tomasz Sobczyk
Re: Open Chess Game Database Standard
The thing is your benchmark does no work other than reading, so synchronous solutions will not look worse than asynchronous ones. You should add some processing in between the reads to see the issue. For easy asynchronous IO std::async (https://en.cppreference.com/w/cpp/thread/async) is an easy solution that's good enough.phhnguyen wrote: ↑Mon Nov 15, 2021 12:31 pmCould you tell me more? Do you have ideas to use them efficiently? It’s best if you could provide some codes or requests (for GitHub).
I have tested already memory map files (mmap). It is good, surprising me but it is just the second-best, behind the fastest one with a significant margin when its code is more complicated.
I don’t have problems with having some ideas and/or implementing them in general. However, sometimes I struggled with how to apply them efficiently. In this case, it is multi-threading. Both reading input (the PGN file) and writing to the database (inserting records) are almost sequencings. Thus logically multi-threading won't help much, especially when other processes (such as parsing PGN tags) are very fast too (not much work to share between threads).
I highly doubt this is the case. You have a pipeline with 4 stages. 1. Reading the file. 2. Parsing the file. 3. Creating the import statements. 4. Importing into DBIn this case, it is multi-threading. Both reading input (the PGN file) and writing to the database (inserting records) are almost sequencings.
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.
Maybe you copied your stockfish commits from someone else too?
I will look into that.
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Open Chess Game Database Standard
As usual very bad trolling advice from sopel:phhnguyen wrote: ↑Mon Nov 15, 2021 12:31 pm I don’t have problems with having some ideas and/or implementing them in general. However, sometimes I struggled with how to apply them efficiently. In this case, it is multi-threading. Both reading input (the PGN file) and writing to the database (inserting records) are almost sequencings. Thus logically multi-threading won't help much, especially when other processes (such as parsing PGN tags) are very fast too (not much work to share between threads).
Heres the answer:
Of course you can parse a file multithreaded and very fast - and here is how:
You have to index the offsets and lengths of all games in a pgn without parsing. Just the raw offset with json or a text seeker. This can be insanely fast because maybe you only search for {} tokens or double newlines.
For Lichess DB I did this already and there i just search for doubled newlines with C++ memchr() on a memory mapped file.
Once you have this vector of offsets and lengths - you can easily spawn 32 Threads and parse each offset and length seperately in via a mapped file.
The trick is that seeking for simple tokens in phase 0 is faster than parsing pgn fully - so you just remember the pointeroffset where each game starts and on a second pass you parse in parallel. If your game is a class you can even generate fuctions to prepare a sql statement for insert. This can also be done in parallel.
DB inserts cannot be done in parallel with sqlite (with proper server sql dbs its faster) - but sqlite is threadsafe anyways so no worries and just commit transactions from multiple threads. Should also be a little bit faster because inserts dont stall.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
- Posts: 391
- Joined: Tue Oct 08, 2019 11:39 pm
- Full name: Tomasz Sobczyk
Re: Open Chess Game Database Standard
You really like to assume that memory is infinite and the IO is free, don't you. You DO REQUIRE asynchronicity aside from parallelism to make it work for files of all sizes in an efficient way.dangi12012 wrote: ↑Mon Nov 15, 2021 3:12 pmAs usual very bad trolling advice from sopel:phhnguyen wrote: ↑Mon Nov 15, 2021 12:31 pm I don’t have problems with having some ideas and/or implementing them in general. However, sometimes I struggled with how to apply them efficiently. In this case, it is multi-threading. Both reading input (the PGN file) and writing to the database (inserting records) are almost sequencings. Thus logically multi-threading won't help much, especially when other processes (such as parsing PGN tags) are very fast too (not much work to share between threads).Heres the answer:
Of course you can parse a file multithreaded and very fast - and here is how:
You have to index the offsets and lengths of all games in a pgn without parsing. Just the raw offset with json or a text seeker. This can be insanely fast because maybe you only search for {} tokens or double newlines.
For Lichess DB I did this already and there i just search for doubled newlines with C++ memchr() on a memory mapped file.
Once you have this vector of offsets and lengths - you can easily spawn 32 Threads and parse each offset and length seperately in via a mapped file.
The trick is that seeking for simple tokens in phase 0 is faster than parsing pgn fully - so you just remember the pointeroffset where each game starts and on a second pass you parse in parallel. If your game is a class you can even generate fuctions to prepare a sql statement for insert. This can also be done in parallel.
DB inserts cannot be done in parallel with sqlite (with proper server sql dbs its faster) - but sqlite is threadsafe anyways so no worries and just commit transactions from multiple threads. Should also be a little bit faster because inserts dont stall.
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.
Maybe you copied your stockfish commits from someone else too?
I will look into that.
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Open Chess Game Database Standard
Last time I will reply since its obvious you are a forum troll. Please lookup how memory mapped files work and how "memory is infinite" is wrong. You obviously dont know that memory mapped files dont get loaded into RAM. Each thread can read via a pointer and all IO is handled via pagefaults. You dont even need to copy from/to any buffers like with streaming IO - the OS maps the "buffer" = 1 page of memory directly into your virtual adress space.
Its the most efficient way for files on windows/linux - and you can even give hints to the OS if you will access randomly or sequentially.
Read both carefully - but I wont reply to your troll attempts anymore.
https://en.wikipedia.org/wiki/Memory_management_unit
https://docs.microsoft.com/en-us/dotnet ... pped-files
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
- Posts: 391
- Joined: Tue Oct 08, 2019 11:39 pm
- Full name: Tomasz Sobczyk
Re: Open Chess Game Database Standard
I'm done with your straw man arguments. You're unable to hold a discussion.dangi12012 wrote: ↑Mon Nov 15, 2021 4:24 pmLast time I will reply since its obvious you are a forum troll. Please lookup how memory mapped files work and how "memory is infinite" is wrong. You obviously dont know that memory mapped files dont get loaded into RAM. Each thread can read via a pointer and all IO is handled via pagefaults. You dont even need to copy from/to any buffers like with streaming IO - the OS maps the "buffer" = 1 page of memory directly into your virtual adress space.
Its the most efficient way for files on windows/linux - and you can even give hints to the OS if you will access randomly or sequentially.
Read both carefully - but I wont reply to your troll attempts anymore.
https://en.wikipedia.org/wiki/Memory_management_unit
https://docs.microsoft.com/en-us/dotnet ... pped-files
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.
Maybe you copied your stockfish commits from someone else too?
I will look into that.
-
- Posts: 396
- Joined: Fri Aug 12, 2016 8:43 pm
Re: Open Chess Game Database Standard
In my experience (SCID reads PGN files in 128kb chunks, automatically doubling the buffer up to 128MB if it encounters larger games) O.S. are pretty good at optimizing point 1. Moving it to a separate thread may increase complexity without improving the performance.