PGN for dummies

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
hgm
Posts: 28396
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: PGN for dummies

Post by hgm »

dangi12012 wrote: Fri Nov 12, 2021 1:24 pm
hgm wrote: Fri Nov 12, 2021 8:02 am Parsing 80GB of PGN is not slower than reading 80GB of anything from disk, right? It is slow because 80GB is a lot, not because it is PGN. Binary formats would be more compact, and could thus be faster.
Parsing with 500kb/s or 500MB/s makes a difference.
One is the first implementation - the other one would be an optimized parser.

If its binary - we wouldnt need to parse anything and could mount it as a memory mapped file into the memory space directly.
That is what I said: binary formats are superior.

XBoard has a move parser that is quie a bit more clever than just parsing SAN. It also understands PSN (like P-7g), Shogi kifu, Xiangqi notation. And it would all be very fast, if it were not for the stupid back-end: the move generator it calls for disambiguating the moves is called too often, and quite slow. It used to be worse before (generating all fully legal moves always). At least I reduced that to only generating pseudo-legals of the pieces of the indicated type, and only test legality when this is ambiguous.

But for orthodox Chess the disambiguation can be lightning fast. Just keep a piece list, holding the locations of pieces of each type as a stack. For each piece of the indicated type, just test whether the listed piece satisfies any provided disambiguators, and has a pseudo-legal move to the given to-square on an empty board (from a table indexed by the square difference). Only if multiple pieces match you have to do more.
Fulvio
Posts: 396
Joined: Fri Aug 12, 2016 8:43 pm

Re: PGN for dummies

Post by Fulvio »

hgm wrote: Fri Nov 12, 2021 3:17 pm At least I reduced that to only generating pseudo-legals of the pieces of the indicated type, and only test legality when this is ambiguous.
Usually it is still necessary to check that the move is legal (the king is not left in check) even if it is not ambiguous.
For example to catch this:

Code: Select all

1.e4 e5 2.d4 Bb4+ 3.c4 *
Checking the legality of the moves is pretty slow.
In SCID it is about 10% of the total time it takes to import a PGN, and considering that they are usually all legal, it is a waste of time.
It must be said, however, that the code is not optimized. For example it does not use the last move to keep a list of the pieces that give check (if two pieces give check, only a king's move can be valid, if a piece that gives check is not captured or blocked, only a king's move can be valid, etc ..).
User avatar
hgm
Posts: 28396
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: PGN for dummies

Post by hgm »

Fulvio wrote: Fri Nov 12, 2021 8:12 pm Usually it is still necessary to check that the move is legal (the king is not left in check) even if it is not ambiguous.
Why would you want to do that? That is not parsing, but performing some task on it. If a PGN wants to move Q from d1 to a2, and there is only one Queen of the player on move... Well, then it is apparently a game with an illegal move. So what?
User avatar
phhnguyen
Posts: 1525
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: PGN for dummies

Post by phhnguyen »

For me, one of the main problems with PGN is that we don’t have any (official) standard to store computing information (such as depth, score, nodes, time…). Those information typically is stored as comments (not the right way), with several de facto standards. All make hard and slow to extract.
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
Fulvio
Posts: 396
Joined: Fri Aug 12, 2016 8:43 pm

Re: PGN for dummies

Post by Fulvio »

hgm wrote: Fri Nov 12, 2021 8:45 pm Why would you want to do that?
In my case:
- to emit a warning message
- many things requires as a precondition a legal position. For example engines: checking the legality of the moves every time you send a position would be much slower. Checking their validity only when they are inserted or imported is a very common solution.
Fulvio
Posts: 396
Joined: Fri Aug 12, 2016 8:43 pm

Re: PGN for dummies

Post by Fulvio »

phhnguyen wrote: Sat Nov 13, 2021 1:45 am For me, one of the main problems with PGN is that we don’t have any (official) standard to store computing information (such as depth, score, nodes, time…). Those information typically is stored as comments (not the right way), with several de facto standards. All make hard and slow to extract.
I totally agree.
The problem in my opinion is that there are different use cases and there is no single format that works for everyone.
For example, a typical scenario is the player analyzing his game and annotating the engine score:

Code: Select all

Bxh7 {+1.50 missed the sacrifice!}
Perfectly readable and portable. For a GUI to extract the evaluation (displaying it elsewhere) and display only the comment would be a mistake. Because maybe the user wants to print the pgn, or send a screenshot, or he's working on a book, etc ...

On the other hand, let's say when annotating a match between engines, you want much more information and are less interested in readability:

Code: Select all

{[%eval 0.65 depth 29 seldepth 34 wdl 230 726 44] maybe more text}
In this case the information should be extracted and shown more clearly.

Another thing to consider is that the user must have the ability to modify it.
I believe that in all GUIs there is the possibility to edit a comment.
Let's say you keep them separated:

Code: Select all

<eval 0.65 depth 29 seldepth 34 wdl 230 726 44> {maybe more text}
now you need to provide a new way to change that information.
User avatar
hgm
Posts: 28396
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: PGN for dummies

Post by hgm »

Fulvio wrote: Sat Nov 13, 2021 8:37 am
hgm wrote: Fri Nov 12, 2021 8:45 pm Why would you want to do that?
In my case:
- to emit a warning message
- many things requires as a precondition a legal position. For example engines: checking the legality of the moves every time you send a position would be much slower. Checking their validity only when they are inserted or imported is a very common solution.
But that has nothing to do with parsing the input format. Long algebraic or binary formats can also specify illegal moves, or moves that are not even pseudo-legal. If you want to do this kind of processing, then chess games are not trivial to process. Changing the input format cannot cure that.

Whether it is better to check legality of all moves or positions on insertion in the database or on retrieval depends on how many times on average positions or games will be retrieved. I could very well imagine that this number on average is much smaller than 1. In which case it would be more efficient to check legality on retrieval.
User avatar
hgm
Posts: 28396
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: PGN for dummies

Post by hgm »

Fulvio wrote: Sat Nov 13, 2021 9:33 am
phhnguyen wrote: Sat Nov 13, 2021 1:45 am For me, one of the main problems with PGN is that we don’t have any (official) standard to store computing information (such as depth, score, nodes, time…). Those information typically is stored as comments (not the right way), with several de facto standards. All make hard and slow to extract.
I totally agree.
The problem in my opinion is that there are different use cases and there is no single format that works for everyone.
For example, a typical scenario is the player analyzing his game and annotating the engine score:

Code: Select all

Bxh7 {+1.50 missed the sacrifice!}
Perfectly readable and portable. For a GUI to extract the evaluation (displaying it elsewhere) and display only the comment would be a mistake. Because maybe the user wants to print the pgn, or send a screenshot, or he's working on a book, etc ...

On the other hand, let's say when annotating a match between engines, you want much more information and are less interested in readability:

Code: Select all

{[%eval 0.65 depth 29 seldepth 34 wdl 230 726 44] maybe more text}
In this case the information should be extracted and shown more clearly.

Another thing to consider is that the user must have the ability to modify it.
I believe that in all GUIs there is the possibility to edit a comment.
Let's say you keep them separated:

Code: Select all

<eval 0.65 depth 29 seldepth 34 wdl 230 726 44> {maybe more text}
now you need to provide a new way to change that information.
There also is the issue of backward compatibility. It would be inconvenient if games that store annotnations from cumputer analysis would not be readable by software that was not be specefically made to understand this annotation. Defining the extension of the format in such a way that it merely a refinement of the definition of a comment solves that in a convenient way.
Sopel
Posts: 391
Joined: Tue Oct 08, 2019 11:39 pm
Full name: Tomasz Sobczyk

Re: PGN for dummies

Post by Sopel »

hgm wrote: Fri Nov 12, 2021 8:45 pm
Fulvio wrote: Fri Nov 12, 2021 8:12 pm Usually it is still necessary to check that the move is legal (the king is not left in check) even if it is not ambiguous.
Why would you want to do that? That is not parsing, but performing some task on it. If a PGN wants to move Q from d1 to a2, and there is only one Queen of the player on move... Well, then it is apparently a game with an illegal move. So what?
Then such a PGN is invalid (it's explicitely mentioned in the standard that the moves must be legal) and should be rejected.

A standard does not only specify what is allowed, but also what is disallowed. And a conforming implementation must accept valid input and REJECT incorrect input. Accepting valid PGN input is easy, rejecting invalid PGN input is harder.
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.

Maybe you copied your stockfish commits from someone else too?
I will look into that.
User avatar
hgm
Posts: 28396
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: PGN for dummies

Post by hgm »

Again, that has nothing to do with parsing. It is a specific processing task on the games. And again, it doesn't make the slightest difference whether the moves of the game are written as SAN or in long algebraic notation when you want to perform that task.

In fact I think this is 'out of bounds' for a format specification. I will decide myself whatever I want to encode in this format, thank you very much! As a user I should be able to decide whether I want to reject games with illegal moves (or on the contrary want to select those...). As a developer I would make sure the user has this choice. Who says I am going to feed it games with invalid moves? The games might already have been checked elsewhere. Like during creation, if they were engine-engine games. Then it would be a pure waste of time to check them again.

Even when there could be games with illegal moves in the input, I would prefer the positions in those games to go into my database, rather than be rejected. Most, if not all of these positions would be perfectly OK (just not reachable in the specified way). Chances a position from those games would ever match my position search are very slim (as they are for any individual game), and if I have reason to reject positions from games with illegal moves, I could judge that whenever I get a match, and trigger an action I deem appropriate at thye time (depending on the purpose for which I retrieve it). You really think I would make my software less useful because a specification dictates me how I should handle violations???