A lazy way to handle PGN files

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
stegemma
Posts: 859
Joined: Mon Aug 10, 2009 10:05 pm
Location: Italy
Full name: Stefano Gemma

A lazy way to handle PGN files

Post by stegemma »

You know, I'm lazy :) so because I have to create a book for satana, I don't want to write a full PGN interpreter. the problem is that there are so many ways to write a single move:

Code: Select all

d4
exd
NxR
N2xRe4
...
and write a full software that handles all the stuffs and still checks for move validity is very annoying (or interesting, it depends on the point of view) so I've used this simple (and lazy!) method:

1) I've added to my move generator the functions to extract all the generated moves at ply 1 in full format, for sample:

Code: Select all

Ne4xPf6
2) I've wrote a function that convert any single move in full format with jolly character '?', for sample:

Code: Select all

d4 -> P??-d4
dxc -> Pd??xPc?
NxR -> N??xR??
NexQ -> Ne?xQ??
O-O -> Ke?-g?
...and so on...
3) I simply loop through all the full format legal moves and I compare using jolly: '?' is equal to any character in the same position; this way is easy to find that the move P??-d4 can be Pd2-d4 or Pd3-d4

This system has the advantage that I don't have to duplicate the move generator just to find what piece can move (dxc) or where it is (Re4) or where it goes (QxR). The same system can maybe used for any legal move generator in chess variants.

Just to know the detail, here's my code:

Code: Select all

bool clsGame::SetGame(const clsString &PGN, clsEngine *pEngine)
{
 bool bRet = true;
 nMoves = 0;
 #define ILLEGAL_MOVE { bRet = false; break; }
 result = pgnUnknown;
 firstmove = 0;
 clsCStrings ss(PGN, '\n');
 clsString sOriginal;
 clsString s;
 clsString sParameter;
 clsString sValue;

 enum enPGNStates { pgnHeader, pgnMoves } state = pgnHeader;
 for (int i = 0; i < ss.Count(); i++)
 {
  s = ss.cGet(i);

  switch (state)
  {
   case pgnHeader:
    if (s[0] == '[')
    {
     int iFound = 1; // salta '['
     sParameter = s.Token(' ', iFound);
     sValue = s.TokenDelimited('\"', '\"', iFound);
     ssHeader.AddString(sParameter, sValue);
     if (sParameter == "RESULT")
     {
      SetResult(sValue);
     }
     break;
    }
    state = pgnMoves;
   case pgnMoves:
    sOriginal += s;
    sOriginal += " ";
    break;
  }
 }
 sOriginal.ReplaceThis("\n", " ")
       .ReplaceThis("\r", " ")
      .ReplaceThis("  ", " ")
      .ReplaceThis("  ", " ")
      .ReplaceThis("+", "")
      .ReplaceThis("#", "")
      ;

 pEngine->Init(STANDARD_CHESS_FEN);
 for (int iFound = nMoves = 0; iFound >= 0 && bRet;)
 {
  clsString s = sOriginal.Token(' ', iFound).Trim();
  #if DEBUG_PGN
   sfout.Push(s + " -> ", false);
  #endif
  if (lgsIsBetween(s[0], '0', '9')) // 1. 2. ... N.
  {
   if (!firstmove) firstmove = s.GetInt(1);
   #if DEBUG_PGN
    sfout.Push(" move.");
   #endif
  }
  else
  {
   if (s[0] == '{')
   {
    clsString sComment = s.xMid(1, s.Length() - 2);
    //ssComments.AddString(clsString(nMoves), sComment);
   }
   else
   {
    if (s[0] == '1' || s[0] == '0')
    {
     SetResult(s);
    }
    else
    {
     if (pEngine->Perft(1) == 0) ILLEGAL_MOVE;
     clsString sLegalMoves = pEngine->GetMoves(true);
     clsCStrings ssLegals(sLegalMoves, ' ');

     int idx = 0; // 0) test pezzo sorgente
     if (s[idx] == 'O')
     {
      if (s == "O-O") s = "Ke?-g?";
      else if (s == "O-O-O") s = "Ke?-c?";
      else ILLEGAL_MOVE;
     }
     else
     {
      if (s.Length() == 2) s.InsertChar(0, 'P'); // d4
      else if (!lgsIsBetween(s[idx], 'A', 'Z')) s.InsertChar(0, '?');

      if (s.Length() == 3 && s[1]!='x') // Pd4 (!dxc !CxT...) -> P??d4
      {
       s.Insert(1, "??");
       idx += 2;
      }
      else
      {
       ++idx; // 1) test colonna sorgente
       if (s[idx] == 'x') // Cx...
       {
        s.Insert(idx, "??");
        ++idx;
       }
       else
       {
        if (!lgsIsBetween(s[idx], 'a', 'h')) s.InsertChar(idx, '?');

        ++idx; // 2) test riga sorgente
        if (s[idx] == 'x') // Cdx...
        {
         s.Insert(idx, "?");
        }
        else
        {
         if (!lgsIsBetween(s[idx], '1', '8')) s.InsertChar(idx, '?'); // Tad8
        }
       }
      }

      ++idx; // 3) test pezzo destinazione
      if (s[idx] == 'x')
      {
       ++idx; // presa
       if (!lgsIsBetween(s[idx], 'A', 'Z')) // pezzo preso
       {
        s.InsertChar(idx, '?');
       }
      }
      else
      {
       s.InsertChar(idx, '-');
      }

      ++idx; // 4) test colonna destinazione
      if (!lgsIsBetween(s[idx], 'a', 'h')) // ...xT
      {
       s.InsertChar(idx, '?');
      }

      ++idx; // 5) test riga destinazione
      if (!lgsIsBetween(s[idx], '1', '8'))
      {
       s.InsertChar(idx, '?');
      }
     }
     // cerca la mossa
     bool bFound = false;
     for (int i = 0; i < ssLegals.Count(); i++)
     {
      clsString sLegal = ssLegals.cGet(i);
      bFound = true;
      for (int c = 0; c < s.Length() && bFound; c++)
      {
       if (s[c] == '?') continue;
       bFound = bFound && s[c] == sLegal[c];
      }
      if (bFound)
      {
       ++nMoves;
       clsString sCleanMove(sLegal);
       sCleanMove.Validate("abcdefgh12345678");
       sCleanMove += sLegal.TokenRight('=');
       sMoves += sCleanMove + " ";
       int val = 0;
       switch(result)
       {
        case pgnWhiteWin: ++val; break;
        case pgnBlackWin: --val; break;
        case pgnDraw:
        case pgnUnknown:
         break;
       }
       clsString sFEN = pEngine->GetFEN();
       if (int *pVal = book.GetItemByKey(sFEN)) *pVal += val;
                else book.Add(sFEN, new int(val));
       #if DEBUG_PGN
        sfout.Push(s + " -> " + sLegal + " -> " + sCleanMove);
       #endif
       pEngine->UserMove(sCleanMove);
       break;
      }
     }
     if (!bFound)
     {
      #if DEBUG_PGN
       sfout.Push(" : LEGAL MOVES: " + sLegalMoves);
      #endif
      ILLEGAL_MOVE;
     }
     #if DEBUG_PGN
      pEngine->DebugBoard();
     #endif
    }
   }
  }
 }
 return bRet;
}
Of course I use my own string libraries so the code should be converted to std::string but I hope that it can helps somebody.
Author of Drago, Raffaela, Freccia, Satana, Sabrina.
http://www.linformatica.com
mvk
Posts: 589
Joined: Tue Jun 04, 2013 10:15 pm

Re: A lazy way to handle PGN files

Post by mvk »

stegemma wrote:You know, I'm lazy :) so because I have to create a book for satana, I don't want to write a full PGN interpreter. the problem is that there are so many ways to write a single move:

Code: Select all

d4
exd
NxR
N2xRe4
...
and write a full software that handles all the stuffs and still checks for move validity is very annoying (or interesting, it depends on the point of view) so I've used this simple (and lazy!) method:

1) I've added to my move generator the functions to extract all the generated moves at ply 1 in full format, for sample:

Code: Select all

Ne4xPf6
2) I've wrote a function that convert any single move in full format with jolly character '?', for sample:

Code: Select all

d4 -> P??-d4
dxc -> Pd??xPc?
NxR -> N??xR??
NexQ -> Ne?xQ??
O-O -> Ke?-g?
...and so on...
That is the same method as in mscp, and I have copied it in the chessmoves Python extension as that is based on a cleaned-up mscp. I had to do something extra to accept "bxc" when both "b2xc3" and "Nb1xc3" are legal moves.
[Account deleted]
User avatar
hgm
Posts: 28461
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: A lazy way to handle PGN files

Post by hgm »

It is basically also how the WinBoard SAN parser works: everything that was not explicitly specified becomes a wildcard, and then you generate legal moves to see how many match. If it is not exactly one it is an illegal move or an ambiguous move.
User avatar
stegemma
Posts: 859
Joined: Mon Aug 10, 2009 10:05 pm
Location: Italy
Full name: Stefano Gemma

Re: A lazy way to handle PGN files

Post by stegemma »

hgm wrote:It is basically also how the WinBoard SAN parser works: everything that was not explicitly specified becomes a wildcard, and then you generate legal moves to see how many match. If it is not exactly one it is an illegal move or an ambiguous move.
Thanks, I've not take a look at WinBoard (or mscp) sources because even when I'm lazy I like to try by myself... and reinvent the hot water anytime :)

I just have to extract some statistic about moves from large pgn files, so I suppose that they are enough correct at the origin. At present, I have extracted about 70 Mb of FENs position with score and I hope that I can build something useful for my engine.

SAN is not a move format for software but for humans... that's why parsing the format itself is so complicated.
Author of Drago, Raffaela, Freccia, Satana, Sabrina.
http://www.linformatica.com