Why my perft nps lower than usual nps ?

Discussion of chess software programming and technical issues.

Moderator: Ras

Chan Rasjid
Posts: 588
Joined: Thu Mar 09, 2006 4:47 pm
Location: Singapore

Why my perft nps lower than usual nps ?

Post by Chan Rasjid »

Hello,

I just cannot trace a simple bug and I don't like it. Bugs sometimes indicate something else more serious amiss elsewhere.

The following is my perft run from the start position:

perft 1 20, nps(0)
perft 2 400, nps(440000)
perft 3 8902, nps(256894)
perft 4 197281, nps(387057)
perft 5 4865609, nps(147223)
^C
rasjid@debian:~/cowrie/optimized/src$

The perft numbers are correct, but the nps is far below that of the release mode nps! The release mode nps is about 1.2 - 1.7 million with a material+pst only evaluation on my athlon64 3500+

I have tried replacing the make/umake, the timers, the generators and turning of the interrupts but all to no avail.

Can someone suggest what the likely problem is.

Code: Select all

static u32 perftnodes;

static void search_perft(int depth, int side){
   move_t movestack[256], *m;
   check_t check_s[1];
   is_incheck(check_s, side);
   m = gen_all(movestack, check_s, side);
   if (m != NULL);
   else {
      return;
   }
   ++nodes;
   if (depth == 0) {
      ++perftnodes;
      return;
   }
   for (m = movestack; *m; ++m){
      makeperft(*m, side);
      search_perft(depth - 1, side ^ 1);
      unmakeperft(*m, side);
   }

   return;
};


static void perft(const char *fen, int n){
   move_t movestack[256], *m;
   check_t check_s[1];
   int ms, side, depth, nps;
   struct timeval start;   
   printf("perft on\n\n");
   stage = 0;
   side = setfenboard(fen);
   is_incheck(check_s, side);
   m = gen_all(movestack, check_s, side);
   if (m == NULL) {
      return;
   }
   nodes = 0;
   gettimeofday(&start, NULL);
   for (depth = 1; depth <= n; ++depth){
      perftnodes = 0;
      for (m = movestack; *m; ++m){
         makeperft(*m, side);
         search_perft(depth - 1, side ^ 1);
         unmakeperft(*m, side);
      }
      ms = msecfrom(&start);
      nps = 0;
      if (ms > 0) {
         nps = (nodes * 1000 / ms);
      }
      printf("perft %d %lu, nps(%d)\n", depth, perftnodes, nps);
   }
}

int main(int argc, char *argv[])
{
   char line[256], command[256];
   setbuf(stdin, NULL);
   setbuf(stdout, NULL);
   
   init_rotate();
   init_pst();
   init_pawn_shelter_tables();
   init_material_table();
   allocatehash(40);
      
#if PERFT   
   perft(FEN_STARTPOS, 20);
   goto RET;
#endif
...   
}
Rasjid
kongsian
Posts: 46
Joined: Thu Jun 15, 2006 11:21 am

Re: Why my perft nps lower than usual nps ?

Post by kongsian »

Did you perhaps change your Makefile and removed -O3?
Chan Rasjid
Posts: 588
Joined: Thu Mar 09, 2006 4:47 pm
Location: Singapore

Re: Why my perft nps lower than usual nps ?

Post by Chan Rasjid »

kongsian wrote:Did you perhaps change your Makefile and removed -O3?
No!

It is this simple and it has me caught.

BTW there is a bug. Nodes in perft can be very high. I changed all nodes to :

Code: Select all

static u64 perftnodes;
static u64 pnodes;

      if (ms > 0) {
         nps = (pnodes * 1000 / ms);
      }

Rasjid
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Why my perft nps lower than usual nps ?

Post by Sven »

Please fix also the printf() statement after having changed from 32 to 64 bit for node counters. %lu is no longer appropriate for 'perftnodes', take %llu or some "tricky portable macro".

However, this does not explain your first numbers. For perft(5) at least, the printed nps number was simply wrong due to overflow with the 32 bit version (in "nodes * 1000 / ms", the "nodes * 1000" part overflows), should be better now after the 64 bit fix (i.e., not differing significantly from nps for perft(4)).

Now the perft(2) .. perft(4) numbers remain. What does your makeperft() and unmakeperft() do, compared to make() and unmake() for your regular search()?

How is your detection of illegal moves realized? Is it part of gen_all()?

Sven
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Why my perft nps lower than usual nps ?

Post by Sven »

You are generating moves before checking for depth == 0, making it useless most of the time. Since move generation is probably about the only expensive operation per node in perft, this increases the overall effort per node by about one branching factor. I would better do it this way in search_perft():

1. Return on illegal position
2. Increment nodes
3. On depth == 0, increment perftnodes and return
4. Test for being in check
5. Generate moves
6. Loop over moves

Sven
Chan Rasjid
Posts: 588
Joined: Thu Mar 09, 2006 4:47 pm
Location: Singapore

Re: Why my perft nps lower than usual nps ?

Post by Chan Rasjid »

Sven Schüle wrote:Please fix also the printf() statement after having changed from 32 to 64 bit for node counters. %lu is no longer appropriate for 'perftnodes', take %llu or some "tricky portable macro".

However, this does not explain your first numbers. For perft(5) at least, the printed nps number was simply wrong due to overflow with the 32 bit version (in "nodes * 1000 / ms", the "nodes * 1000" part overflows), should be better now after the 64 bit fix (i.e., not differing significantly from nps for perft(4)).

Now the perft(2) .. perft(4) numbers remain. What does your makeperft() and unmakeperft() do, compared to make() and unmake() for your regular search()?

How is your detection of illegal moves realized? Is it part of gen_all()?

Sven
%lu is long unsigned int (64 bit) with 64 bit gcc - has no warning. %llu has warning.

I tried with my regular make/unmake which does more, eg zobrist, mat[], pst, etc. But about the same nps.

Illegal move is ok. When a king capture is detected search returns ( m == NULL).

It is the same after the 64 bit integer nodes fix.

I adapted my autoplay() to release mode which runs from the same linux terminal. nps is more than a million:
rasjid@debian:~/cowrie/optimized/src$ ./cowrie

Cowrie Chess Version 1.0, 3rd Feb 2010

Auto Play Start
game 1 move c2c4
-+ nps(727748)- nps(1410098)+ nps(1365333)- nps(1350329)+ nps(1335652)- nps(1404342)+ nps(1395340)- nps(1358673)+ nps(1328808) pc(30) - nps(1372960)+ nps(1438595)- nps(1509200)+ nps(1378461)- nps(1260307)+ nps(1244862)- nps(1305318)+ nps(1291349)- nps(1354583)+ nps(1285019) pc(28) - nps(1304458)+ nps(1146880)- nps(1312530)+ nps(1241212)- nps(1282210)+ nps(1199838)- nps(1304458)+ nps(1317843)- nps(1442909)+ nps(1432681) pc(28) - nps(1365333)+ nps(1365333)- nps(1434757)+ nps(1267809)- nps(1459960)+ nps(1618778)- nps(1413847)+ nps(1765940)- nps(1638400)+ nps(1618778) pc(28) - nps(1603724)+ nps(1612255)- nps(1641968)+ nps(1531214)- nps(1600000)+ nps(1552985)- nps(1519046)+ nps(1536000)- nps(1506574)+ nps(1503845) pc(26) - nps(1383064)+ nps(1533477)- nps(1454716)+ nps(1564301)- nps(1542023)+ nps(1530828)- nps(1597104)+ nps(1544041)- nps(1412413)+ nps(1473693) pc(22) - nps(1451154)+ nps(1489454)- nps(1373225)+ nps(1407214)- nps(1389286)+ nps(1312820)- nps(1346630)+ nps(1383064)- nps(1373508)+ nps(1455157) pc(22) - nps(1414095)+ nps(1445647)- nps(1381783)+ nps(1583786)- nps(1454201)+ nps(1571068)- nps(1557157)+ nps(1471378)- nps(1392640)+ nps(1587200) pc(22) - nps(1333581)+ nps(1503594)- nps(1609142)+ nps(1629821)- nps(1612598)+ nps(1456355)- nps(1225574)+ nps(1421552)- nps(1483034)+ nps(1382400) pc(19) - nps(1386338)+ nps(1309377)- nps(1388474)+ nps(1346630)- nps(1069611)+ nps(1084235)- nps(1024000)+ nps(1024000)- nps(1210181)+ nps(1071850) pc(17) - nps(1120290)+ nps(1257826)- nps(1257826)+ nps(1228800)- nps(1228800)+ nps(1228800)- nps(1228800)+ nps(1228800)
search_full failed / mate
score -7994

game 2 move d2d3
-+ nps(1335652)- nps(1405322)+ nps(1382400)- nps(1365333)+ nps(1296202)- nps(1445647)+ nps(1288050)- nps(1483034)+ nps(1424695)^C
rasjid@debian:~/cowrie/optimized/src$
Just weird!

Rasjid
Chan Rasjid
Posts: 588
Joined: Thu Mar 09, 2006 4:47 pm
Location: Singapore

Re: Why my perft nps lower than usual nps ?

Post by Chan Rasjid »

Sven Schüle wrote:You are generating moves before checking for depth == 0, making it useless most of the time. Since move generation is probably about the only expensive operation per node in perft, this increases the overall effort per node by about one branching factor. I would better do it this way in search_perft():

1. Return on illegal position
2. Increment nodes
3. On depth == 0, increment perftnodes and return
4. Test for being in check
5. Generate moves
6. Loop over moves

Sven
I have to genmove() before testing for depth == 0 as I use the method of capture-king in the next node to detect that a node is illegal.
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Why my perft nps lower than usual nps ?

Post by Sven »

Chan Rasjid wrote:
Sven Schüle wrote:You are generating moves before checking for depth == 0, making it useless most of the time. Since move generation is probably about the only expensive operation per node in perft, this increases the overall effort per node by about one branching factor. I would better do it this way in search_perft():

1. Return on illegal position
2. Increment nodes
3. On depth == 0, increment perftnodes and return
4. Test for being in check
5. Generate moves
6. Loop over moves

Sven
I have to genmove() before testing for depth == 0 as I use the method of capture-king in the next node to detect that a node is illegal.
Then you have the explanation you were asking for. Your method implies that your perft is slower in terms of nps than your regular search, because you always do full move generation at the leaves of perft. Your regular search does not suffer from that because at depth == 0 you make qsearch() - I assume - where you generate only captures, but these are sufficient to detect illegal moves via king capture.

So a little improvement for your perft nps would be to call gen_captures() instead of gen_all() if depth == 0.

Sven
Chan Rasjid
Posts: 588
Joined: Thu Mar 09, 2006 4:47 pm
Location: Singapore

Re: Why my perft nps lower than usual nps ?

Post by Chan Rasjid »

Sven,

You are spot on. I found it independently, but too slow.

Code: Select all

static void search_perft(int depth, int side){
   move_t movestack[256], *m;
   check_t check_s[1];
   
   if (depth == 0) {
   /*
      This gen_all(movestack, check_s, side) is fairly expensive as it generates also the neutral moves which  has costly move ordering - maybe with decreasing returns !  
      
      m = gen_all(movestack, check_s, side);
      if (m != NULL);
      else {
         return;
      }

   */
   
      // call this to validate the legality of this node.
      if ((m = genqs(movestack, side, NULL)) != NULL){
         ++pnodes;
         ++perftnodes;
         return;
      }else {/* invalid, capt King */
         return;
      }
   }

   is_incheck(check_s, side);
   m = gen_all(movestack, check_s, side);
   if (m != NULL);
   else {/* invalid, capt King */
      return;
   }
   ++pnodes;
   if (depth == 0) {
      ++perftnodes;
      return;
   }
   for (m = movestack; *m; ++m){
      makeperft(*m, side);
      search_perft(depth - 1, side ^ 1);
      unmakeperft(*m, side);
   }

   return;
}
nps now for athlon64 3500+:
perft 1 20, nps(0)
perft 2 400, nps(0)
perft 3 8902, nps(2440500)
perft 4 197281, nps(1914734)
perft 5 4865609, nps(3153593)
perft 6 119060324, nps(4132615)
perft 7 3195901860, nps(4294715)
So it could mean my ordering of neutral moves is too elaborate.

Thanks
Rasjid
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Why my perft nps lower than usual nps ?

Post by Sven »

Chan Rasjid wrote:Sven,

You are spot on. I found it independently, but too slow.

Code: Select all

...
So it could mean my ordering of neutral moves is too elaborate.
I can think of a possible reason why your original version failed. It is a common mistake, or let's call it "bad programming habit", to write big functions that serve for more than one purpose at once. In your case, your gen_all() function includes both move generation *and* move ordering, and it is also used for illegal move detection. By separating these issues you gain much more flexibility. It may turn out that the performance of perft() is much less crucial than the performance of your regular search, but nevertheless it can often be considered a design flaw to put too much functionality into one function, simply because that function becomes too specialized and is not easy to reuse at other places. Every time you want to use gen_all() you have to buy everything it includes, even if you don't like it. In your case, gen_all() was in fact not reusable for perft without accepting major drawbacks due to useless generation and ordering of irrelevant neutral moves.

As to your new implementation, I would still consider it as kind of an "intermediate" solution since I expect that your genqs() function includes move ordering, too, which has zero meaning for perft due to absence of alpha-beta. I would separate the ordering and the generation parts for both gen_all() and genqs(). Furthermore, I still think that legality check can be done much simpler and faster than by generating all capture moves. But maybe you are currently in a situation where you want to postpone "optimizations". (Having made perft faster by a factor of >10 may indeed be more than an "optimization", in this case I think you simply use a superior algorithm.) So separating ordering from generation can be useful, using a faster legality check is kind of "optional".

For your regular search, can you explain briefly how gen_all() works? Do you generate moves piece by piece and calculate ordering data immediately when a move is generated, stopping in the middle when generating a king capture?

Sven