A perft() benchmark

Discussion of chess software programming and technical issues.

Moderator: Ras

Macintosh
Posts: 13
Joined: Wed Jun 26, 2013 8:23 pm
Location: Jena, Germany

Re: A perft() benchmark

Post by Macintosh »

Hi all,

this is my very first post, although I was following this forum for quite some time. But since I am currently working on the perft-part of my engine, I felt the urge to share my results. Comments welcome.

Here is what I got on an i7-3770S @ 3.1GHz:

1 Thread, no Hash, bulk-counting:
(ignore the CPU-usage output - still work in progress)

Code: Select all

setoption name Use Hash value false
setoption name Max Threads value 1

performance tree 7 split
Testing performance of tree ...
  a2a3: 106,743,106N.
  a2a4: 137,077,337N.
  b2b3: 133,233,975N.
  b2b4: 134,087,476N.
  c2c3: 144,074,944N.
  c2c4: 157,756,443N.
  d2d3: 227,598,692N.
  d2d4: 269,605,599N.
  e2e3: 306,138,410N.
  e2e4: 309,478,263N.
  f2f3: 102,021,008N.
  f2f4: 119,614,841N.
  g2g3: 135,987,651N.
  g2g4: 130,293,018N.
  h2h3: 106,678,423N.
  h2h4: 138,495,290N.
  b1a3: 120,142,144N.
  b1c3: 148,527,161N.
  g1f3: 147,678,554N.
  g1h3: 120,669,525N.
Total number of nodes: 3,195,901,860. Time: 3.479s.
0ns/N (raw: 1ns/N, CPU load: 0.0%).
Performance test completed.

performance tree 8 split
Testing performance of tree ...
  a2a3: 2,863,411,653N.
  a2a4: 3,676,309,619N.
  b2b3: 3,579,299,617N.
  b2b4: 3,569,067,629N.
  c2c3: 3,806,229,124N.
  c2c4: 4,199,667,616N.
  d2d3: 6,093,248,619N.
  d2d4: 7,184,581,950N.
  e2e3: 8,039,390,919N.
  e2e4: 8,102,108,221N.
  f2f3: 2,728,615,868N.
  f2f4: 3,199,039,406N.
  g2g3: 3,641,432,923N.
  g2g4: 3,466,204,702N.
  h2h3: 2,860,408,680N.
  h2h4: 3,711,123,115N.
  b1a3: 3,193,522,577N.
  b1c3: 3,926,684,340N.
  g1f3: 3,937,354,096N.
  g1h3: 3,221,278,282N.
Total number of nodes: 84,998,978,956. Time: 1.499m.
0ns/N (raw: 1ns/N, CPU load: 0.0%).
Performance test completed.
Using 4 threads (fixed split depth of 7) and 8GB hash (16B each entry):

Code: Select all

setoption name Hash value 8192
performance tree 7 split
Testing performance of tree ...
  a2a3: 106,743,106N.
  a2a4: 137,077,337N.
  b2b3: 133,233,975N.
  b2b4: 134,087,476N.
  c2c3: 144,074,944N.
  c2c4: 157,756,443N.
  d2d3: 227,598,692N.
  d2d4: 269,605,599N.
  e2e3: 306,138,410N.
  e2e4: 309,478,263N.
  f2f3: 102,021,008N.
  f2f4: 119,614,841N.
  g2g3: 135,987,651N.
  g2g4: 130,293,018N.
  h2h3: 106,678,423N.
  h2h4: 138,495,290N.
  b1a3: 120,142,144N.
  b1c3: 148,527,161N.
  g1f3: 147,678,554N.
  g1h3: 120,669,525N.
Total number of nodes: 3,195,901,860. Time: 1.045s.
0ns/N (raw: 0ns/N, CPU load: 1.4%).
Performance test completed.

setoption name Clear Hash
info string Hash cleared
performance tree 8 split
Testing performance of tree ...
  a2a3: 2,863,411,653N.
  a2a4: 3,676,309,619N.
  b2b3: 3,579,299,617N.
  b2b4: 3,569,067,629N.
  c2c3: 3,806,229,124N.
  c2c4: 4,199,667,616N.
  d2d3: 6,093,248,619N.
  d2d4: 7,184,581,950N.
  e2e3: 8,039,390,919N.
  e2e4: 8,102,108,221N.
  f2f3: 2,728,615,868N.
  f2f4: 3,199,039,406N.
  g2g3: 3,641,432,923N.
  g2g4: 3,466,204,702N.
  h2h3: 2,860,408,680N.
  h2h4: 3,711,123,115N.
  b1a3: 3,193,522,577N.
  b1c3: 3,926,684,340N.
  g1f3: 3,937,354,096N.
  g1h3: 3,221,278,282N.
Total number of nodes: 84,998,978,956. Time: 13.29s.
0ns/N (raw: 0ns/N, CPU load: 0.0%).
Performance test completed.

setoption name Clear Hash
info string Hash cleared
performance tree 9 split
Testing performance of tree ...
  a2a3: 74,950,758,099N.
  a2a4: 101,265,301,849N.
  b2b3: 96,577,095,997N.
  b2b4: 97,442,160,946N.
  c2c3: 108,697,368,719N.
  c2c4: 120,549,219,832N.
  d2d3: 176,976,245,463N.
  d2d4: 227,220,482,342N.
  e2e3: 259,522,947,791N.
  e2e4: 263,561,543,780N.
  f2f3: 68,094,899,093N.
  f2f4: 84,792,070,664N.
  g2g3: 99,646,370,024N.
  g2g4: 92,281,289,941N.
  h2h3: 74,778,417,365N.
  h2h4: 102,853,440,161N.
  b1a3: 85,849,641,909N.
  b1c3: 109,418,317,145N.
  g1f3: 108,393,009,416N.
  g1h3: 86,659,653,631N.
Total number of nodes: 2,439,530,234,167. Time: 2.501m.
0ns/N (raw: 0ns/N, CPU load: 0.2%).
Performance test completed.

setoption name Clear Hash
info string Hash cleared
performance tree 10 split
Testing performance of tree ...
  a2a3: 2,149,477,156,227N.
  a2a4: 2,905,552,970,419N.
  b2b3: 2,774,842,822,463N.
  b2b4: 2,772,533,545,113N.
  c2c3: 3,072,577,495,123N.
  c2c4: 3,437,747,391,692N.
  d2d3: 5,071,006,040,569N.
  d2d4: 6,459,463,242,656N.
  e2e3: 7,299,373,354,878N.
  e2e4: 7,380,003,266,234N.
  f2f3: 1,945,020,011,164N.
  f2f4: 2,418,056,589,775N.
  g2g3: 2,853,630,724,145N.
  g2g4: 2,624,128,147,144N.
  h2h3: 2,142,832,044,687N.
  h2h4: 2,948,003,834,105N.
  b1a3: 2,440,848,135,252N.
  b1c3: 3,096,505,857,746N.
  g1f3: 3,090,773,583,680N.
  g1h3: 2,470,483,499,345N.
Total number of nodes: 69,352,859,712,417. Time: 36.22m.
0ns/N (raw: 0ns/N, CPU load: 0.1%).
Performance test completed.

setoption name Clear Hash
info string Hash cleared
performance tree 11 split
Testing performance of tree ...
  a2a3: 60,403,292,887,824N. (35.89m)
  a2a4: 85,054,341,127,064N. (45.62m)
  b2b3: 79,510,326,025,357N. (41.86m)
  b2b4: 80,419,308,561,211N. (41.41m)
  c2c3: 92,235,553,734,553N. (44.48m)
  c2c4: 103,605,670,223,681N. (48.53m)
  d2d3: 151,857,971,385,067N. (1.261h)
  d2d4: 211,583,204,457,112N. (1.810h)
  e2e3: 241,074,613,621,302N. (1.921h)
  e2e4: 245,841,494,675,197N. (2.065h)
  f2f3: 51,614,296,095,395N. (25.36m)
  f2f4: 68,372,448,303,691N. (32.54m)
  g2g3: 82,762,826,570,051N. (41.12m)
  g2g4: 73,966,186,324,024N. (37.24m)
  h2h3: 60,097,879,424,719N. (31.15m)
  h2h4: 86,739,921,618,220N. (44.58m)
  b1a3: 70,080,800,068,168N. (35.72m)
  b1c3: 91,451,554,526,572N. (44.99m)
  g1f3: 89,933,046,388,964N. (43.88m)
  g1h3: 71,046,267,678,634N. (34.88m)
Total number of nodes: 2,097,651,003,696,806. Time: 17.54h.
0ns/N (raw: 0ns/N, CPU load: 0.1%).
Performance test completed.
I was quite happy getting the perft(11) below 18h. :)

Hash still needs improvement. At the moment I only use the transposition tables during the parallel search.

The core move generator is also used by the chess engine, only generates legal moves, and for perft, I added a specialized nodes-counting function in order to avoid move-serialization.

Greetings from Jena (Germany)


Marcus
Macintosh
Posts: 13
Joined: Wed Jun 26, 2013 8:23 pm
Location: Jena, Germany

Re: A perft() benchmark

Post by Macintosh »

Oops,

I had a bug in the no-hashing part, that's why my 1-thread/no-hash time was so low in the last post. Here are the (hopefully) correct timings:

Code: Select all

setoption name Use Hash value false
setoption name Max Threads value 1
performance tree 7 split
Testing performance of tree ...
  a2a3: 106,743,106N.
  a2a4: 137,077,337N.
  b2b3: 133,233,975N.
  b2b4: 134,087,476N.
  c2c3: 144,074,944N.
  c2c4: 157,756,443N.
  d2d3: 227,598,692N.
  d2d4: 269,605,599N.
  e2e3: 306,138,410N.
  e2e4: 309,478,263N.
  f2f3: 102,021,008N.
  f2f4: 119,614,841N.
  g2g3: 135,987,651N.
  g2g4: 130,293,018N.
  h2h3: 106,678,423N.
  h2h4: 138,495,290N.
  b1a3: 120,142,144N.
  b1c3: 148,527,161N.
  g1f3: 147,678,554N.
  g1h3: 120,669,525N.
Total number of nodes: 3,195,901,860. Time: 11.41s.
0ns/N (raw: 3ns/N, CPU load: 0.0%).
Performance test completed.

performance tree 8 split
Testing performance of tree ...
  a2a3: 2,863,411,653N.
  a2a4: 3,676,309,619N.
  b2b3: 3,579,299,617N.
  b2b4: 3,569,067,629N.
  c2c3: 3,806,229,124N.
  c2c4: 4,199,667,616N.
  d2d3: 6,093,248,619N.
  d2d4: 7,184,581,950N.
  e2e3: 8,039,390,919N.
  e2e4: 8,102,108,221N.
  f2f3: 2,728,615,868N.
  f2f4: 3,199,039,406N.
  g2g3: 3,641,432,923N.
  g2g4: 3,466,204,702N.
  h2h3: 2,860,408,680N.
  h2h4: 3,711,123,115N.
  b1a3: 3,193,522,577N.
  b1c3: 3,926,684,340N.
  g1f3: 3,937,354,096N.
  g1h3: 3,221,278,282N.
Total number of nodes: 84,998,978,956. Time: 5.068m.
0ns/N (raw: 3ns/N, CPU load: 0.0%).
Performance test completed.

hashstats perft
Perft-Hash statistics: (Total: 131,072)
 Pl.  Ply  Min     Max     Ave     StdDev  Usage     Used
 ### Hash is empty ###
Greetings from Jena


Marcus
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Now with multithreading

Post by sje »

I've added perft() multithreading in Symbolic's re-write.

On a 3.4 GHz Core i7-2600:

Code: Select all

CountMt(PCModeFull, 0): 1   Time: 0.000   Frequency: inf   Period: 0
CountMt(PCModeFull, 1): 20   Time: 0.383   Frequency: 52.1523   Period: 0.0191746
CountMt(PCModeFull, 2): 400   Time: 0.383   Frequency: 1043.07   Period: 0.000958713
CountMt(PCModeFull, 3): 8902   Time: 0.373   Frequency: 23827   Period: 4.19691e-05
CountMt(PCModeFull, 4): 197281   Time: 0.362   Frequency: 543748   Period: 1.83909e-06
CountMt(PCModeFull, 5): 4865609   Time: 0.336   Frequency: 1.4445e+07   Period: 6.92283e-08
CountMt(PCModeFull, 6): 119060324   Time: 5.456   Frequency: 2.18195e+07   Period: 4.58306e-08
CountMt(PCModeFull, 7): 3195901860   Time: 2:17.219   Frequency: 2.32904e+07   Period: 4.29361e-08
CountMt(PCModeFull, 8): 84998978956   Time: 1:01:02.329   Frequency: 2.3209e+07   Period: 4.30867e-08
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: Now with multithreading

Post by sje »

Same machine, now with assistance from a shared transposition table:

Code: Select all

CountMt(PCModeTran, 0): 1   Time: 0.000   Frequency: inf   Period: 0
CountMt(PCModeTran, 1): 20   Time: 0.383   Frequency: 52.1601   Period: 0.0191718
CountMt(PCModeTran, 2): 400   Time: 0.383   Frequency: 1043.73   Period: 0.000958103
CountMt(PCModeTran, 3): 8902   Time: 0.373   Frequency: 23846.5   Period: 4.19348e-05
CountMt(PCModeTran, 4): 197281   Time: 0.364   Frequency: 541941   Period: 1.84522e-06
CountMt(PCModeTran, 5): 4865609   Time: 0.343   Frequency: 1.4178e+07   Period: 7.05316e-08
CountMt(PCModeTran, 6): 119060324   Time: 0.339   Frequency: 3.50732e+08   Period: 2.85118e-09
CountMt(PCModeTran, 7): 3195901860   Time: 2.025   Frequency: 1.57773e+09   Period: 6.33821e-10
CountMt(PCModeTran, 8): 84998978956   Time: 24.467   Frequency: 3.47393e+09   Period: 2.87858e-10
CountMt(PCModeTran, 9): 2439530234167   Time: 5:30.433   Frequency: 7.38282e+09   Period: 1.3545e-10
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: Now with multithreading

Post by sje »

Bulk counting, no transposition table:

Code: Select all

CountMt(PCModeBulk, 0): 1   Time: 0.000   Frequency: inf   Period: 0
CountMt(PCModeBulk, 1): 20   Time: 0.383   Frequency: 52.1504   Period: 0.0191753
CountMt(PCModeBulk, 2): 400   Time: 0.383   Frequency: 1043.24   Period: 0.000958555
CountMt(PCModeBulk, 3): 8902   Time: 0.383   Frequency: 23211.1   Period: 4.30829e-05
CountMt(PCModeBulk, 4): 197281   Time: 0.374   Frequency: 527044   Period: 1.89737e-06
CountMt(PCModeBulk, 5): 4865609   Time: 0.345   Frequency: 1.40671e+07   Period: 7.10879e-08
CountMt(PCModeBulk, 6): 119060324   Time: 0.532   Frequency: 2.23772e+08   Period: 4.46884e-09
CountMt(PCModeBulk, 7): 3195901860   Time: 9.747   Frequency: 3.27867e+08   Period: 3.05001e-09
CountMt(PCModeBulk, 8): 84998978956   Time: 4:48.663   Frequency: 2.94457e+08   Period: 3.39608e-09
CountMt(PCModeBulk, 9): 2439530234167   Time: 2:03:51.621   Frequency: 3.28264e+08   Period: 3.04633e-09