An altenative perft() initial FEN

Discussion of chess software programming and technical issues.

Moderator: Ras

CRoberson
Posts: 2094
Joined: Mon Mar 13, 2006 2:31 am
Location: North Carolina, USA

Re: An altenative perft() initial FEN

Post by CRoberson »

I've run Telepath on the position for each of the first 7 ply and obtained the same node counts as you did.
xmas79
Posts: 286
Joined: Mon Jun 03, 2013 7:05 pm
Location: Italy

Re: An altenative perft() initial FEN.

Post by xmas79 »

Hi all,
I'm interested in NPS scaling with multithread enabled and hashtable disabled (since I don't want to use hashtables in perft).

r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R3K2R w KQkq -

I wrote different perft functions: one in particular where I use fast "stripped down" Make/Unmake functions where I removed hash signature updates etc... and a special MakeUnmakeFast function in the last ply to check only if the move is legal (I have only pseudo-legal move generator). With the fully bloated Make/Unmake with no other tricks in the last ply I can get "only" about half the speed.

This is all on my quad-core Core i7-3630QM@2.40GHz.

Code: Select all

perftfastmt 6
Time elapsed: 28.65400 seconds
Total leaf nodes: 7891984336
275.4M LNPS

perftmt 6
Time elapsed: 60.90700 seconds
Total leaf nodes: 7891984336
129.6M LNPS
And here are the single thread versions:

Code: Select all

perftfast 6
Time elapsed: 134.12200 seconds
Total leaf nodes: 7891984336
58.8M LNPS

perft 6
Time elapsed: 268.17700 seconds
Total leaf nodes: 7891984336
29.4M LNPS
As you can see I get only a 4x scaling even if I'm using 8 threads (CPU have 4 cores with HT). I thought it must be the HT stuff, but using 4 threads seems to halves scaling factor.
What NPS scaling do you get with multithread?




And here's in the end divided results of perft 6 and perft 7:

Code: Select all

perftfast 6 

 1 Nf3*e5 256118651: r3k2r/1pp1qppp/p1np1n2/2b1N1B1/2B1P1b1/P1NP4/1PP1QPPP/R3K2R b KQkq -
 2 Bc4*a6 170724575: r3k2r/1pp1qppp/B1np1n2/2b1p1B1/4P1b1/P1NP1N2/1PP1QPPP/R3K2R b KQkq -
 3 Bc4*f7 22358900: r3k2r/1pp1qBpp/p1np1n2/2b1p1B1/4P1b1/P1NP1N2/1PP1QPPP/R3K2Rb KQkq -
 4 Bg5*f6 145467022: r3k2r/1pp1qppp/p1np1B2/2b1p3/2B1P1b1/P1NP1N2/1PP1QPPP/R3K2R b KQkq -
 5 Qe2-d1 169787452: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP2PPP/R2QK2R b KQkq -
 6 Qe2-f1 157901116: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP2PPP/R3KQ1R b KQkq -
 7 Qe2-d2 178876552: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PPQ1PPP/R3K2R b KQkq -
 8 Qe2-e3 187562592: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NPQN2/1PP2PPP/R3K2R b KQkq -
 9 Ra1-b1 163110039: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/1R2K2R b KQkq -
10 Ra1-c1 162063286: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/2R1K2R b KQkq -
11 Ra1-d1 155111784: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/3RK2R b KQkq -
12 Ra1-a2 142402945: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/RPP1QPPP/4K2R b KQkq -
13 Rh1-f1 160853862: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R3KR2 b KQkq -
14 Rh1-g1 168403724: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R3K1R1 b KQkq -
15 Bc4-a2 165001356: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/4P1b1/P1NP1N2/BPP1QPPP/R3K2R b KQkq -
16 Bc4-b3 162876250: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/4P1b1/PBNP1N2/1PP1QPPP/R3K2R b KQkq -
17 Bc4-b5 118534407: r3k2r/1pp1qppp/p1np1n2/1Bb1p1B1/4P1b1/P1NP1N2/1PP1QPPP/R3K2R b KQkq -
18 Bc4-d5 156746652: r3k2r/1pp1qppp/p1np1n2/2bBp1B1/4P1b1/P1NP1N2/1PP1QPPP/R3K2R b KQkq -
19 Bc4-e6 162159260: r3k2r/1pp1qppp/p1npBn2/2b1p1B1/4P1b1/P1NP1N2/1PP1QPPP/R3K2R b KQkq -
20 Bg5-c1 163956956: r3k2r/1pp1qppp/p1np1n2/2b1p3/2B1P1b1/P1NP1N2/1PP1QPPP/R1B1K2R b KQkq -
21 Bg5-d2 167436099: r3k2r/1pp1qppp/p1np1n2/2b1p3/2B1P1b1/P1NP1N2/1PPBQPPP/R3K2R b KQkq -
22 Bg5-e3 188923139: r3k2r/1pp1qppp/p1np1n2/2b1p3/2B1P1b1/P1NPBN2/1PP1QPPP/R3K2R b KQkq -
23 Bg5-f4 212478824: r3k2r/1pp1qppp/p1np1n2/2b1p3/2B1PBb1/P1NP1N2/1PP1QPPP/R3K2R b KQkq -
24 Bg5-h4 157267257: r3k2r/1pp1qppp/p1np1n2/2b1p3/2B1P1bB/P1NP1N2/1PP1QPPP/R3K2R b KQkq -
25 Bg5-h6 179105449: r3k2r/1pp1qppp/p1np1n1B/2b1p3/2B1P1b1/P1NP1N2/1PP1QPPP/R3K2R b KQkq -
26 Nc3-b1 133597337: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P2P1N2/1PP1QPPP/RN2K2R b KQkq -
27 Nc3-d1 132979018: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P2P1N2/1PP1QPPP/R2NK2R b KQkq -
28 Nc3-a2 153356008: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P2P1N2/NPP1QPPP/R3K2R b KQkq -
29 Nc3-a4 158446983: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/N1B1P1b1/P2P1N2/1PP1QPPP/R3K2R b KQkq -
30 Nc3-b5 158989454: r3k2r/1pp1qppp/p1np1n2/1Nb1p1B1/2B1P1b1/P2P1N2/1PP1QPPP/R3K2R b KQkq -
31 Nc3-d5 161873336: r3k2r/1pp1qppp/p1np1n2/2bNp1B1/2B1P1b1/P2P1N2/1PP1QPPP/R3K2R b KQkq -
32 Nf3-g1 173731523: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP4/1PP1QPPP/R3K1NR b KQkq -
33 Nf3-d2 184486630: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP4/1PPNQPPP/R3K2R b KQkq -
34 Nf3-d4 213244060: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2BNP1b1/P1NP4/1PP1QPPP/R3K2R b KQkq -
35 Nf3-h4 189677279: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1bN/P1NP4/1PP1QPPP/R3K2R b KQkq -
36  b2-b3 161223442: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/PPNP1N2/2P1QPPP/R3K2R b KQkq -
37  g2-g3 175267258: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1NP1/1PP1QP1P/R3K2R b KQkq -
38  h2-h3 196761843: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N1P/1PP1QPP1/R3K2R b KQkq -
39  a3-a4 182863372: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/P1B1P1b1/2NP1N2/1PP1QPPP/R3K2R b KQkq -
40  d3-d4 203014324: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2BPP1b1/P1N2N2/1PP1QPPP/R3K2R b KQkq -
41  b2-b4 172122868: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/1PB1P1b1/P1NP1N2/2P1QPPP/R3K2R b KQkq -
42  h2-h4 182697403: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1bP/P1NP1N2/1PP1QPP1/R3K2R b KQkq -
43    O-O 176251801: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R4RK1 b KQkq -
44  O-O-O 162758263: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/2KR3R b KQkq -
45 Ke1-d1 173202558: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R2K3R b KQkq -
46 Ke1-f1 176260981: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R4K1R b KQkq -
47 Ke1-d2 193950446: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PPKQPPP/R6R b KQkq -

Time elapsed: 134.12200 seconds 
Total leaf nodes: 7891984336 
58.8M LNPS

And here a perft 7 result:

Code: Select all

perftfastmt 7

 1  h2-h3 8700041322: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N1P/1PP1QPP1/R3K2R b KQkq -
 2  a3-a4 7844893814: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/P1B1P1b1/2NP1N2/1PP1QPPP/R3K2R b KQkq -
 3  d3-d4 9408057290: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2BPP1b1/P1N2N2/1PP1QPPP/R3K2R b KQkq -
 4  b2-b4 7383734723: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/1PB1P1b1/P1NP1N2/2P1QPPP/R3K2R b KQkq b3
 5  h2-h4 7857707773: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1bP/P1NP1N2/1PP1QPP1/R3K2R b KQkq h3
 6    O-O 7441315266: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R4RK1 b kq -
 7 Ke1-d1 7176704182: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R2K3R b kq -
 8 Ke1-f1 7434607500: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R4K1R b kq -
 9 Ke1-d2 8429004623: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PPKQPPP/R6R b kq -
10 Rh1-f1 6548916568: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R3KR2 b Qkq -
11 Rh1-g1 6997505595: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R3K1R1 b Qkq -
12 Bc4-a2 6828224394: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/4P1b1/P1NP1N2/BPP1QPPP/R3K2R b KQkq -
13 Bc4-b3 6724378117: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/4P1b1/PBNP1N2/1PP1QPPP/R3K2R b KQkq -
14 Bc4-b5 4815395108: r3k2r/1pp1qppp/p1np1n2/1Bb1p1B1/4P1b1/P1NP1N2/1PP1QPPP/R3K2R b KQkq -
15 Bc4-d5 6637464383: r3k2r/1pp1qppp/p1np1n2/2bBp1B1/4P1b1/P1NP1N2/1PP1QPPP/R3K2R b KQkq -
16 Bc4-e6 7051856718: r3k2r/1pp1qppp/p1npBn2/2b1p1B1/4P1b1/P1NP1N2/1PP1QPPP/R3K2R b KQkq -
17 Bg5-d2 6877428331: r3k2r/1pp1qppp/p1np1n2/2b1p3/2B1P1b1/P1NP1N2/1PPBQPPP/R3K2R b KQkq -
18 Bg5-e3 8097829773: r3k2r/1pp1qppp/p1np1n2/2b1p3/2B1P1b1/P1NPBN2/1PP1QPPP/R3K2R b KQkq -
19 Bg5-f4 9110198725: r3k2r/1pp1qppp/p1np1n2/2b1p3/2B1PBb1/P1NP1N2/1PP1QPPP/R3K2R b KQkq -
20 Bg5-h4 6243857458: r3k2r/1pp1qppp/p1np1n2/2b1p3/2B1P1bB/P1NP1N2/1PP1QPPP/R3K2R b KQkq -
21 Bg5-h6 7609854994: r3k2r/1pp1qppp/p1np1n1B/2b1p3/2B1P1b1/P1NP1N2/1PP1QPPP/R3K2R b KQkq -
22 Nc3-b1 5106093309: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P2P1N2/1PP1QPPP/RN2K2R b KQkq -
23 Nc3-d1 5092743663: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P2P1N2/1PP1QPPP/R2NK2R b KQkq -
24 Nc3-a2 6217317396: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P2P1N2/NPP1QPPP/R3K2R b KQkq -
25 Nc3-a4 6581923754: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/N1B1P1b1/P2P1N2/1PP1QPPP/R3K2R b KQkq -
26 Nc3-b5 6743690873: r3k2r/1pp1qppp/p1np1n2/1Nb1p1B1/2B1P1b1/P2P1N2/1PP1QPPP/R3K2R b KQkq -
27 Nc3-d5 7031516349: r3k2r/1pp1qppp/p1np1n2/2bNp1B1/2B1P1b1/P2P1N2/1PP1QPPP/R3K2R b KQkq -
28 Nf3-g1 7058061238: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP4/1PP1QPPP/R3K1NR b KQkq -
29 Nf3-d2 7793889011: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP4/1PPNQPPP/R3K2R b KQkq -
30 Nf3-d4 9751959347: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2BNP1b1/P1NP4/1PP1QPPP/R3K2R b KQkq -
31 Nf3-h4 8146091138: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1bN/P1NP4/1PP1QPPP/R3K2R b KQkq -
32  b2-b3 6563013573: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/PPNP1N2/2P1QPPP/R3K2R b KQkq -
33  g2-g3 7362922786: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1NP1/1PP1QP1P/R3K2R b KQkq -
34  O-O-O 6677652450: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/2KR3R b kq -
35 Nf3*e5 11833091585: r3k2r/1pp1qppp/p1np1n2/2b1N1B1/2B1P1b1/P1NP4/1PP1QPPP/R3K2R b KQkq -
36 Bc4*a6 6911473293: r3k2r/1pp1qppp/B1np1n2/2b1p1B1/4P1b1/P1NP1N2/1PP1QPPP/R3K2R b KQkq -
37 Bc4*f7 931459632: r3k2r/1pp1qBpp/p1np1n2/2b1p1B1/4P1b1/P1NP1N2/1PP1QPPP/R3K2R b KQkq -
38 Bg5*f6 6135150412: r3k2r/1pp1qppp/p1np1B2/2b1p3/2B1P1b1/P1NP1N2/1PP1QPPP/R3K2R b KQkq -
39 Qe2-d1 7066718673: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP2PPP/R2QK2R b KQkq -
40 Qe2-f1 6205034378: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP2PPP/R3KQ1R b KQkq -
41 Qe2-d2 7713599363: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PPQ1PPP/R3K2R b KQkq -
42 Qe2-e3 8417344023: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NPQN2/1PP2PPP/R3K2R b KQkq -
43 Ra1-b1 6753584464: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/1R2K2R b Kkq -
44 Ra1-c1 6656410108: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/2R1K2R b Kkq -
45 Ra1-d1 6275244828: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/3RK2R b Kkq -
46 Ra1-a2 5588602038: r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/RPP1QPPP/4K2R b Kkq -
47 Bg5-c1 6569302985: r3k2r/1pp1qppp/p1np1n2/2b1p3/2B1P1b1/P1NP1N2/1PP1QPPP/R1B1K2R b KQkq -

Time elapsed: 1225.87400 seconds
Total leaf nodes: 332402867326
271.2M LNPS
perft 7 results are not ordered because of non determinism of multithread.

Best regards,
Natale.
ibid
Posts: 89
Joined: Mon Jun 13, 2011 12:09 pm

Re: An altenative perft() initial FEN.

Post by ibid »

xmas79 wrote:Hi all,
I'm interested in NPS scaling with multithread enabled and hashtable disabled (since I don't want to use hashtables in perft).

[d]r3k2r/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R3K2R w KQkq -
I wrote different perft functions: one in particular where I use fast "stripped down" Make/Unmake functions where I removed hash signature updates etc... and a special MakeUnmakeFast function in the last ply to check only if the move is legal (I have only pseudo-legal move generator). With the fully bloated Make/Unmake with no other tricks in the last ply I can get "only" about half the speed.

This is all on my quad-core Core i7-3630QM@2.40GHz.

Code: Select all

perftfastmt 6
Time elapsed: 28.65400 seconds
Total leaf nodes: 7891984336
275.4M LNPS

perftmt 6
Time elapsed: 60.90700 seconds
Total leaf nodes: 7891984336
129.6M LNPS
And here are the single thread versions:

Code: Select all

perftfast 6
Time elapsed: 134.12200 seconds
Total leaf nodes: 7891984336
58.8M LNPS

perft 6
Time elapsed: 268.17700 seconds
Total leaf nodes: 7891984336
29.4M LNPS
As you can see I get only a 4x scaling even if I'm using 8 threads (CPU have 4 cores with HT). I thought it must be the HT stuff, but using 4 threads seems to halves scaling factor.
What NPS scaling do you get with multithread?
Something like this should get near-perfect scaling, at least as long as you're using physical cores. A multi-threaded no-hash-table perft(6) for me:

Code: Select all

1 core   17.813
2 cores   8.981 [50.4%]
3 cores   5.992 [33.6%]
4 cores   4.490 [25.2%]
I'm not using an intel cpu so cannot test hyperthreading.

Random thoughts:
- Your cpu can go as high as 3.4 GHz with turbo, which could be throwing off the single thread numbers if you didn't turn it off.
- When I get odd multi-threaded numbers like that, I usually find that multiple threads are doing frequent writes to variables in the same cache line in memory.

-paul
xmas79
Posts: 286
Joined: Mon Jun 03, 2013 7:05 pm
Location: Italy

Re: An altenative perft() initial FEN.

Post by xmas79 »

Something like this should get near-perfect scaling, at least as long as you're using physical cores
Ummhh... funny stuff here.... That's what I think too, but here are my results of a perft(5) (it's late...):

Code: Select all

perftfastmt 5
1 core :  58.0M LNPS [ 100% ]
2 cores: 110.2M LNPS [ 190% ]
3 cores: 160.4M LNPS [ 277% ]
4 cores: 186.9M LNPS [ 322% ]
8 cores: 207.2M LNPS [ 357% ]
Up to 4 cores performance could be acceptable (even if a 3.3x is not as 4x), but, seriously men, 357% with 8 cores is really ridiculous. As already said, I suspect it's a Hyperthreading problem, waiting to see if other people noticed something similiar...

Not near-perfect NPS scaling must be due to something I do in a fancy way... I simply put every position up to ply=3 into a queue and then start processing them in parallel. When a thread finishes its work, dequeues another position. I see no "waiting" threads, that means I'm doing something in a very inefficient way? How to split correctly?
Random thoughts:
- Your cpu can go as high as 3.4 GHz with turbo, which could be throwing off the single thread numbers if you didn't turn it off.
Disabling turboboost seems to have poor effect on my scaling, except for 8 cores:

Code: Select all

perftfastmt 5
1 core :  43.6M LNPS [ 100% ]
2 cores:  83.5M LNPS [ 192% ]
3 cores: 116.9M LNPS [ 268% ]
4 cores: 133.3M LNPS [ 306% ]
8 cores: 183.0M LNPS [ 419% ] <-- boost
- When I get odd multi-threaded numbers like that, I usually find that multiple threads are doing frequent writes to variables in the same cache line in memory.
Ahhh processor things... I think I don't have any false sharing issue in this code, expect for the queue itself of course. But items in the queue can be up to let's say 10.000 and get dequeued at not so fast rate, so I don't expect that performance drop. Splitting up to ply=2 instead of ply=3 reduces queue elements (and hence a possible false sharing problem), but performance doesn't go up...


Natl.
syzygy
Posts: 5786
Joined: Tue Feb 28, 2012 11:56 pm

Re: An altenative perft() initial FEN.

Post by syzygy »

xmas79 wrote:Up to 4 cores performance could be acceptable (even if a 3.3x is not as 4x), but, seriously men, 357% with 8 cores is really ridiculous. As already said, I suspect it's a Hyperthreading problem, waiting to see if other people noticed something similiar...
Since you mention hyperthreading, you probably have an intel cpu. In that case it is very unlikely you have 8 cores. You most likely have 4 cores with HT, which means 4 logical hardware threads. It is expected that 8 hyperthreads only go a bit faster than 4 threads on a 4 core cpu.
Ahhh processor things... I think I don't have any false sharing issue in this code, expect for the queue itself of course.
Assuming you don't write into the queue after creating it, there can be no false sharing there. The only thing you share is the atomic counter for picking the next element in the queue (or a mutex or spinlock for accessing and incrementing the counter if you don't use an atomic increment). As you said this shouldn't be an issue as the number of elements in the queue is very low compared to the total number of visited nodes.

Of course not sharing anything else does not mean there is no false sharing.
xmas79
Posts: 286
Joined: Mon Jun 03, 2013 7:05 pm
Location: Italy

Re: An altenative perft() initial FEN.

Post by xmas79 »

syzygy wrote:Since you mention hyperthreading, you probably have an intel cpu. In that case it is very unlikely you have 8 cores. You most likely have 4 cores with HT, which means 4 logical hardware threads.
xmas79 wrote:...This is all on my quad-core Core i7-3630QM@2.40GHz...
xmas79 wrote:...(CPU have 4 cores with HT)...
syzygy wrote:It is expected that 8 hyperthreads only go a bit faster than 4 threads on a 4 core cpu.
What I already suspected (and then confirmed).
syzygy wrote:Of course not sharing anything else does not mean there is no false sharing.
This is far less obvious in my opinion. If I write a multithread program that have zero global/shared variables among treads (and cache-aligned variables) I think false share is absent too. Am I wrong?