Stockfish-1.6.2 Benchmarks for 1 to 8 Threads

Tord Romstad · Post by **Tord Romstad** » Sat Jan 02, 2010 1:18 pm

zullil wrote:I did indeed leave MNTpSP=5. I also monitored CPU during some initial testing, and indeed each program was at 800% (=8 cores in use) almost always.

If Tord is still around---should I do some benchmarking/testing with larger values of MNTpSP? Crude early testing seemed to indicate that changing this parameter had little effect on nps, compared to changes in MSD. I suppose it can't hurt to try.

I'm still here, and I'll try to give a crude, not too technical explanation of what the two parameters do:

In a YBW search, the search at every node in the search tree always starts with a single CPU searching the first move alone. If this move refutes the move directly before it, the search at the current node stops immediately. If not, other CPUs are invited to join the work in searching the remaining moves, if they aren't already busy searching some other part of the tree. A node where the work is shared between two or more CPUs is called a "split point".

Splitting is a very expensive operation. The chess board and numerous other big chunks of data must be copied between all the CPUs cooperating at the split point. It gets even more expensive when the number of CPUs increase, because even more data must be copied.

Because splitting is so expensive, we don't want to do it unless the sub-tree below the current node is so big that we save a significant amount of work by having several CPUs working on the sub-tree. Therefore, we want to avoid splitting to close to the leaves of the tree. The "Minimum Split Depth" parameter controls how close to the leaves splitting is allowed. With the default setting of 4, the program will not try to split when the remaining depth is less than 4 plies. Setting this parameter too high or too low will hurt the performance. If it is too low, the program will spend too much time on copying and synchronization at split points. If it is too high, some of the CPUs will spend too much time being idle and waiting for work to do. It seems reasonable that a higher value is better with a higher number of CPUs, but the optimal values can only be determined by experimentation.

The meaning of the other parameter, "Maximum Number of Threads per Split Point", should now be obvious. I have no idea and no intuition about what the optimal value should be.

I must admit that I am a little less optimistic than Marco about how much can be achieved by simply fine-tuning these two parameters. In order to make the search really efficient on more than 4 CPUs, I think we need to make changes to the actual code.

zullil · Post by **zullil** » Sat Jan 02, 2010 1:32 pm

Thanks, Tord, for that clear explanation. Seems like a bit of experimentation with MNTpSP might be fun.

Joerg Oster · Post by **Joerg Oster** » Sat Jan 02, 2010 2:31 pm

Tord Romstad wrote:
zullil wrote:I did indeed leave MNTpSP=5. I also monitored CPU during some initial testing, and indeed each program was at 800% (=8 cores in use) almost always.

If Tord is still around---should I do some benchmarking/testing with larger values of MNTpSP? Crude early testing seemed to indicate that changing this parameter had little effect on nps, compared to changes in MSD. I suppose it can't hurt to try.
I'm still here, and I'll try to give a crude, not too technical explanation of what the two parameters do:

In a YBW search, the search at every node in the search tree always starts with a single CPU searching the first move alone. If this move refutes the move directly before it, the search at the current node stops immediately. If not, other CPUs are invited to join the work in searching the remaining moves, if they aren't already busy searching some other part of the tree. A node where the work is shared between two or more CPUs is called a "split point".

Splitting is a very expensive operation. The chess board and numerous other big chunks of data must be copied between all the CPUs cooperating at the split point. It gets even more expensive when the number of CPUs increase, because even more data must be copied.

Because splitting is so expensive, we don't want to do it unless the sub-tree below the current node is so big that we save a significant amount of work by having several CPUs working on the sub-tree. Therefore, we want to avoid splitting to close to the leaves of the tree. The "Minimum Split Depth" parameter controls how close to the leaves splitting is allowed. With the default setting of 4, the program will not try to split when the remaining depth is less than 4 plies. Setting this parameter too high or too low will hurt the performance. If it is too low, the program will spend too much time on copying and synchronization at split points. If it is too high, some of the CPUs will spend too much time being idle and waiting for work to do. It seems reasonable that a higher value is better with a higher number of CPUs, but the optimal values can only be determined by experimentation.

The meaning of the other parameter, "Maximum Number of Threads per Split Point", should now be obvious. I have no idea and no intuition about what the optimal value should be.

I must admit that I am a little less optimistic than Marco about how much can be achieved by simply fine-tuning these two parameters. In order to make the search really efficient on more than 4 CPUs, I think we need to make changes to the actual code.

Hi Tord,

yes, thanks for this explanation.

I was asking because in my testing with 4 Cores, MNTpSP=5 (default) gives good results. 4 also.
But maybe some testing with MNTpSP=3 would also be worth a try?
6 seems to be really bad, which is not a big surprise, I guess. Nps is dropping (sometimes significantly).

Setting MNTpSP to 8 or even 9 on an Octa seemed more logical to me after my impressions on my Quadcore.
But maybe MNTpSP=6 or 7 on an 8-Core-Machine is worth a try.

Kind regards,
Joerg.

mcostalba · Post by **mcostalba** » Sat Jan 02, 2010 2:54 pm

zullil wrote:
mcostalba wrote:I see you mastered cutechess quickly

I am really curious about the final result....

I know it is not good luck to say this so early, but I suspect that we will end up with a much stronger setup for 8 cores then current one.
Hi Marco,

Based on the three 100-game match results posted above, what match would you like me to run next? MSD4--MSD7 for 1000 games? MSD6--MSD7 for 1000 games?

Let me know.

Louis

Both

A match against MSD4 is mandatory because MSD4 is the current setup, after that, if MSD7 wins, you may want to do another 1000 match MSD6--MSD7 to fine tune the result.

I know what you think, but unfortunatly in chess engine (serious) testing there are no easy shortcuts

zullil · Post by **zullil** » Sat Jan 02, 2010 3:12 pm

mcostalba wrote:
Both

A match against MSD4 is mandatory because MSD4 is the current setup, after that, if MSD7 wins, you may want to do another 1000 match MSD6--MSD7 to fine tune the result.

I know what you think, but unfortunatly in chess engine (serious) testing there are no easy shortcuts

Not a problem. I'll start the MSD4--MSD7 match later today. Unless instructed otherwise, I'll use the same cutechess-cli options/parameters as in the 100 game match.

zullil · Post by **zullil** » Sat Jan 02, 2010 4:37 pm

Joerg Oster wrote: But maybe MNTpSP=6 or 7 on an 8-Core-Machine is worth a try.

Kind regards,
Joerg.

Hi Joerg,

Here's mSD=7;MNTpSP=5 against mSD=7;MNTpSP=7:

Code: Select all

./cutechess-cli.sh -fcp cmd=/Users/louis/Desktop/stockfish-162MSD/src/stockfish-1.6.2-MSD7 name=Stockfish-1.6.2-MSD75 -scp cmd=/Users/louis/Desktop/stockfish-162MSD/src/stockfish-1.6.2-MSD7MNTpSP7 name=Stockfish-1.6.2-MSD77 -both dir=/Users/louis/Desktop/stockfish-162MSD/src proto=uci tc=40 book=performance.bin bookdepth=10 option.Hash=512 option.Ponder=false option.OwnBook=false -event MSD7577_TEST_MATCH -games 100 -repeat -site EastonPA -pgnout ~/Desktop/MSD7577_TEST_MATCH.pgn

Score of Stockfish-1.6.2-MSD75 vs Stockfish-1.6.2-MSD77: 24 - 18 - 58
Finished match

Based on this (statistically insignificant) outcome and my desire to keep this simple, I've decided to leave MNTpSP=5 (default) for now. As I mentioned before, changing MNTpSP seems to have less effect than changing mSD.

Louis

Joerg Oster · Post by **Joerg Oster** » Sat Jan 02, 2010 6:03 pm

Hi Louis,

yes, my results are similar. With higher mSD=6 or 7 I get higher nps on my Quad as well. The default setting seems not to be the best even for a Quad.

I will make a post later with my first results.

Joerg

Joerg Oster · Post by **Joerg Oster** » Sat Jan 02, 2010 10:24 pm

Hi Louis and SF team,

here are my first results so far.

Pos 1
[d]8/pp4pp/4k3/3rPp2/1Pr4P/2B1KPP1/1P6/4R3 b - - 0 1

With Stockfish 1.62 default I get this: nps ~ 5.140 kn

Code: Select all

					Max. Num = 5	MSD = 4
2...g7-g6 3.Te1-a1 a7-a6 4.Ta1-f1 Td5-d8 5.Tf1-h1 Td8-d7 6.Th1-a1 h7-h6 7.Ta1-h1 Tc4-c8 8.Th1-a1 Tc8-d8 9.Ke3-e2 Td7-d5 10.Ta1-g1 Td8-c8 11.Ke2-e3 Tc8-c4 12.Tg1-h1 Td5-d8 13.f3-f4 Tc4-c7 14.h4-h5 g6-g5 15.f4xg5 h6xg5 16.Th1-a1 Tc7-d7 
  -+  (-1.81)   Tiefe: 27   00:03:04  946mN

With MSD=7: nps ~ 6.190 kn (+20%)

Code: Select all

					Max. Num = 5	MSD = 7
2...Tc4xc3+ 3.b2xc3 Td5xe5+ 4.Ke3-d2 Te5xe1 5.Kd2xe1 Ke6-d5 6.Ke1-d2 Kd5-c4 7.Kd2-c2 h7-h5 8.Kc2-d2 a7-a6 9.f3-f4 Kc4-b3 10.Kd2-d3 g7-g6 11.Kd3-d2 b7-b6 12.Kd2-d3 a6-a5 13.b4xa5 b6xa5 14.Kd3-d2 a5-a4 15.Kd2-d3 a4-a3 
  -+  (-2.78 !)   Tiefe: 27   00:01:28  545mN

Pos 2
[d]2q2r1k/4pp2/3p1np1/3P1P2/pP2P3/2p1N3/5PQP/2R3K1 b - - 0 1

SF default: nps ~ 4.879 kn

Code: Select all

					Max. Num = 5	MSD = 4
1...Tf8-g8 2.Dg2-h3+ Sf6-h7 3.f2-f3 g6xf5+ 4.Kg1-f2 f5-f4 5.Se3-f5 Dc8-c4 
  -+  (-1.93 --)   Tiefe: 19   00:00:33  161mN

With MSD=7: nps ~ 4.907 kn (+0.6%)

Code: Select all

					Max. Num = 5	MSD = 7
1...Tf8-g8 2.Kg1-h1 a4-a3 3.f2-f4 Dc8-b7 4.Dg2-h3+ Kh8-g7 5.f5xg6 f7xg6 6.Dh3-e6 Db7-a7 7.Se3-c2 Sf6-h5 8.De6-h3 Da7-f2 9.Dh3xc3+ Kg7-f7 10.Dc3-h3 Df2-d2 11.Dh3-e3 Dd2xe3 12.Sc2xe3 Sh5xf4 13.b4-b5 a3-a2 
  -+  (-1.81)   Tiefe: 19   00:00:54  265mN

Pos 3
[d]4k3/7p/1pr1np2/p1p1pNpP/P1P1K1P1/1P2P3/3R1P2/8 w - - 0 1

SF default: nps ~ 5.467 kn

Code: Select all

					Max. Num = 5	MSD = 4
1.Sf5-d6+ Ke8-e7 2.Sd6-f5+ Ke7-e8 3.Td2-d6 Tc6xd6 4.Sf5xd6+ Ke8-d7 5.Sd6-b5 Se6-g7 6.h5-h6 Sg7-e8 7.Ke4-d5 f6-f5 8.Kd5xe5 f5xg4 9.Ke5-f5 Kd7-e7 10.Sb5-c3 Se8-d6+ 11.Kf5xg5 Sd6-f7+ 12.Kg5xg4 Sf7xh6+ 13.Kg4-f4 Sh6-f7 14.Sc3-d5+ Ke7-e6 15.Sd5xb6 Sf7-e5 16.Kf4-e4 Se5-g4 17.f2-f4 Sg4-f2+ 18.Ke4-f3 Sf2-d3 19.e3-e4 h7-h5 20.f4-f5+ Ke6-e5 21.Sb6-d7+ Ke5-d6 22.Sd7-f6 h5-h4 23.Sf6-g4 Kd6-e7 24.e4-e5 Sd3-c1 25.Sg4-f6 Sc1xb3 26.Sf6-d5+ Ke7-d7 
  +-  (3.15)   Tiefe: 29   00:00:45  246mN

With MSD=7: nps ~ 6.280 kn (+15%)

Code: Select all

					Max. Num = 5	MSD = 7
1.Td2-d6 Se6-d8 2.Ke4-d5 Tc6xd6+ 3.Kd5xd6 e5-e4 4.Kd6-c7 Sd8-f7 5.Kc7xb6 Ke8-d7 6.Kb6xa5 Sf7-e5 7.Ka5-b5 Se5xg4 8.Kb5xc5 Sg4xf2 9.Kc5-d5 g5-g4 10.a4-a5 Sf2-d3 11.Sf5-g3 
  +-  (3.55 !)   Tiefe: 29   00:01:06  415mN

I tested some more positions with the same result in general: setting MSD (Min. Split Depth) to 6 or 7 will give better nps, especially in the endgame. In midgame positions MSD=4 (default) often performed pretty well, too.

I will now run a match between SF 1.6.2 default and SF 1.6.2 MNTpSP=5; MSD=7 and post the result again.

Joerg.

zullil · Post by **zullil** » Sun Jan 03, 2010 2:53 pm

Stockfish-1.6.2-mSD7 (8 threads) defeated Stockfish-1.6.2-mSD4 (8 threads) by a score of 291 to 120, with 589 draws.

Can provide the .pgn file if needed. Will start a similar match between Stockfish-1.6.2-mSD6 and Stockfish-1.6.2-mSD7 later today.

Code: Select all

./cutechess-cli.sh -fcp cmd=/Users/louis/Desktop/StockfishTests/stockfish-162-ja/src/stockfish-1.6.2-mSD4 -scp cmd=/Users/louis/Desktop/StockfishTests/stockfish-162-ja/src/stockfish-1.6.2-mSD7 -both dir=/Users/louis/Desktop/StockfishTests/stockfish-162-ja/src/ proto=uci tc=40 book=performance.bin bookdepth=10 option.Ponder=false option.OwnBook=false -event mSD4-7_TEST_MATCH -games 1000 -repeat -site MacPro4,1-2.26GHz-8_Threads-512M_Hash -pgnout ./mSD4-7_TEST_MATCH.pgn 

Score of Stockfish 1.6.2-mSD4 64bit vs Stockfish 1.6.2-mSD7 64bit: 120 - 291 - 589
Finished match

Code: Select all

Stockfish 1.6.2-mSD4 64bit. By Tord Romstad, Marco Costalba, Joona Kiiski.
Good! CPU has hardware POPCNT. We will use it.
uci
id name Stockfish 1.6.2-mSD4 64bit
id author Tord Romstad, Marco Costalba, Joona Kiiski

option name Use Search Log type check default false
option name Search Log Filename type string default SearchLog.txt
option name Book File type string default book.bin
option name Mobility (Middle Game) type spin default 100 min 0 max 200
option name Mobility (Endgame) type spin default 100 min 0 max 200
option name Pawn Structure (Middle Game) type spin default 100 min 0 max 200
option name Pawn Structure (Endgame) type spin default 100 min 0 max 200
option name Passed Pawns (Middle Game) type spin default 100 min 0 max 200
option name Passed Pawns (Endgame) type spin default 100 min 0 max 200
option name Space type spin default 100 min 0 max 200
option name Aggressiveness type spin default 100 min 0 max 200
option name Cowardice type spin default 100 min 0 max 200
option name King Safety Curve type combo default Quadratic var Quadratic var Linear
option name King Safety Coefficient type spin default 40 min 1 max 100
option name King Safety X Intercept type spin default 0 min 0 max 20
option name King Safety Max Slope type spin default 30 min 10 max 100
option name King Safety Max Value type spin default 500 min 100 max 1000
option name Queen Contact Check Bonus type spin default 3 min 0 max 8
option name Queen Check Bonus type spin default 2 min 0 max 4
option name Rook Check Bonus type spin default 1 min 0 max 4
option name Bishop Check Bonus type spin default 1 min 0 max 4
option name Knight Check Bonus type spin default 1 min 0 max 4
option name Discovered Check Bonus type spin default 3 min 0 max 8
option name Mate Threat Bonus type spin default 3 min 0 max 8
option name Check Extension (PV nodes) type spin default 2 min 0 max 2
option name Check Extension (non-PV nodes) type spin default 1 min 0 max 2
option name Single Reply Extension (PV nodes) type spin default 2 min 0 max 2
option name Single Reply Extension (non-PV nodes) type spin default 2 min 0 max 2
option name Mate Threat Extension (PV nodes) type spin default 0 min 0 max 2
option name Mate Threat Extension (non-PV nodes) type spin default 0 min 0 max 2
option name Pawn Push to 7th Extension (PV nodes) type spin default 1 min 0 max 2
option name Pawn Push to 7th Extension (non-PV nodes) type spin default 1 min 0 max 2
option name Passed Pawn Extension (PV nodes) type spin default 1 min 0 max 2
option name Passed Pawn Extension (non-PV nodes) type spin default 0 min 0 max 2
option name Pawn Endgame Extension (PV nodes) type spin default 2 min 0 max 2
option name Pawn Endgame Extension (non-PV nodes) type spin default 2 min 0 max 2
option name Full Depth Moves (PV nodes) type spin default 10 min 1 max 100
option name Full Depth Moves (non-PV nodes) type spin default 3 min 1 max 100
option name Threat Depth type spin default 5 min 0 max 100
option name Randomness type spin default 0 min 0 max 10
option name Minimum Split Depth type spin default 4 min 4 max 7
option name Maximum Number of Threads per Split Point type spin default 5 min 4 max 8
option name Threads type spin default 8 min 1 max 8
option name Hash type spin default 512 min 4 max 2048
option name Clear Hash type button
option name New Game type button
option name Ponder type check default true
option name OwnBook type check default true
option name MultiPV type spin default 1 min 1 max 500
option name UCI_ShowCurrLine type check default false
option name UCI_Chess960 type check default false
option name UCI_AnalyseMode type check default false

Stockfish 1.6.2-mSD7 64bit. By Tord Romstad, Marco Costalba, Joona Kiiski.
Good! CPU has hardware POPCNT. We will use it.
uci
id name Stockfish 1.6.2-mSD7 64bit
id author Tord Romstad, Marco Costalba, Joona Kiiski

option name Use Search Log type check default false
option name Search Log Filename type string default SearchLog.txt
option name Book File type string default book.bin
option name Mobility (Middle Game) type spin default 100 min 0 max 200
option name Mobility (Endgame) type spin default 100 min 0 max 200
option name Pawn Structure (Middle Game) type spin default 100 min 0 max 200
option name Pawn Structure (Endgame) type spin default 100 min 0 max 200
option name Passed Pawns (Middle Game) type spin default 100 min 0 max 200
option name Passed Pawns (Endgame) type spin default 100 min 0 max 200
option name Space type spin default 100 min 0 max 200
option name Aggressiveness type spin default 100 min 0 max 200
option name Cowardice type spin default 100 min 0 max 200
option name King Safety Curve type combo default Quadratic var Quadratic var Linear
option name King Safety Coefficient type spin default 40 min 1 max 100
option name King Safety X Intercept type spin default 0 min 0 max 20
option name King Safety Max Slope type spin default 30 min 10 max 100
option name King Safety Max Value type spin default 500 min 100 max 1000
option name Queen Contact Check Bonus type spin default 3 min 0 max 8
option name Queen Check Bonus type spin default 2 min 0 max 4
option name Rook Check Bonus type spin default 1 min 0 max 4
option name Bishop Check Bonus type spin default 1 min 0 max 4
option name Knight Check Bonus type spin default 1 min 0 max 4
option name Discovered Check Bonus type spin default 3 min 0 max 8
option name Mate Threat Bonus type spin default 3 min 0 max 8
option name Check Extension (PV nodes) type spin default 2 min 0 max 2
option name Check Extension (non-PV nodes) type spin default 1 min 0 max 2
option name Single Reply Extension (PV nodes) type spin default 2 min 0 max 2
option name Single Reply Extension (non-PV nodes) type spin default 2 min 0 max 2
option name Mate Threat Extension (PV nodes) type spin default 0 min 0 max 2
option name Mate Threat Extension (non-PV nodes) type spin default 0 min 0 max 2
option name Pawn Push to 7th Extension (PV nodes) type spin default 1 min 0 max 2
option name Pawn Push to 7th Extension (non-PV nodes) type spin default 1 min 0 max 2
option name Passed Pawn Extension (PV nodes) type spin default 1 min 0 max 2
option name Passed Pawn Extension (non-PV nodes) type spin default 0 min 0 max 2
option name Pawn Endgame Extension (PV nodes) type spin default 2 min 0 max 2
option name Pawn Endgame Extension (non-PV nodes) type spin default 2 min 0 max 2
option name Full Depth Moves (PV nodes) type spin default 10 min 1 max 100
option name Full Depth Moves (non-PV nodes) type spin default 3 min 1 max 100
option name Threat Depth type spin default 5 min 0 max 100
option name Randomness type spin default 0 min 0 max 10
option name Minimum Split Depth type spin default 7 min 4 max 7
option name Maximum Number of Threads per Split Point type spin default 5 min 4 max 8
option name Threads type spin default 8 min 1 max 8
option name Hash type spin default 512 min 4 max 2048
option name Clear Hash type button
option name New Game type button
option name Ponder type check default true
option name OwnBook type check default true
option name MultiPV type spin default 1 min 1 max 500
option name UCI_ShowCurrLine type check default false
option name UCI_Chess960 type check default false
option name UCI_AnalyseMode type check default false

mcostalba · Post by **mcostalba** » Sun Jan 03, 2010 2:58 pm

zullil wrote:Stockfish-1.6.2-mSD7 (8 threads) defeated Stockfish-1.6.2-mSD4 (8 threads) by a score of 291 to 120, with 589 draws.

Thanks Louis !

this for me is _the_ news of this sunday

Waiting for mSD7 vs mSD6.....

Stockfish-1.6.2 Benchmarks for 1 to 8 Threads

Re: Stockfish-1.6.2 Benchmarks for 1 to 8 Threads

Re: Stockfish-1.6.2 Benchmarks for 1 to 8 Threads

Re: Stockfish-1.6.2 Benchmarks for 1 to 8 Threads

Re: Stockfish-1.6.2 Benchmarks for 1 to 8 Threads

Re: Stockfish-1.6.2 Benchmarks for 1 to 8 Threads

Re: Stockfish-1.6.2 Benchmarks for 1 to 8 Threads

Re: Stockfish-1.6.2 Benchmarks for 1 to 8 Threads

Re: Stockfish-1.6.2 Benchmarks for 1 to 8 Threads

Re: Stockfish-1.6.2 Benchmarks for 1 to 8 Threads

Re: Stockfish-1.6.2 Benchmarks for 1 to 8 Threads