Perft speed and depth questions

Sven · Post by **Sven** » Fri Jun 12, 2020 9:17 am

Bulk counting is the easiest tweak you can apply when already having a fully legal move generator. Your code would look like this:

Code: Select all

uint64 nodes = 0;

uint64 Board::Perft(int depth)
{
	MoveList moves;
	uint64 leaves = 0;

	assert(depth > 0);
	++nodes;
	int nMoves = genMoves(moves);
	if (depth == 1) return nMoves;

	for (int i = 0; i < nMoves; i++)
	{
		makeMove(moves.move[i]);
		leaves += Perft(depth - 1);
		undoMove(moves.move[i]);
	}
	return leaves;
}

I renamed "nodes" into "leaves" and added another global "nodes" counter to allow counting both the number of leaves (which is the number to check against the reference) and the number of inner nodes visited (which is more meaningful to me when talking about speed since it reflects how often genMoves() was actually called).

abulmo2 · Post by **abulmo2** » Fri Jun 12, 2020 11:49 am

fabianVDW wrote: ↑Thu Jun 11, 2020 11:59 pm
abulmo2 wrote: ↑Thu Jun 11, 2020 10:33 pm
brianr wrote: ↑Thu Jun 11, 2020 8:19 pm While correctly working code is most important, 14M nps seems a little on the slow side.
Tinker is far from the fastest at perft, but it is about 95M nps on a 3.6GHz CPU.
Move generation in Tinker is done with magic bitboards, BTW.
95M nps without bulk counting is very fast. 14 Mnps looks like a good start. On my computer, stockfish-8 does 180 Mnps, but with bulk counting. That's about 15M nps without bulk counting. So 14 M nps is not very far from the speed of the move generator of one of the strongest engine.
Not sure where you got those numbers from...
FabChess has ~150mnps on startpos with bulk counting, this turns out at ~35mnps without bulk counting. I am expecting SF speed to be similar higher, or if it is lower then it will be due to unnecessary work done at leafs for initiializing info like pinned pieces etc. (which is not needed in leaves in perft). Then it would be more fair to use bulk counting to evaluate SF's speed though

You are right I mistyped the speed without bulk counting, it was ~25Mnps. The number is from a modified copy of stockfish-8. According to your number it looks like the relative speed of perft with/without bulk counting depends on one's implementation.

mvanthoor · Post by **mvanthoor** » Fri Jun 12, 2020 2:17 pm

I'm in the stages of writing a chess engine myself. I've also done some perft optimizations a month or 2 ago:

http://talkchess.com/forum3/viewtopic.php?t=73577

In the very first version, my engine ran at 10 Mnps (i7 6700K), so your 14 Mnps is not bad to start out with; but you have some optimization to do. In the thread above, you can follow me and Terje optimizing our respective engines to the point that they are giving similar results on my machines; the engines are so close (+/- 2 seconds depending on the position), that any difference is basically because of compiler and linker differences. (Terje's engine Weiss is written in C, my engine Rustic is being written in Rust.)

We optimized without any 'tricks', and we don't strip out anything we actually need for playing chess.

No tricks:
- no hasing
- no bulk counting
- no specialties in the move generator such as omitting pinned pieces or a special in-check move generator. (That can be added later.

No stripping (this is stuff which isn't needed for perft, but is needed for chess playing):
- Keep material score for both sides.
- Incrementally calculate zobrist hash (later used for hash/transposition tables)
- keep 50 move rule counter
- keep full move number

Just plain old perft, that counts down to the leaves like you are doing now.

On my machine, we ended up at +/- 78 seconds for perft 7 in the starting position and +/- 200 seconds in the "kiwipete" position, which is about +/- 40 million leaves/sec.

Your single-threaded CPU-speed is similar to mine:
6700K: https://www.cpubenchmark.net/compare/In ... 2565vs2874
6850K: https://www.cpubenchmark.net/compare/In ... 2800vs2785

Your CPU is a smidge slower (+/- 3-4%) than mine with single threads, because of the higher core clock of the 6700K. Still, 38 million leaves/sec should be achievable for your engine on your CPU. As long as you haven't reached at least 30 million leaves/sec (that is my engine's speed when compiled in debug mode with opt-level 2), you can still go faster without any tricks. FabianVDW (FabChess, written in Rust), seems to confirm the numbers with his 35 million leaves/sec. I am assuming his CPU is comparable to a 6700K.

Then you could add bulk-counting if you wish; but it requires a fully legal move generator. Good for perft, bad for chess playing. (It requires to move the is_king_attacked() function from make_move to the movegenerator. It can be done with if-statements to determine where to execute this function when running perft or when playing chess.) Personally, I wouldn't be concerned with bulk counting, because it adds nothing to your engine. It's just that: a perft counting trick.

You can also add using a hash table. First: make sure your Zobrist calculations are always correct. Make a function to create the Zobrist-key from scratch and compare it with your incrementally updated key after each move. If you get no errors, your Zobrist-key would be correct. You can use this to create a hash-table. This can really boost perft speed with something like 85% or so I've got a thread about that as well, floating around in here; where H.G.Muller also participated and kindly explained some stuff with regard to hashing.

While this could also be seen as a "trick", it does add something to your engine:
- A good working zobrist module.
- A working hash table, which can be re-used/extended/redone, so it will become the transposition table which is used for actually speeding up the engine during playing.

Welcome, good luck, and while you've had a good start, you can definitely go three times as fast, even without any tricks

fabianVDW · Post by **fabianVDW** » Fri Jun 12, 2020 4:59 pm

abulmo2 wrote: ↑Fri Jun 12, 2020 11:49 am According to your number it looks like the relative speed of perft with/without bulk counting depends on one's implementation.

I definitely second this! If you do extra work on initializing extra info on your GameState struct when constructing it for instance, it won't hurt search performance at all (if it's always needed), like, let's say, pinned pieces info, since we almost always want the evaluation of a GameState and need pinned pieces info there too, but it will and should hurt perft performance significantly. In this case, I think it is the most fair to use bulk counting to compare performance.

mvanthoor wrote: ↑Fri Jun 12, 2020 2:17 pm FabianVDW (FabChess, written in Rust), seems to confirm the numbers with his 35 million leaves/sec. I am assuming his CPU is comparable to a 6700K.

I have an i5-6400. To be fair though, FabChess uses a completly legal movegeneration, so no illegal moves will ever be generated. I am also pretty sure this is not the best number I can get. And the number I reported is nodes / time_spent, so not actually leaves/sec. At the startpos with a branching factor of ~25(I think) this should make a difference of 1-1.5mnps.

mvanthoor wrote: ↑Fri Jun 12, 2020 2:17 pm Then you could add bulk-counting if you wish; but it requires a fully legal move generator. Good for perft, bad for chess playing. (It requires to move the is_king_attacked() function from make_move to the movegenerator. It can be done with if-statements to determine where to execute this function when running perft or when playing chess.) Personally, I wouldn't be concerned with bulk counting, because it adds nothing to your engine. It's just that: a perft counting trick.

A full legal movegenerator of some sort is needed anyway, or atleast you need some way to check legality of moves, or you go the route of even searching illegal moves in the search tree. I am also not convinced a full legal movegen is much slower if slower at all:

In full legal movegen, you need square_is_attacked_function exactly the amount of sensible king moves + the amount of possible castle moves times. I have no actual statistical data on this, but I am guessing this number to be pretty moderate in the midgame. If validating moves during search, we also need to validate this atleast one time. In cut nodes, about ~90% of the nodes get cut after 1 move. In all-nodes, you would need to validate much more often. I might investigate on this with actual data in the future, but my guess is that the performance diff is not too bad. And a legal move generation is much more comfortable to handle.

Atleast when I rewrote FabChess to use pseudolegal movegen, performance was much worse all across the board (in perft, without bulk counting).

bob · Post by **bob** » Fri Jun 12, 2020 6:11 pm

Some background here. I believe I was the first to use this. Back in the 80's. We rewrote the move generator in Cray Blitz in assembly language. It was a pain to debug. I decided on the "perft" approach solely to test/debug the move generator. We'd run two versions, one FORTRAN, one assembly, and we tested and debugged until they matched.

I carried this over into Crafty as early versions went through several different approaches on move generation. Starting with the Slate/Atkin approach, then rotated bit boards (which took some time to debug), and the magic. It was really intended solely for that purpose. Then several started to use it as a benchmark for speed. I never followed that path since move generation is a very small part of the overall CPU time burned.

Speed here is not so important. I doubt anyone's move generator takes more than 10% of total search time, which means a 20% improvement in perft numbers is only a 2% overall speed gain. I would not worry about anything but matching the node counts exactly...

ajaxuk · Post by **ajaxuk** » Fri Jun 12, 2020 7:23 pm

Very interesting, thank you for the background information bob. Although, I think it is similar to telling your child that they should / shouldn't do something. Invariably they ignore good advice and have to learn the hard way rather than by their parents experience

- I am sure I will look back and see how silly I was to still be focusing on squeezing out extra spurious performance.

fabianVDW · Post by **fabianVDW** » Fri Jun 12, 2020 7:27 pm

ajaxuk wrote: ↑Fri Jun 12, 2020 7:23 pm Very interesting, thank you for the background information bob. Although, I think it is similar to telling your child that they should / shouldn't do something. Invariably they ignore good advice and have to learn the hard way rather than by their parents experience - I am sure I will look back and see how silly I was to still be focusing on squeezing out extra spurious performance.

I don't think it's silly at all. First of all it's pretty fun, second it will make you a better programmer and third, in FabChess I got improvements in the order of 10-15% by improving the ecosystem around makemove/ generating moves and general chess related stuff, so it is definitely worthwhile, IMO. But it's hard to justify doing it before anything else in your system, as you will likely have to iterate on what you have anyway, since your needs will most likely change over the course of developing your engine.

ajaxuk · Post by **ajaxuk** » Fri Jun 12, 2020 7:28 pm

Sven wrote: ↑Fri Jun 12, 2020 9:17 am Bulk counting is the easiest tweak you can apply when already having a fully legal move generator. Your code would look like this:
Code: Select all
uint64 nodes = 0;

uint64 Board::Perft(int depth)
{
	MoveList moves;
	uint64 leaves = 0;

	assert(depth > 0);
	++nodes;
	int nMoves = genMoves(moves);
	if (depth == 1) return nMoves;

	for (int i = 0; i < nMoves; i++)
	{
		makeMove(moves.move[i]);
		leaves += Perft(depth - 1);
		undoMove(moves.move[i]);
	}
	return leaves;
}
I renamed "nodes" into "leaves" and added another global "nodes" counter to allow counting both the number of leaves (which is the number to check against the reference) and the number of inner nodes visited (which is more meaningful to me when talking about speed since it reflects how often genMoves() was actually called).

Ok, it is getting late and it is Friday, but I have pasted this in to my code and get pretty much the same results as my original Perft routine... I am obviously doing something silly... hopefully clarity will come in the morning.

odomobo · Post by **odomobo** » Fri Jun 12, 2020 9:30 pm

ajaxuk wrote: ↑Fri Jun 12, 2020 7:28 pm Ok, it is getting late and it is Friday, but I have pasted this in to my code and get pretty much the same results as my original Perft routine... I am obviously doing something silly... hopefully clarity will come in the morning.

Note that if you use "nodes" to calculate nodes per second, it will be basically the same. However, the bulk counting approach gives you 1 more ply nearly for free. You should find that perft at the same depth takes significantly less time.

Bulk counting is pointless anyhow, IMO. It's an artificial way to boost an artificial benchmark. Or another way to look at it, bulk counting is primarily measuring the speed of genMoves, and the direct approach is more heavily weighted toward makeMove/undoMove.

ajaxuk · Post by **ajaxuk** » Sat Jun 13, 2020 8:50 am

odomobo wrote: ↑Fri Jun 12, 2020 9:30 pm
ajaxuk wrote: ↑Fri Jun 12, 2020 7:28 pm Ok, it is getting late and it is Friday, but I have pasted this in to my code and get pretty much the same results as my original Perft routine... I am obviously doing something silly... hopefully clarity will come in the morning.
Note that if you use "nodes" to calculate nodes per second, it will be basically the same. However, the bulk counting approach gives you 1 more ply nearly for free. You should find that perft at the same depth takes significantly less time.

Bulk counting is pointless anyhow, IMO. It's an artificial way to boost an artificial benchmark. Or another way to look at it, bulk counting is primarily measuring the speed of genMoves, and the direct approach is more heavily weighted toward makeMove/undoMove.

Well after a restorative cup of tea this morning a found my silly mistake - I had renamed the runction to PerftBulk and called it from the main program but forgot it was recursive... changing the call within the function to PerftBul of course fixed the issue. It is indeed significantly quicker!

Perft speed and depth questions

Re: Perft speed and depth questions

Re: Perft speed and depth questions

Re: Perft speed and depth questions

Re: Perft speed and depth questions

Re: Perft speed and depth questions

Re: Perft speed and depth questions

Re: Perft speed and depth questions

Re: Perft speed and depth questions

Re: Perft speed and depth questions

Re: Perft speed and depth questions