Perft speed and depth questions

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

ajaxuk
Posts: 13
Joined: Tue Jun 09, 2020 6:44 pm
Full name: Mark Buisseret

Re: Perft speed and depth questions

Post by ajaxuk »

mvanthoor wrote: Fri Jun 12, 2020 2:17 pm I'm in the stages of writing a chess engine myself. I've also done some perft optimizations a month or 2 ago:

http://talkchess.com/forum3/viewtopic.php?t=73577

We optimized without any 'tricks', and we don't strip out anything we actually need for playing chess.

Just plain old perft, that counts down to the leaves like you are doing now.

On my machine, we ended up at +/- 78 seconds for perft 7 in the starting position and +/- 200 seconds in the "kiwipete" position, which is about +/- 40 million leaves/sec.

Your single-threaded CPU-speed is similar to mine:
6700K: https://www.cpubenchmark.net/compare/In ... 2565vs2874
6850K: https://www.cpubenchmark.net/compare/In ... 2800vs2785

You can also add using a hash table. First: make sure your Zobrist calculations are always correct. Make a function to create the Zobrist-key from scratch and compare it with your incrementally updated key after each move. If you get no errors, your Zobrist-key would be correct. You can use this to create a hash-table. This can really boost perft speed with something like 85% or so I've got a thread about that as well, floating around in here; where H.G.Muller also participated and kindly explained some stuff with regard to hashing.

Welcome, good luck, and while you've had a good start, you can definitely go three times as fast, even without any tricks :)
This is very helpful, thankyou!

I like your approach of bencmarking the Perft based on what is useful for the final engine and not just trying to minimise the time. Was your final time of 78 seconds based on compiling for a 32 bit or 64 bit platform? I am compiling for 32 bit. If i compile for 64 bit then time reduces from ~190 seconds to ~110 seconds. As you mention we have similar speed processors so I will feel much better looking for 20 second speed improvement over 110 to be comparable to you.

It may be mentioned in the thread you posted, which I will start reading for more tips :D
Joost Buijs
Posts: 1563
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: Perft speed and depth questions

Post by Joost Buijs »

Usually a few positions are not enough to check with perft if your movegen/movemake combo is working properly. I've seen several times that everything looks fine with the usual positions like the initial and kiwipete, but the movegen is still broken on others. There are databases freely available on the internet which contain hundreds of positions with perft numbers, I don't have them, but I know they exist.

Perft speed doesn't mean much, perft is so small that it completely runs from the cache, in a full fledged engine you have a totally different situation.

My perft numbers on the initial position at depth 7 are: ~166 mnps with bulk-counting and ~60 mnps without bulk counting, this is with the latest MSVC compiler on a 3.7 Ghz. AMD-TR3 with precision boost disabled.

Here are some positions I use to check perft, maybe they are useful:

Code: Select all

{
		// initial position
		{"rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1", 6, 119060324},
		//{"rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR b KQkq - 0 1", 6, 119060324},
		{"rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1", 7, 3195901860},
		//{"rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR b KQkq - 0 1", 7, 3195901860},
		{"rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1", 8, 84998978956},
		//{"rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR b KQkq - 0 1", 8, 84998978956},

		// KiwiPete
		{"r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq - 0 1", 4, 4085603},
		{"r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq - 0 1", 5, 193690690},

		// SJE
		{"rnb1kbnr/ppp1pppp/8/3p4/1P6/P2P3q/2P1PPP1/RNBQKBNR b KQkq - 0 4", 7, 44950307154},
		{"rnb1kbnr/pp1pp1pp/1qp2p2/8/Q1P5/N7/PP1PPPPP/1RB1KBNR b Kkq - 2 4", 7, 14794751816},

		// old positions
		{"1Q5Q/8/3p1p2/2RpkpR1/2PpppP1/2QPQPB1/8/4K3 b - - 0 1", 7, 290063345},
		{"1Q5Q/8/3p1p2/2RpkpR1/2PpppP1/2QPQPB1/8/4K3 b - - 0 1", 8, 17665826996},
		{"r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq - 0 1", 5, 193690690},
		{"8/PPP4k/8/8/8/8/4Kppp/8 w - - 0 1", 6, 34336777},
		{"8/2p5/3p4/KP5r/1R3p1k/8/4P1P1/8 w - - 0 1", 7, 178633661},
		{"r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq - 0 1", 6, 8031647685},
		{"r3r1k1/1pq2pp1/2p2n2/1PNn4/2QN2b1/6P1/3RPP2/2R3KB b - - 0 1", 6, 15097513050},
		{"8/p1p1p3/8/1P1P2k1/1K2p1p1/8/3P1P1P/8 w - - 0 1", 7, 118590233},

		// new positions
		{"r3k2r/8/8/8/3pPp2/8/8/R3K1RR b KQkq e3 0 1", 6, 485647607},
		{"r3k2r/Pppp1ppp/1b3nbN/nP6/BBP1P3/q4N2/Pp1P2PP/R2Q1RK1 w kq - 0 1", 6, 706045033},
		{"8/7p/p5pb/4k3/P1pPn3/8/P5PP/1rB2RK1 b - d3 0 28", 6, 38633283},
		{"8/3K4/2p5/p2b2r1/5k2/8/8/1q6 b - - 1 67", 7, 493407574},
		{"rnbqkb1r/ppppp1pp/7n/4Pp2/8/8/PPPP1PPP/RNBQKBNR w KQkq f6 0 3", 6, 244063299},
		{"8/p7/8/1P6/K1k3p1/6P1/7P/8 w - - 0 1", 8, 8103790},
		{"n1n5/PPPk4/8/8/8/8/4Kppp/5N1N b - - 0 1", 6, 71179139},
		{"r3k2r/p6p/8/B7/1pp1p3/3b4/P6P/R3K2R w KQkq - 0 1", 6, 77054993},
		{"8/5p2/8/2k3P1/p3K3/8/1P6/8 b - - 0 1", 8, 64451405},
		{"r3k2r/pb3p2/5npp/n2p4/1p1PPB2/6P1/P2N1PBP/R3K2R w KQkq - 0 1", 5, 29179893},
		{"r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq - 0 1", 4, 4085603},
		{"rnbqkbnr/ppN5/3ppppp/8/B3P3/8/PPPP1PPP/R1BQK1NR b KQkq - 0 1", 7, 1029536265},
		{"k7/8/8/8/8/8/3b1P2/4K3 w - - 0 1", 10, 1555125531},
		{"r3k2r/1bp2pP1/5n2/1P1Q4/1pPq4/5N2/1B1P2p1/R3K2R b KQkq c3 0 1", 6, 8419356881},
		{"rrrrkr1r/rr1rr3/8/8/8/8/8/6KR w k - 0 1", 7, 1941335854},

		// avoid illegal ep (thanks to Steve Maughan):
		{"3k4/3p4/8/K1P4r/8/8/8/8 b - - 0 1", 6, 1134888},
		{"8/8/8/8/k1p4R/8/3P4/3K4 w - - 0 1", 6, 1134888},

		// avoid illegal ep #2
		{"8/8/4k3/8/2p5/8/B2P2K1/8 w - - 0 1", 6, 1015133},
		{"8/b2p2k1/8/2P5/8/4K3/8/8 b - - 0 1", 6, 1015133},

		// en passant capture checks opponent:
		{"8/8/1k6/2b5/2pP4/8/5K2/8 b - d3 0 1", 6, 1440467},
		{"8/5k2/8/2Pp4/2B5/1K6/8/8 w - d6 0 1", 6, 1440467},

		// short castling gives check:
		{"5k2/8/8/8/8/8/8/4K2R w K - 0 1", 6, 661072},
		{"4k2r/8/8/8/8/8/8/5K2 b k - 0 1", 6, 661072},

		// long castling gives check:
		{"3k4/8/8/8/8/8/8/R3K3 w Q - 0 1", 6, 803711},
		{"r3k3/8/8/8/8/8/8/3K4 b q - 0 1", 6, 803711},

		// castling (including losing cr due to rook capture):
		{"r3k2r/1b4bq/8/8/8/8/7B/R3K2R w KQkq - 0 1", 4, 1274206},
		{"r3k2r/7b/8/8/8/8/1B4BQ/R3K2R b KQkq - 0 1", 4, 1274206},

		// castling prevented:
		{"r3k2r/8/3Q4/8/8/5q2/8/R3K2R b KQkq - 0 1", 4, 1720476},
		{"r3k2r/8/5Q2/8/8/3q4/8/R3K2R w KQkq - 0 1", 4, 1720476},

		// promote out of check:
		{"2K2r2/4P3/8/8/8/8/8/3k4 w - - 0 1", 6, 3821001},
		{"3K4/8/8/8/8/8/4p3/2k2R2 b - - 0 1", 6, 3821001},

		// discovered check:
		{"8/8/1P2K3/8/2n5/1q6/8/5k2 b - - 0 1", 5, 1004658},
		{"5K2/8/1Q6/2N5/8/1p2k3/8/8 w - - 0 1", 5, 1004658},

		// promote to give check:
		{"4k3/1P6/8/8/8/8/K7/8 w - - 0 1", 6, 217342},
		{"8/k7/8/8/8/8/1p6/4K3 b - - 0 1", 6, 217342},

		// underpromote to check:
		{"8/P1k5/K7/8/8/8/8/8 w - - 0 1", 6, 92683},
		{"8/8/8/8/8/k7/p1K5/8 b - - 0 1", 6, 92683},

		// self stalemate:
		{"K1k5/8/P7/8/8/8/8/8 w - - 0 1", 6, 2217},
		{"8/8/8/8/8/p7/8/k1K5 b - - 0 1", 6, 2217},

		// stalemate/checkmate:
		{"8/k1P5/8/1K6/8/8/8/8 w - - 0 1", 7, 567584},
		{"8/8/8/8/1k6/8/K1p5/8 b - - 0 1", 7, 567584},

		// double check:
		{"8/8/2k5/5q2/5n2/8/5K2/8 b - - 0 1", 4, 23527},
		{"8/5k2/8/5N2/5Q2/2K5/8/8 w - - 0 1", 4, 23527},

		// capture-to-square
		{"3k4/8/8/2Pp3r/2K5/8/8/8 w - d6 0 1", 2, 112},
		{"1RR4K/3P4/8/8/8/8/3p4/4rr1k w - - 0 1", 6, 419523239},
		{"1RR4K/3P4/8/8/8/8/3p4/4rr1k b - - 1 1", 6, 395340738},
		{"1RR5/7K/3P4/8/8/3p4/7k/4rr2 w - - 0 1", 6, 310492012},
		{"1RR5/7K/3P4/8/8/3p4/7k/4rr2 b - - 1 1", 6, 302653359},
}
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Perft speed and depth questions

Post by Sven »

odomobo wrote: Fri Jun 12, 2020 9:30 pm
ajaxuk wrote: Fri Jun 12, 2020 7:28 pm Ok, it is getting late and it is Friday, but I have pasted this in to my code and get pretty much the same results as my original Perft routine... I am obviously doing something silly... hopefully clarity will come in the morning.
Note that if you use "nodes" to calculate nodes per second, it will be basically the same. However, the bulk counting approach gives you 1 more ply nearly for free. You should find that perft at the same depth takes significantly less time.

Bulk counting is pointless anyhow, IMO. It's an artificial way to boost an artificial benchmark. Or another way to look at it, bulk counting is primarily measuring the speed of genMoves, and the direct approach is more heavily weighted toward makeMove/undoMove.
Perft is kind of a verification method (relative to the correctness of given numbers). Bulk counting simply saves time to perform it. I'd call it pointless to call it pointless to do so 😉

And it has less impact on the benchmark if that benchmark is based on nodes (number of genMoves()) than on leaves.
Sven Schüle (engine author: Jumbo, KnockOut, Surprise)
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Perft speed and depth questions

Post by mvanthoor »

ajaxuk wrote: Sat Jun 13, 2020 8:58 am This is very helpful, thankyou!

I like your approach of bencmarking the Perft based on what is useful for the final engine and not just trying to minimise the time. Was your final time of 78 seconds based on compiling for a 32 bit or 64 bit platform? I am compiling for 32 bit. If i compile for 64 bit then time reduces from ~190 seconds to ~110 seconds. As you mention we have similar speed processors so I will feel much better looking for 20 second speed improvement over 110 to be comparable to you.

It may be mentioned in the thread you posted, which I will start reading for more tips :D
I'm compiling for 64-bit, and I also set RUSTFLAGS = "-Ctarget-cpu=skylake", to use BMI2 instructions. (If you're using a C or C++ compiler, it probably has a CFLAGS or CXXLAGS environmental variable with a similar option.)

If I compile as a 32-bit program, speed drops by 45-50% as you have noticed already.

The mentioned thread turned out to be a small optimization competition between me (Rustic) and Terje (Weiss), as our engines (or more accurate at this point, move generators) are very similar internally. This is not surprising: we both watched BlueFever's chess engine tutorials on YouTube. Weiss is actually derived from the tutorial engine VICE and then converted to use magic bitboards. Rustic was written in Rust from scratch, using magic bitboards from the start, but still with the VICE tutorials as a basis. In the end, Weiss and Rustic turned out to be equally fast, running at +/- 40 million leaves/sec on my CPU, with Weiss a bit faster in this position, Rustic a bit faster in that position.

We just wanted to see how much we could optimize make_move, unmake_move, add_move. The underlying magic bitboard implementation is basically the same. Note that Rustic isn't done yet. It has been on hold for a month for several reasons (life happened...), but development is now slowly resuming. Weiss is already a very strong engine of over 2800 ELO.

I would be surprised if the entire optimization round gained that engine more than 10 or 20 ELO :) Note that speeding up an engine by a factor of 2 (EVERYTHING, not only move generation) gains you only 50 ELO or so.

Don't fret about a few seconds in perft. As mentioned, perft serves a few purposes:
- Make sure the move generator is correct
- Make sure make_move, unmake_move and add_move are as fast as possible
- Make sure the move generator is correct
- Run enough positions (search for "perftsuite.epd") to make sure all the numbers match up and are correct
- So you can make sure move generation is correct.
- Are you sick yet? :D

If you would profile your program (for example, using Intel VTune, or perf / oprof in Linux), you'd see that make_move, unmake_move, square_attacked and add_move are the most used functions. This is logical, as you are only generating moves, and then making and unmaking them. One move generation round gives you 20-30 moves, so you call make_move and make_move 20-30 times.

Later, the search will cut out huge swats of the tree. Evaluation and search will become the dominant function. Move generation (and everything around it, which you're testing with perft) will be 10% or less of the total runtime. This means that if you optimize your move generator to run faster, that extra speed will be utilized only 10% or less of the total runtime.

Your speed of 15 million leaves/sec is perfectly fine for finishing your first version of your engine. (I could have had mine finished a month ago if I hadn't spend 6 weeks optimizing perft.. but hey, I wanted to.) If you know how to profile, then find the bottlenecks and remove them if it's not too much work. If, after recompiling as 64-bit, your engine is already running perft 7 from the starting position at around 110 seconds, you're already running at close to 30 million leaves/sec if I'm not miscalculating. That is more than fast enough to write a very strong engine.

In short:
if you want to play the optimization game, go ahead and see if you can reach around 38-40 million leaves/sec on your CPU.
If you want to finish your engine earlier and already achieve around 30 million leaves/sec, stop optimizing until you finished it.

PS:
(Move generation could be sped up a few percent if I would also use PEXT, but that would entail changes in the move generator code. That is an optimization for later.)
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
ajaxuk
Posts: 13
Joined: Tue Jun 09, 2020 6:44 pm
Full name: Mark Buisseret

Re: Perft speed and depth questions

Post by ajaxuk »

mvanthoor wrote: Mon Jun 15, 2020 10:48 pm
I'm compiling for 64-bit, and I also set RUSTFLAGS = "-Ctarget-cpu=skylake", to use BMI2 instructions. (If you're using a C or C++ compiler, it probably has a CFLAGS or CXXLAGS environmental variable with a similar option.)

If I compile as a 32-bit program, speed drops by 45-50% as you have noticed already.

Don't fret about a few seconds in perft. As mentioned, perft serves a few purposes:
- Make sure the move generator is correct
- Make sure make_move, unmake_move and add_move are as fast as possible
- Make sure the move generator is correct
- Run enough positions (search for "perftsuite.epd") to make sure all the numbers match up and are correct
- So you can make sure move generation is correct.
- Are you sick yet? :D

If you would profile your program (for example, using Intel VTune, or perf / oprof in Linux), you'd see that make_move, unmake_move, square_attacked and add_move are the most used functions. This is logical, as you are only generating moves, and then making and unmaking them. One move generation round gives you 20-30 moves, so you call make_move and make_move 20-30 times.

In short:
if you want to play the optimization game, go ahead and see if you can reach around 38-40 million leaves/sec on your CPU.
If you want to finish your engine earlier and already achieve around 30 million leaves/sec, stop optimizing until you finished it.

PS:
(Move generation could be sped up a few percent if I would also use PEXT, but that would entail changes in the move generator code. That is an optimization for later.)
64-bit is definately the way to go, I am using VS2017 and set the optimisation flags to max performance, not sure if there is anything else I can do to squeeze out a few seconds. Although I am now reasonably happy with the speed and so far all my perftsuite tests have been successful.

I went for a fully legal generate move function and when I profile my Make and Unmake move functions they ustilise approx 45%-50% of the CPU time (Not Bulk counting). I have implemented Zobrist hashing and this has caused a fair amount of overhead but this also seems to work fine. The Key at the end of perft (6) is the same as the key at the beginning and I also checked by recalulating the key from ground-up, so to speak, for each move and comparing with the incremental change key.

Implementing a hash table for perft is my current headache. it's quicker, but with the wrong answer, so it almost works :) I know I don't need a hash table for perft but it's good programming practice for later! ...well thats what I keep telling myself. I saw your thread on this topic and will probably revisit to improve hash table performance once(if) I can get it working.

I think I may have been sick to even start this project given my lack of programming experience :D
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Perft speed and depth questions

Post by mvanthoor »

ajaxuk wrote: Tue Jun 16, 2020 3:53 pm I went for a fully legal generate move function and when I profile my Make and Unmake move functions they ustilise approx 45%-50% of the CPU time (Not Bulk counting). I have implemented Zobrist hashing and this has caused a fair amount of overhead but this also seems to work fine. The Key at the end of perft (6) is the same as the key at the beginning and I also checked by recalulating the key from ground-up, so to speak, for each move and comparing with the incremental change key.
In perft you're not going to notice, but in a chess game, doing the legality check in the move generator wastes time. Because of alpha/beta and other pruning techniques you're going to use, it's very well possible that you are checking legality for moves that are never considered (and thus never executed) by your search function.

It's correct if make_move and unmake_move take most of your time during perft. Obviously, you can strip the checking of the zobrist hash after you confirm that the incremental updating works correctly. In my case, I cheat a bit. make_move calls remove_piece() and put_piece() in my board object. These functions to Zobrist updates. Because unmake_move restores the entire previous game state at once when undoing a move, including the zobrist hashing, I have two local remove_piece() and put_piece() functions that omit the zobrist update.

I tried the same thing with the material count, but it didn't work. By putting those four bytes in the GameState, the struct took 25 bytes instead of 21. (If I remember the counting correctly.) Rust rounds these values up to the next multiple of 8, or 24 and 32 bytes. Therefore putting the material count into the GameState, I ended up copying 7 useless bytes back and forth, which was slower than just keeping the incremental updates.
Implementing a hash table for perft is my current headache. it's quicker, but with the wrong answer, so it almost works :) I know I don't need a hash table for perft but it's good programming practice for later! ...well thats what I keep telling myself. I saw your thread on this topic and will probably revisit to improve hash table performance once(if) I can get it working.
You don't _need_ a hash table for perft (or the chess program, even), but perft is a great way of checking that it works correctly. To create the hash table, you need the Zobrist hashing to work correctly, which you have already verified. After you finish the first version of the engine, the hash table is one of the first things you should add. (At least, that's what I'm going to do in mine.)
I think I may have been sick to even start this project given my lack of programming experience :D
In that case, writing a chess engine is a high bar to set for yourself as one of the first projects. Good luck.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Perft speed and depth questions

Post by bob »

mvanthoor wrote: Mon Jun 15, 2020 10:48 pm
ajaxuk wrote: Sat Jun 13, 2020 8:58 am This is very helpful, thankyou!

I like your approach of bencmarking the Perft based on what is useful for the final engine and not just trying to minimise the time. Was your final time of 78 seconds based on compiling for a 32 bit or 64 bit platform? I am compiling for 32 bit. If i compile for 64 bit then time reduces from ~190 seconds to ~110 seconds. As you mention we have similar speed processors so I will feel much better looking for 20 second speed improvement over 110 to be comparable to you.

It may be mentioned in the thread you posted, which I will start reading for more tips :D
I'm compiling for 64-bit, and I also set RUSTFLAGS = "-Ctarget-cpu=skylake", to use BMI2 instructions. (If you're using a C or C++ compiler, it probably has a CFLAGS or CXXLAGS environmental variable with a similar option.)

If I compile as a 32-bit program, speed drops by 45-50% as you have noticed already.

The mentioned thread turned out to be a small optimization competition between me (Rustic) and Terje (Weiss), as our engines (or more accurate at this point, move generators) are very similar internally. This is not surprising: we both watched BlueFever's chess engine tutorials on YouTube. Weiss is actually derived from the tutorial engine VICE and then converted to use magic bitboards. Rustic was written in Rust from scratch, using magic bitboards from the start, but still with the VICE tutorials as a basis. In the end, Weiss and Rustic turned out to be equally fast, running at +/- 40 million leaves/sec on my CPU, with Weiss a bit faster in this position, Rustic a bit faster in that position.

We just wanted to see how much we could optimize make_move, unmake_move, add_move. The underlying magic bitboard implementation is basically the same. Note that Rustic isn't done yet. It has been on hold for a month for several reasons (life happened...), but development is now slowly resuming. Weiss is already a very strong engine of over 2800 ELO.

I would be surprised if the entire optimization round gained that engine more than 10 or 20 ELO :) Note that speeding up an engine by a factor of 2 (EVERYTHING, not only move generation) gains you only 50 ELO or so.

Don't fret about a few seconds in perft. As mentioned, perft serves a few purposes:
- Make sure the move generator is correct
- Make sure make_move, unmake_move and add_move are as fast as possible
- Make sure the move generator is correct
- Run enough positions (search for "perftsuite.epd") to make sure all the numbers match up and are correct
- So you can make sure move generation is correct.
- Are you sick yet? :D

If you would profile your program (for example, using Intel VTune, or perf / oprof in Linux), you'd see that make_move, unmake_move, square_attacked and add_move are the most used functions. This is logical, as you are only generating moves, and then making and unmaking them. One move generation round gives you 20-30 moves, so you call make_move and make_move 20-30 times.

Later, the search will cut out huge swats of the tree. Evaluation and search will become the dominant function. Move generation (and everything around it, which you're testing with perft) will be 10% or less of the total runtime. This means that if you optimize your move generator to run faster, that extra speed will be utilized only 10% or less of the total runtime.

Your speed of 15 million leaves/sec is perfectly fine for finishing your first version of your engine. (I could have had mine finished a month ago if I hadn't spend 6 weeks optimizing perft.. but hey, I wanted to.) If you know how to profile, then find the bottlenecks and remove them if it's not too much work. If, after recompiling as 64-bit, your engine is already running perft 7 from the starting position at around 110 seconds, you're already running at close to 30 million leaves/sec if I'm not miscalculating. That is more than fast enough to write a very strong engine.

In short:
if you want to play the optimization game, go ahead and see if you can reach around 38-40 million leaves/sec on your CPU.
If you want to finish your engine earlier and already achieve around 30 million leaves/sec, stop optimizing until you finished it.

PS:
(Move generation could be sped up a few percent if I would also use PEXT, but that would entail changes in the move generator code. That is an optimization for later.)
Crafty does both. I usually use "perf" rather than perft:

White(1): perft 6
total moves=119060324 time=3.09

White(1): perf 6
generated 80000000 moves, time=0.40 seconds
generated 200103552 moves per second
generated/made/unmade 80000000 moves, time=1.59 seconds
generated/made/unmade 50367936 moves per second
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Perft speed and depth questions

Post by mvanthoor »

bob wrote: Tue Jun 16, 2020 9:32 pm Crafty does both. I usually use "perf" rather than perft:

White(1): perft 6
total moves=119060324 time=3.09

White(1): perf 6
generated 80000000 moves, time=0.40 seconds
generated 200103552 moves per second
generated/made/unmade 80000000 moves, time=1.59 seconds
generated/made/unmade 50367936 moves per second
Hi :) I've noticed Crafty has a very fast move generator; it runs Perft 7 in about 57 seconds on my system. You've stated in other posts around the forum (and the internet, for that matter) that you don't use things such as hash, bulk counting or other specific tricks for perft, and that you do full make/unmake. What did you do to make the move generation so fast?

I don't see things such as piece pin detection and check evasions "tricks", because they actually enhance the move generator. (I tried this with knight pins only, but stuff got slower... for now, at least.)
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Perft speed and depth questions

Post by bob »

I am not sure how pin detection and such enhances move generation since it is rare when you consider all pieces on the board. However, I DO have a legal move generator that is used whenever the side on move is in check, as that is more efficient than generating all pseudo-legal moves and then using a make/reject/unmake mechanism to dump the bad ones...

As far as speed goes, it is what it is. Crafty has always been about performance, without removing important parts of course. The move generator is pretty "lean and mean."
alessandro
Posts: 49
Joined: Tue Aug 12, 2014 11:21 am
Location: Lund
Full name: Alessandro Iavicoli

Re: Perft speed and depth questions

Post by alessandro »

As already pointed out, the first thing is ensure a correct implementation. Optimization and anything else comes after it. What will you implements is related to your final goal.

As a matter of comparison, AdaChess perft with bulk count has 21M nps on my Dell Xps 13 with Ubuntu - Intel® Core™ i7-8565U CPU @ 1.80GHz × 8
I consider it relatively fast: I mean that, to obtain a faster implementation, something has to be removed and/or changed. And this is definitely not my goal.
AdaChess uses a 10x12 array board representation, it generates legal moves and uses a "check-investigation" algorithm just because it is cool to implement it. Almost every design optimization possible is implemented in the move generation. AdaChess is the origin of those information about checks, discovery checks, and so on that you see in the chess programming wiki at the perft results page.
The current available implementation does not have the bulk count, therefore is quite slow. But the new one will have it and will collect those stats also for the divide command, in a pretty print, formatted output - because the appearance matter ;-)

Good luck!
--
AdaChess - Smart Chess Engine - https://github.com/adachess/AdaChess

Image