I'm in the stages of writing a chess engine myself. I've also done some perft optimizations a month or 2 ago:
In the very first version, my engine ran at 10 Mnps (i7 6700K), so your 14 Mnps is not bad to start out with; but you have some optimization to do. In the thread above, you can follow me and Terje optimizing our respective engines to the point that they are giving similar results on my machines; the engines are so close (+/- 2 seconds depending on the position), that any difference is basically because of compiler and linker differences. (Terje's engine Weiss is written in C, my engine Rustic is being written in Rust.)
We optimized without any 'tricks', and we don't strip out anything we actually need for playing chess.
- no hasing
- no bulk counting
- no specialties in the move generator such as omitting pinned pieces or a special in-check move generator. (That can be added later.
No stripping (this is stuff which isn't needed for perft, but is needed for chess playing):
- Keep material score for both sides.
- Incrementally calculate zobrist hash (later used for hash/transposition tables)
- keep 50 move rule counter
- keep full move number
Just plain old perft, that counts down to the leaves like you are doing now.
On my machine, we ended up at +/- 78 seconds for perft 7 in the starting position and +/- 200 seconds in the "kiwipete" position, which is about +/- 40 million leaves/sec.
Your single-threaded CPU-speed is similar to mine:
6700K: https://www.cpubenchmark.net/compare/In ... 2565vs2874
6850K: https://www.cpubenchmark.net/compare/In ... 2800vs2785
Your CPU is a smidge slower (+/- 3-4%) than mine with single threads, because of the higher core clock of the 6700K. Still, 38 million leaves/sec should be achievable for your engine on your CPU. As long as you haven't reached at least 30 million leaves/sec (that is my engine's speed when compiled in debug mode with opt-level 2), you can still go faster without any tricks. FabianVDW (FabChess, written in Rust), seems to confirm the numbers with his 35 million leaves/sec. I am assuming his CPU is comparable to a 6700K.
Then you could add bulk-counting if you wish; but it requires a fully legal move generator. Good for perft, bad for chess playing. (It requires to move the is_king_attacked() function from make_move to the movegenerator. It can be done with if-statements to determine where to execute this function when running perft or when playing chess.) Personally, I wouldn't be concerned with bulk counting, because it adds nothing to your engine. It's just that: a perft counting trick.
You can also add using a hash table. First: make sure your Zobrist calculations are always correct. Make a function to create the Zobrist-key from scratch and compare it with your incrementally updated key after each move. If you get no errors, your Zobrist-key would be correct. You can use this to create a hash-table. This can really boost perft speed with something like 85% or so I've got a thread about that as well, floating around in here; where H.G.Muller also participated and kindly explained some stuff with regard to hashing.
While this could also be seen as a "trick", it does add something to your engine:
- A good working zobrist module.
- A working hash table, which can be re-used/extended/redone, so it will become the transposition table which is used for actually speeding up the engine during playing.
Welcome, good luck, and while you've had a good start, you can definitely go three times as fast, even without any tricks