You are talking about three very different points:flok wrote: ↑Mon Dec 10, 2018 3:00 pm The move generator of stockfish is ~14x faster than the one of embla (compared via perft 6).
So I'm thinking there's some room for improvement in Embla.
Currently Embla implements an 8x8 array of pointers. Those pointers point to the 2 x 16 piece-objects. Those objects calculate the moves for their object-type and state. This is then stored in an array.
There are several alternative implementations that I know of. Bitboards, mailbox, 0x88, etc.
Question now is: which is faster?
Strictly for (pseudo legal) move generation. I would like to keep search & eval as they are (for now).
1) board representation and its impact on overall engine speed,
2) move generation speed in the context of tree search,
3) perft speed.
Of course 1) influences 2) and 3). But, despite your remark "strictly for (...) move generation", 1) also influences search and eval. The amount of changes required in search and eval code when switching to a different board representation depends on several factors, including the level of abstraction you have chosen to apply there. If your search and eval is written as "high level code" then you would not need many changes there to switch to bitboards. Basically the implementation of loops over all pieces (or all of a certain group, like all black knights) will be different for mailbox vs bitboards. (Nearby, in Jumbo I have implemented a "BoardIterator" class that serves as an abstraction of the board representation and is one of several elements allowing me to support two Jumbo variants with two different board representations: bitboards and 0x88, while keeping the whole search and most of the eval code common to both variants.)
Many programmers, basically all but one , agree that bitboards as board representation are our current first choice regarding overall engine speed. For 2) and 3), as two very special "disciplines", I think it is possible that mailbox-based representations, like 0x88, or attack-map based ones, can result in a faster movegen + perft implementation if they are used in a clever way. But no chess engine is "only generating moves", and the speed of evaluation is much more important than movegen.
Other points that have already been mentioned in other replies are
- a proper implementation of legality checking and
- the use of "bulk counting" for perft.
Comparing perft speed of your engine to that of SF or other top engines should be postponed until these two have been achieved, at least, otherwise you are comparing the well-known apples to oranges.