- Pigeon is a small new C++ codebase, released under the MIT license, and available on GitHub here
- It was written with SIMD execution (and future migration to GPU) in mind
- There is a lot of work still to be done, but it was posted in the spirit of "release early/release often", and in the hope of learning from the members of this fine forum
welcome here at Talkchess CCC! A question to Pigoen's board representation. I don't get the PositionT declaration (position.h). It has the classical 12 Bitmask (Bitboard) elements for the 12 piece kinds (beside other game state stuff) where u64 is appropriate but template type SIMD used. Why? Why not an array? How do those scalars map to elements of a 256 or even 512 bit AVX2 or AVX-512 vector type?
So a "normal" single position would be PositionT<u64>. Anywhere the code just says Position, that's what it means (there's a typedef in defs.h).
For calculating (say) legal moves on 4 independent positions at once using AVX2, you'd have a PositionT<simd4_avx2>, in which each field contained the values for all 4 positions. So mWhitePawns would be a vector of {white pawns for first position, white pawns for second position, etc...}
You'd then call CalcMoveMap() on that multi-position, all the calculations would happen in parallel across the 4 SIMD lanes, and the result would go into a MoveMapT<simd4_avx2>, which is structured the same way.
For that to work, the code in CalcMoveMap() has to be branch-free, because data-dependent branching would have to happen on each lane. That's why it's kind of goofy in some places: branching has been replaced with predication/masking.
It's not all there yet. :/ These things can run in SIMD so far:
- mapping legal moves
- applying a move and updating the position/state
- calculating position hashes for the TT
- flipping black/white (as everything is coded from the white POV)
- evaluating a position (you get back a vector of n scores for the n positions)
Unpacking a move map (which is bit masks) into a variable-length move list is still a scalar operation, and the main flow in Engine::NegaMax() is not SIMD-aware yet. I'm working on hooking it up now, and then we'll see how well things scale. Fingers crossed.
But how can this kind of SIMD parallelism with multiple positions utilized in a serial alpha-beta approach?
The SIMD approach I was playing with around used two xmm (or one ymm) registers to store a quad bitboad as board representation, to apply almost branchless monochrome (color flipping the board after make move like you) movegen à la DirGolem.
StuartRiffle wrote:Unpacking a move map (which is bit masks) into a variable-length move list is still a scalar operation, and the main flow in Engine::NegaMax() is not SIMD-aware yet. I'm working on hooking it up now, and then we'll see how well things scale. Fingers crossed.
(Does that make sense?)
I think there is no necessity to unpack moves into variable-length move list.
My attacks map (and simple transformation it to legal moves set) is a matrix of 8 16-byte vectors. Each byte represent a single piece. I need 8 bytes to represent a chess-position.
It is easy find or filter out any kind of moves without any move-lists and without any sorting.
That sounds cool. I would like to get rid of that step too.
Unpacking the move map is currently the part where I classify the moves (winning capture, losing capture, etc etc) for ordering them... do you have that kind of info baked into your bit representation somehow?
That sounds cool. I would like to get rid of that step too.
Unpacking the move map is currently the part where I classify the moves (winning capture, losing capture, etc etc) for ordering them... do you have that kind of info baked into your bit representation somehow?
I have 16 (per each chess piece of the side to move) move bitboards, 1 bit = 1 move. No need to order anything, just select moves in MVV/LVA order and clear the move bit if the move have been searched.