New open source engine - Pigeon

StuartRiffle · Post by **StuartRiffle** » Thu Apr 07, 2016 9:40 pm

Hi all.

I already posted an announcement with some details about the engine in the general topics forum, so I will keep this one short and stick to the facts.

- Pigeon is a small new C++ codebase, released under the MIT license, and available on GitHub here
- It was written with SIMD execution (and future migration to GPU) in mind
- There is a lot of work still to be done, but it was posted in the spirit of "release early/release often", and in the hope of learning from the members of this fine forum

I'd be happy to hear any feedback!

Cheers,
-Stuart

Code: Select all

    /O_
    ||
   / \\
 =/__//
    ^^

(that's a pigeon)

Gerd Isenberg · Post by **Gerd Isenberg** » Thu Apr 07, 2016 10:26 pm

Hi Stuart,

welcome here at Talkchess CCC! A question to Pigoen's board representation. I don't get the PositionT declaration (position.h). It has the classical 12 Bitmask (Bitboard) elements for the 12 piece kinds (beside other game state stuff) where u64 is appropriate but template type SIMD used. Why? Why not an array? How do those scalars map to elements of a 256 or even 512 bit AVX2 or AVX-512 vector type?

Thanks,
Gerd

StuartRiffle · Post by **StuartRiffle** » Thu Apr 07, 2016 11:20 pm

Hi Gerd, thank you!

So a "normal" single position would be PositionT<u64>. Anywhere the code just says Position, that's what it means (there's a typedef in defs.h).

For calculating (say) legal moves on 4 independent positions at once using AVX2, you'd have a PositionT<simd4_avx2>, in which each field contained the values for all 4 positions. So mWhitePawns would be a vector of {white pawns for first position, white pawns for second position, etc...}

You'd then call CalcMoveMap() on that multi-position, all the calculations would happen in parallel across the 4 SIMD lanes, and the result would go into a MoveMapT<simd4_avx2>, which is structured the same way.

For that to work, the code in CalcMoveMap() has to be branch-free, because data-dependent branching would have to happen on each lane. That's why it's kind of goofy in some places: branching has been replaced with predication/masking.

It's not all there yet. :/ These things can run in SIMD so far:
- mapping legal moves
- applying a move and updating the position/state
- calculating position hashes for the TT
- flipping black/white (as everything is coded from the white POV)
- evaluating a position (you get back a vector of n scores for the n positions)

Unpacking a move map (which is bit masks) into a variable-length move list is still a scalar operation, and the main flow in Engine::NegaMax() is not SIMD-aware yet. I'm working on hooking it up now, and then we'll see how well things scale. Fingers crossed.

(Does that make sense?)

smatovic · Post by **smatovic** » Fri Apr 08, 2016 4:17 pm

Hi Stuart,

thanks for sharing and good luck with your engine....

...sometime someone has to get this gpu thingy finally running

--
Srdja

Gerd Isenberg · Post by **Gerd Isenberg** » Fri Apr 08, 2016 5:04 pm

But how can this kind of SIMD parallelism with multiple positions utilized in a serial alpha-beta approach?

The SIMD approach I was playing with around used two xmm (or one ymm) registers to store a quad bitboad as board representation, to apply almost branchless monochrome (color flipping the board after make move like you) movegen à la DirGolem.

StuartRiffle · Post by **StuartRiffle** » Mon Apr 11, 2016 2:07 am

Hi Gerd,

I finally got it running, and the answer to "how can it work" is "not as well as I had hoped"

It does provide a slight net speed improvement though. I'll take what I can get! Details are in a new thread here.

Cheers,

Aleks Peshkov · Post by **Aleks Peshkov** » Mon Apr 11, 2016 11:05 am

StuartRiffle wrote:Unpacking a move map (which is bit masks) into a variable-length move list is still a scalar operation, and the main flow in Engine::NegaMax() is not SIMD-aware yet. I'm working on hooking it up now, and then we'll see how well things scale. Fingers crossed.

(Does that make sense?)

I think there is no necessity to unpack moves into variable-length move list.

My attacks map (and simple transformation it to legal moves set) is a matrix of 8 16-byte vectors. Each byte represent a single piece. I need 8 bytes to represent a chess-position.

It is easy find or filter out any kind of moves without any move-lists and without any sorting.

StuartRiffle · Post by **StuartRiffle** » Mon Apr 11, 2016 3:39 pm

Hi Aleks,

That sounds cool. I would like to get rid of that step too.

Unpacking the move map is currently the part where I classify the moves (winning capture, losing capture, etc etc) for ordering them... do you have that kind of info baked into your bit representation somehow?

Aleks Peshkov · Post by **Aleks Peshkov** » Mon Apr 11, 2016 4:14 pm

StuartRiffle wrote:Hi Aleks,

That sounds cool. I would like to get rid of that step too.

Unpacking the move map is currently the part where I classify the moves (winning capture, losing capture, etc etc) for ordering them... do you have that kind of info baked into your bit representation somehow?

I have 16 (per each chess piece of the side to move) move bitboards, 1 bit = 1 move. No need to order anything, just select moves in MVV/LVA order and clear the move bit if the move have been searched.

New open source engine - Pigeon

New open source engine - Pigeon

Re: New open source engine - Pigeon

Re: New open source engine - Pigeon

Re: New open source engine - Pigeon

Re: New open source engine - Pigeon

Re: New open source engine - Pigeon

Re: New open source engine - Pigeon

Re: New open source engine - Pigeon

Re: New open source engine - Pigeon