New Engine announcement: Pygmalion

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

klx
Posts: 179
Joined: Tue Jun 15, 2021 8:11 pm
Full name: Emanuel Torres

Re: New Engine announcement: Pygmalion

Post by klx »

R. Tomasi wrote: Thu Sep 02, 2021 9:28 pm For lengths up to 32 it will use parallel sorting networks ...
What does your code look like that chooses the sorting network function to call?
[Moderation warning] This signature violated the rule against commercial exhortations.
smatovic
Posts: 2645
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: New Engine announcement: Pygmalion

Post by smatovic »

R. Tomasi wrote: Thu Sep 02, 2021 3:00 pm ...
Over the last years the "engine" was rewritten several times and ported to C++. I even played around with an OpenCL version in the hopes of harnessing the power of modern GPUs, but (as you may already guess) failed miserably at efficiently parallelizing the search in a way that fits these vector-based architectures of the GPUs. About two years ago I had a version that was playing reasonably strong (it consistently beat the FairyMax version bundled with Winboard at the time), but it was buggy, full of search instabilities and generally not anything I would show around without being ashamed of my work.
...
Curious, there are not that many OpenCL engines out there. What kind of design did you choose? Did you couple multiple gpu-threads to one worker? What kind of move generation?

--
Srdja
R. Tomasi
Posts: 307
Joined: Wed Sep 01, 2021 4:08 pm
Location: Germany
Full name: Roland Tomasi

Re: New Engine announcement: Pygmalion

Post by R. Tomasi »

klx wrote: Sun Sep 05, 2021 11:14 am
R. Tomasi wrote: Thu Sep 02, 2021 9:28 pm For lengths up to 32 it will use parallel sorting networks ...
What does your code look like that chooses the sorting network function to call?

Code: Select all

template<typename VALUE, typename SCORE>
class sortAlgorithm
{
private:
/* 
	[...]
*/
	typedef void SORTFUNCTION(VALUE* pValues, SCORE* pScores);
	constexpr static inline SORTFUNCTION* m_Sort[31]
	{
		&sort_N2,
		&sort_N3,
		&sort_N4,
		&sort_N5,
		&sort_N6,
		&sort_N7,
		&sort_N8,
		&sort_N9,
		&sort_N10,
		&sort_N11,
		&sort_N12,
		&sort_N13,
		&sort_N14,
		&sort_N15,
		&sort_N16,
		&sort_N17,
		&sort_N18,
		&sort_N19,
		&sort_N20,
		&sort_N21,
		&sort_N22,
		&sort_N23,
		&sort_N24,
		&sort_N25,
		&sort_N26,
		&sort_N27,
		&sort_N28,
		&sort_N29,
		&sort_N30,
		&sort_N31,
		&sort_N32
	};
	constexpr static size_t sort_tail() noexcept
	{
		return 32;
	}
public:
	constexpr static void sortValues(VALUE* pValues, SCORE* pScores, const size_t length) noexcept
	{
		if (length > 1)
		{
			if (length <= sort_tail())
			{
				m_Sort[length - 2](pValues, pScores);
			}
			else
			{
				quickSort(0, length - 1, pValues, pScores);
			}
		}
	}
}
R. Tomasi
Posts: 307
Joined: Wed Sep 01, 2021 4:08 pm
Location: Germany
Full name: Roland Tomasi

Re: New Engine announcement: Pygmalion

Post by R. Tomasi »

smatovic wrote: Sun Sep 05, 2021 11:55 am
R. Tomasi wrote: Thu Sep 02, 2021 3:00 pm ...
Over the last years the "engine" was rewritten several times and ported to C++. I even played around with an OpenCL version in the hopes of harnessing the power of modern GPUs, but (as you may already guess) failed miserably at efficiently parallelizing the search in a way that fits these vector-based architectures of the GPUs. About two years ago I had a version that was playing reasonably strong (it consistently beat the FairyMax version bundled with Winboard at the time), but it was buggy, full of search instabilities and generally not anything I would show around without being ashamed of my work.
...
Curious, there are not that many OpenCL engines out there. What kind of design did you choose? Did you couple multiple gpu-threads to one worker? What kind of move generation?

--
Srdja
My Idea was to break down the processing of a node into different phases (prologue, make move, returned from move, loop next move, epilogue) and have a kernel that can process such a phase. Then in essence process all moves of a node in a vector-parallel fashion using these kernels. Trouble is, that depending on what phase the kernel is in it will take different time to execute it, such that all moves always wait for the one that takes the longest to process. Also, the different threads branch differently, which is a huge performance hit on a GPU (you typically want neighbouring workers to branch the same way).
Concerning the move generation: well, I simply ported my C++ code to C99 and replaced anything recursive by equivalent iterations (since you cannot do recursion on the GPU).
It simply did not work out the way I hoped. Although it did play chess, it became relatively quickly clear that this design is totally inefficient.

Edit: I may have the code lying around somewhere. If you're interested I can try to dig it out, but I don't want to promise you that I still have it - I may have lost that code. I'm not sure.
smatovic
Posts: 2645
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: New Engine announcement: Pygmalion

Post by smatovic »

R. Tomasi wrote: Sun Sep 05, 2021 12:05 pm My Idea was to break down the processing of a node into different phases (prologue, make move, returned from move, loop next move, epilogue) and have a kernel that can process such a phase. Then in essence process all moves of a node in a vector-parallel fashion using these kernels. Trouble is, that depending on what phase the kernel is in it will take different time to execute it, such that all moves always wait for the one that takes the longest to process. Also, the different threads branch differently, which is a huge performance hit on a GPU (you typically want neighbouring workers to branch the same way).
Concerning the move generation: well, I simply ported my C++ code to C99 and replaced anything recursive by equivalent iterations (since you cannot do recursion on the GPU).
It simply did not work out the way I hoped. Although it did play chess, it became relatively quickly clear that this design is totally inefficient.

Edit: I may have the code lying around somewhere. If you're interested I can try to dig it out, but I don't want to promise you that I still have it - I may have lost that code. I'm not sure.
Not sure, maybe I had a similar idea, I called it SPPS, simple parallel processing scheme, couple one work-group, 128 threads, to one worker, process up to 128 moves of a node in parallel, store the moves as tree in a way that you can process the moves in parallel. SPPS was able to play chess, but IIRC I had to realize that I am wasting most of the threads during Quiescence-Search and such an approach can not work out...

--
Srdja
R. Tomasi
Posts: 307
Joined: Wed Sep 01, 2021 4:08 pm
Location: Germany
Full name: Roland Tomasi

Re: New Engine announcement: Pygmalion

Post by R. Tomasi »

smatovic wrote: Sun Sep 05, 2021 12:31 pm
R. Tomasi wrote: Sun Sep 05, 2021 12:05 pm My Idea was to break down the processing of a node into different phases (prologue, make move, returned from move, loop next move, epilogue) and have a kernel that can process such a phase. Then in essence process all moves of a node in a vector-parallel fashion using these kernels. Trouble is, that depending on what phase the kernel is in it will take different time to execute it, such that all moves always wait for the one that takes the longest to process. Also, the different threads branch differently, which is a huge performance hit on a GPU (you typically want neighbouring workers to branch the same way).
Concerning the move generation: well, I simply ported my C++ code to C99 and replaced anything recursive by equivalent iterations (since you cannot do recursion on the GPU).
It simply did not work out the way I hoped. Although it did play chess, it became relatively quickly clear that this design is totally inefficient.

Edit: I may have the code lying around somewhere. If you're interested I can try to dig it out, but I don't want to promise you that I still have it - I may have lost that code. I'm not sure.
Not sure, maybe I had a similar idea, I called it SPPS, simple parallel processing scheme, couple one work-group, 128 threads, to one worker, process up to 128 moves of a node in parallel, store the moves as tree in a way that you can process the moves in parallel. SPPS was able to play chess, but IIRC I had to realize that I am wasting most of the threads during Quiescence-Search and such an approach can not work out...

--
Srdja
Sounds like a similar concept, yes. Anyways, I dumped that idea relatively quickly. I think there still are ways to use the GPU in a chess engine, but imho it would have to be some sort of hybrid-aproach.
R. Tomasi
Posts: 307
Joined: Wed Sep 01, 2021 4:08 pm
Location: Germany
Full name: Roland Tomasi

Re: New Engine announcement: Pygmalion

Post by R. Tomasi »

pygmalion-mechanics

This module provides the framework for defining moves and how they are supposed to be made/unmade. It builds on top of pygmalion-state. There is no move generation or legality checks going on, that's all in the pygmalion-dynamics module which builds on top of pygmalion-mechanics. The mechanics module is first and foremost responsible for describing what exactly a certain movetype is supposed to do with the board. It also provides everything needed to keep the move history of a game and bloomfilters for repetition checks.

As already done with the pygmalion-state module it is configured using a description template:

Code: Select all

template<typename MOVE, size_t COUNT_BITS_BLOOMFILTER, size_t COUNT_VALUES_BLOOMFILTER>
class descriptor_mechanics;
The template arguments have the following semantics:
  • COUNT_BITS_BLOOMFILTER defines how many bits of a position hash should be used for the repetition bloomfilter.
  • COUNT_VALUES_BLOOMFILTER defines the range of the individual entries in the bloomfilter. Essentially this should be larger than the maximum number of moves a game can have, which obviously depends on the rules of the boardgame that is targetted.
  • MOVE is the class that defines a move of the game. It must inherit from the move template/class that is also defined in the pygmalion-mechanics module. Afaik there is no elegant way in C++17 to enforce template arguments inheriting a certain base class. I think C++20 provides ways to do that, but I have not really looked into that, so don't take me at face-value here.
That base template/class for all movetypes is defined as follows:

Code: Select all

template<typename BOARD, size_t COUNT_BITS, typename MOVEDATA, typename INSTANCE>
class move
{
public:
	using instanceType = INSTANCE;
	using boardType = BOARD;
	using descriptorState = typename boardType::descriptorState;
#include <pygmalion-state/include_state.h>
	constexpr static const size_t countBits{ COUNT_BITS };
	using movebitsType = uint_t<countBits, false>;
	using movedataType = MOVEDATA;
protected:
	constexpr move() noexcept = default;
	constexpr move(move&&) noexcept = default;
	constexpr move(const move&) noexcept = default;
	constexpr move& operator=(move&&) noexcept = default;
	constexpr move& operator=(const move&) noexcept = default;
	~move() noexcept = default;
public:
	std::string name() const noexcept
	{
		return static_cast<const instanceType*>(this)->name_Implementation();
	}
	constexpr movedataType doMove(boardType& position, const movebitsType& moveBits) const noexcept
	{
		return static_cast<const instanceType*>(this)->doMove_Implementation(position, moveBits);
	}
	constexpr void undoMove(boardType& position, const movedataType& data) const noexcept
	{
		static_cast<const instanceType*>(this)->undoMove_Implementation(position, data);
	}
	bool parse(const boardType& position, std::string& text, movebitsType& moveBits) const noexcept
	{
		return static_cast<const instanceType*>(this)->parse_Implementation(position, text, moveBits);
	}
	std::string toString(const boardType& position, const movebitsType& moveBits) const noexcept
	{
		return static_cast<const instanceType*>(this)->toString_Implementation(position, moveBits);
	}
	constexpr squaresType otherOccupancyDelta(const boardType& position, const movebitsType& moveBits) const noexcept
	{
		return static_cast<const instanceType*>(this)->otherOccupancyDelta_Implementation(position, moveBits);
	}
	constexpr squaresType ownOccupancyDelta(const boardType& position, const movebitsType& moveBits) const noexcept
	{
		return static_cast<const instanceType*>(this)->ownOccupancyDelta_Implementation(position, moveBits);
	}
	constexpr squareType fromSquare(const boardType& position, const movebitsType& moveBits) const noexcept
	{
		return static_cast<const instanceType*>(this)->fromSquare_Implementation(position, moveBits);
	}
	constexpr squareType toSquare(const boardType& position, const movebitsType& moveBits) const noexcept
	{
		return static_cast<const instanceType*>(this)->toSquare_Implementation(position, moveBits);
	}
};
Again it's using the CRTP for static polymorphism. The fromSquare and toSquare methods probably should be moved from this class into the inheriting class, since for some movetypes (like nullmove) these may not be well-defined. I plan to do that eventually, but for the moment I could not be bothered tidying this up. It's on the to-do list, though.

Building on top of that base-class the module defines some movetypes which one typically encounters in boardgames:
  • transportmove : a move that just transports a piece from one square to another
  • capturemove: a move that transports a piece from one square to another and removes the piece that occupies the target square
  • promotionmove: a move that removes the piece from it's initial square and drops a different piece on the target square
  • promocapturemove: a move that removes the piece from it's initial square and drops a different piece on the target square after removing the piece that occupies the target square
  • dropmove: a move that just spawns a piece on a square
  • killmove: a move that removes a piece from a square
  • nullmove: does nothing, except from changing which player is on the move
These classes will provide the framework with information like for example how many bits they require to store the information that is encoded in a move of that type. If the boardgame requires anything fancy on top of that (for example en-passant or castling in chess) one will have to derive a specialized class for that from the base move class.

Now, since typically you will need more than one of these movetypes in a boardgame (in chess for example you need transport, captures, promotions, promocaptures) another class that inherits from move is provided: disjunctive move. What it does is multiplexing different movetypes into a single movetype.

In chess, for example, my final move class looks like that:

Code: Select all

class combinedmoves :
		public pygmalion::mechanics::disjunctivemove<board, queenpromocapturemove, queenpromotionmove, rookpromocapturemove, rookpromotionmove, bishoppromocapturemove, bishoppromotionmove, knightpromocapturemove, knightpromotionmove, doublepushmove, enpassantmove, kingsidecastlemove, queensidecastlemove, capturemove, quietmove, nullmove>
This turns out to be a quite compact way of storing moves, for chess a move will require 16bit storage size and it could even be extended with movetypes for offering/accepting draws without increasing the storage size.
tcusr
Posts: 323
Joined: Tue Aug 31, 2021 10:32 pm
Full name: tcusr

Re: New Engine announcement: Pygmalion

Post by tcusr »

you can do a static_assert and use std::is_base_of to verify a class inherits from another one
R. Tomasi
Posts: 307
Joined: Wed Sep 01, 2021 4:08 pm
Location: Germany
Full name: Roland Tomasi

Re: New Engine announcement: Pygmalion

Post by R. Tomasi »

tcusr wrote: Sun Sep 05, 2021 4:13 pm you can do a static_assert and use std::is_base_of to verify a class inherits from another one
You're right! Thanks for pointing that out. I guess I'll incorporate static asserts of that kind. It's more or less a cosmetic thing, since when the inheritance does not fit, it will not compile anyways. But it's better to have one static assert failing instead of like 100 compiler errors without clear indication of what went wrong.
R. Tomasi
Posts: 307
Joined: Wed Sep 01, 2021 4:08 pm
Location: Germany
Full name: Roland Tomasi

Re: New Engine announcement: Pygmalion

Post by R. Tomasi »

Source code is now available on github under GPLv3 license:

https://github.com/rtomasi75/pygmalion