C++ coroutines and alphabeta

dangi12012 · Post by **dangi12012** » Tue Oct 04, 2022 12:13 pm

Minimal coroutine example:

#include <coroutine>
#include <optional>
#include <cstddef>
#include <iterator>
#include <iostream>

template <typename T>
class Generator
{
	class Promise
	{
	public:
		using value_type = std::optional<T>;

		Promise() = default;
		std::suspend_always initial_suspend() { return {}; }
		std::suspend_always final_suspend() noexcept { return {}; }
		void unhandled_exception() {
			std::rethrow_exception(std::move(std::current_exception()));
		}

		std::suspend_always yield_value(T value) {
			this->value = std::move(value);
			return {};
		}

		// void return_value(T value) {
		//     this->value = std::move(value);
		// }

		void return_void() {
			this->value = std::nullopt;
		}

		inline Generator get_return_object();

		value_type get_value() {
			return std::move(value);
		}

		bool finished() {
			return !value.has_value();
		}

	private:
		value_type value{};
	};

public:
	using value_type = T;
	using promise_type = Promise;

	explicit Generator(std::coroutine_handle<Promise> handle)
		: handle(handle)
	{}

	~Generator() {
		if (handle) { handle.destroy(); }
	}

	Promise::value_type next() {
		if (handle) {
			handle.resume();
			return handle.promise().get_value();
		}
		else {
			return {};
		}
	}

	struct end_iterator {};

	class iterator
	{
	public:
		using value_type = Promise::value_type;
		using difference_type = std::ptrdiff_t;
		using iterator_category = std::input_iterator_tag;

		iterator() = default;
		iterator(Generator& generator) : generator{ &generator }
		{}

		value_type operator*() const {
			if (generator) {
				return generator->handle.promise().get_value();
			}
			return {};
		}

		value_type operator->() const {
			if (generator) {
				return generator->handle.promise().get_value();
			}
			return {};
		}

		iterator& operator++() {
			if (generator && generator->handle) {
				generator->handle.resume();
			}
			return *this;
		}

		iterator& operator++(int) {
			if (generator && generator->handle) {
				generator->handle.resume();
			}
			return *this;
		}

		bool operator== (const end_iterator&) const {
			return generator ? generator->handle.promise().finished() : true;
		}

	private:
		Generator* generator{};
	};

	iterator begin() {
		iterator it{ *this };
		return ++it;
	}

	end_iterator end() {
		return end_sentinel;
	}

private:
	end_iterator end_sentinel{};
	std::coroutine_handle<Promise> handle;
};


template <typename T>
inline Generator<T> Generator<T>::Promise::get_return_object()
{
	return Generator{ std::coroutine_handle<Promise>::from_promise(*this) };
}


template<typename T>
Generator<T> range(T fromInclusive, T toExclusive) {
	for (T v = fromInclusive; v < toExclusive; ++v) {
		co_yield v;
	}
}

int main() {
	for (auto val : range(1, 10)) {
		std::cout << val.value() << '\n';
	}
}

Has anyone played around with a movegenerator that just co_yields the next move? Coroutines could be quite useful in never needing to allocate a movelist - and directly doing alphabeta search inside a foreach loop on a coroutine. Having a templated coroutine also could spit out the moves in order depending on the template parameter without any additional cost. So Capture moves, silent moves, or any other heuristics that control what is returned in what order.

Of course this can also be done with compiletime template - function inlining, but that would have to keep all the data in the stackframe (recursion etc) which a coroutine does not need to do, because it has a seperate storage mechanism.
Quote: https://en.cppreference.com/w/cpp/language/coroutines
"Coroutines are stackless: they suspend execution by returning to the caller and the data that is required to resume execution is stored separately from the stack."

So it seems that this is a 3rd alternative to the canonical *movelist++ = makemove()

mar · Post by **mar** » Tue Oct 04, 2022 2:55 pm

coroutines are just glorified cooperative multitasking with ugly syntax (as usual since this is modern C++) => you don't have enough control to serialize the state and so on, so the actual use case options are quite limited

I doubt it will be fast enough - the state it needs to maintain fake locals (and control flow index/code ptr) is allocated on the heap(!)
managing state transitions requires either a giant switch at the start of the coroutine (like in C#) or a function (code) pointer to continue after yield

but that's why what profiling tools are good for

so I don't see how that's superior to manually designed staged move generator, which does exactly the same (albeit without heap allocation) and you have full control over what's going on

we can argue that modern heap allocators use thread-local chunks for small blocks (not guaranteed), but when the heap needs to sync it may completely fry your multi-threaded performance

mar · Post by **mar** » Tue Oct 04, 2022 3:10 pm

one last thing: ok, you can work around the heap allocation by supplying a custom operator new/delete in your promise, assuming the compiler itself can't magically elide the allocation (wouldn't bet on that), still seems a bit too complicated/fragile to me

Ronald · Post by **Ronald** » Tue Oct 04, 2022 3:30 pm

How do you determine the order so that the next "best" move is generated, for instance: the next quiet move with the highest history score?

dangi12012 · Post by **dangi12012** » Tue Oct 04, 2022 8:45 pm

You can insert any (constexpr) heuristic you want - it obviously cannot depend on the relative values of the sibling nodes since that is not known yet and you are avoiding the sort.

On the coroutine the compiler (to my suprise) sees through it mostly:
simple - no difference: https://godbolt.org/z/n9vMnqdPq
middle - missed the Gauss equivalence which is bad! - https://godbolt.org/z/8c8Wj8oKW
complex - no difference again: https://godbolt.org/z/v5EMf8s8s

So this is a nice novelty but not for competitive chessprogramming at the moment imo.

dangi12012 · Post by **dangi12012** » Wed Oct 05, 2022 5:23 pm

For reference (as I visit these posts later sometimes) this is very much related and not used in chessprogramming too much yet.
https://stackoverflow.com/questions/197 ... r-networks

So if you go the route of having a list (which you really dont have to) a Templated Bose-Nelson sorting network for fixed movelist sizes is the way to go.
Static sort template:

Code: Select all

/*
 Copyright (c) 2020 Kang Yue Sheng Benjamin.
 
 Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
 
 The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
 
 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 */

#ifndef static_sort_h
#define static_sort_h

/*
 Adapted from the Bose-Nelson Sorting network code from:
 https://github.com/atinm/bose-nelson/blob/master/bose-nelson.c
 */

/**
 * A Functor class to create a sort for fixed sized arrays/containers with a
 * compile time generated Bose-Nelson sorting network.
 * \tparam NumElements  The number of elements in the array or container to sort.
 */
template <unsigned NumElements> class StaticSort
{
	// Default less than comparator
	struct LT
	{
		template <class A, class B>
		inline bool operator () (const A &a, const B &b) const
		{
			return a < b;
		}
	};
	
	template <class A, class C, int I0, int I1> struct Swap
	{
		template <class T> inline void s(T &v0, T &v1, C c)
		{
			// Explicitly code out the Min and Max to nudge the compiler
			// to generate branchless code where applicable.
			T t = c(v0, v1) ? v0 : v1; // Min
			v1 = c(v0, v1) ? v1 : v0; // Max
			v0 = t;
		}
		
		inline Swap(A &a, C c) { s(a[I0], a[I1], c); }
	};
	
	template <class A, class C, int I, int J, int X, int Y> struct PB
	{
		inline PB(A &a, C c)
		{
			enum { L = X >> 1, M = (X & 1 ? Y : Y + 1) >> 1, IAddL = I + L, XSubL = X - L };
			PB<A, C, I, J, L, M> p0(a, c);
			PB<A, C, IAddL, J + M, XSubL, Y - M> p1(a, c);
			PB<A, C, IAddL, J, XSubL, M> p2(a, c);
		}
	};
	
	template <class A, class C, int I, int J> struct PB <A, C, I, J, 1, 1>
	{
		inline PB(A &a, C c) { Swap<A, C, I - 1, J - 1> s(a, c); }
	};
	
	template <class A, class C, int I, int J> struct PB <A, C, I, J, 1, 2>
	{
		inline PB(A &a, C c) { Swap<A, C, I - 1, J> s0(a, c); Swap<A, C, I - 1, J - 1> s1(a, c); }
	};
	
	template <class A, class C, int I, int J> struct PB <A, C, I, J, 2, 1>
	{
		inline PB(A &a, C c) { Swap<A, C, I - 1, J - 1> s0(a, c); Swap<A, C, I, J - 1> s1(a, c); }
	};
	
	template <class A, class C, int I, int M, int Stop> struct PS
	{
		inline PS(A &a, C c)
		{
			enum { L = M >> 1, IAddL = I + L, MSubL = M - L};
			PS<A, C, I, L, (L <= 1)> ps0(a, c);
			PS<A, C, IAddL, MSubL, (MSubL <= 1)> ps1(a, c);
			PB<A, C, I, IAddL, L, MSubL> pb(a, c);
		}
	};
	
	template <class A, class C, int I, int M> struct PS <A, C, I, M, 1>
	{
		inline PS(A &a, C c) {}
	};
	
public:
	/**
	 * Sorts the array/container arr.
	 * \param  arr  The array/container to be sorted.
	 */
	template <class Container>
	inline void operator() (Container &arr) const
	{
		PS<Container, LT, 1, NumElements, (NumElements <= 1)> ps(arr, LT());
	};
	
	/**
	 * Sorts the array arr.
	 * \param  arr  The array to be sorted.
	 */
	template <class T>
	inline void operator() (T *arr) const
	{
		PS<T*, LT, 1, NumElements, (NumElements <= 1)> ps(arr, LT());
	};
	
	/**
	 * Sorts the array/container arr.
	 * \param  arr     The array/container to be sorted.
	 * \tparam Compare The less than comparator.
	 */
	template <class Container, class Compare>
	inline void operator() (Container &arr, Compare &lt) const
	{
		typedef Compare & C;
		PS<Container, C, 1, NumElements, (NumElements <= 1)> ps(arr, lt);
	};
	
	/**
	 * Sorts the array arr.
	 * \param  arr     The array to be sorted.
	 * \tparam Compare The less than comparator.
	 */
	template <class T, class Compare>
	inline void operator() (T *arr, Compare &lt) const
	{
		typedef Compare & C;
		PS<T*, C, 1, NumElements, (NumElements <= 1)> ps(arr, lt);
	};
	
	/**
	 * Sorts the array/container arr.
	 * \param  arr     The array/container to be sorted.
	 * \tparam Compare The less than comparator.
	 */
	template <class Container, class Compare>
	inline void operator() (Container &arr, const Compare &lt) const
	{
		typedef const Compare & C;
		PS<Container, C, 1, NumElements, (NumElements <= 1)> ps(arr, lt);
	};
	
	/**
	 * Sorts the array arr.
	 * \param  arr     The array to be sorted.
	 * \tparam Compare The less than comparator.
	 */
	template <class T, class Compare>
	inline void operator() (T *arr, const Compare &lt) const
	{
		typedef const Compare & C;
		PS<T*, C, 1, NumElements, (NumElements <= 1)> ps(arr, lt);
	};
};


/**
 * A Functor class to create a sort for fixed sized arrays/containers with a
 * compile time generated Bose-Nelson sorting network.
 * Inspired by TimSort, this scans through the array first.
 * It skips the sorting-network if it is strictly increasing or decreasing. ;)
 * \tparam NumElements  The number of elements in the array or container to sort.
 */
template <unsigned NumElements> class StaticTimSort
{
	// Default less than comparator
	struct LT
	{
		template <class A, class B>
		inline bool operator () (const A &a, const B &b) const
		{
			return a < b;
		}
	};
	
	template <class A, class C> struct Intro
	{
		template <class T>
		static inline void reverse(T _, A &a)
		{
			if (NumElements > 1) {
				unsigned left = 0, right = NumElements - 1;
				while (left < right) {
					T temp = a[left];
					a[left++] = a[right];
					a[right--] = temp;
				}
			}
		}
		
		template <class T>
		static inline bool sorted(T prev, A &a, C c)
		{
			if (NumElements < 8) return false;
			
			bool hasDecreasing = false;
			bool hasIncreasing = false;
			
			for (unsigned i = 1; i < NumElements; ++i) {
				T curr = a[i];
				if (c(curr, prev)) {
					hasDecreasing = true;
				}
				if (c(prev, curr)) {
					hasIncreasing = true;
				}
				prev = curr;
				if (NumElements > 22)
					if (hasIncreasing && hasDecreasing)
						return false;
			}
			if (!hasDecreasing) {
				return true;
			}
			if (!hasIncreasing) {
				reverse(a[0], a);
				return true;
			}
			return false;
		}
	};
	
	
	
public:
	/**
	 * Sorts the array/container arr.
	 * \param  arr  The array/container to be sorted.
	 */
	template <class Container>
	inline void operator() (Container &arr) const
	{
		if (!Intro<Container, LT>::sorted(arr[0], arr, LT()))
			StaticSort<NumElements>()(arr);
	};
	
	/**
	 * Sorts the array arr.
	 * \param  arr  The array to be sorted.
	 */
	template <class T>
	inline void operator() (T *arr) const
	{
		if (!Intro<T*, LT>::sorted(arr[0], arr, LT()))
			StaticSort<NumElements>()(arr);
	};
	
	/**
	 * Sorts the array/container arr.
	 * \param  arr     The array/container to be sorted.
	 * \tparam Compare The less than comparator.
	 */
	template <class Container, class Compare>
	inline void operator() (Container &arr, Compare &lt) const
	{
		typedef Compare & C;
		if (!Intro<Container, C>::sorted(arr[0], arr, lt))
			StaticSort<NumElements>()(arr, lt);
	};
	
	/**
	 * Sorts the array arr.
	 * \param  arr     The array to be sorted.
	 * \tparam Compare The less than comparator.
	 */
	template <class T, class Compare>
	inline void operator() (T *arr, Compare &lt) const
	{
		typedef Compare & C;
		if (!Intro<T*, C>::sorted(arr[0], arr, lt))
			StaticSort<NumElements>()(arr, lt);
	};
	
	/**
	 * Sorts the array/container arr.
	 * \param  arr     The array/container to be sorted.
	 * \tparam Compare The less than comparator.
	 */
	template <class Container, class Compare>
	inline void operator() (Container &arr, const Compare &lt) const
	{
		typedef const Compare & C;
		if (!Intro<Container, C>::sorted(arr[0], arr, lt))
			StaticSort<NumElements>()(arr, lt);
	};
	
	/**
	 * Sorts the array arr.
	 * \param  arr     The array to be sorted.
	 * \tparam Compare The less than comparator.
	 */
	template <class T, class Compare>
	inline void operator() (T *arr, const Compare &lt) const
	{
		typedef const Compare & C;
		if (!Intro<T*, C>::sorted(arr[0], arr, lt))
			StaticSort<NumElements>()(arr, lt);
	};
};
#endif

Ras · Post by **Ras** » Wed Oct 05, 2022 5:35 pm

dangi12012 wrote: ↑Wed Oct 05, 2022 5:23 pma Templated Bose-Nelson sorting network for fixed movelist sizes

First, you have static move ranking in a real chess engine (hash, MVV/LVA, history), and the highest ranked move should already cut in at least half of the cases. Swapping this move to the top and trying it is a better approach. Second, the code bloat that occurs for arbitrary (and potentially long) list lengths is hardly worth the speedup. In particular because the whole move generator is a rather irrelevant speed part of the engine, and the sorting (if done) even more so.

dangi12012 · Post by **dangi12012** » Wed Oct 05, 2022 5:42 pm

Ras wrote: ↑Wed Oct 05, 2022 5:35 pm
dangi12012 wrote: ↑Wed Oct 05, 2022 5:23 pma Templated Bose-Nelson sorting network for fixed movelist sizes
First, you have static move ranking in a real chess engine (hash, MVV/LVA, history), and the highest ranked move should already cut in at least half of the cases. Swapping this move to the top and trying it is a better approach. Second, the code bloat that occurs for arbitrary (and potentially long) list lengths is hardly worth the speedup. In particular because the whole move generator is a rather irrelevant speed part of the engine, and the sorting (if done) even more so.

Very true! - try the best move first - you could pad with nullmoves to use 3-4 networks only to avoid bloat (would need to test if that is bad)
A 3% speedup is a 3% speedup.
Measure and profile. General statements often do not apply to specific projects. Sorting networks dont apply to more than 18 items or so.

C++ coroutines and alphabeta

C++ coroutines and alphabeta

Re: C++ coroutines and alphabeta

Re: C++ coroutines and alphabeta

Re: C++ coroutines and alphabeta

Re: C++ coroutines and alphabeta

Re: C++ coroutines and alphabeta

Re: C++ coroutines and alphabeta

Re: C++ coroutines and alphabeta