The Plan-9 to finally solve chess :)

Discussion of chess software programming and technical issues.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Post Reply
Sergei S. Markoff
Posts: 207
Joined: Mon Sep 12, 2011 9:27 pm
Location: Moscow, Russia
Contact:

The Plan-9 to finally solve chess :)

Post by Sergei S. Markoff » Sat May 11, 2019 4:52 pm

An idea how to train NN to play chess w/o RL using only TBs.

1. Train some deep NN on 7-men tablebases.
2. Let's assume some trust men limit, initially L = 7.
3. Create empty minibatch for NN.
4. Gererate random L + 1 men position and perform N-ply search (optimal N value is disputable)

Code: Select all

val neg_value(val value)
{
	if (value == indefinite) return indefinite;
	if (value == draw) return draw;
	if (value == loss) return win; else return loss;
}

bool is_at_least(value alpha, value beta)
{
	if (alpha == win) return true;
	if (beta == loss) return true;
	if (alpha == loss) return false;

	if (alpha == indefinite)
	{
		return false;
	}

	if (beta == indefinite)
	{
		return false;
	}

	return true;
}

val trust_search(int depth, val alpha, val beta)
{
	if (checkmate()) return -infinity;
	if (stalemate()) return 0;
	if (repetition()) return 0;
	if (insuffient_material()) return 0;
	if (TB_pos()) return TB_eval();

	if (piece_count <= L) return NN_eval();

	if (depth <= 0) return indefinite;

	gen_moves();
	foreach (move in moves)
	{
		val = neg_val(trust_search(depth - 1, neg_val(beta), neg_val(alpha)));
		if (val == win) return win;
		if (alpha == loss)
		{
			alpha = val;
		}
		else if (val == indefinite)
		{
			alpha = indefinite;
		}
		
		if (is_at_least(alpha, beta))
		{
			return alpha;
		}
	}

	return alpha;
}
5. If trust_search value is not indefinite, add this position with this eval to NN minibatch. Repeat steps 4 and 5 until minibatch is big enough.
6. Perform one learning step for NN with this minibatch.
7. Until NN error is less then some bound B (value is disputable) repeat steps 3—7.
8. L = L + 1
9. Repeat steps 3—9 until L = 32.
10. PROFIT!
The Force Be With You!

Post Reply