MCTS implementation question

Discussion of chess software programming and technical issues.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Post Reply
Posts: 314
Joined: Sat Feb 04, 2017 10:57 pm
Location: USA

MCTS implementation question

Post by MOBMAT » Fri Feb 22, 2019 8:45 pm

I'm working the code for an MCTS implementation.

I'm coding Tic-Tac-Toe as an easy GAME to work out the bugs.

I'm running into a situation where either:

1. The code is working fine and i'm just not sure what to do
2. I haven't implemented something correctly (usually this one! LOL)

The question has to do with "terminal nodes" and what to do with them when one is returned by the selection/traverse step.
here is my condensed main search loop:

Code: Select all

	while (AbortSearch() == false)
		iLeaf = Traverse(0);	// (0) is the root
		if (IsNodeTerminal(iLeaf) == false) {
			Reward = Rollout(iLeaf);
			BackPropagate(iLeaf, Reward);
		m_NumSimulations++;	// keep track of how many times we have looped
Abort fails if we run out of nodes (i use a fixed array rather than dynamic allocation), or it runs out of time (it never when debugging i have it search forever)
Traverse (select) appears to be working correctly, and can return a terminal node.
when this occurs, a rollout isn't done (we have end of game) and my thinking was, there isn't anything to back propagate, so i don't call backpropate.

with Tic-Tac-Toe, especially after a few moves have been made, it doesn't take long for all the nodes to be terminal, and the loop just keeps spinning it's wheels. The number of simulations goes up but no new nodes are created.

I haven't seen a good write up of what to do in this situation, assuming my search loop code is correct.

it seems like a waste of time to simply keep looping till time runs out, but i see the same type of behavior with chess engines when they find mate. they just keep on searching forever.
i7-6700K @ 4.00Ghz (using 6 threads), EGTBs on PCI SSD
Benchmark: Stockfish 11 64 bmi2 (nps): 2067669

Posts: 163
Joined: Fri Oct 22, 2010 7:47 pm
Location: Austria

Re: MCTS implementation question

Post by nionita » Sat Feb 23, 2019 7:05 am

The search aim is to approximate the node values. But once you have terminal nodes, there is no need to approximate anymore, their value is exact. So those nodes should not be visited again. Instead their parents should be aware of such descendants, such that:
- if one descendant is a win, the parent is a win (parent must be treated as a leaf)
- if all descendant are loss, the parent is a loss (parent must be treated as a leaf)
- otherwise expand only the other descendants (normal MCTS node)

This logic should propagate up to the root.

Posts: 515
Joined: Tue Aug 21, 2018 5:52 pm
Full name: Dietrich Kappe

Re: MCTS implementation question

Post by dkappe » Thu Feb 28, 2019 7:53 pm

Have a look at leela_lite for a ~100 line implementation.

Post Reply