Search Question
Moderators: hgm, Rebel, chrisw
-
- Posts: 917
- Joined: Fri Aug 21, 2020 1:25 am
- Location: Planet Earth, Sol system
- Full name: Michael J Sherwin
Re: Search Question
HGM,
You could probably evolve NEG quite easily into a MCTS type engine that is a little bit smarter than pure MCTS because the games it would play would not be totally random (or is NEG random, I can't remember, if so give it a small amount of smarts). I think it could work like this. Play a game and store the moves in a giant tree structure or maybe a giant hash and keep track of w l d for each move. Play more games and modify the results of each position. Chose moves to play based on the accumulated w d l + the initial eval of the position. Draws get a slight malus for both sides causing different moves to be tried. Moves farther from the root get a bigger bonus or malus. The bigger malus values will cause those moves to be changed more quickly. But they can change back if other moves are no better or worse. Smaller malus values toward the root will cause those moves to be searched more before they change. If the initial move is good for any position it may never be changed causing the algorithm to concentrate it's effort where it needs to concentrate its effort before trying a different move two ply earlier for the worse side.
You could probably evolve NEG quite easily into a MCTS type engine that is a little bit smarter than pure MCTS because the games it would play would not be totally random (or is NEG random, I can't remember, if so give it a small amount of smarts). I think it could work like this. Play a game and store the moves in a giant tree structure or maybe a giant hash and keep track of w l d for each move. Play more games and modify the results of each position. Chose moves to play based on the accumulated w d l + the initial eval of the position. Draws get a slight malus for both sides causing different moves to be tried. Moves farther from the root get a bigger bonus or malus. The bigger malus values will cause those moves to be changed more quickly. But they can change back if other moves are no better or worse. Smaller malus values toward the root will cause those moves to be searched more before they change. If the initial move is good for any position it may never be changed causing the algorithm to concentrate it's effort where it needs to concentrate its effort before trying a different move two ply earlier for the worse side.
-
- Posts: 28163
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Search Question
True, the rollouts would be a lot better. But I doubt if they would be good enough to compete with a simple evaluation. (Like PST only.)
I do want to make N.E.G. tactically a little bit better; recognizing overloaded pieces and such, and then use it as an MCTS policy head.
I do want to make N.E.G. tactically a little bit better; recognizing overloaded pieces and such, and then use it as an MCTS policy head.
-
- Posts: 917
- Joined: Fri Aug 21, 2020 1:25 am
- Location: Planet Earth, Sol system
- Full name: Michael J Sherwin
Re: Search Question
I don't know what rollout means. I think PST + SEE would work well to first order the moves so that the MCTS does not start off randomly, ever, in any position reached. In every game played it just always plays the 'best' move until that move is penalized enough that the next best move is tried. In my experience with this method for after game learning in 2006 it usually took far less than 100 games from any starting position to totally master the position and start winning every game. Even against the best engines of the time. The best example was that I had Romi play 10 matches against Glaurung2 using the Nunn 10 pgn file. In the first match Romi scored 5% in the 10th match Romi scored 95%. And that was after only 9 games from one side and the games it played from the other side for each starting position. Just imagine what it can learn in 100;000 or more games!hgm wrote: ↑Wed Jul 17, 2024 11:37 am True, the rollouts would be a lot better. But I doubt if they would be good enough to compete with a simple evaluation. (Like PST only.)
I do want to make N.E.G. tactically a little bit better; recognizing overloaded pieces and such, and then use it as an MCTS policy head.
-
- Posts: 917
- Joined: Fri Aug 21, 2020 1:25 am
- Location: Planet Earth, Sol system
- Full name: Michael J Sherwin
Re: Search Question
The nice thing about the giant tree structure in memory is that once the moves have been generated and placed in the tree they never need generated again. The algorithm just follows a branch in the tree until a new position is reached at which time it needs to generate moves. A pool of memory is initialized with the sibling pointer pointing to the next memory slot. Then in building the tree the next memory slot is removed from the pool for every move generated and connected with the sibling pointer. The first child is connected with the child pointer. If during the search any node can be deleted it and its child nodes are reconnected to the memory pool.Mike Sherwin wrote: ↑Wed Jul 17, 2024 8:10 pmI don't know what rollout means. I think PST + SEE would work well to first order the moves so that the MCTS does not start off randomly, ever, in any position reached. In every game played it just always plays the 'best' move until that move is penalized enough that the next best move is tried. In my experience with this method for after game learning in 2006 it usually took far less than 100 games from any starting position to totally master the position and start winning every game. Even against the best engines of the time. The best example was that I had Romi play 10 matches against Glaurung2 using the Nunn 10 pgn file. In the first match Romi scored 5% in the 10th match Romi scored 95%. And that was after only 9 games from one side and the games it played from the other side for each starting position. Just imagine what it can learn in 100;000 or more games!hgm wrote: ↑Wed Jul 17, 2024 11:37 am True, the rollouts would be a lot better. But I doubt if they would be good enough to compete with a simple evaluation. (Like PST only.)
I do want to make N.E.G. tactically a little bit better; recognizing overloaded pieces and such, and then use it as an MCTS policy head.
-
- Posts: 28163
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Search Question
In pure MTCS, when you reach a position you never have reached before, by expanding the stored tree in the last node of the branch you were following, you continue the game along a single branch (which you do not store) until a result is reached. That result is then used as the evaluation of the new node in the stored part. This is called a rolout.
The disadvantage of storing the tree is that you run out of memory rather quickly. Jocly's AI uses MCTS (but with heuristic evaluation in the leaves of the stored tree). But in Tenjiku Shogi I can at most have it think 10 min. Then the memory of my PC is filled, and it starts swapping the treev to disk. Which kills all performance. Not only of Jocly, but of the PC as a whole.
In this year's Tenjiku Correspondence Championship Jocly even lost its game to the Interactive Diagram (which uses alpha-beta at about 1knps, and was set to searching 2.5 ply, which takes it about 1-2 min/move).
The disadvantage of storing the tree is that you run out of memory rather quickly. Jocly's AI uses MCTS (but with heuristic evaluation in the leaves of the stored tree). But in Tenjiku Shogi I can at most have it think 10 min. Then the memory of my PC is filled, and it starts swapping the treev to disk. Which kills all performance. Not only of Jocly, but of the PC as a whole.
In this year's Tenjiku Correspondence Championship Jocly even lost its game to the Interactive Diagram (which uses alpha-beta at about 1knps, and was set to searching 2.5 ply, which takes it about 1-2 min/move).
-
- Posts: 113
- Joined: Sat Jul 30, 2022 12:12 pm
- Full name: Jamie Whiting
Re: Search Question
I am interested to see how strong you could make "HCE" policy - I managed something reasonably successful when I first started writing monty, using only a few bonuses for threats, captures, etc + tuning. However it gets absolutely destroyed by even the simplest net arches I experimented with (and its many hundreds of elo worse than the current net arch).
You can mitigate this by engaging in some form of node pruning strategy, monty currently uses LRU (least recently used), whereby when the tree is full, the least recently accessed node is removed to make space (this is super uncompromising, but issues arise when mutlti-threading, so I'm currently changing this).
-
- Posts: 917
- Joined: Fri Aug 21, 2020 1:25 am
- Location: Planet Earth, Sol system
- Full name: Michael J Sherwin
Re: Search Question
I guess what I am suggesting is not MTCS at all. I don't know if it has a name. It just simply plays the "best move" for the entire game. But the best move changes all the time depending on results. And the search space is highly limited compared to MTCS. If it has a weakness it would be not finding sacrifices if a normal move is good enough to win. In my opinion MTCS does not learn. It merely 'plays the odds'. What I am proposing really learns because every move is guided by all the moves it has already tried.hgm wrote: ↑Mon Jul 22, 2024 6:31 am In pure MTCS, when you reach a position you never have reached before, by expanding the stored tree in the last node of the branch you were following, you continue the game along a single branch (which you do not store) until a result is reached. That result is then used as the evaluation of the new node in the stored part. This is called a rolout.
The disadvantage of storing the tree is that you run out of memory rather quickly. Jocly's AI uses MCTS (but with heuristic evaluation in the leaves of the stored tree). But in Tenjiku Shogi I can at most have it think 10 min. Then the memory of my PC is filled, and it starts swapping the treev to disk. Which kills all performance. Not only of Jocly, but of the PC as a whole.
In this year's Tenjiku Correspondence Championship Jocly even lost its game to the Interactive Diagram (which uses alpha-beta at about 1knps, and was set to searching 2.5 ply, which takes it about 1-2 min/move).