Alpha-Beta as a rollouts algorithm

Michel · Post by **Michel** » Mon Jan 29, 2018 8:46 am

The rollouts are still alpha-beta not scout (PVS) but we don't really need that.

It seems PVS is described in Theorem 3. It is actually very elegant (more elegant than the standard implementation with zero window searches).

Michel · Post by **Michel** » Mon Jan 29, 2018 11:19 am

The PVS rollout mechanism always selects the move with the highest stored (mathematical) upper bound.

This is very similar to UCT where a move is chosen with the highest statistical upper bound.

This makes indeed possible to envision all kinds of interpolations between UCT and alpha/beta.

Daniel Shawul · Post by **Daniel Shawul** » Mon Jan 29, 2018 4:01 pm

Michel wrote:The PVS rollout mechanism always selects the move with the highest stored (mathematical) upper bound.

This is very similar to UCT where a move is chosen with the highest statistical upper bound.

This makes indeed possible to envision all kinds of interpolations between UCT and alpha/beta.

I thought they were going to implement MT-SSS* indirectly with null windows searches just like Aske Plaat did i.e using the MT driver passing (beta-1,beta) bounds at the root shown in Algorithm 2. I guess since PVS ~ MT-SSS*, PVS is sort of implemented indirectly. I quickly tried this and search hangs but is definately due to a bug i have, because move sorting should not have mattered. Will come back later with this.

---
For PVS, I was thinking of a more direct approach where we do the following:

1) Check if a given node has atleast one sibling with closed window (i.e. alpha bound determined), if so use a nullwindow for the rollout of the move.

2) When we backup and find out the score fails high, do research with full alpha-beta window instead of closing the window. This will force more rollouts on this node and at the root till the correct score is found.

This direct approach will also be useful for re-searching LMR reduced moves when they fail high, which i think it would be difficult with the other approach.

---

Theorem 2 always chooses the left-most child and its behviour is static for the duration of a given iteration in an ID framework. My selection policy is a gready one and always chooses the child with the highest score -- that means it is dynamic in a given iteration. I think this is better since it incorporates behaviour of dynamic move sorting heuristics.

If I did a uniform (random) selection of moves alpha-beta goes back toward minmax with a branching factor b^3/4 instead of b^1/2. So I was thinking whatever you do in the selection phase is equivalent to a move sorting heuristics -- and shouldn't lead to a new algorithm like going from alpha-beta to PVS, but maybe it does as you pointed out in Theorem 3. I need to digest this.

Daniel

Daniel Shawul · Post by **Daniel Shawul** » Mon Jan 29, 2018 6:01 pm

I quickly tried this and search hangs but is definately due to a bug i have

Fixed the bug and now using beta_c for selection runs fine. However, it seems it doesn't improve things compared to what i had before ( i.e using the actual score of the node). Infact using beta_c seems to be slower than score_c.

Dann Corbit · Post by **Dann Corbit** » Mon Jan 29, 2018 8:50 pm

Daniel Shawul wrote:
I quickly tried this and search hangs but is definately due to a bug i have
Fixed the bug and now using beta_c for selection runs fine. However, it seems it doesn't improve things compared to what i had before ( i.e using the actual score of the node). Infact using beta_c seems to be slower than score_c.

I guess the really interesting question is, "Why does score_c seem to work better?"

Daniel Shawul · Post by **Daniel Shawul** » Mon Jan 29, 2018 10:44 pm

I may have a bug that invalidates any comparison at this point. Actually, I am going back to no pruning, no aspiration version and try to work my way up again.

For those interested, there are more details in the full paper here.

Also if anyone want to experment with scorpio's implemenation and contribute, i could add you to the repo. Let's have fun with it.

Daniel

smatovic · Post by **smatovic** » Wed May 24, 2023 6:33 am

Daniel Shawul wrote: ↑Thu Jan 25, 2018 11:59 pm ...

Daniel, can you elaborate a bit what you do currently with Scorpio, I read here and there that you implemented all kinds of techniques, MCTS, AB NNUE, even Lc0 CNN batches via parallel AB, what did work best for you? Is it now AB Rollouts with NNUE?Thx.

--
Srdja

Alpha-Beta as a rollouts algorithm

Re: Alpha-Beta as a rollouts algorithm

Re: Alpha-Beta as a rollouts algorithm

Re: Alpha-Beta as a rollouts algorithm

Re: Alpha-Beta as a rollouts algorithm

Re: Alpha-Beta as a rollouts algorithm

Re: Alpha-Beta as a rollouts algorithm

Re: Alpha-Beta as a rollouts algorithm