### Minimax, noisy evaluations, PUCT

Posted:

**Wed Mar 20, 2019 2:58 pm**Minimax is known to have a bias which does not diminish with depth. The paper

uses KRK endings to investigate this.

Here's a simplified situation which I was able to analyse mathematically. Suppose the branching factor is 2, the true value of all nodes is 0, and that the evaluation function gives noisy estimates of this, independent random variables which are symmetric about 0, with cdf F. The minimax evaluation at the root tends towards F^-1((sqrt(5)-1)/2) ~= F^-1(.618) as the depth tends to infinity. It's funny to see the golden ratio appearing. If the noise follows a standard normal, the median root evaluations look like this.

In a real game, the bias will occur when there is more than one nearly optimal move. This is not necessarily a bad thing. Positions with several nearly optimal moves are arguably superior to ones with only one. I found this from 2007 viewtopic.php?f=7&t=13433&p=114597&hili ... ax#p114597, where H.G.Muller suggests adding extra noise to obtain the effect, apparently unaware of the bias that already exists.

For fixed-depth minimax, the bias does not affect the ordering of evaluations. I guess there is some interaction with quiescent search and the width of aspiration windows. But I guess the bias is not a big issue for traditional engines.

On the other hand, if the search is to variable depths, as in PUCT, the even-odd oscillations seem problematic. I wonder if this part of why DeepMind found mean() better than max() for PUCT.

*Bias and pathology in minimax search*, 2005, A. Sadikov, I. Bratko, I. Kononenkouses KRK endings to investigate this.

Here's a simplified situation which I was able to analyse mathematically. Suppose the branching factor is 2, the true value of all nodes is 0, and that the evaluation function gives noisy estimates of this, independent random variables which are symmetric about 0, with cdf F. The minimax evaluation at the root tends towards F^-1((sqrt(5)-1)/2) ~= F^-1(.618) as the depth tends to infinity. It's funny to see the golden ratio appearing. If the noise follows a standard normal, the median root evaluations look like this.

Code: Select all

```
Ply root eval
0 0 median of standard normal
1 0.545 median of max(child evals)
2 0.103
3 0.46
4 0.216
5 0.369
6 0.245
7 0.345
8 0.264
9 0.33
10 0.277
11 0.32
12 0.285
13 0.313
14 0.29
15 0.309
16 0.294
17 0.306
18 0.296
19 0.304
20 0.297
HUGE 0.3003214
```

In a real game, the bias will occur when there is more than one nearly optimal move. This is not necessarily a bad thing. Positions with several nearly optimal moves are arguably superior to ones with only one. I found this from 2007 viewtopic.php?f=7&t=13433&p=114597&hili ... ax#p114597, where H.G.Muller suggests adding extra noise to obtain the effect, apparently unaware of the bias that already exists.

For fixed-depth minimax, the bias does not affect the ordering of evaluations. I guess there is some interaction with quiescent search and the width of aspiration windows. But I guess the bias is not a big issue for traditional engines.

On the other hand, if the search is to variable depths, as in PUCT, the even-odd oscillations seem problematic. I wonder if this part of why DeepMind found mean() better than max() for PUCT.