Komodo 12.3 is out

Werewolf · Post by **Werewolf** » Fri Dec 21, 2018 7:47 pm

lkaufman wrote: ↑Fri Dec 21, 2018 6:22 pm
Werewolf wrote: ↑Fri Dec 21, 2018 9:35 am
lkaufman wrote: ↑Fri Dec 21, 2018 5:57 am
Dann Corbit wrote: ↑Fri Dec 21, 2018 5:26 am Has splitting the cores between MCS and ordinary alpha-beta been tried?

It looks to me like the MCS version is fast at finding a stable, good move and the alpha-beta version solves tricky positions faster.
I've been keen on that idea for years, long before we even started on MCTS, but it's not simple to implement and it's far from clear how to combine them. Since our MCTS version already uses short alpha-beta searches we already combine them to some extent now. I suppose if we stall out on MCTS we might try this, but so far no sign of that happening.
Dreaming a bit here but could this work: run MCTS on the GPU, alpha beta on the CPU cores. Both have the same E.F, the one with the higher eval gets picked?

Edit: or more refined - the MCTS always gets picked except when the alpha-beta one is a certain amount greater, say +1.00 (for tricky positions)
MCTS doesn't use GPU, neural networks do, and we don't currently use neural networks. Splitting the cores of CPU that way is possible, but the benefit is less clear than you might think, because Komodo MCTS is already pretty good tactically though it coud be better.

I've read 5 or 6 online projects of putting MCTS on a GPU. I can't vouch for their success but people are trying. Here's one:

https://pdfs.semanticscholar.org/fe90/c ... 7a3327.pdf

Nordlandia · Post by **Nordlandia** » Fri Dec 21, 2018 8:48 pm

lkaufman wrote: ↑Fri Dec 21, 2018 3:14 am
Nordlandia wrote: ↑Thu Dec 20, 2018 10:02 am As tablebases technology advances. Will it make sense in the near future to allocate 1-core for tablebase handling / probing?.

Allocating 1-core solely for tablebase consuling, with all the benefits for this purpose.

Idea #1: egtb probing can be initiated after intelligent algorithm when it make sense than probing egtb very early in the game and sacrificing speed.

Is this something to consider?
Maybe I misunderstand your idea, but if you are just trying to save the time it takes to determine whether we are down to seven men or less on the board when an evaluation is done, that amount of time is too tiny to be of interest.

My idea is that one cpu stand solely for tablebase handling throughout the game so the speed of the remaining cores ain't affected.

Concept idea example with 10-cores:

One cpu for OS.
-
One cpu for tablebase handling/inquiring. -> 2000 Knp/s (or equivalent speed for 1-core) can be used (among other things) on stabilizing engine speed under heavy taxing endgame probing that compromises speed. So the question is whether it's worth sacrifcing one cpu on tablebases so the remaning can run without being affected by speed penalty of tablebase probing.
-
Resulting in 8-cores running at full speed.

grahamj · Post by **grahamj** » Fri Dec 21, 2018 10:19 pm

Werewolf wrote: ↑Fri Dec 21, 2018 7:47 pm
I've read 5 or 6 online projects of putting MCTS on a GPU. I can't vouch for their success but people are trying. Here's one:

https://pdfs.semanticscholar.org/fe90/c ... 7a3327.pdf

There is MCTS and MCTS. The so-called MCTS used by AlphaZero and LC0 is a deterministic algorithm which is hard to parallelise, and IMO is better called PUCT (prediction plus upper confidence tree) search.The article you linked to uses random (Monte Carlo) rollouts, which lends itself to parallelisation. I'd be interested in links to the other MCTS/GPU projects you mention, especially if they are PUCT-type MCTS.

lkaufman · Post by **lkaufman** » Fri Dec 21, 2018 10:24 pm

Using another core to access the tablebase won't save any time if the normal search has to wait for the result of the probe to decide what to do next. This is the fundamental problem of MP search in chess; the alpha-beta algorithm is basically sequential; two cores aren't as good as one core at twice the speed.

lkaufman · Post by **lkaufman** » Fri Dec 21, 2018 10:39 pm

Werewolf wrote: ↑Fri Dec 21, 2018 7:47 pm
lkaufman wrote: ↑Fri Dec 21, 2018 6:22 pm
Werewolf wrote: ↑Fri Dec 21, 2018 9:35 am
lkaufman wrote: ↑Fri Dec 21, 2018 5:57 am
Dann Corbit wrote: ↑Fri Dec 21, 2018 5:26 am Has splitting the cores between MCS and ordinary alpha-beta been tried?

It looks to me like the MCS version is fast at finding a stable, good move and the alpha-beta version solves tricky positions faster.
I've been keen on that idea for years, long before we even started on MCTS, but it's not simple to implement and it's far from clear how to combine them. Since our MCTS version already uses short alpha-beta searches we already combine them to some extent now. I suppose if we stall out on MCTS we might try this, but so far no sign of that happening.
Dreaming a bit here but could this work: run MCTS on the GPU, alpha beta on the CPU cores. Both have the same E.F, the one with the higher eval gets picked?

Edit: or more refined - the MCTS always gets picked except when the alpha-beta one is a certain amount greater, say +1.00 (for tricky positions)
MCTS doesn't use GPU, neural networks do, and we don't currently use neural networks. Splitting the cores of CPU that way is possible, but the benefit is less clear than you might think, because Komodo MCTS is already pretty good tactically though it coud be better.
I've read 5 or 6 online projects of putting MCTS on a GPU. I can't vouch for their success but people are trying. Here's one:

https://pdfs.semanticscholar.org/fe90/c ... 7a3327.pdf

Thanks. This one appears to use playouts so it's not directly relevant for Komodo, which uses short searches to estimate the win prob. But it does suggest that MCTS might benefit from GPU without using NN, so there is something to hope for.

Karol Majewski · Post by **Karol Majewski** » Sat Dec 22, 2018 8:46 am

lkaufman wrote: ↑Wed Dec 19, 2018 9:37 pm
Karol Majewski wrote: ↑Wed Dec 19, 2018 8:15 pm Hi Larry,

why MCTS search in Komodo is non-determinicstic despite using 1 thread only? I see some randomness during the search. Each run gives slightly different output. For example Leela's search is deterministic on single thread. Here is Komodo 12.3 on 1 CPU in starting position:

First run:
0.02: +0.39/9 1.e4 Nc6 2.d4 d5
0.04: +0.35/11 1.e4 e5 2.Nc3 Nc6
0.06: +0.33/12 1.e4 e5 2.Nf3 Nc6 3.Bc4 Nf6
0.10: +0.28/13 1.e4 e5 2.Nf3 Nc6 3.Bc4 Nf6 4.O-O
0.18: +0.20/14 1.e4 e5 2.Nf3 Nc6 3.Bc4 Nf6 4.Nc3 Nxe4
0.32: +0.17/15 1.e4 e5 2.Nf3 Nc6 3.Bc4 Nf6 4.d3 Bc5 5.O-O
0.36: +0.18/16 1.e4 e5 2.Nf3 Nc6 3.Bc4 Nf6 4.d3 Bc5 5.O-O
0.50: +0.17/16 1.c4 c5 2.e3 e6 3.a3 Nf6
1.02: +0.20/17 1.c4 c5 2.e3 e6 3.a3 Nf6
1.18: +0.20/17 1.e3 Nf6 2.d4 d5 3.c4 c6 4.cxd5 cxd5
1.56: +0.20/18 1.e3 c5 2.Nf3 d5 3.c4 d4 4.exd4 cxd4 5.b4 a5
3.06: +0.21/19 1.e3 c5 2.Nf3 a6 3.d4 e6 4.d5
3.46: +0.23/20 1.e3 c5 2.Nf3 a6 3.d4 e6 4.d5
4.48: +0.20/20 1.Nf3 c5 2.e4 e6 3.c4 Nc6 4.Be2 Nf6 5.Nc3 d5 6.exd5 exd5 7.cxd5
7.24: +0.18/21 1.Nf3 c5 2.e4 e6 3.h3 Nc6 4.Bb5 Nd4 5.Be2 d5 6.Nxd4
8.10: +0.18/22 1.Nf3 c5 2.e4 e6 3.h3 Be7 4.Nc3 Nc6 5.Bb5 Nf6 6.e5

Second run:
0.02: +0.39/9 1.e4 Nc6 2.d4 d5
0.04: +0.35/11 1.e4 e5 2.Nc3 Nc6
0.06: +0.33/12 1.e4 e5 2.Nf3 Nc6 3.Bc4 Nf6
0.12: +0.25/13 1.e4 e5 2.Nf3 Nc6 3.Bc4 Nf6 4.O-O
0.18: +0.20/14 1.e4 e5 2.Nf3 Nc6 3.Bc4 Nf6 4.Nc3 Nxe4
0.32: +0.17/15 1.e4 e5 2.Nf3 Nc6 3.Bc4 Nf6 4.d3 Bc5 5.O-O
0.36: +0.17/16 1.e4 e5 2.Nf3 Nc6 3.Bc4 Nf6 4.d3 Bc5 5.O-O
0.52: +0.17/16 1.c4 c5 2.e3 e6 3.a3 Nf6
1.02: +0.20/17 1.c4 c5 2.e3 e6 3.a3 Nf6
1.18: +0.20/17 1.e3 Nf6 2.d4 d5 3.c4 c6 4.cxd5 cxd5
1.56: +0.20/18 1.e3 c5 2.Nf3 d5 3.c4 d4 4.exd4 cxd4 5.b4 a5
3.06: +0.22/19 1.e3 c5 2.Nf3 a6 3.d4 e6 4.d5
3.44: +0.24/20 1.e3 c5 2.Nf3 a6 3.d4 e6 4.d5
4.46: +0.20/20 1.Nf3 c5 2.e4 e6 3.c4 Nc6 4.Be2 Nf6 5.Nc3 d5 6.exd5 exd5 7.cxd5
7.20: +0.18/21 1.Nf3 c5 2.e4 e6 3.Nc3 Nc6 4.Bb5 Nd4 5.O-O a6 6.Bd3 Nxf3+ 7.Qxf3
11.04: +0.17/22 1.Nf3 c5 2.e4 e6 3.h3 Nc6 4.Bb5 Nd4 5.Be2 d5 6.d3 Be7

Best
Karol
Maybe Mark has another answer, but since the two outputs are so similar it looks to me just to be due to cutoff being based on time or approximate node counts. With normal komodo on one thread an N ply search should give identical results every time, but with MCTS "Ply" is just an arbitrary function of nodes and I don't believe that an N ply search will always cut off at exactly the same number of nodes. When time is involved, results will very because NPS is not constant.

In the given example, the difference is very small, but in some other positions it's not that small. And this is a bit annoying, because very often I need to reproduce the analysis output. Is it possible to fix this? Make it (single thread MCTS) deterministic without losing Elo?

lkaufman · Post by **lkaufman** » Sun Dec 23, 2018 8:48 pm

Karol Majewski wrote: ↑Sat Dec 22, 2018 8:46 am
lkaufman wrote: ↑Wed Dec 19, 2018 9:37 pm
Karol Majewski wrote: ↑Wed Dec 19, 2018 8:15 pm Hi Larry,

why MCTS search in Komodo is non-determinicstic despite using 1 thread only? I see some randomness during the search. Each run gives slightly different output. For example Leela's search is deterministic on single thread. Here is Komodo 12.3 on 1 CPU in starting position:

First run:
0.02: +0.39/9 1.e4 Nc6 2.d4 d5
0.04: +0.35/11 1.e4 e5 2.Nc3 Nc6
0.06: +0.33/12 1.e4 e5 2.Nf3 Nc6 3.Bc4 Nf6
0.10: +0.28/13 1.e4 e5 2.Nf3 Nc6 3.Bc4 Nf6 4.O-O
0.18: +0.20/14 1.e4 e5 2.Nf3 Nc6 3.Bc4 Nf6 4.Nc3 Nxe4
0.32: +0.17/15 1.e4 e5 2.Nf3 Nc6 3.Bc4 Nf6 4.d3 Bc5 5.O-O
0.36: +0.18/16 1.e4 e5 2.Nf3 Nc6 3.Bc4 Nf6 4.d3 Bc5 5.O-O
0.50: +0.17/16 1.c4 c5 2.e3 e6 3.a3 Nf6
1.02: +0.20/17 1.c4 c5 2.e3 e6 3.a3 Nf6
1.18: +0.20/17 1.e3 Nf6 2.d4 d5 3.c4 c6 4.cxd5 cxd5
1.56: +0.20/18 1.e3 c5 2.Nf3 d5 3.c4 d4 4.exd4 cxd4 5.b4 a5
3.06: +0.21/19 1.e3 c5 2.Nf3 a6 3.d4 e6 4.d5
3.46: +0.23/20 1.e3 c5 2.Nf3 a6 3.d4 e6 4.d5
4.48: +0.20/20 1.Nf3 c5 2.e4 e6 3.c4 Nc6 4.Be2 Nf6 5.Nc3 d5 6.exd5 exd5 7.cxd5
7.24: +0.18/21 1.Nf3 c5 2.e4 e6 3.h3 Nc6 4.Bb5 Nd4 5.Be2 d5 6.Nxd4
8.10: +0.18/22 1.Nf3 c5 2.e4 e6 3.h3 Be7 4.Nc3 Nc6 5.Bb5 Nf6 6.e5

Second run:
0.02: +0.39/9 1.e4 Nc6 2.d4 d5
0.04: +0.35/11 1.e4 e5 2.Nc3 Nc6
0.06: +0.33/12 1.e4 e5 2.Nf3 Nc6 3.Bc4 Nf6
0.12: +0.25/13 1.e4 e5 2.Nf3 Nc6 3.Bc4 Nf6 4.O-O
0.18: +0.20/14 1.e4 e5 2.Nf3 Nc6 3.Bc4 Nf6 4.Nc3 Nxe4
0.32: +0.17/15 1.e4 e5 2.Nf3 Nc6 3.Bc4 Nf6 4.d3 Bc5 5.O-O
0.36: +0.17/16 1.e4 e5 2.Nf3 Nc6 3.Bc4 Nf6 4.d3 Bc5 5.O-O
0.52: +0.17/16 1.c4 c5 2.e3 e6 3.a3 Nf6
1.02: +0.20/17 1.c4 c5 2.e3 e6 3.a3 Nf6
1.18: +0.20/17 1.e3 Nf6 2.d4 d5 3.c4 c6 4.cxd5 cxd5
1.56: +0.20/18 1.e3 c5 2.Nf3 d5 3.c4 d4 4.exd4 cxd4 5.b4 a5
3.06: +0.22/19 1.e3 c5 2.Nf3 a6 3.d4 e6 4.d5
3.44: +0.24/20 1.e3 c5 2.Nf3 a6 3.d4 e6 4.d5
4.46: +0.20/20 1.Nf3 c5 2.e4 e6 3.c4 Nc6 4.Be2 Nf6 5.Nc3 d5 6.exd5 exd5 7.cxd5
7.20: +0.18/21 1.Nf3 c5 2.e4 e6 3.Nc3 Nc6 4.Bb5 Nd4 5.O-O a6 6.Bd3 Nxf3+ 7.Qxf3
11.04: +0.17/22 1.Nf3 c5 2.e4 e6 3.h3 Nc6 4.Bb5 Nd4 5.Be2 d5 6.d3 Be7

Best
Karol
Maybe Mark has another answer, but since the two outputs are so similar it looks to me just to be due to cutoff being based on time or approximate node counts. With normal komodo on one thread an N ply search should give identical results every time, but with MCTS "Ply" is just an arbitrary function of nodes and I don't believe that an N ply search will always cut off at exactly the same number of nodes. When time is involved, results will very because NPS is not constant.
In the given example, the difference is very small, but in some other positions it's not that small. And this is a bit annoying, because very often I need to reproduce the analysis output. Is it possible to fix this? Make it (single thread MCTS) deterministic without losing Elo?

Actually there would be a benefit for us if we could fix this, as it makes speedups much easier to test. Perhaps we could make it so that if you specify N ply, it would calculate the number of nodes and stop at exactly that number. But I don't know how costly it would be to check the node count much more frequently. It might be practical for fixed depth searches but might lose more elo than we would accept in normal timed games. If we did this then if you wanted this feature for infinite analysis you would just set fixed depth to its maximum number.

Komodo 12.3 is out

Re: Komodo 12.3 is out

Re: Komodo 12.3 is out

Re: Komodo 12.3 is out

Re: Komodo 12.3 is out

Re: Komodo 12.3 is out

Re: Komodo 12.3 is out

Re: Komodo 12.3 is out