LC0 on 43 cores had a ~2700 CCRL ELO performance.

jkiliani · Post by **jkiliani** » Thu Apr 19, 2018 5:17 pm

Daniel Shawul wrote:
jkiliani wrote:
Daniel Shawul wrote:
mirek wrote:
Daniel Shawul wrote:If you use MCTS alone, you will suffer from tactical problems even a 100x more time won't solve a 7-ply trap.
If we are speaking A0 that is only true if the search guiding NN won't recognize the patterns and realize that such trap maybe there. And obviously NN can fail at recognizing such pattern similarly as e.g. null-move heuristics can fail for zugzwang detection, but the idea is that most of the times when the 7 ply trap is there it will be recognized by the properly trained NN. And this must be the case with A0 otherwise it couldn't be nearly as strong with nps so low.
No, I would be surprized if a neural network can recognize even a quescence level tactics. It is just a big evaluation function. What was suggested was that the policy network could help to pick the right kind of moves to avoid this tactical problems. But by definition, traps are bad looking moves that would turn out to be good after x-plies of search. The policy network will not pick the bad looking move hence fails for the trap.
AlphaZero would disagree with you. The answer to this problem is actually simple: A large enough neural net, trained with with enough reinforcement learning, will be able to tell when a position looks dangerous and adjust its policy priors to search these moves. Otherwise, Stockfish would have constantly found tactics against AZ, which it didn't.
That is what astounded me the first time they reported their result, i.e. why stockfish was not able to exploit A0 tactically weak MCTS search. LCzero still has tactical problems at 43-cores so that speaks volumes about the severity of the problem. According to the paper I posted here, even if you give MCTS 100x more time than a corresponding full width search, it might not be able to ever find a 7-ply trap. This is due the simulation allocation policy favouring the best looking moves. If on the other hand you have a uniform policy, then it would find the tactics quicker but not as fast as an alphabeta engine. The problem is that an MCTS searcher converges to a MINMAX tree not alpha-beta pruned tree. Alpha-Beta rollouts MCTS on the other hand can find it as fast as standard alphabeta engines, and also allows you to immediately import heuristics such as lmr+nullmove into a rollouts version. With A0's MCTS search, there is always going to be some deep tactics that A0 is going to miss and stockish will find. A0 may alleviate the problem enough to beat Stockfsh using massive hardware like 4-TPU's but it is always going to have this tactical problem and looks silly sometimes.

An engine based on MCTS playouts is a completely different thing from a UCT+NN engine, like AlphaZero or Leela. Clearly, the net in AlphaZero was strong enough to counter SF's tactics in almost all cases. You can already see now that Leela at current nets is a lot more resilient to tactics than it was a few weeks ago, on the 64x6 architecture. A large and well trained neural net won't need to search very deep to see tactical opportunities, they are effectively baked into the network itself. Leela will get a lot better in tactics with more training, and especially with larger architectures, IN SPITE of the loss in node calculation speed.

Daniel Shawul · Post by **Daniel Shawul** » Thu Apr 19, 2018 5:42 pm

jkiliani wrote:
Daniel Shawul wrote:
jkiliani wrote:
Daniel Shawul wrote:
mirek wrote:
Daniel Shawul wrote:If you use MCTS alone, you will suffer from tactical problems even a 100x more time won't solve a 7-ply trap.
If we are speaking A0 that is only true if the search guiding NN won't recognize the patterns and realize that such trap maybe there. And obviously NN can fail at recognizing such pattern similarly as e.g. null-move heuristics can fail for zugzwang detection, but the idea is that most of the times when the 7 ply trap is there it will be recognized by the properly trained NN. And this must be the case with A0 otherwise it couldn't be nearly as strong with nps so low.
No, I would be surprized if a neural network can recognize even a quescence level tactics. It is just a big evaluation function. What was suggested was that the policy network could help to pick the right kind of moves to avoid this tactical problems. But by definition, traps are bad looking moves that would turn out to be good after x-plies of search. The policy network will not pick the bad looking move hence fails for the trap.
AlphaZero would disagree with you. The answer to this problem is actually simple: A large enough neural net, trained with with enough reinforcement learning, will be able to tell when a position looks dangerous and adjust its policy priors to search these moves. Otherwise, Stockfish would have constantly found tactics against AZ, which it didn't.
That is what astounded me the first time they reported their result, i.e. why stockfish was not able to exploit A0 tactically weak MCTS search. LCzero still has tactical problems at 43-cores so that speaks volumes about the severity of the problem. According to the paper I posted here, even if you give MCTS 100x more time than a corresponding full width search, it might not be able to ever find a 7-ply trap. This is due the simulation allocation policy favouring the best looking moves. If on the other hand you have a uniform policy, then it would find the tactics quicker but not as fast as an alphabeta engine. The problem is that an MCTS searcher converges to a MINMAX tree not alpha-beta pruned tree. Alpha-Beta rollouts MCTS on the other hand can find it as fast as standard alphabeta engines, and also allows you to immediately import heuristics such as lmr+nullmove into a rollouts version. With A0's MCTS search, there is always going to be some deep tactics that A0 is going to miss and stockish will find. A0 may alleviate the problem enough to beat Stockfsh using massive hardware like 4-TPU's but it is always going to have this tactical problem and looks silly sometimes.
An engine based on MCTS playouts is a completely different thing from a UCT+NN engine, like AlphaZero or Leela. Clearly, the net in AlphaZero was strong enough to counter SF's tactics in almost all cases. You can already see now that Leela at current nets is a lot more resilient to tactics than it was a few weeks ago, on the 64x6 architecture. A large and well trained neural net won't need to search very deep to see tactical opportunities, they are effectively baked into the network itself. Leela will get a lot better in tactics with more training, and especially with larger architectures, IN SPITE of the loss in node calculation speed.

Though I don't agree with you, I like your confidence that the NN alone will improve its tactics. As far as I can tell it is the improvement in hardware that helped it improve its tactical ability.

I thought initially leela had bugs that would leave a piece hanging. My guess is that if you now train a network from CCRL games, just like Gary did initially, it would probably be on the same tactical level as the current L0 trained from self play games.

CMCanavessi · Post by **CMCanavessi** » Thu Apr 19, 2018 6:23 pm

Daniel Shawul wrote:
Ras wrote:
Daniel Shawul wrote:Why is that GPU FLOPS are different from CPU ones, again
I have explained that several times now. Please re-read, I won't explain it over and over.
GPU flops are different from CPU flops because you need to have a different algorithm ?? Are you serious? What a dumb "explanation" that is.

A FLOP is a performance metric (Floating Point Operations Per Second) period.

The top-500 ranks super-computers using FLOPS metric based on LINPACK or something like that without worrying about what is used to build it ( CPU, GPU, KNL etc..)

Doing a dot product is not even such a hard algorithm to implement anyway.

I will ask you this question again, which was not answered in the other thread:

If GPUs provide so much "advantage", then why don't you (or any other chess engine programmer) re-write Scorpio to use GPU instead of CPU? You can't tell me it would not be efficient, cause "flops are flops", right? If you ran Scorpio on a GPU, it would be WAY weaker than what it really is. Same happens to Leela when it runs on CPUs....

Milos · Post by **Milos** » Thu Apr 19, 2018 6:36 pm

jkiliani wrote:AlphaZero would disagree with you. The answer to this problem is actually simple: A large enough neural net, trained with with enough reinforcement learning, will be able to tell when a position looks dangerous and adjust its policy priors to search these moves. Otherwise, Stockfish would have constantly found tactics against AZ, which it didn't.

That is plain wrong i.e. Daniel is totally right.
If you extrapolate from Figure 2 the performance of A0 assuming that Elo curves from it and SF don't diverge even further as you are reducing TC you'd get that SF has at least 400Elo better performance than A0 at TC=0.0125ms/move. 0.0125ms is average time required for single eval of A0.
OTOH in 0.0125ms SF8 can on average search 875 nodes.
That is roughly the same number of nodes that is searched by SF8's alpha beta search of depth 5. So depth 5 SF8 alpha-beta search is at least 400eval stronger just A0 NN eval.
Only quiescence search on SF8 visits roughly 100 nodes. 3 doublings of QS is therefore equivalent in strength to depth 5 search, i.e. even if we assume 120Elo per doubling of nodes searched only QS is like 360 Elo weaker than depth 5 search, and A0 eval is at least 400Elo weaker.
Therefore, it is safe to assume that only QS of SF8 is more powerful than NN eval of A0.
So much about "magic" strength of NN eval.

Daniel Shawul · Post by **Daniel Shawul** » Thu Apr 19, 2018 6:50 pm

CMCanavessi wrote:
Daniel Shawul wrote:
Ras wrote:
Daniel Shawul wrote:Why is that GPU FLOPS are different from CPU ones, again
I have explained that several times now. Please re-read, I won't explain it over and over.
GPU flops are different from CPU flops because you need to have a different algorithm ?? Are you serious? What a dumb "explanation" that is.

A FLOP is a performance metric (Floating Point Operations Per Second) period.

The top-500 ranks super-computers using FLOPS metric based on LINPACK or something like that without worrying about what is used to build it ( CPU, GPU, KNL etc..)

Doing a dot product is not even such a hard algorithm to implement anyway.
I will ask you this question again, which was not answered in the other thread:

If GPUs provide so much "advantage", then why don't you (or any other chess engine programmer) re-write Scorpio to use GPU instead of CPU? You can't tell me it would not be efficient, cause "flops are flops", right? If you ran Scorpio on a GPU, it would be WAY weaker than what it really is. Same happens to Leela when it runs on CPUs....

Scorpio can't run on the GPU, LCZero can run on both the CPU and GPU/TPU. The only difference is the latter gives it a significant advantage because it makes its bulky NN eval much faster than it it were running on CPU.

This is nothing new in that regard, also DeepBlue had special hardware for accelerating its eval using FGPA. Are you going to ask me now why Scorpio didn't have FGPA eval too ? DeepBlue was a hardware success story so could the A0 success too -- though i didn't think of it like that originally.

Just to piss you off though, here is a Scorpio GPU version https://github.com/dshawul/GpuHex/tree/chess

It actually runs the MCTS on the GPU unlike A0 or L0; my challenge to LCZero developers is to do the MCTS on the GPU too.

CMCanavessi · Post by **CMCanavessi** » Thu Apr 19, 2018 7:13 pm

Daniel Shawul wrote:Scorpio can't run on the GPU

Exactly, and why would that be?

If, as you are saying, GPUs provide so big of an advantage Gflop-wise, why doesn't your engine (or any other serious engine) run on GPUs? You of course know the answer to that.

LCZero can run on both the CPU and GPU/TPU. The only difference is the latter gives it a significant advantage because it makes its bulky NN eval much faster than it it were running on CPU.

Again, exactly. It was _DESIGNED_ to run on GPU/TPU and only runs on CPUs for a very specific reason: so that some people with no gpu can run some training games and contribute to the project. In the future, when the network size is increased again, CPUs will be so slow it will be completely useless/impractical to even run training on them. And you of course, again, know that perfectly well.

This is nothing new in that regard, also DeepBlue had special hardware for accelerating its eval using FGPA. Are you going to ask me now why Scorpio didn't have FGPA eval too ? DeepBlue was a hardware success story so could the A0 success too -- though i didn't think of it like that originally.

Well, both projects were just a promotional thing, that's absolutely true. That doesn't mean there's no truth or some cool things to take from both. As for why Scorpio doesn't run on FPGA, maybe because it would not be convenient... right? It might be faster, but who would use it? 1% of the users? Everything is done for a reason, and, for the 3rd time, you ofc know that

Just to piss you off though, here is a Scorpio GPU version https://github.com/dshawul/GpuHex/tree/chess

It actually runs the MCTS on the GPU unlike A0 or L0; my challenge to LCZero developers is to do the MCTS on the GPU too.

That's a cool one!! How does it perform? Does it run fully on gpu, or it relies on the cpu for some stuff?

corres · Post by **corres** » Fri Apr 20, 2018 1:05 am

[quote="George Tsavdaris"]

But SF is based on huge training too. The fishtest methodology that currently SF's strength comes from has millions of games per month and zillions of CPU hours spent for it.

[/quote]

In your opinion how much money is the development of a chess engine like Komodo or Houdini? And the teaching of AlphaZero?
I think the difference is as sky and ground.

corres · Post by **corres** » Fri Apr 20, 2018 11:55 am

[quote="mhull"]

So we're back to the "uniform platform" school of computer chess competition. However, limiting A0 to 64 scalar CPUs is completely arbitrary and biased toward scalar-optimized chess projects. Demanding a non-scalar optimized project to un-optimize itself "to make it fair" is not fair.
It would be as if in the 1980's , we demanded a fair contest only if Cray Blitz would run on the same hardware as Mephisto <insert version name>.
I don't think so.

[/quote]

It is obvious A0 is not a chess engine but a Chess Machine as was Cray Blitz and Mephistos(!).
So the match between Stockfish and A0 was not an engine vs. engine match. Nevertheless, a lot of peoples think about these competition(?), as a usual engine-engine games.
The main difference between classical chess engine and Chess Machine based on NN hardware and software is not in its hardware but the concept of its working. These difference is much more larger then was the difference between Cray Blitz and Mephistos.
So the comparison based on the hardware used by A0 and used by Stockfish is rather misleading.

corres · Post by **corres** » Sun Apr 22, 2018 10:10 am

[quote="Daniel Shawul"]

No, I would be surprized if a neural network can recognize even a quescence level tactics. It is just a big evaluation function....

[/quote]

I am afraid you are wrong.
The NN of A0 is a system to recognize formations.
Such a system is used in the machine for face recognizing, enemy visual recognizing, etc.
A0 stick to the formations of figures probabilities for every possible moves.
If during the process of teaching A0 meets with a trap position enough many time it will stick to that trap position very low probability for bad moves.
So the teaching process and the number of layers are the key to recognize traps.

LC0 on 43 cores had a ~2700 CCRL ELO performance.

Re: LC0 on 43 cores had a ~2700 CCRL ELO performance.

Re: LC0 on 43 cores had a ~2700 CCRL ELO performance.

Re: LC0 on 43 cores had a ~2700 CCRL ELO performance.

Re: LC0 on 43 cores had a ~2700 CCRL ELO performance.

Re: LC0 on 43 cores had a ~2700 CCRL ELO performance.

Re: LC0 on 43 cores had a ~2700 CCRL ELO performance.

Re: LC0 on 43 cores had a ~2700 CCRL ELO performance.

Re: LC0 on 43 cores had a ~2700 CCRL ELO performance.

Re: LC0 on 43 cores had a ~2700 CCRL ELO performance.