LC0 on 43 cores had a ~2700 CCRL ELO performance.

George Tsavdaris · Post by **George Tsavdaris** » Thu Apr 19, 2018 11:45 am

corres wrote: During the discussion you forget an important thing:
NN is not only an instrument to replace the static evaluation of standard chess engines. NN behaves like a dynamic opening AND middle game book. The quality and extent of this dynamic book depends on the learning of NN.
The chess power of these MACHINE (and not engine!) is greatly depend on the power and time used for teaching NN - that is making these dynamic book.
So making a really well established comparison between A0 and Stockfish is a hopeless thing.

But SF is based on huge training too. The fishtest methodology that currently SF's strength comes from has millions of games per month and zillions of CPU hours spent for it.

GregNeto · Post by **GregNeto** » Thu Apr 19, 2018 2:21 pm

please correct the following if it is wrong, it is just my conclusion after following the discussions here and on reddit/cbaduk:

LCzero is a NN which has some kind of superdeep static eval which even learns and knows about tactics and stuff humans may not even consider (some kind of pattern recognition?). The deeper this NN gets through training (updating the blocks and filters when progres stalls) the slower this static evaluation gets.

This stativ eval is currently used with mcts for playouts but could be used in an alpha beta searcher and may be especially interesting for move ordering.

The ultimate goal for an NN should be the best result possible with as few computational resources as necessary (not counting the resources for creating the NN). So in my opinion it makes a lot of sense for testing Lczero on one cpu vs xy engine on one cpu!

In go the older more mature cousin leelazero now plays at an equal level with programs which were top level one year ago using only 200 playouts (or visits or nodes, I do not know the difference). If the chess version follows in these footsteps (and I have few doubts that it will) we might see a smartphone version of lczero which plays on par with stockfish or komodo running on a PC.

In my limited testing network 63 running on one cpu had a rating of 1450 with a time control of 1 minute plus 1 second per move, using established engines running on one core. After a couple of hundred games network 122 had around 2000, at the same level with good old faile or gerbil. Let´s see ...

Daniel Shawul · Post by **Daniel Shawul** » Thu Apr 19, 2018 3:20 pm

mirek wrote:
Daniel Shawul wrote:
They had no option but to use MCTS not because it is better.
That is because it was getting 80,000 nodes/s even on 4-TPUs. With that nps a full width alpha-beta search you are stuck with search depth engines used to get in the 90's. That brings up tactical problems which they minimized with massive hardware -- it annoys me that there is no mention of this in the arxiv paper. They could have said yes their is a problem that could be exploited by a tactical engine, but we solved it with massive hardware woud be enough. The kind of tactical mistakes Leala zero made on a 48 core tcec machine speaks loud about this problem.
The tactical problem of A0 is there only if we are speaking of very short time controls.

Also speaking of details that are "not being explicitly mentioned" it seems to me you are overly concerned with tactical vulnerabilities present only at short time controls

The hardware used for A0 is 4-TPUs so 1s there could be an eternity elsewhere. L0 made silly tactical mistakes running on 43-cores TCEC hardware which is considered a high-end upto now that is. A0 probably will show the same tactical problems, and might even be worse given its 8x bigger net, if it was run on the TCEC hardware. We know the problem is the MCTS search which is exactly the same in both A0 and L0. They raised the minimum hardware requirement so high to mask these tactical problems without a single mention of it. You can get a super stockfish on a mobile processor but you would need 43-cores to get to 2700, and how many more cores to reach 3500 ?

And you can't compare level at which LC0 is at the moment with A0. 1 month ago LC0 was doing even much more horrible tactical blunders, so now you extrapolate to the future and I think it should be clear what the correct conclusions should be.

Even now on 43-core TCEC machine it makes serious tactical blunders. Even the developers of L0 know this pretty well. You on the other don't seem to understand the seriousness of the problem and the amount of hardware needed to solve it.

So if 1s / move or engine bullet games is your thing than sure, A0 will suck there on consumer HW for quiet some time. If on the other hand you are more inclined towards LTC, then clearly A0 approach is the way to go. I mean if they made it a regular 120min / 40 moves + 30 sec increment match + proper time management SF8 would probably lose even much worse than just by 100 elo.

Daniel Shawul · Post by **Daniel Shawul** » Thu Apr 19, 2018 3:22 pm

duncan wrote:
Daniel Shawul wrote:
Ok lets assume A0 has to run on 4-TPUs for some unknown reason to me, then to be fair (based on flops) they have to give stockfish 180x64= 11520 cores not just 64 ...
what about A0 running on 4-TPUs and adding on to stockfish the extra elo it would have got if it had 11,520 cores.

would that be fair. ?and how many extra elos would it have got?

Ofcourse it wouldn't but that is the point. We know alphabeta engines don't scale well more than a few hundred cores so that would be taking advantage of that fact ...

Daniel Shawul · Post by **Daniel Shawul** » Thu Apr 19, 2018 3:29 pm

Ras wrote:
Daniel Shawul wrote:Ok lets assume A0 has to run on 4-TPUs for some unknown reason to me, then to be fair (based on flops)
Flops are irrelevant, and not only because Stockfish runs integer math. During the match, it was 4 TPUs and not 40. Just like Stockfish was developed with MUCH more computing power than it ran in the match.

4-TPUs are used for the match I know that. But the point is they could have said we are going to use 40-TPUs for the match and Stockfish could use 120000 cores it it wants to etc.. This would be nonsense, as is using just 4-TPUs because then Stockifsh would have to run on 11000 cores - which we all know is hard for an alphabeta engine to scale on.

Both GPU and TPU numbers I used for my calculations are theoretical flops so don't see why that matters.
Because GPU flops are not the same as CPU flops. Actually, that is why GPUs exist at all. To make use of GPU flops, you need to have an algorithm that performs the same operation on a lot of data. GPU flops cannot be used as randomly as CPU flops.
.

Why is that GPU FLOPS are different from CPU ones, again

FLOPS measures floating point performance per second no matter what kind of hardware you use.

Daniel Shawul · Post by **Daniel Shawul** » Thu Apr 19, 2018 3:34 pm

mirek wrote:
Daniel Shawul wrote:If you use MCTS alone, you will suffer from tactical problems even a 100x more time won't solve a 7-ply trap.
If we are speaking A0 that is only true if the search guiding NN won't recognize the patterns and realize that such trap maybe there. And obviously NN can fail at recognizing such pattern similarly as e.g. null-move heuristics can fail for zugzwang detection, but the idea is that most of the times when the 7 ply trap is there it will be recognized by the properly trained NN. And this must be the case with A0 otherwise it couldn't be nearly as strong with nps so low.

No, I would be surprized if a neural network can recognize even a quescence level tactics. It is just a big evaluation function. What was suggested was that the policy network could help to pick the right kind of moves to avoid this tactical problems. But by definition, traps are bad looking moves that would turn out to be good after x-plies of search. The policy network will not pick the bad looking move hence fails for the trap.

jkiliani · Post by **jkiliani** » Thu Apr 19, 2018 3:51 pm

Daniel Shawul wrote:
mirek wrote:
Daniel Shawul wrote:If you use MCTS alone, you will suffer from tactical problems even a 100x more time won't solve a 7-ply trap.
If we are speaking A0 that is only true if the search guiding NN won't recognize the patterns and realize that such trap maybe there. And obviously NN can fail at recognizing such pattern similarly as e.g. null-move heuristics can fail for zugzwang detection, but the idea is that most of the times when the 7 ply trap is there it will be recognized by the properly trained NN. And this must be the case with A0 otherwise it couldn't be nearly as strong with nps so low.
No, I would be surprized if a neural network can recognize even a quescence level tactics. It is just a big evaluation function. What was suggested was that the policy network could help to pick the right kind of moves to avoid this tactical problems. But by definition, traps are bad looking moves that would turn out to be good after x-plies of search. The policy network will not pick the bad looking move hence fails for the trap.

AlphaZero would disagree with you. The answer to this problem is actually simple: A large enough neural net, trained with with enough reinforcement learning, will be able to tell when a position looks dangerous and adjust its policy priors to search these moves. Otherwise, Stockfish would have constantly found tactics against AZ, which it didn't.

Ras · Post by **Ras** » Thu Apr 19, 2018 4:36 pm

Daniel Shawul wrote:Why is that GPU FLOPS are different from CPU ones, again

I have explained that several times now. Please re-read, I won't explain it over and over.

Daniel Shawul · Post by **Daniel Shawul** » Thu Apr 19, 2018 4:47 pm

jkiliani wrote:
Daniel Shawul wrote:
mirek wrote:
Daniel Shawul wrote:If you use MCTS alone, you will suffer from tactical problems even a 100x more time won't solve a 7-ply trap.
If we are speaking A0 that is only true if the search guiding NN won't recognize the patterns and realize that such trap maybe there. And obviously NN can fail at recognizing such pattern similarly as e.g. null-move heuristics can fail for zugzwang detection, but the idea is that most of the times when the 7 ply trap is there it will be recognized by the properly trained NN. And this must be the case with A0 otherwise it couldn't be nearly as strong with nps so low.
No, I would be surprized if a neural network can recognize even a quescence level tactics. It is just a big evaluation function. What was suggested was that the policy network could help to pick the right kind of moves to avoid this tactical problems. But by definition, traps are bad looking moves that would turn out to be good after x-plies of search. The policy network will not pick the bad looking move hence fails for the trap.
AlphaZero would disagree with you. The answer to this problem is actually simple: A large enough neural net, trained with with enough reinforcement learning, will be able to tell when a position looks dangerous and adjust its policy priors to search these moves. Otherwise, Stockfish would have constantly found tactics against AZ, which it didn't.

That is what astounded me the first time they reported their result, i.e. why stockfish was not able to exploit A0 tactically weak MCTS search. LCzero still has tactical problems at 43-cores so that speaks volumes about the severity of the problem. According to the paper I posted here, even if you give MCTS 100x more time than a corresponding full width search, it might not be able to ever find a 7-ply trap. This is due the simulation allocation policy favouring the best looking moves. If on the other hand you have a uniform policy, then it would find the tactics quicker but not as fast as an alphabeta engine. The problem is that an MCTS searcher converges to a MINMAX tree not alpha-beta pruned tree. Alpha-Beta rollouts MCTS on the other hand can find it as fast as standard alphabeta engines, and also allows you to immediately import heuristics such as lmr+nullmove into a rollouts version. With A0's MCTS search, there is always going to be some deep tactics that A0 is going to miss and stockish will find. A0 may alleviate the problem enough to beat Stockfsh using massive hardware like 4-TPU's but it is always going to have this tactical problem and looks silly sometimes.

Daniel Shawul · Post by **Daniel Shawul** » Thu Apr 19, 2018 5:12 pm

Ras wrote:
Daniel Shawul wrote:Why is that GPU FLOPS are different from CPU ones, again
I have explained that several times now. Please re-read, I won't explain it over and over.

GPU flops are different from CPU flops because you need to have a different algorithm ?? Are you serious? What a dumb "explanation" that is.

A FLOP is a performance metric (Floating Point Operations Per Second) period.

The top-500 ranks super-computers using FLOPS metric based on LINPACK or something like that without worrying about what is used to build it ( CPU, GPU, KNL etc..)

Doing a dot product is not even such a hard algorithm to implement anyway.

LC0 on 43 cores had a ~2700 CCRL ELO performance.

Re: LC0 on 43 cores had a ~2700 CCRL ELO performance.

Re: LC0 on 43 cores had a ~2700 CCRL ELO performance.

Re: LC0 on 43 cores had a ~2700 CCRL ELO performance.

Re: LC0 on 43 cores had a ~2700 CCRL ELO performance.

Re: LC0 on 43 cores had a ~2700 CCRL ELO performance.

Re: LC0 on 43 cores had a ~2700 CCRL ELO performance.

Re: LC0 on 43 cores had a ~2700 CCRL ELO performance.

Re: LC0 on 43 cores had a ~2700 CCRL ELO performance.

Re: LC0 on 43 cores had a ~2700 CCRL ELO performance.

Re: LC0 on 43 cores had a ~2700 CCRL ELO performance.