LC0 on 43 cores had a ~2700 CCRL ELO performance.

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
George Tsavdaris
Posts: 1627
Joined: Thu Mar 09, 2006 12:35 pm

LC0 on 43 cores had a ~2700 CCRL ELO performance.

Post by George Tsavdaris »

So after the 46 games conclusion on the tournaments on TCEC, we had the following results of the LC0 on 44 cores against Scorpio 2.79, Stockfish 1.0 and Fruit 2.1:

LC0 - Scorpio +4 =3 -13 -168 ELO
LC0 - Stockfish 1.0 +7 = 3 -10 -53 ELO
LC0 - Fruit 2.2 +4 =1 -1 +191 ELO

Overall performance is 2706 CCRL ELO.
That is a big surprise at least to me.
I expected LC0 to lose about 19-1 or 18-2 against Scorpio and something like that(a little less) against SF 1.0 but 2700 CCRL ELO was interesting.

Perhaps LC0 in order to play well, needs a really strong hardware. So perhaps our test matches on short time controls and even on long time controls on medium or even relatively good hardware are "flawed" in the sense that don't show the real capabilities of LC0.
43 CPUs must be about equivalent to one GTX 1080 so one needs a good GPU to run LC0.

And the result could easily be much better for LC0 since in easily drawn positions with perpetual, it decided to reject it and lose.
LC0 seemed to have a super strong middlegame knowledge, with incredible "heuristics" for attacking the opponent King, but was very weak at tactics and very bad to even abysmal in endgames.


Also by seeing the LC0 networks and relative ELO of it compared to number of self-played games, as also the same for A0 from Deepmind's paper, we can see that A0 had a BIG regression and deceleration when it reached 2300 CCRL ELO that made it stay between 2300 and 2500 CCRL ELO for 3 million games. From 3 million training games to 6 million training games.
But after it passed that, its strength grew exponentially and went from 2500 ELO to 3000 in just 1 million games.
LC0 has a similar regression and its big bet is if it will overcome it. Now that it approaches 6 million games also.


x-axis is the number of games trained.
y-axis the ELO given for each engine.
The ELO's for each engine are not comparable between each other but just show the strength variance for the number of training games. But A0's ELO shows the corresponding CCRL ELO that A0 had during to each stage of its development.
Image
After his son's birth they've asked him:
"Is it a boy or girl?"
YES! He replied.....
Hai
Posts: 598
Joined: Sun Aug 04, 2013 1:19 pm

Re: LC0 on 43 cores had a ~2700 CCRL ELO performance.

Post by Hai »

The newest LC0 got already +50 elo and +32 elo improvements, so it is already 82 elo stronger. And much more on better hardware.
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: LC0 on 43 cores had a ~2700 CCRL ELO performance.

Post by Daniel Shawul »

But then isn't this a hardware trick ? What makes it different from Deep Blue that had hardware to accelerate its evaluation. They could have added chess knowledge they could think of without worrying about slowing it down -- as long as their FGPA keeps the eval time constant. I am sure stockfish have been removing eval through the years because it was slowing it down, though the eval feature would have been a major win had the eval time remained constant through some hardware support. E.g. Evaluating space in a detailed way I am sure would add more and more elos, if it didn't cost time. For all we know, leela zero could already be a 3500 elo engine if it were run on 4 TPUs... Not impressed ...
Werewolf
Posts: 1796
Joined: Thu Sep 18, 2008 10:24 pm

Re: LC0 on 43 cores had a ~2700 CCRL ELO performance.

Post by Werewolf »

Daniel Shawul wrote:But then isn't this a hardware trick ? What makes it different from Deep Blue that had hardware to accelerate its evaluation. They could have added chess knowledge they could think of without worrying about slowing it down -- as long as their FGPA keeps the eval time constant. I am sure stockfish have been removing eval through the years because it was slowing it down, though the eval feature would have been a major win had the eval time remained constant through some hardware support. E.g. Evaluating space in a detailed way I am sure would add more and more elos, if it didn't cost time. For all we know, leela zero could already be a 3500 elo engine if it were run on 4 TPUs... Not impressed ...
That's a really good point, except the hardware we could potentially get for LCZero is more easily obtainable at a certain price.

I can basically make a supercomputer for it by spending £2700 on a Titan V. But spending the same amount of money on a Xeon workstation doesn't get very much for an alpha beta engine.
duncan
Posts: 12038
Joined: Mon Jul 07, 2008 10:50 pm

Re: LC0 on 43 cores had a ~2700 CCRL ELO performance.

Post by duncan »

Daniel Shawul wrote:But then isn't this a hardware trick ? What makes it different from Deep Blue that had hardware to accelerate its evaluation. They could have added chess knowledge they could think of without worrying about slowing it down -- as long as their FGPA keeps the eval time constant. I am sure stockfish have been removing eval through the years because it was slowing it down, though the eval feature would have been a major win had the eval time remained constant through some hardware support. E.g. Evaluating space in a detailed way I am sure would add more and more elos, if it didn't cost time. For all we know, leela zero could already be a 3500 elo engine if it were run on 4 TPUs... Not impressed ...
if scorpio were given 10 times the time to play against lco. would you change the code to put more eval in or keep things as they are to gain extra search to maximise the win.?
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: LC0 on 43 cores had a ~2700 CCRL ELO performance.

Post by Daniel Shawul »

duncan wrote:
Daniel Shawul wrote:But then isn't this a hardware trick ? What makes it different from Deep Blue that had hardware to accelerate its evaluation. They could have added chess knowledge they could think of without worrying about slowing it down -- as long as their FGPA keeps the eval time constant. I am sure stockfish have been removing eval through the years because it was slowing it down, though the eval feature would have been a major win had the eval time remained constant through some hardware support. E.g. Evaluating space in a detailed way I am sure would add more and more elos, if it didn't cost time. For all we know, leela zero could already be a 3500 elo engine if it were run on 4 TPUs... Not impressed ...
if scorpio were given 10 times the time to play against lco. would you change the code to put more eval in or keep things as they are to gain extra search to maximise the win.?
I estimated scorpio's eval to be about 100x faster than the 10x128 nn eval of lc0.
With one of the MCTS versions of scorpio that i estimate to be 2300-2400 leela is performing 300-400 elo weaker on single core matches with a time control of 80+0.2 as i posted elsewhere.
But this LC0 engine could actually be a 3500 elo on 4 TPUs, that just seems hard to attribute to algorithmic breakthrough ...

Well for alpha-beta engines faster search (tactics) seems to have won over bulky but accurate evaluation . But if scorio were to go against A0 or LC0, it is probably best to add in more evaluation because clearly these engines are not going to screw you with tactics...

I would add that even A0 probably couldn't have used full width alpha-beta with its 80 kns because then it would have to evaluate all possible branches. So the optimum for them is to use a very selective search ( and risk tactical exposure) and compensate for that with a big NN evaluation, while pushing up the minimum hardware requirement that is outside of many people's reach. Oh well GPU's are getting cheaper which plays in their favour ...
carldaman
Posts: 2283
Joined: Sat Jun 02, 2012 2:13 am

Re: LC0 on 43 cores had a ~2700 CCRL ELO performance.

Post by carldaman »

For what it's worth, I'm getting performances of 2400+ for Leela 0.6 net129, running on 4 physical cores at 40/40 repeating time control.

It's been beating up on learning-enabled RomiChess P3N at this time control, but the total number of games is still low. The quality of the games is good, sometimes spectacular. (I'm always more interested in the quality of the games than their quantity).

CL
duncan
Posts: 12038
Joined: Mon Jul 07, 2008 10:50 pm

Re: LC0 on 43 cores had a ~2700 CCRL ELO performance.

Post by duncan »

Daniel Shawul wrote: Well for alpha-beta engines faster search (tactics) seems to have won over bulky but accurate evaluation . But if scorio were to go against A0 or LC0, it is probably best to add in more evaluation because clearly these engines are not going to screw you with tactics...

would you say the same thing about an alphabeta engine.?
let's say scorpio scores 60% against stockfish 1. and 80% against a supersmart but slow stockfish with triple the code in it's eval function. would scorpio score more than 80% if it thinned out it's eval function and became faster because clearly supersmart but slow stockfish is not going to screw you with tactics ?

intuitively I would say no.but then I know nothing about computer chess.
mirek
Posts: 52
Joined: Sat Mar 24, 2018 4:18 pm

Re: LC0 on 43 cores had a ~2700 CCRL ELO performance.

Post by mirek »

Daniel Shawul wrote: But this LC0 engine could actually be a 3500 elo on 4 TPUs, that just seems hard to attribute to algorithmic breakthrough ...
I think this greatly depends on time control.
If for example we were speaking 1 min / move as in A0 paper we could estimate expected performance of A0 on single high-end consumer GPU like 1080Ti.

4xTPU = 80 kN/s
1x1080Ti = 2kN/s (estimated value for big A0 net)

that means at 1min we would get 120 000 nodes total for 1080Ti.
Looking at the scaling graph at the A0 paper that puts it around 1s thinking time on 4xTPU. And incidently at 1s the A0 on 4xTPU is performing basically comparable to SF8 @ 64 core machine, 1minute per move.
So if you were after performance / $ and time control >= 1 move per minute then A0 seems a winner there.
Daniel Shawul wrote: Well for alpha-beta engines faster search (tactics) seems to have won over bulky but accurate evaluation .
It's not only better evaluation but also the much better pruning - when we are speaking about A0.
If you only improve eval, which results to it being more time expensive without updating how variations get pruned than of course the overall effect will be weaker elo. However if you have "something" that can effectively guide the search (like NN for example) which in turn will reduce the search branching factor then it's clear that it must be beneficial to the resulting strength at least for limiting case of thinking time approaching infinity :)

This should be obvious as any inefficiency in eval function means only a constant slowdown, while lower branching factor means exponential speedup which given enough thinking time will overcome any possible constant slowdown.

As it stands around these days: A0 + 1080Ti is just enough to overcome the constant slowdown to be competitive with stockfish8 at 1min per move (and possibly even stronger on longer time controls with proper time management). However it's more or less clear that more powerfull GPUs will hit the market in the upcoming years so the NN constant slowdown will become less and less of a problem, while I don't expect to see any considerable progress on the CPU side of the things for traditional alpha-beta engines. You most likely can't scale effectively much beyond 64 cores: not today not in 5 years (and from looking at the "progress" desktop / server CPUs have made in terms of instructions per clock that doesn't leave much optimism either) While scaling is no problem in A0 approach and the more powerful hardware is more or less certain to arrive.
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: LC0 on 43 cores had a ~2700 CCRL ELO performance.

Post by Milos »

mirek wrote:
Daniel Shawul wrote: But this LC0 engine could actually be a 3500 elo on 4 TPUs, that just seems hard to attribute to algorithmic breakthrough ...
I think this greatly depends on time control.
If for example we were speaking 1 min / move as in A0 paper we could estimate expected performance of A0 on single high-end consumer GPU like 1080Ti.

4xTPU = 80 kN/s
1x1080Ti = 2kN/s (estimated value for big A0 net)

that means at 1min we would get 120 000 nodes total for 1080Ti.
Looking at the scaling graph at the A0 paper that puts it around 1s thinking time on 4xTPU. And incidently at 1s the A0 on 4xTPU is performing basically comparable to SF8 @ 64 core machine, 1minute per move.
So if you were after performance / $ and time control >= 1 move per minute then A0 seems a winner there.
Daniel Shawul wrote: Well for alpha-beta engines faster search (tactics) seems to have won over bulky but accurate evaluation .
It's not only better evaluation but also the much better pruning - when we are speaking about A0.
If you only improve eval, which results to it being more time expensive without updating how variations get pruned than of course the overall effect will be weaker elo. However if you have "something" that can effectively guide the search (like NN for example) which in turn will reduce the search branching factor then it's clear that it must be beneficial to the resulting strength at least for limiting case of thinking time approaching infinity :)

This should be obvious as any inefficiency in eval function means only a constant slowdown, while lower branching factor means exponential speedup which given enough thinking time will overcome any possible constant slowdown.

As it stands around these days: A0 + 1080Ti is just enough to overcome the constant slowdown to be competitive with stockfish8 at 1min per move (and possibly even stronger on longer time controls with proper time management). However it's more or less clear that more powerfull GPUs will hit the market in the upcoming years so the NN constant slowdown will become less and less of a problem, while I don't expect to see any considerable progress on the CPU side of the things for traditional alpha-beta engines. You most likely can't scale effectively much beyond 64 cores: not today not in 5 years (and from looking at the "progress" desktop / server CPUs have made in terms of instructions per clock that doesn't leave much optimism either) While scaling is no problem in A0 approach and the more powerful hardware is more or less certain to arrive.
A0 was not better than SF8 even with 4TPUs. Rigged tests and and cherry picked results mean nothing.
Claiming A0 on 1080Ti would be on par or better than SF8 on 64 cores at 1min/move is just baseless claim (figure 2 from the paper is obviously fake since SF doesn't scale as shown and this has been demonstrated long ago).
What Daniel is constantly pointing is that alpha-beta as general search might not have the future, but there is absolutely zero evidence that NN evaluation is the key. A0 got its performance from massive amount of rollouts thanks to massive hardware. If you used better MCTS and AB for leaves evaluation instead of NN you could get better performance on comparably massive hardware (1k CPU cores).