Is AlphaZero-LC0 (Leela0) A chess Engine?

WinPooh · Post by **WinPooh** » Wed Jun 05, 2019 1:06 pm

Sorry for off-topic, but could anybody give me a link to lc0 network (weights file) which played in TCEC-15 Superfinal? As I know it is not a regular net available from the official lc0 site.

crem · Post by **crem** » Wed Jun 05, 2019 1:25 pm

WinPooh wrote: ↑Wed Jun 05, 2019 1:06 pm Sorry for off-topic, but could anybody give me a link to lc0 network (weights file) which played in TCEC-15 Superfinal? As I know it is not a regular net available from the official lc0 site.

Direct link: http://lc0.org/t40-t8-610
Link to the directory: http://lc0.org/contrib

supersharp77 · Post by **supersharp77** » Wed Jun 05, 2019 8:24 pm

Robert Pope wrote: ↑Tue Jun 04, 2019 7:50 pm
supersharp77 wrote: ↑Tue Jun 04, 2019 7:00 pm
Dann Corbit wrote: ↑Sat Feb 23, 2019 12:42 am It does not work like that. It starts with the rules of the game.
It does self play to learn what things are good and what things are bad.
It is not memorizing board positions.
It writes out a file of numbers called a network that contains the values for different board features.
It uses this network to judge board positions.

The MCTS sampling is just a search method where it looks at future positions (alpha-beta also does this, but not by sampling).

It is a different approach. But it is still a chess engine.
Ok Dann Nice answer..Then why are these successful Neural Nets so HUGE(ex 90 million games NN) and why does the speed factor so large in the performance results? Thx AR
They are huge for one key reason: Neural nets are a very GENERAL solution for a very specific problem.
Two implications that come out of that:
1. There is probably a much more effective way to encode the knowledge of chess, but WE don't KNOW it. So for now, we use a giant sausage grinder, and A LOT of meat.
2. There is NO "SMARTS" to how the information in the net is coded. The neural net doesn't have ANY LOGIC to let it generalize like that, so it needs a huge number of neurons (and games) to figure it out. Part of that is intentional, though. A neural net will figure out on its own that if it can get a head start for it's pawn, the enemy king will never stop it from promoting. 100/300/300/500/900 will never be able to teach that.
3. Because it is a general solution, you need A LOT of DIFFERENT EXAMPLES to fine tune the solution. 200,000 positions might be enough to tune a 64x64 piece-square table in a traditional engine. Tuning a 64x64x64x64 piece square table would be a whole different can of worms.

Thank You....Great Answer My Friend..Then Why are there people all over the internet and on You tube boasting about how SMART and BRILLIANT LC0/Leela0/Alpha Zero is/was compared to Stockfish and other Chess Engines(ex Houdini-Komodo-SugaR every day?
Point #2 A thought came to my mind after all this discussion of Card Speeds, GPU's TPU's CPU's Geoforce cards..Overclocking and Super systems to "maximize LC0" Heres the thought.."If Someone Has A Old VW Beetle and Then Puts a Ferrari engine suspension and transmission in it and special wheels and tires and a Nitro system in it for performance..Is it still a VW or Is It Now really a Ferrari? wheres the Cutoff?" Thx AR

Robert Pope · Post by **Robert Pope** » Wed Jun 05, 2019 9:18 pm

supersharp77 wrote: ↑Wed Jun 05, 2019 8:24 pm
Robert Pope wrote: ↑Tue Jun 04, 2019 7:50 pm
supersharp77 wrote: ↑Tue Jun 04, 2019 7:00 pm
Dann Corbit wrote: ↑Sat Feb 23, 2019 12:42 am It does not work like that. It starts with the rules of the game.
It does self play to learn what things are good and what things are bad.
It is not memorizing board positions.
It writes out a file of numbers called a network that contains the values for different board features.
It uses this network to judge board positions.

The MCTS sampling is just a search method where it looks at future positions (alpha-beta also does this, but not by sampling).

It is a different approach. But it is still a chess engine.
Ok Dann Nice answer..Then why are these successful Neural Nets so HUGE(ex 90 million games NN) and why does the speed factor so large in the performance results? Thx AR
They are huge for one key reason: Neural nets are a very GENERAL solution for a very specific problem.
Two implications that come out of that:
1. There is probably a much more effective way to encode the knowledge of chess, but WE don't KNOW it. So for now, we use a giant sausage grinder, and A LOT of meat.
2. There is NO "SMARTS" to how the information in the net is coded. The neural net doesn't have ANY LOGIC to let it generalize like that, so it needs a huge number of neurons (and games) to figure it out. Part of that is intentional, though. A neural net will figure out on its own that if it can get a head start for it's pawn, the enemy king will never stop it from promoting. 100/300/300/500/900 will never be able to teach that.
3. Because it is a general solution, you need A LOT of DIFFERENT EXAMPLES to fine tune the solution. 200,000 positions might be enough to tune a 64x64 piece-square table in a traditional engine. Tuning a 64x64x64x64 piece square table would be a whole different can of worms.
Thank You....Great Answer My Friend..Then Why are there people all over the internet and on You tube boasting about how SMART and BRILLIANT LC0/Leela0/Alpha Zero is/was compared to Stockfish and other Chess Engines(ex Houdini-Komodo-SugaR every day?

It's generality also makes it capable of discovering connections that human programmers would never stumble on in a million years. Imagine that human programmers start with the piece values 100/300/300/500/900. And then eventually, piece square tables are discovered. And then different piece square tables start to be in endgames. And then certain specific material combinations are found to be weaker than others.

All of those "discoveries" are important to help the evaluation be more accurate, but each is only a tiny sliver of the whole picture, and it gets harder and harder for programmers to find new slivers of importance as engines get more refined. The neural nets take a whole different tack, by using a massive network to try to approximate the whole "truth of chess", within the capabilities of its framework. It's understanding of the truth is poor to start, but every game that is added helps it get a more accurate gestalt.

And the brilliant part comes in because there are basically no missing "slivers" of knowledge that it lacks, only an incomplete (and improving) understanding of all slivers as a whole.

Dann Corbit · Post by **Dann Corbit** » Wed Jun 05, 2019 9:33 pm

It really is rather amazing what LC0 and Alpha0 manage to do.

Consider the decades that people have spent refining alpha-beta chess search, and the enormous effort spent figuring out how to evaluate correctly.
In a few short years, the GPU approach has caught up.
That's kind of astonishing.

On the other hand, if you look at the pure millions of instructions per second produced by the GPU cards compared even to the strongest mutli-core CPUs, then it is a terrible return (Elo per instruction executed).
The RTX 2080 does 20.14 TFLOPS in 16 bit FP mode.
Two of those produce 40 TFLOPS. {trillion floating point operations per second}

Geekbench base processor is Intel Core i5-2520M @ 2.50 GHz
13.80GFlops = 2500 geekbench points

Here is the record score:
Nov 08, 2018 Dell Inc. PowerEdge R840 Intel Xeon Platinum 8180M 3800 MHz (112 cores) Linux 64-bit mattl 4700 155050
62.02 faster than base processor, gives 855.876 GFlops. {billion floating point operations per second}

So two RTX 2080 cards are 46 times faster than the fastest machine ever benched on Geekbench in terms of Flops from a CPU.

Two RTX cards help LC0 to be just stronger than Stockfish. Not at all remarkable, considering the pure compute power thrown at it.

supersharp77 · Post by **supersharp77** » Wed Jun 05, 2019 9:38 pm

Robert Pope wrote: ↑Wed Jun 05, 2019 9:18 pm
supersharp77 wrote: ↑Wed Jun 05, 2019 8:24 pm
Robert Pope wrote: ↑Tue Jun 04, 2019 7:50 pm
supersharp77 wrote: ↑Tue Jun 04, 2019 7:00 pm
Dann Corbit wrote: ↑Sat Feb 23, 2019 12:42 am It does not work like that. It starts with the rules of the game.
It does self play to learn what things are good and what things are bad.
It is not memorizing board positions.
It writes out a file of numbers called a network that contains the values for different board features.
It uses this network to judge board positions.

The MCTS sampling is just a search method where it looks at future positions (alpha-beta also does this, but not by sampling).

It is a different approach. But it is still a chess engine.
Ok Dann Nice answer..Then why are these successful Neural Nets so HUGE(ex 90 million games NN) and why does the speed factor so large in the performance results? Thx AR
They are huge for one key reason: Neural nets are a very GENERAL solution for a very specific problem.
Two implications that come out of that:
1. There is probably a much more effective way to encode the knowledge of chess, but WE don't KNOW it. So for now, we use a giant sausage grinder, and A LOT of meat.
2. There is NO "SMARTS" to how the information in the net is coded. The neural net doesn't have ANY LOGIC to let it generalize like that, so it needs a huge number of neurons (and games) to figure it out. Part of that is intentional, though. A neural net will figure out on its own that if it can get a head start for it's pawn, the enemy king will never stop it from promoting. 100/300/300/500/900 will never be able to teach that.
3. Because it is a general solution, you need A LOT of DIFFERENT EXAMPLES to fine tune the solution. 200,000 positions might be enough to tune a 64x64 piece-square table in a traditional engine. Tuning a 64x64x64x64 piece square table would be a whole different can of worms.
Thank You....Great Answer My Friend..Then Why are there people all over the internet and on You tube boasting about how SMART and BRILLIANT LC0/Leela0/Alpha Zero is/was compared to Stockfish and other Chess Engines(ex Houdini-Komodo-SugaR every day?
Point #2 A thought came to my mind after all this discussion of Card Speeds, GPU's TPU's CPU's Geoforce cards..Overclocking and Super systems to "maximize LC0" Heres the thought.."If Someone Has A Old VW Beetle and Then Puts a Ferrari engine suspension and transmission in it and special wheels and tires and a Nitro system in it for performance..Is it still a VW or Is It Now really a Ferrari? wheres the Cutoff?" Thx AR
It's generality also makes it capable of discovering connections that human programmers would never stumble on in a million years. Imagine that human programmers start with the piece values 100/300/300/500/900. And then eventually, piece square tables are discovered. And then different piece square tables start to be in endgames. And then certain specific material combinations are found to be weaker than others.

All of those "discoveries" are important to help the evaluation be more accurate, but each is only a tiny sliver of the whole picture, and it gets harder and harder for programmers to find new slivers of importance as engines get more refined. The neural nets take a whole different tack, by using a massive network to try to approximate the whole "truth of chess", within the capabilities of its framework. It's understanding of the truth is poor to start, but every game that is added helps it get a more accurate gestalt.

And the brilliant part comes in because there are basically no missing "slivers" of knowledge that it lacks, only an incomplete (and improving) understanding of all slivers as a whole.

The Flaw in That Logic is The We Have All Been Told That There Are "Infinite Numbers Of Possibilities In The Game Of Chess" How Large does That NN (Neural Net)) Have To Be To Encapsulate "That Infinite Number Of Move Combinations and Permutations?" And How fast The Machine? .....Thx AR

chrisw · Post by **chrisw** » Wed Jun 05, 2019 9:43 pm

Robert Pope wrote: ↑Wed Jun 05, 2019 9:18 pm
supersharp77 wrote: ↑Wed Jun 05, 2019 8:24 pm
Robert Pope wrote: ↑Tue Jun 04, 2019 7:50 pm
supersharp77 wrote: ↑Tue Jun 04, 2019 7:00 pm
Dann Corbit wrote: ↑Sat Feb 23, 2019 12:42 am It does not work like that. It starts with the rules of the game.
It does self play to learn what things are good and what things are bad.
It is not memorizing board positions.
It writes out a file of numbers called a network that contains the values for different board features.
It uses this network to judge board positions.

The MCTS sampling is just a search method where it looks at future positions (alpha-beta also does this, but not by sampling).

It is a different approach. But it is still a chess engine.
Ok Dann Nice answer..Then why are these successful Neural Nets so HUGE(ex 90 million games NN) and why does the speed factor so large in the performance results? Thx AR
They are huge for one key reason: Neural nets are a very GENERAL solution for a very specific problem.
Two implications that come out of that:
1. There is probably a much more effective way to encode the knowledge of chess, but WE don't KNOW it. So for now, we use a giant sausage grinder, and A LOT of meat.
2. There is NO "SMARTS" to how the information in the net is coded. The neural net doesn't have ANY LOGIC to let it generalize like that, so it needs a huge number of neurons (and games) to figure it out. Part of that is intentional, though. A neural net will figure out on its own that if it can get a head start for it's pawn, the enemy king will never stop it from promoting. 100/300/300/500/900 will never be able to teach that.
3. Because it is a general solution, you need A LOT of DIFFERENT EXAMPLES to fine tune the solution. 200,000 positions might be enough to tune a 64x64 piece-square table in a traditional engine. Tuning a 64x64x64x64 piece square table would be a whole different can of worms.
Thank You....Great Answer My Friend..Then Why are there people all over the internet and on You tube boasting about how SMART and BRILLIANT LC0/Leela0/Alpha Zero is/was compared to Stockfish and other Chess Engines(ex Houdini-Komodo-SugaR every day?
It's generality also makes it capable of discovering connections that human programmers would never stumble on in a million years. Imagine that human programmers start with the piece values 100/300/300/500/900. And then eventually, piece square tables are discovered. And then different piece square tables start to be in endgames. And then certain specific material combinations are found to be weaker than others.

All of those "discoveries" are important to help the evaluation be more accurate, but each is only a tiny sliver of the whole picture, and it gets harder and harder for programmers to find new slivers of importance as engines get more refined. The neural nets take a whole different tack, by using a massive network to try to approximate the whole "truth of chess", within the capabilities of its framework. It's understanding of the truth is poor to start, but every game that is added helps it get a more accurate gestalt.

And the brilliant part comes in because there are basically no missing "slivers" of knowledge that it lacks, only an incomplete (and improving) understanding of all slivers as a whole.

Do you know the mean error between the actual output of the value head and the target train value?

whereagles · Post by **whereagles** » Thu Jun 06, 2019 1:25 am

supersharp77 wrote: ↑Wed Jun 05, 2019 8:24 pm."If Someone Has A Old VW Beetle and Then Puts a Ferrari engine suspension and transmission in it and special wheels and tires and a Nitro system in it for performance..Is it still a VW or Is It Now really a Ferrari? wheres the Cutoff?" Thx AR

well doh it's a ferrari in a beetle shell

Robert Pope · Post by **Robert Pope** » Thu Jun 06, 2019 3:55 pm

chrisw wrote: ↑Wed Jun 05, 2019 9:43 pm
Do you know the mean error between the actual output of the value head and the target train value?

That would be dependent on the exact net you choose to examine, as well as the specific data you are comparing against. Also, not really relevant to his question or my response.

jp · Post by jp » Thu Jun 06, 2019 4:23 pm

Dann Corbit wrote: ↑Wed Jun 05, 2019 9:33 pm Consider the decades that people have spent refining alpha-beta chess search, and the enormous effort spent figuring out how to evaluate correctly.
In a few short years, the GPU approach has caught up.
That's kind of astonishing.

Neural networks are many decades old, in fact roughly the same age as alpha-beta, so we should not think this has happened rapidly.

Dann Corbit wrote: ↑Wed Jun 05, 2019 9:33 pm On the other hand, if you look at the pure millions of instructions per second produced by the GPU cards compared even to the strongest mutli-core CPUs, then it is a terrible return (Elo per instruction executed).

So two RTX 2080 cards are 46 times faster than the fastest machine ever benched on Geekbench in terms of Flops from a CPU.

Two RTX cards help LC0 to be just stronger than Stockfish. Not at all remarkable, considering the pure compute power thrown at it.

In this sense, NN engines rely completely on brute-force computing power.

Is AlphaZero-LC0 (Leela0) A chess Engine?

Re: Is AlphaZero-LC0 (Leela0) A chess Engine?

Re: Is AlphaZero-LC0 (Leela0) A chess Engine?

Re: Is AlphaZero-LC0 (Leela0) A chess Engine?

Re: Is AlphaZero-LC0 (Leela0) A chess Engine?

Re: Is AlphaZero-LC0 (Leela0) A chess Engine?

Re: Is AlphaZero-LC0 (Leela0) A chess Engine?

Re: Is AlphaZero-LC0 (Leela0) A chess Engine?

Re: Is AlphaZero-LC0 (Leela0) A chess Engine?

Re: Is AlphaZero-LC0 (Leela0) A chess Engine?

Re: Is AlphaZero-LC0 (Leela0) A chess Engine?