Alphazero news

noobpwnftw · Post by **noobpwnftw** » Mon Dec 10, 2018 7:39 pm

matthewlai wrote: ↑Mon Dec 10, 2018 7:06 pm I actually have a lot of interesting science that still needs to be done

Do something useful for this world.

I have a some questions, is the number of 800 playouts per move during training has any correlations to the input model, network size or your training target?
Essentially the training is trying to compress the information of a 800 node search into 1, with that in mind how would you know if the network can actually take this much information or it can actually take much more than that and you would be slowing down the training by not feeding it enough? What happens if you train a 10b network or a 80b network?

Talking about science, it seems to me that actually Leela folks are doing things with a lot more research than what it appears on your paper, most of your numbers and formulas are given arbitrarily and explained by "we tried others and they failed" or something, but aren't you guys suppose to work on the principles and do fundamental researches of the NNs in general and the Leela folks bragging about their victory over Stockfish?

It's so ironic that your roles seem to be swapped in terms of doing useful things for this world.

jp · Post by jp » Mon Dec 10, 2018 7:53 pm

noobpwnftw wrote: ↑Mon Dec 10, 2018 7:39 pm
matthewlai wrote: ↑Mon Dec 10, 2018 7:06 pm
Talking about science, it seems to me that actually Leela folks are doing things with a lot more research than what it appears on your paper, most of your numbers and formulas are given arbitrarily and explained by "we tried others and they failed" or something, but aren't you guys suppose to work on the principles and do fundamental researches of the NNs in general and the Leela folks bragging about their victory over Stockfish?

Does the final paper say that sort of thing? Where?

hgm · Post by **hgm** » Mon Dec 10, 2018 7:55 pm

The NN does not try to remember the entire search. It just tries to extract patterns that correlate with good positions, or with moves that deserve to be searched. So the question is not how many nodes there are in the searches, but how many patterns there are in chess that give a clue to how well you are doing and what you should try next. This determines the size of the NN. A larger set of training samples just enhances the probability that relevant patterns indeed occur.

Dann Corbit · Post by **Dann Corbit** » Mon Dec 10, 2018 8:12 pm

nabildanial wrote: ↑Sat Dec 08, 2018 12:39 pm
Alexander Schmidt wrote: ↑Sat Dec 08, 2018 12:29 pm Maybe people will go to jail because an AI thinks he will someday do a crime. One day autonomous robots will decide which person to kill, on a battlefield or to prevent a possible crime. Maybe one day an AI will press the red button.
We have the so-called "Ethics of artificial intelligence" to prevent those things from happening.

Unfortunately, the military is already building machines designed to make their own decisions to kill people.

I guess they never read Asimov's laws of robotics.

noobpwnftw · Post by **noobpwnftw** » Mon Dec 10, 2018 8:12 pm

I do not apply my human imagination in order to explain the training process, I see it only tries to fit the output of a 800 node search by 1 node, and do so in many iterations.

If you ask how that correlates to chess or any other games, then there is a easy way out, they are domain knowledge.

Now let's just make it simple, say we have a 10x10 game of life board, what are the mathematics behind designing an NN and choosing the optimal or near optimal training configurations for it to be able to estimate the outcome of any board state after its 10K iterations with good accuracy?

I think this is the science and mathematics we need, not ELO estimates vs Stockfish.

duncan · Post by **duncan** » Mon Dec 10, 2018 8:22 pm

Dann Corbit wrote: ↑Mon Dec 10, 2018 8:12 pm
nabildanial wrote: ↑Sat Dec 08, 2018 12:39 pm
Alexander Schmidt wrote: ↑Sat Dec 08, 2018 12:29 pm Maybe people will go to jail because an AI thinks he will someday do a crime. One day autonomous robots will decide which person to kill, on a battlefield or to prevent a possible crime. Maybe one day an AI will press the red button.
We have the so-called "Ethics of artificial intelligence" to prevent those things from happening.
Unfortunately, the military is already building machines designed to make their own decisions to kill people.

I guess they never read Asimov's laws of robotics.

Rule no. 1 has an inbuilt contradiction and so is open to interpretation.
https://www.auburn.edu/~vestmon/robotics.html

Daniel Shawul · Post by **Daniel Shawul** » Mon Dec 10, 2018 8:37 pm

noobpwnftw wrote: ↑Mon Dec 10, 2018 8:12 pm I do not apply my human imagination in order to explain the training process, I see it only tries to fit the output of a 800 node search by 1 node, and do so in many iterations.

If you ask how that correlates to chess or any other games, then there is a easy way out, they are domain knowledge.

Now let's just make it simple, say we have a 10x10 game of life board, what are the mathematics behind designing an NN and choosing the optimal or near optimal training configurations for it to be able to estimate the outcome of any board state after its 10K iterations with good accuracy?

I think this is the science and mathematics we need, not ELO estimates vs Stockfish.

This is like tuning an evaluation function with qsearch(), search(d=1), etc... So 800 nodes maybe the equivalent of tuning with qsearch i.
In that regard you don't want to tune the NN with 800000 nodes searches (nevermind the practicality of it ) because you would be trying to
memorize a search with tactics in a NN which it isn't good at. Tuning a hand-written evaluation function with an alpha-beta search of depth=20
is probably a bad idea too.

noobpwnftw · Post by **noobpwnftw** » Mon Dec 10, 2018 9:08 pm

Daniel Shawul wrote: ↑Mon Dec 10, 2018 8:37 pm
noobpwnftw wrote: ↑Mon Dec 10, 2018 8:12 pm I do not apply my human imagination in order to explain the training process, I see it only tries to fit the output of a 800 node search by 1 node, and do so in many iterations.

If you ask how that correlates to chess or any other games, then there is a easy way out, they are domain knowledge.

Now let's just make it simple, say we have a 10x10 game of life board, what are the mathematics behind designing an NN and choosing the optimal or near optimal training configurations for it to be able to estimate the outcome of any board state after its 10K iterations with good accuracy?

I think this is the science and mathematics we need, not ELO estimates vs Stockfish.
This is like tuning an evaluation function with qsearch(), search(d=1), etc... So 800 nodes maybe the equivalent of tuning with qsearch i.
In that regard you don't want to tune the NN with 800000 nodes searches (nevermind the practicality of it ) because you would be trying to
memorize a search with tactics in a NN which it isn't good at. Tuning a hand-written evaluation function with an alpha-beta search of depth=20
is probably a bad idea too.

In the context of per node evaluation is much more expensive here, shouldn't we care about wisely choosing the number of nodes? It is even not known that if you use 2x more samples from 400 node searches(thus maintaining about the same amount of total resources needed), will there be any difference at all? Seems that people with 5000 TPUs around has nothing much to say about it, maybe someone just come up with that value during a smoke break.

Albert Silver · Post by **Albert Silver** » Mon Dec 10, 2018 9:41 pm

noobpwnftw wrote: ↑Mon Dec 10, 2018 9:08 pm
Daniel Shawul wrote: ↑Mon Dec 10, 2018 8:37 pm
noobpwnftw wrote: ↑Mon Dec 10, 2018 8:12 pm I do not apply my human imagination in order to explain the training process, I see it only tries to fit the output of a 800 node search by 1 node, and do so in many iterations.

If you ask how that correlates to chess or any other games, then there is a easy way out, they are domain knowledge.

Now let's just make it simple, say we have a 10x10 game of life board, what are the mathematics behind designing an NN and choosing the optimal or near optimal training configurations for it to be able to estimate the outcome of any board state after its 10K iterations with good accuracy?

I think this is the science and mathematics we need, not ELO estimates vs Stockfish.
This is like tuning an evaluation function with qsearch(), search(d=1), etc... So 800 nodes maybe the equivalent of tuning with qsearch i.
In that regard you don't want to tune the NN with 800000 nodes searches (nevermind the practicality of it ) because you would be trying to
memorize a search with tactics in a NN which it isn't good at. Tuning a hand-written evaluation function with an alpha-beta search of depth=20
is probably a bad idea too.
In the context of per node evaluation is much more expensive here, shouldn't we care about wisely choosing the number of nodes? It is even not known that if you use 2x more samples from 400 node searches(thus maintaining about the same amount of total resources needed), will there be any difference at all? Seems that people with 5000 TPUs around has nothing much to say about it, maybe someone just come up with that value during a smoke break.

Actually, I too wonder whether other visit counts than 800 were tested in any depth. In his live talk during Game 7 of the World Championship, Demis Hassabis openly said that there was no certainty that 800 nodes was indeed the best choice, and that another value might be better. He argued it off as being a matter of quantity versus quality and the time to generate them, but this didn't seem like a serious reason. After all, if Deep Mind can generate 44 million games using 800 visits over the course of a few hours, testing 1600 visits once or twice, especially after they had fine-tuned the entire NN training process, couldn't be much of a burden on their resources.

matthewlai · Post by **matthewlai** » Tue Dec 11, 2018 1:11 am

noobpwnftw wrote: ↑Mon Dec 10, 2018 7:39 pm
matthewlai wrote: ↑Mon Dec 10, 2018 7:06 pm I actually have a lot of interesting science that still needs to be done

Do something useful for this world.
I have a some questions, is the number of 800 playouts per move during training has any correlations to the input model, network size or your training target?
Essentially the training is trying to compress the information of a 800 node search into 1, with that in mind how would you know if the network can actually take this much information or it can actually take much more than that and you would be slowing down the training by not feeding it enough? What happens if you train a 10b network or a 80b network?

Talking about science, it seems to me that actually Leela folks are doing things with a lot more research than what it appears on your paper, most of your numbers and formulas are given arbitrarily and explained by "we tried others and they failed" or something, but aren't you guys suppose to work on the principles and do fundamental researches of the NNs in general and the Leela folks bragging about their victory over Stockfish?

It's so ironic that your roles seem to be swapped in terms of doing useful things for this world.

I almost thought you had a genuine question and then I finished reading the post. Have fun!

Alphazero news

Re: Alphazero news

Re: Alphazero news

Re: Alphazero news

Re: Alphazero news

Re: Alphazero news

Re: Alphazero news

Re: Alphazero news

Re: Alphazero news

Re: Alphazero news

Re: Alphazero news