Alphazero news

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

noobpwnftw
Posts: 560
Joined: Sun Nov 08, 2015 11:10 pm

Re: Alphazero news

Post by noobpwnftw »

matthewlai wrote: Mon Dec 10, 2018 7:06 pm I actually have a lot of interesting science that still needs to be done

Do something useful for this world.
I have a some questions, is the number of 800 playouts per move during training has any correlations to the input model, network size or your training target?
Essentially the training is trying to compress the information of a 800 node search into 1, with that in mind how would you know if the network can actually take this much information or it can actually take much more than that and you would be slowing down the training by not feeding it enough? What happens if you train a 10b network or a 80b network?

Talking about science, it seems to me that actually Leela folks are doing things with a lot more research than what it appears on your paper, most of your numbers and formulas are given arbitrarily and explained by "we tried others and they failed" or something, but aren't you guys suppose to work on the principles and do fundamental researches of the NNs in general and the Leela folks bragging about their victory over Stockfish?

It's so ironic that your roles seem to be swapped in terms of doing useful things for this world. :D
jp
Posts: 1470
Joined: Mon Apr 23, 2018 7:54 am

Re: Alphazero news

Post by jp »

noobpwnftw wrote: Mon Dec 10, 2018 7:39 pm
matthewlai wrote: Mon Dec 10, 2018 7:06 pm
Talking about science, it seems to me that actually Leela folks are doing things with a lot more research than what it appears on your paper, most of your numbers and formulas are given arbitrarily and explained by "we tried others and they failed" or something, but aren't you guys suppose to work on the principles and do fundamental researches of the NNs in general and the Leela folks bragging about their victory over Stockfish?
Does the final paper say that sort of thing? Where?
User avatar
hgm
Posts: 27788
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Alphazero news

Post by hgm »

The NN does not try to remember the entire search. It just tries to extract patterns that correlate with good positions, or with moves that deserve to be searched. So the question is not how many nodes there are in the searches, but how many patterns there are in chess that give a clue to how well you are doing and what you should try next. This determines the size of the NN. A larger set of training samples just enhances the probability that relevant patterns indeed occur.
Dann Corbit
Posts: 12538
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Alphazero news

Post by Dann Corbit »

nabildanial wrote: Sat Dec 08, 2018 12:39 pm
Alexander Schmidt wrote: Sat Dec 08, 2018 12:29 pm Maybe people will go to jail because an AI thinks he will someday do a crime. One day autonomous robots will decide which person to kill, on a battlefield or to prevent a possible crime. Maybe one day an AI will press the red button.
We have the so-called "Ethics of artificial intelligence" to prevent those things from happening.
Unfortunately, the military is already building machines designed to make their own decisions to kill people.

I guess they never read Asimov's laws of robotics.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
noobpwnftw
Posts: 560
Joined: Sun Nov 08, 2015 11:10 pm

Re: Alphazero news

Post by noobpwnftw »

I do not apply my human imagination in order to explain the training process, I see it only tries to fit the output of a 800 node search by 1 node, and do so in many iterations.

If you ask how that correlates to chess or any other games, then there is a easy way out, they are domain knowledge.

Now let's just make it simple, say we have a 10x10 game of life board, what are the mathematics behind designing an NN and choosing the optimal or near optimal training configurations for it to be able to estimate the outcome of any board state after its 10K iterations with good accuracy?

I think this is the science and mathematics we need, not ELO estimates vs Stockfish. :D
Last edited by noobpwnftw on Mon Dec 10, 2018 8:30 pm, edited 3 times in total.
duncan
Posts: 12038
Joined: Mon Jul 07, 2008 10:50 pm

Re: Alphazero news

Post by duncan »

Dann Corbit wrote: Mon Dec 10, 2018 8:12 pm
nabildanial wrote: Sat Dec 08, 2018 12:39 pm
Alexander Schmidt wrote: Sat Dec 08, 2018 12:29 pm Maybe people will go to jail because an AI thinks he will someday do a crime. One day autonomous robots will decide which person to kill, on a battlefield or to prevent a possible crime. Maybe one day an AI will press the red button.
We have the so-called "Ethics of artificial intelligence" to prevent those things from happening.
Unfortunately, the military is already building machines designed to make their own decisions to kill people.

I guess they never read Asimov's laws of robotics.
Rule no. 1 has an inbuilt contradiction and so is open to interpretation.
https://www.auburn.edu/~vestmon/robotics.html
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Alphazero news

Post by Daniel Shawul »

noobpwnftw wrote: Mon Dec 10, 2018 8:12 pm I do not apply my human imagination in order to explain the training process, I see it only tries to fit the output of a 800 node search by 1 node, and do so in many iterations.

If you ask how that correlates to chess or any other games, then there is a easy way out, they are domain knowledge.

Now let's just make it simple, say we have a 10x10 game of life board, what are the mathematics behind designing an NN and choosing the optimal or near optimal training configurations for it to be able to estimate the outcome of any board state after its 10K iterations with good accuracy?

I think this is the science and mathematics we need, not ELO estimates vs Stockfish. :D
This is like tuning an evaluation function with qsearch(), search(d=1), etc... So 800 nodes maybe the equivalent of tuning with qsearch i.
In that regard you don't want to tune the NN with 800000 nodes searches (nevermind the practicality of it ) because you would be trying to
memorize a search with tactics in a NN which it isn't good at. Tuning a hand-written evaluation function with an alpha-beta search of depth=20
is probably a bad idea too.
noobpwnftw
Posts: 560
Joined: Sun Nov 08, 2015 11:10 pm

Re: Alphazero news

Post by noobpwnftw »

Daniel Shawul wrote: Mon Dec 10, 2018 8:37 pm
noobpwnftw wrote: Mon Dec 10, 2018 8:12 pm I do not apply my human imagination in order to explain the training process, I see it only tries to fit the output of a 800 node search by 1 node, and do so in many iterations.

If you ask how that correlates to chess or any other games, then there is a easy way out, they are domain knowledge.

Now let's just make it simple, say we have a 10x10 game of life board, what are the mathematics behind designing an NN and choosing the optimal or near optimal training configurations for it to be able to estimate the outcome of any board state after its 10K iterations with good accuracy?

I think this is the science and mathematics we need, not ELO estimates vs Stockfish. :D
This is like tuning an evaluation function with qsearch(), search(d=1), etc... So 800 nodes maybe the equivalent of tuning with qsearch i.
In that regard you don't want to tune the NN with 800000 nodes searches (nevermind the practicality of it ) because you would be trying to
memorize a search with tactics in a NN which it isn't good at. Tuning a hand-written evaluation function with an alpha-beta search of depth=20
is probably a bad idea too.
In the context of per node evaluation is much more expensive here, shouldn't we care about wisely choosing the number of nodes? It is even not known that if you use 2x more samples from 400 node searches(thus maintaining about the same amount of total resources needed), will there be any difference at all? Seems that people with 5000 TPUs around has nothing much to say about it, maybe someone just come up with that value during a smoke break.
Albert Silver
Posts: 3019
Joined: Wed Mar 08, 2006 9:57 pm
Location: Rio de Janeiro, Brazil

Re: Alphazero news

Post by Albert Silver »

noobpwnftw wrote: Mon Dec 10, 2018 9:08 pm
Daniel Shawul wrote: Mon Dec 10, 2018 8:37 pm
noobpwnftw wrote: Mon Dec 10, 2018 8:12 pm I do not apply my human imagination in order to explain the training process, I see it only tries to fit the output of a 800 node search by 1 node, and do so in many iterations.

If you ask how that correlates to chess or any other games, then there is a easy way out, they are domain knowledge.

Now let's just make it simple, say we have a 10x10 game of life board, what are the mathematics behind designing an NN and choosing the optimal or near optimal training configurations for it to be able to estimate the outcome of any board state after its 10K iterations with good accuracy?

I think this is the science and mathematics we need, not ELO estimates vs Stockfish. :D
This is like tuning an evaluation function with qsearch(), search(d=1), etc... So 800 nodes maybe the equivalent of tuning with qsearch i.
In that regard you don't want to tune the NN with 800000 nodes searches (nevermind the practicality of it ) because you would be trying to
memorize a search with tactics in a NN which it isn't good at. Tuning a hand-written evaluation function with an alpha-beta search of depth=20
is probably a bad idea too.
In the context of per node evaluation is much more expensive here, shouldn't we care about wisely choosing the number of nodes? It is even not known that if you use 2x more samples from 400 node searches(thus maintaining about the same amount of total resources needed), will there be any difference at all? Seems that people with 5000 TPUs around has nothing much to say about it, maybe someone just come up with that value during a smoke break.
Actually, I too wonder whether other visit counts than 800 were tested in any depth. In his live talk during Game 7 of the World Championship, Demis Hassabis openly said that there was no certainty that 800 nodes was indeed the best choice, and that another value might be better. He argued it off as being a matter of quantity versus quality and the time to generate them, but this didn't seem like a serious reason. After all, if Deep Mind can generate 44 million games using 800 visits over the course of a few hours, testing 1600 visits once or twice, especially after they had fine-tuned the entire NN training process, couldn't be much of a burden on their resources.
"Tactics are the bricks and sticks that make up a game, but positional play is the architectural blueprint."
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Alphazero news

Post by matthewlai »

noobpwnftw wrote: Mon Dec 10, 2018 7:39 pm
matthewlai wrote: Mon Dec 10, 2018 7:06 pm I actually have a lot of interesting science that still needs to be done

Do something useful for this world.
I have a some questions, is the number of 800 playouts per move during training has any correlations to the input model, network size or your training target?
Essentially the training is trying to compress the information of a 800 node search into 1, with that in mind how would you know if the network can actually take this much information or it can actually take much more than that and you would be slowing down the training by not feeding it enough? What happens if you train a 10b network or a 80b network?

Talking about science, it seems to me that actually Leela folks are doing things with a lot more research than what it appears on your paper, most of your numbers and formulas are given arbitrarily and explained by "we tried others and they failed" or something, but aren't you guys suppose to work on the principles and do fundamental researches of the NNs in general and the Leela folks bragging about their victory over Stockfish?

It's so ironic that your roles seem to be swapped in terms of doing useful things for this world. :D
I almost thought you had a genuine question and then I finished reading the post. Have fun!
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.