In the wake of the news about AlphaZero, we are indeed not particularly interested in Romi's learning. Nor are we interested in Crafty's learning or in any of the many other position-based learning implementations that existed before AlphaZero. All those approaches simply have nothing to do with AlphaZero.
It is AlphaZero where our interest lies when we are reading and posting in AlphaZero-related threads. That should not come as a surprise.
As I tried to explain to you earlier, the only reason why people are posing critical questions to you is your amazing insistence that there is some sort of strong relationship between AlphaZero and Romi's learning. Clearly, there is not.
Have you seen anyone else here that is hijacking the AlphaZero threads by continuously pushing their own program?
Google's AlphaGo team has been working on chess
Moderators: hgm, Rebel, chrisw
-
- Posts: 5563
- Joined: Tue Feb 28, 2012 11:56 pm
-
- Posts: 3196
- Joined: Fri May 26, 2006 3:00 am
- Location: WY, USA
- Full name: Michael Sherwin
Re: Google's AlphaGo team has been working on chess
I know quite a bit about RL. That comes from my experience with RomiChess. All I'm pushing is the opinion that A0's strength is due to RL. It does not matter whether it is NN+MCTS+RL or A/B+RL. Both approaches benefit greatly from RL. My main point that I'm trying to get people to understand is that SF+RL fully trained, however many training games that would take would have crushed A0 at this time in A0's development. All I have is my knowledge gained with RC so that is why I talk about RC so much. But of course my experience with RC matters zero to you and many others. RL is the miracle in A0 more than the playing algorithm and if a top engine like SF, K, H ect were to adopt RL then what I say will be proven. So attack me and RC all you want. But it does not address what I'm trying to get people to understand.syzygy wrote:In the wake of the news about AlphaZero, we are indeed not particularly interested in Romi's learning. Nor are we interested in Crafty's learning or in any of the many other position-based learning implementations that existed before AlphaZero. All those approaches simply have nothing to do with AlphaZero.
It is AlphaZero where our interest lies when we are reading and posting in AlphaZero-related threads. That should not come as a surprise.
As I tried to explain to you earlier, the only reason why people are posing critical questions to you is your amazing insistence that there is some sort of strong relationship between AlphaZero and Romi's learning. Clearly, there is not.
Have you seen anyone else here that is hijacking the AlphaZero threads by continuously pushing their own program?
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
-
- Posts: 3196
- Joined: Fri May 26, 2006 3:00 am
- Location: WY, USA
- Full name: Michael Sherwin
Re: Google's AlphaGo team has been working on chess
RL can be added to a UCI engine in the following manner. A UCI engine would have to keep a record of the game. Then on end of game the UCI engine can determine for itself the outcome of the game and then update its RL database.Michael Sherwin wrote:I know quite a bit about RL. That comes from my experience with RomiChess. All I'm pushing is the opinion that A0's strength is due to RL. It does not matter whether it is NN+MCTS+RL or A/B+RL. Both approaches benefit greatly from RL. My main point that I'm trying to get people to understand is that SF+RL fully trained, however many training games that would take would have crushed A0 at this time in A0's development. All I have is my knowledge gained with RC so that is why I talk about RC so much. But of course my experience with RC matters zero to you and many others. RL is the miracle in A0 more than the playing algorithm and if a top engine like SF, K, H ect were to adopt RL then what I say will be proven. So attack me and RC all you want. But it does not address what I'm trying to get people to understand.syzygy wrote:In the wake of the news about AlphaZero, we are indeed not particularly interested in Romi's learning. Nor are we interested in Crafty's learning or in any of the many other position-based learning implementations that existed before AlphaZero. All those approaches simply have nothing to do with AlphaZero.
It is AlphaZero where our interest lies when we are reading and posting in AlphaZero-related threads. That should not come as a surprise.
As I tried to explain to you earlier, the only reason why people are posing critical questions to you is your amazing insistence that there is some sort of strong relationship between AlphaZero and Romi's learning. Clearly, there is not.
Have you seen anyone else here that is hijacking the AlphaZero threads by continuously pushing their own program?
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
-
- Posts: 405
- Joined: Sat Jul 02, 2011 10:49 pm
Re: Google's AlphaGo team has been working on chess
" ... A0's strength is due to RL. It does not matter whether it is NN+MCTS+RL or A/B+RL."Michael Sherwin wrote:I know quite a bit about RL. That comes from my experience with RomiChess. All I'm pushing is the opinion that A0's strength is due to RL. It does not matter whether it is NN+MCTS+RL or A/B+RL. Both approaches benefit greatly from RL. My main point that I'm trying to get people to understand is that SF+RL fully trained, however many training games that would take would have crushed A0 at this time in A0's development. All I have is my knowledge gained with RC so that is why I talk about RC so much. But of course my experience with RC matters zero to you and many others. RL is the miracle in A0 more than the playing algorithm and if a top engine like SF, K, H ect were to adopt RL then what I say will be proven. So attack me and RC all you want. But it does not address what I'm trying to get people to understand.syzygy wrote:In the wake of the news about AlphaZero, we are indeed not particularly interested in Romi's learning. Nor are we interested in Crafty's learning or in any of the many other position-based learning implementations that existed before AlphaZero. All those approaches simply have nothing to do with AlphaZero.
It is AlphaZero where our interest lies when we are reading and posting in AlphaZero-related threads. That should not come as a surprise.
As I tried to explain to you earlier, the only reason why people are posing critical questions to you is your amazing insistence that there is some sort of strong relationship between AlphaZero and Romi's learning. Clearly, there is not.
Have you seen anyone else here that is hijacking the AlphaZero threads by continuously pushing their own program?
from the paper ...
https://arxiv.org/pdf/1712.01815.pdf
this is what alphazero uses:
non-linear function approximatiom
deep neural network (NN)
reinforcement learning algorithm (RL)
MCTS (averages over approximation errors)
gradient descent (parameter adjustment)
mean-squared error
cross-entropy
weight regularisation
also from the paper...
prior reinforcement learning in computer chess
NeuroChess
neural network (evaluated positions)
temporal-difference (learning)
KnightCap
neural network (evaluated positions)
temporal-difference (leaf)
Meep
linear evaluation function (evaluated positions)
temporal-difference (TreeStrap)
Giraffe
neural network (evaluated positions)
temporal-difference (leaf) [self-play]
DeepChess
neural network (trained to perform evaluation)
Hex
networks (value)
policy (dual)
and finally from the paper...
chess programs using traditional MCTS were much weaker than alpha-beta search programs...
while alpha-beta programs based on neural networks have previously been unable to compete with faster, handcrafted evaluation functions...
-
- Posts: 3196
- Joined: Fri May 26, 2006 3:00 am
- Location: WY, USA
- Full name: Michael Sherwin
Re: Google's AlphaGo team has been working on chess
I can't tell if this is pro con or just information. Maybe a match between Romi and one of these programs mentioned is in order?pilgrimdan wrote:" ... A0's strength is due to RL. It does not matter whether it is NN+MCTS+RL or A/B+RL."Michael Sherwin wrote:I know quite a bit about RL. That comes from my experience with RomiChess. All I'm pushing is the opinion that A0's strength is due to RL. It does not matter whether it is NN+MCTS+RL or A/B+RL. Both approaches benefit greatly from RL. My main point that I'm trying to get people to understand is that SF+RL fully trained, however many training games that would take would have crushed A0 at this time in A0's development. All I have is my knowledge gained with RC so that is why I talk about RC so much. But of course my experience with RC matters zero to you and many others. RL is the miracle in A0 more than the playing algorithm and if a top engine like SF, K, H ect were to adopt RL then what I say will be proven. So attack me and RC all you want. But it does not address what I'm trying to get people to understand.syzygy wrote:In the wake of the news about AlphaZero, we are indeed not particularly interested in Romi's learning. Nor are we interested in Crafty's learning or in any of the many other position-based learning implementations that existed before AlphaZero. All those approaches simply have nothing to do with AlphaZero.
It is AlphaZero where our interest lies when we are reading and posting in AlphaZero-related threads. That should not come as a surprise.
As I tried to explain to you earlier, the only reason why people are posing critical questions to you is your amazing insistence that there is some sort of strong relationship between AlphaZero and Romi's learning. Clearly, there is not.
Have you seen anyone else here that is hijacking the AlphaZero threads by continuously pushing their own program?
from the paper ...
https://arxiv.org/pdf/1712.01815.pdf
this is what alphazero uses:
non-linear function approximatiom
deep neural network (NN)
reinforcement learning algorithm (RL)
MCTS (averages over approximation errors)
gradient descent (parameter adjustment)
mean-squared error
cross-entropy
weight regularisation
also from the paper...
prior reinforcement learning in computer chess
NeuroChess
neural network (evaluated positions)
temporal-difference (learning)
KnightCap
neural network (evaluated positions)
temporal-difference (leaf)
Meep
linear evaluation function (evaluated positions)
temporal-difference (TreeStrap)
Giraffe
neural network (evaluated positions)
temporal-difference (leaf) [self-play]
DeepChess
neural network (trained to perform evaluation)
Hex
networks (value)
policy (dual)
and finally from the paper...
chess programs using traditional MCTS were much weaker than alpha-beta search programs...
while alpha-beta programs based on neural networks have previously been unable to compete with faster, handcrafted evaluation functions...
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
-
- Posts: 27793
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Google's AlphaGo team has been working on chess
And there is about as much need for that as pushing the claim that the sky is blue, water is wet, or checkmate is a win in Chess. It is what the Alpha Zero paper claims in the first place, what the whole experiment was set up to reveal: that Chess could be self-taught by reinforcement learning, without any knowledge other than the rules being revealed to it by any other means. All that it did better than a random mover, was due to RL.Michael Sherwin wrote:All I'm pushing is the opinion that A0's strength is due to RL.
-
- Posts: 6991
- Joined: Thu Aug 18, 2011 12:04 pm
Re: Google's AlphaGo team has been working on chess
To simplify things, I suppose it's not hard to imagine that it's quite simple with learning (fom a given position) to get a 100% score against your own engine very soon.CheckersGuy wrote:That alpha-beta search + reinforcement learning is indeed better than mcts + nn+reinforcement learning is still something that has to be proven. Assertions and a bulk of text doesn't help it Only a match between engines using those 2 different algorithms can be thought of as an definitive answer. Everything else is just based on certain assumptions.Michael Sherwin wrote:Marco, A0 did not win a match against SF. A0 with RL won a match against SF. Or said another way, A0 won a match against SF because SF does not have RL. Or thought of a different way, a group of programmers identified a deficiency that exist in a competitive field and took advantage of that deficiency by eliminating that deficiency in their entity. Or one can change that thought around and say RL does not belong in competitive chess because it covers up the underlying strength and correctness of the algorithm. In that case the A0 vs SF match is non sequitur and meaningless. Then there is the thought of the fan that wants RL but are ignored because they are not important and what the fan thinks or wants is not meaningful.mcostalba wrote:I have read the paper: result is impressive!
Honestly I didn't think it was possible because my understanding was that chess is more "computer friendly" than Go....I was wrong.
It is true, SF is not meant to play at its best without a book and especially 1 fixed minute per move cuts out the whole time management, it would be more natural to play with tournament conditions, but nevertheless I think these are secondary aspects, what has been accomplished is huge.
But, what you can't say is, "what has been accomplished is huge" in terms of a chess playing algorithm. You might say that what A0 has demonstrated in go, chess and shogi has accomplished a huge demonstration that a NN with RL may conquer hamanity some day. I won't argue against that. Concerning chess though the AB algorithm is not inferior to NN+MC. It is inferior to NN+MC+RL. AB+RL is far superior to NN+MC+RL.
And I said all that without mentioning RomiChess not even one time!
-
- Posts: 3196
- Joined: Fri May 26, 2006 3:00 am
- Location: WY, USA
- Full name: Michael Sherwin
Re: Google's AlphaGo team has been working on chess
From page 21 of Test no.1 (DiscoCheck w32 comparative)CheckersGuy wrote:Can you point to a link ? Where there a sufficient amount of test games played ? Would like to see the statisticsMichael Sherwin wrote:Technically correct but not practically correct. Demonstrably there is strong evidence supporting what I posted. It was demonstrated by R_m_C_e_s_ that hundreds of elo can be gained just by very few training games in real competition. And over a 1000 elo in very restrictive test with even fewer training games. Against a truly massive opening book and against 6 top engines it was demonstrated that 50 elo per 5,000 games of training is achieved. And the gain was linear during the scope of the test. So unless it is believed that a 2400 elo engine can benefit this way but a 3400 elo engine cannot then it can be assumed that the 3400 elo engine will do quite well. In the case of SF that would mean victory against A0.CheckersGuy wrote:That alpha-beta search + reinforcement learning is indeed better than mcts + nn+reinforcement learning is still something that has to be proven. Assertions and a bulk of text doesn't help it Only a match between engines using those 2 different algorithms can be thought of as an definitive answer. Everything else is just based on certain assumptions. Since we won't have a commercial version of AlphaZero anytime soon it probably will be quite some time until we find outMichael Sherwin wrote:Marco, A0 did not win a match against SF. A0 with RL won a match against SF. Or said another way, A0 won a match against SF because SF does not have RL. Or thought of a different way, a group of programmers identified a deficiency that exist in a competitive field and took advantage of that deficiency by eliminating that deficiency in their entity. Or one can change that thought around and say RL does not belong in competitive chess because it covers up the underlying strength and correctness of the algorithm. In that case the A0 vs SF match is non sequitur and meaningless. Then there is the thought of the fan that wants RL but are ignored because they are not important and what the fan thinks or wants is not meaningful.mcostalba wrote:I have read the paper: result is impressive!
Honestly I didn't think it was possible because my understanding was that chess is more "computer friendly" than Go....I was wrong.
It is true, SF is not meant to play at its best without a book and especially 1 fixed minute per move cuts out the whole time management, it would be more natural to play with tournament conditions, but nevertheless I think these are secondary aspects, what has been accomplished is huge.
But, what you can't say is, "what has been accomplished is huge" in terms of a chess playing algorithm. You might say that what A0 has demonstrated in go, chess and shogi has accomplished a huge demonstration that a NN with RL may conquer hamanity some day. I won't argue against that. Concerning chess though the AB algorithm is not inferior to NN+MC. It is inferior to NN+MC+RL. AB+RL is far superior to NN+MC+RL.
And I said all that without mentioning RomiChess not even one time!
http://www.talkchess.com/forum/viewtopi ... highlight=
[/url]Like you can see in the below capture, after 3600 games in my tournament, the singles effective learning functions seem to be implemented in:
-RomiChess P3L
-KnightCap 3.7e
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
-
- Posts: 6991
- Joined: Thu Aug 18, 2011 12:04 pm
Re: Google's AlphaGo team has been working on chess
Is your opinion based on Giraffe's results in the STS test suite?Daniel Shawul wrote:Most of us here suspected that this could happen once Giraffe showed it can beat Stockfish's eval.
From his thesis:
Page 24
Figure 4 shows the result of running the test periodically as training progresses. With the material only bootstrap, it achieves a score of approximately 6000/15000. As training progresses, it gradually improved to approximately 9500/15000, with peaks above 9700/15000, proving that it has managed to gain a tremendous amount of positional understanding.
Page 25
It is clear that Giraffe's evaluation function now has at least comparable positional understanding compared to evaluation functions of top engines in the world
Page 25
Since Giraffe discovered all the evaluation features through self-play, it is likely that it knows about patterns that have not yet been studied by humans, and hence not included in the test suite.
-
- Posts: 1563
- Joined: Thu Jul 16, 2009 10:47 am
- Location: Almere, The Netherlands
Re: Google's AlphaGo team has been working on chess
The only thing that it shows is that Giraffe has a performance comparable with top-engines on the 1500 positions from the STS test-set.
In the past I used an older version of STS to tune my evaluation-function (never changed it since), I also see a performance on STS comparable with top-engines but I'm pretty sure that my engine doesn't have the same positional understanding as e.g. Stockfish, Komodo and Houdini, to name a few.
I did an experiment once and replaced my evaluation-function with the one from an older Stockfish version, it gave me about 150 Elo gain, the score on STS remained in the same ballpark though. The score on STS doesn't tell you much, 1500 positions (very similar in each of the 15 categories) are way to little to say anything about the positional understanding of the evaluation-function.
In the past I used an older version of STS to tune my evaluation-function (never changed it since), I also see a performance on STS comparable with top-engines but I'm pretty sure that my engine doesn't have the same positional understanding as e.g. Stockfish, Komodo and Houdini, to name a few.
I did an experiment once and replaced my evaluation-function with the one from an older Stockfish version, it gave me about 150 Elo gain, the score on STS remained in the same ballpark though. The score on STS doesn't tell you much, 1500 positions (very similar in each of the 15 categories) are way to little to say anything about the positional understanding of the evaluation-function.