I can't believe that so many people don't get it!

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: I can't believe that so many people don't get it!

Post by Rebel »

hgm wrote:The 100 games all started from the normal start position.
Nothing of that in the document.

But if so,then where are all the random openings coming from? Nothing in the paper. And I may assume no randomness in SF. We have only seen 10 games and I may hope no doubles with 1 min per move and then BOOM -> force move.

Instead we have:

Table 2 analyses the most common human openings (those played more than 100,000 times in an online database of human chess games (1)). Each of these openings is independently discovered and played frequently by AlphaZero during self-play training. When starting from each human opening, AlphaZero convincingly defeated Stockfish, suggesting that it has indeed mastered a wide spectrumof chess play.
User avatar
hgm
Posts: 27795
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: I can't believe that so many people don't get it!

Post by hgm »

One question: was Romi Chess in any way learning while it was playing its opponent. Or was it purely learing from self play, and only using the thus learned knowledge against Rybka.

And a second question: the learn file was said to cotain WDL statistics of positions. How would Romi Chess benefit from that information once it got to a position that was not in the file?
FWCC
Posts: 117
Joined: Wed Aug 22, 2007 4:39 pm

Re: I can't believe that so many people don't get it!

Post by FWCC »

I don't think you get it sir Alpha Zero was using a new algorithm to also help it to learn chess and go there is more to it than learn files there is a deeper mathematics involved.As I have said before the possibilities are endless with this algorithm they can use Alpha 0 to help cure diseases to work on Stellar Mapping and also to come up with new propulsion for space travel the possibilities are endless.

FWCC
syzygy
Posts: 5563
Joined: Tue Feb 28, 2012 11:56 pm

Re: I can't believe that so many people don't get it!

Post by syzygy »

Rodolfo Leoni wrote:[As this conceptual experience got somehow backward propagated up to startposition, the system is the same of Romichess.
No, it's not.

Outbooking a deterministic opponent by repeating openings until a winning line is found is an obvious and ancient technique that has nothing to do with AlphaZero. OliThink used it on FICS more than 20 years ago. Many human players preceded OliThink.
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: I can't believe that so many people don't get it!

Post by Michael Sherwin »

hgm wrote:One question: was Romi Chess in any way learning while it was playing its opponent. Or was it purely learing from self play, and only using the thus learned knowledge against Rybka.

And a second question: the learn file was said to cotain WDL statistics of positions. How would Romi Chess benefit from that information once it got to a position that was not in the file?
I took a short nap and my eyes are working better.

1. Asked and answered several times. But okay once more. Part a) yes in a way because before the search all prior knowledge is loaded into the hash table then the search learns from the data and selects a move. Part b) No self learning was employed against Rybka. However, that is immaterial because Romi's learning opposes Romi's natural evaluation function and causes it to return a different result if Romi is losing.

2. WDL is learned best line only if stats are good and when that ends and there is absolutely no subtree to load into the hash table then Romi at least has played a line up to then that it has performed better at in the past so Romi is still better off at that point than without the learning.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: I can't believe that so many people don't get it!

Post by Michael Sherwin »

FWCC wrote:I don't think you get it sir Alpha Zero was using a new algorithm to also help it to learn chess and go there is more to it than learn files there is a deeper mathematics involved.As I have said before the possibilities are endless with this algorithm they can use Alpha 0 to help cure diseases to work on Stellar Mapping and also to come up with new propulsion for space travel the possibilities are endless.

FWCC
And yet if it did not store the experience it gained from the simulations at least for some time it can not do any of that.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: I can't believe that so many people don't get it!

Post by Rebel »

Daniel Shawul wrote: Lets disect what you have achieved according to CPW.

=========================================================
RomiChess is famous for its learning approach [2]
Monkey see Monkey do. Romi remembers and incorporates winning lines regardless of which side played the moves into the opening book and can play them back instantly up to 180 ply if the stats for that line remain good.
I guess that fact that you replay them 180 plies must be the real invention. I never thought of that.
Pavlov's dog experiments adapted to computer chess. Each sides moves are given a slight bonus if that side has won and the other sides moves are given a slight penalty. So, good moves can get a slight penalty and bad moves can get a slight bonus, however, through time those are corrected. These bonus/penalties are loaded into the hash table before each move by the computer. If Romi is loosing game after game then this will cause Romi to 'fish' for better moves to play until Romi starts to win.
Ok, so we are now loading the data in the hash table not the book, clever!
==========================================

Daniel
Maybe you underestimate what can be done by simple hashing. A couple of years ago I created an opening book of 150 million positions (1.6 Gb) made from CCRL/CEGT games (max 30 moves) and analysed positions by Dann Corbit and got a 102 ELO improvement [link].

And this isn't even Reinforcement Learning as Mike showed in his Qxc3! example elsewhere.

If you with Scorpio start a game against SF giving the first 5 moves of the Ruy Lopez in advance you likely will lose, you apply Reinforcement Learning and replay the game, maybe you need to do that 100-1000 times but in the end Scorpio will win. You repeat the process until Scorpio wins that line in all variantions, say 500 times in a row.

It's not nonsense.

Hence Tord's complaint makes sense when he was asked, why no opening book?
User avatar
hgm
Posts: 27795
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: I can't believe that so many people don't get it!

Post by hgm »

Rebel wrote:
hgm wrote:The 100 games all started from the normal start position.
Nothing of that in the document.
Well, it should have been if they started from non-standard positions. The 10 games they published from that match all started from the standard position.
But if so,then where are all the random openings coming from? Nothing in the paper. And I may assume no randomness in SF. We have only seen 10 games and I may hope no doubles with 1 min per move and then BOOM -> force move.
This is a good question, as in the games they did publish it is often Stockfish that deviated. So with 64 threads there seems to be a lot of randomness. AlphaZero also is see to play two different initial moves, and after the same first move two different second moves. So I guess there is a legitimate concern whether the games are sufficiently independent to claim that for calculating rating error bars they should indeed be counted as 100 games.
Instead we have:

Table 2 analyses the most common human openings (those played more than 100,000 times in an online database of human chess games (1)). Each of these openings is independently discovered and played frequently by AlphaZero during self-play training. When starting from each human opening, AlphaZero convincingly defeated Stockfish, suggesting that it has indeed mastered a wide spectrumof chess play.
This is no mystery. These are different matches, not starting from the standard opening position, each with their own result. And indeed they do give the starting position in this case. So there have been 1300 games actually, from 13 different starting positions, the 12 given popular openings, and the standard opeing position.
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: I can't believe that so many people don't get it!

Post by Michael Sherwin »

syzygy wrote:
Rodolfo Leoni wrote:[As this conceptual experience got somehow backward propagated up to startposition, the system is the same of Romichess.
No, it's not.

Outbooking a deterministic opponent by repeating openings until a winning line is found is an obvious and ancient technique that has nothing to do with AlphaZero. OliThink used it on FICS more than 20 years ago. Many human players preceded OliThink.
Except that is not what Romi's reinforcement learning is doing. How many times do I have to explain to people that wont 'hear'. The subtree with the reinforcement values are loaded into the hash table, so sorry no book. The reinforcement values just guide the search better on average. The more experience Romi has with a position the more info loaded into the hash the better Romi plays.

P.S. Rodolfo understands more of the way it works than most programmers because he worked with RomiChess for years and then for years with The Baron because Richard incorporated Romi's RL but not Romi's MSMDL. And Rodolfo has worked with many more learning engines and he knows the difference between them. He at least knows what he is talking about when he talks about Romi's learning!
Last edited by Michael Sherwin on Mon Dec 18, 2017 9:40 pm, edited 1 time in total.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
syzygy
Posts: 5563
Joined: Tue Feb 28, 2012 11:56 pm

Re: I can't believe that so many people don't get it!

Post by syzygy »

Michael Sherwin wrote:
syzygy wrote:
Rodolfo Leoni wrote:[As this conceptual experience got somehow backward propagated up to startposition, the system is the same of Romichess.
No, it's not.

Outbooking a deterministic opponent by repeating openings until a winning line is found is an obvious and ancient technique that has nothing to do with AlphaZero. OliThink used it on FICS more than 20 years ago. Many human players preceded OliThink.
Except that is not what Romi's reinforcement learning is doing. How many times do I have to explain to people that wont 'hear'. The subtree with the reinforcement values are loaded into the hash table, so sorry no book. The reinforcement values just guide the search better on average. The more experience Romi has with a position the more info loaded into the hash the better Romi plays.
And all of that has nothing to do with AlphaZero. So excuse me for not even reading.