I can't believe that so many people don't get it!

Discussion of anything and everything relating to chess playing software and machines.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Post Reply
Michael Sherwin
Posts: 3041
Joined: Fri May 26, 2006 1:00 am
Location: WY, USA
Full name: Michael Sherwin

I can't believe that so many people don't get it!

Post by Michael Sherwin » Mon Dec 18, 2017 3:04 pm

AlphaZ beat SF by the use of a 'simple trick' called a learn file with reinforcement learning. RomiChess demonstrated the same 'simple trick' 11 years ago against the world's strongest chess engine at the time beating Rybka.

It has been established that A0 has a learn file that it saves all its trained games in and stores wins, losses, draws and a percentage chance to win. RomiChess does the exact same thing. Here is a record from Romi's learn file.

Record 1 sib 487 chd 2 fs 12 ts 28 t 0 f 0 d 15 s 0 score 17 w 283 L 264 d 191

Record Number
First Sibling Record
First Child Record
From Square
To Square
Type of Move
Flags
Depth
Status
Score, reinforcement learning rewards/penalties
White Wins
Black Wins
Draws

Store a million complete games that have been guided by the stats in the learn file and tactics unlimited ply deep can be found and stored and played back or the search can be guided to find them. It is just a 'simple trick'.

I put 'simple trick' in single quotes because it is a valid trick and not some swindle. If an engine is programmed to do this then more power to it! The wins are legit and if an engine like SF, K or H etc. lose because they don't have this type of learning then tough cookies!
I hate if statements. Pawns demand if statements. Therefore I hate pawns.

Ras
Posts: 1142
Joined: Tue Aug 30, 2016 6:19 pm
Contact:

Re: I can't believe that so many people don't get it!

Post by Ras » Mon Dec 18, 2017 3:09 pm

Michael Sherwin wrote:It has been established that A0 has a learn file that it saves all its trained games in
Calling the adjusted weights of a NN a "learn file" seems stretching words quite far, IMO.

Besides, did RomiChess get its learn file by self play? Did you really not code anything in RomiChess than the rules of the game and let the engine figure out the rest by itself?

Michael Sherwin
Posts: 3041
Joined: Fri May 26, 2006 1:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: I can't believe that so many people don't get it!

Post by Michael Sherwin » Mon Dec 18, 2017 3:24 pm

Ras wrote:
Michael Sherwin wrote:It has been established that A0 has a learn file that it saves all its trained games in
Calling the adjusted weights of a NN a "learn file" seems stretching words quite far, IMO.

Besides, did RomiChess get its learn file by self play? Did you really not code anything in RomiChess than the rules of the game and let the engine figure out the rest by itself?
You are dealing with selective reporting and not the whole picture. Yes A0 has NN that performs the function of guiding the search using the information in the learn file. That is its function. The miracle is stored in the learn file.

Romi's learn file causes RomiChess to go against its normal evaluation function and play something different, something learned. So your quip, "Did you really not code anything in RomiChess than the rules of the game and let the engine figure out the rest by itself?", is really non sequitur. Anyway, I did not say AlphaZ is identical to RomiChess. I said that they both use the same, learn file trick to win games that they could not win otherwise. I have 11+ years of experience with reinforcement learning. How many years experience do you have?
I hate if statements. Pawns demand if statements. Therefore I hate pawns.

User avatar
hgm
Posts: 23615
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

Re: I can't believe that so many people don't get it!

Post by hgm » Mon Dec 18, 2017 3:41 pm

This is not what the paper describes. They used each game to adjust the weights in the neural network a tiny bit, in a direction determined by the game result. After that, they discarded the game and never looked at it again. The eventual NN that results depeds on all the game positions, for sure, but that result is not stored by position. It only contains knowledge of the kind "If you can capture a Queen with a Knight, it is on average a good idea to do so". There is no way whatsever to trace back from which positions this knowledge came. The N doesn't even remember which positions it has seen. Just the notable characteristics from the total average of all positions it has seen.

Michael Sherwin
Posts: 3041
Joined: Fri May 26, 2006 1:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: I can't believe that so many people don't get it!

Post by Michael Sherwin » Mon Dec 18, 2017 3:50 pm

But is that correct? There is evidence that they keep a learn file with wins, losses, draws and a percentage chance of winning. Are there enough neurons to remember millions of these stats or is there more to it? In a human brain data storage and neurons are synonymous but in a computer brain neurons and data storage is not synonymous and yet if one wants to use the human brain function as a parable then memory and neurons would be talked about as they are synonymous. So yes the games as a separate entity have been discarded but the 'memory' of the games is stored--in a learn file.
I hate if statements. Pawns demand if statements. Therefore I hate pawns.

Dirt
Posts: 2851
Joined: Wed Mar 08, 2006 9:01 pm
Location: Irvine, CA, USA

Re: I can't believe that so many people don't get it!

Post by Dirt » Mon Dec 18, 2017 3:54 pm

Would this work to beat top professionals at go? I don't think so. I very much doubt that Google switched to Romichess' methods when training for chess.
Deasil is the right way to go.

Ras
Posts: 1142
Joined: Tue Aug 30, 2016 6:19 pm
Contact:

Re: I can't believe that so many people don't get it!

Post by Ras » Mon Dec 18, 2017 3:57 pm

Michael Sherwin wrote:But is that correct?
That is how an NN works. The only common factor with RomiChess is that there is some way of reinforcement learning, but the rest has nothing in common. That's probably what so many people got. ;-)
Are there enough neurons to remember millions of these stats
That's not now how an NN works. Memorising is one technique that we humans can do with our brains, but actually, it's the least powerful way even we humans learn. It's about pattern recognition without precise position match, which I guess is exactly what RomiChess does not perform.

Michael Sherwin
Posts: 3041
Joined: Fri May 26, 2006 1:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: I can't believe that so many people don't get it!

Post by Michael Sherwin » Mon Dec 18, 2017 4:00 pm

Dirt wrote:Would this work to beat top professionals at go? I don't think so. I very much doubt that Google switched to Romichess' methods when training for chess.
In Go they would have to break the board up into chunks with recognizable patterns and play the highest percentage moves on those formations, but yes it would work.
I hate if statements. Pawns demand if statements. Therefore I hate pawns.

Michael Sherwin
Posts: 3041
Joined: Fri May 26, 2006 1:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: I can't believe that so many people don't get it!

Post by Michael Sherwin » Mon Dec 18, 2017 4:16 pm

Ras wrote:
Michael Sherwin wrote:But is that correct?
That is how an NN works. The only common factor with RomiChess is that there is some way of reinforcement learning, but the rest has nothing in common. That's probably what so many people got. ;-)
Are there enough neurons to remember millions of these stats
That's not now how an NN works. Memorising is one technique that we humans can do with our brains, but actually, it's the least powerful way even we humans learn. It's about pattern recognition without precise position match, which I guess is exactly what RomiChess does not perform.
And here are some quotes by some that seem to be in the know.

Truls Edvard Stokke
"Hey Michael, very interesting stuff, this seems like a table-based monte carlo policy evaluation. Impressive that you would independently discover such a thing on your own." " However this is indeed a first step towards the policy evaluation used in A0. " Then in his simulation of Ao on a pc he publishes a chart of a search tree with backed up values. And then in other subjects it is mentioned by more than one that A0 stores wins, losses, draws and a winning percentage and you guys don't argue against it. It can't store all that data in the NN. It has to be storing w,l,d,p data somewhere either in memory or on a hard drive. And to say NN does not work that way is ridiculous. NN can analyze stored data. I might not be 100% correct but what you guys are saying is, it is like those that tell me God does not work like that. Well I got news for you, God can work anyway he likes and so can NN. You might be right but don't say stupid things like NN does not work that way, lol. Is there an emoji for frustration?
I hate if statements. Pawns demand if statements. Therefore I hate pawns.

User avatar
Rebel
Posts: 4663
Joined: Thu Aug 18, 2011 10:04 am

Re: I can't believe that so many people don't get it!

Post by Rebel » Mon Dec 18, 2017 4:17 pm

hgm wrote:This is not what the paper describes.
Indeed. But we are not dealing with academics. We are dealing with a commercial company with a bad reputation. They have their mouth full of ethics but their actions are criminal, like the copying of books, like buying youtube while knowing its massive illegal copyrighted material and only God knows what they do with the data we publicly trust to the internet and thus to them, my name might as well be colored red after this post :lol:

Ranting aside, let's talk about what the paper doesn't reveal, the start positions of the 100 games. Those 50 start positions certainly can be learned as Mike described.

Post Reply