I can't believe that so many people don't get it!

Rodolfo Leoni · Post by **Rodolfo Leoni** » Tue Dec 19, 2017 3:59 pm

Michael Sherwin wrote: ...........................................................................
What are the expected gains if a top AB searcher like SF adopted similar learning?
...........................................................................

Hi Mike,

That above can be done without modify SF. Maybe that wouldn't be easy, but possible with a dedicated GUI. Of course that would different from hashing trees as it happens with Romi. GUI could only handle a learning book, but what's matter? When an engine uses learning it sends an output and it waits for an input. If it comes from hash or from a file, it's the same. (So the discussion about book learning or hash learning isn't relevant, as all learning engines use a file for storing positions and hashing them.)

To make SF learning by self-play, it'd suffice to set a "book contempt". That would avoid to repeat auto selfplay of a draw game

That of course is not Romi's behavior, but it would give you a learning SF effect. Hardware limits remain, tough. Our pockets are too empty for buying a 5000 TPUs hardware!

Rebel · Post by **Rebel** » Wed Dec 20, 2017 9:18 am

Daniel Shawul wrote:Come on Ed,

Allow me to answer HGM first, it's important for me to explain how I stand in this discussion in case you missed my initial post on this topic in CTF. I will answer later.

TommyTC · Post by **TommyTC** » Wed Dec 20, 2017 10:21 am

Rebel wrote:
hgm wrote:
Rebel wrote:
hgm wrote:The 100 games all started from the normal start position.
Nothing of that in the document.
Well, it should have been if they started from non-standard positions. The 10 games they published from that match all started from the standard position.
Nope.

Game-4 AZ-SF 1. d4 e6
Game-5 AZ-SF 1. d4 Nf6

I just started SF with multi-pv = 2 and played 1. d4. During the first minute of analysis, it continually flip-flopped between best moves: 1...e6 and 1...Nf6. Eventually it evaluated both at +0.06.

I'd say you've supplied information that the games did actually start from the initial position with no book!

corres · Post by **corres** » Wed Dec 20, 2017 11:09 am

[quote="Michael Sherwin"]
AlphaZ beat SF by the use of a 'simple trick' called a learn file with reinforcement learning. RomiChess demonstrated the same 'simple trick' 11 years ago against the world's strongest chess engine at the time beating Rybka.
It has been established that A0 has a learn file that it saves all its trained games in and stores wins, losses, draws and a percentage chance to win. RomiChess does the exact same thing. Here is a record from Romi's learn file.
Record 1 sib 487 chd 2 fs 12 ts 28 t 0 f 0 d 15 s 0 score 17 w 283 L 264 d 191
Record Number
First Sibling Record
First Child Record
From Square
To Square
Type of Move
Flags
Depth
Status
Score, reinforcement learning rewards/penalties
White Wins
Black Wins
Draws
Store a million complete games that have been guided by the stats in the learn file and tactics unlimited ply deep can be found and stored and played back or the search can be guided to find them. It is just a 'simple trick'.
I put 'simple trick' in single quotes because it is a valid trick and not some swindle. If an engine is programmed to do this then more power to it! The wins are legit and if an engine like SF, K or H etc. lose because they don't have this type of learning then tough cookies!
[/quote]

You are right basically.
But. Can you estimate the measure of that learning file what makes from Romi an engine with 3400 Elo?
It is pity but the team of DeepMind did not give me any information about the measure of (programmable) memory used by AlphaZero for neural network. I am afraid a Romi type engine with 3400 Elo needs much more bigger memory to use for learning file as the AlphaZero have.
Moreover a system based on neural network is more flexible and effective than using a learning file only.

Rebel · Post by **Rebel** » Wed Dec 20, 2017 11:27 am

hgm wrote:
Rebel wrote:All of the document can be true, except that a paragraph of how AZ learned SF8 first was left out.
That would make them die-hard liars. Lying by omission is still lying. It would be considered gross scientific fraud.

Yep.

If I remember correctly you are doing this stuff even longer than me and I would say this AZ thing (provided the conditions of the match meet the scientific standard) by far is the biggest breakthrough in computer chess ever. Would you not agree with me? And the paper doesn't meet the scientific standard. Hence I prefer (as announced in CTF) to stick to my DA role for the moment, discuss every detail, until everything is said, people might see that as strange but I feel it as an obligation.

The paper then. Reading it I would say the author(s) have a good understanding of computer chess in general, excellent understanding of the inner works of a chess program, some members of the deepmind team are (maybe even long time) members and lurk here because it is likely they know this is the place where the programmers meet and where their document will be scrutinized and yet I have to believe they don't know how to properly play a fair match? Is that stupidity? If not stupidity then what is it?

There are indeed reasons to believe (we discussed it) all 100 games were played from the start position, how stupid is that? And if not stupidity then what is it? Did they not know you either play from predefined opening lines or from an opening book? If only it were to avoid doubles. They did not know?

Did they not know by doing so they favored AZ?

From the paper we read AZ learned the most common openings and left SF in the dark, not allowing an opening book. They did not know that is unfair?

Of course they knew.

And yet they decided as they decided.

Why?

I consider the "why" question as one of the most important question in life. Everything happens for a reason.

~~~~~

I proposed a working model, learning an opponent from the start position, we even have a proven case (Mchess 5) from the past during the RGCC 96/97 period.

Not showing us all 100 games, the fixed 1 minute TC all fit well in this picture.

Adding up all things I am a sceptic for good reasons.

I was told that at the Free University (or was it UvA) only two thesis defenses in all of the history of the university had not resulted in granting the Ph.D. degree. In one of them the student appeared stone drunk. The other was for a thesis that discussed an experimental treatment of a certain kind of cancer, which by the 10 case studies treated in the thesis looked very good. And then during questioning, it turned out that the fact that 90 other patients submitted to this same treatment had died had been omitted...

Terrible indeed.

Rebel · Post by **Rebel** » Wed Dec 20, 2017 11:29 am

TommyTC wrote:
Rebel wrote:
hgm wrote:
Rebel wrote:
hgm wrote:The 100 games all started from the normal start position.
Nothing of that in the document.
Well, it should have been if they started from non-standard positions. The 10 games they published from that match all started from the standard position.
Nope.

Game-4 AZ-SF 1. d4 e6
Game-5 AZ-SF 1. d4 Nf6
I just started SF with multi-pv = 2 and played 1. d4. During the first minute of analysis, it continually flip-flopped between best moves: 1...e6 and 1...Nf6. Eventually it evaluated both at +0.06.

I'd say you've supplied information that the games did actually start from the initial position with no book!

Correct, I did the same yesterday.

Michael Sherwin · Post by **Michael Sherwin** » Wed Dec 20, 2017 11:46 am

corres wrote:
Michael Sherwin wrote: AlphaZ beat SF by the use of a 'simple trick' called a learn file with reinforcement learning. RomiChess demonstrated the same 'simple trick' 11 years ago against the world's strongest chess engine at the time beating Rybka.
It has been established that A0 has a learn file that it saves all its trained games in and stores wins, losses, draws and a percentage chance to win. RomiChess does the exact same thing. Here is a record from Romi's learn file.
Record 1 sib 487 chd 2 fs 12 ts 28 t 0 f 0 d 15 s 0 score 17 w 283 L 264 d 191
Record Number
First Sibling Record
First Child Record
From Square
To Square
Type of Move
Flags
Depth
Status
Score, reinforcement learning rewards/penalties
White Wins
Black Wins
Draws
Store a million complete games that have been guided by the stats in the learn file and tactics unlimited ply deep can be found and stored and played back or the search can be guided to find them. It is just a 'simple trick'.
I put 'simple trick' in single quotes because it is a valid trick and not some swindle. If an engine is programmed to do this then more power to it! The wins are legit and if an engine like SF, K or H etc. lose because they don't have this type of learning then tough cookies!
You are right basically.
But. Can you estimate the measure of that learning file what makes from Romi an engine with 3400 Elo?
It is pity but the team of DeepMind did not give me any information about the measure of (programmable) memory used by AlphaZero for neural network. I am afraid a Romi type engine with 3400 Elo needs much more bigger memory to use for learning file as the AlphaZero have.
Moreover a system based on neural network is more flexible and effective than using a learning file only.

I'm not sure what you are asking but I will give as much information as I can.

Romi's learn file is stored on the hard drive. It is modified on the hard drive. The only part of it that is brought into memory is the subtree of the current position if there is one. And that is stored in the hash table so no extra memory footprint is created.

Romi only being a 2425 ccrl elo engine needs to learn a lot of good moves to win games against way stronger engines. A top engine can take advantage of much less learning just simply because only one move is all it may need. A top engine will show a positive learning curve much sooner.

"Moreover a system based on neural network is more flexible and effective than using a learning file only."

Romi does not use a learn file only. Technically there is no learning in a learn file. It is just data recording results. The real learning happens when the nodes are moved from the data tree to the hash file. The data moved into the hash is what allows the search to learn and hopefully play better moves. Those nodes moved into the hash are each a little nugget of accumulated knowledge that goes beyond the understanding of the eval and results in super human looking play. If an engine that achieves a 3800 elo can play near perfect chess then RL may not help much. If instead the elo ceiling is at 5000 or higher then RL can produce giant gains in elo with enough games. Romi's elo gain is linear in the range of 1 to 1000 elo in only 400 games using only 10 starting positions against one opponent. That is 2.5 elo per game. Against a humongous book and iirc 6 top engines Romi's elo gain was 50 elo per 5000 games.

Rebel · Post by **Rebel** » Wed Dec 20, 2017 12:19 pm

Daniel Shawul wrote:
Rebel wrote:
Daniel Shawul wrote:
Rebel wrote: Maybe you underestimate what can be done by simple hashing. A couple of years ago I created an opening book of 150 million positions (1.6 Gb) made from CCRL/CEGT games (max 30 moves) and analysed positions by Dann Corbit and got a 102 ELO improvement [link].
No, I don't underestimate the value of book learning especially with a deterministic opponent. What I am opposed to is the claiming I did what alphago did, when the only denominator is "learning". Book learning as everybody knows it (whether it is stored in the book or hash_table) is specific to a certain position -- with a 64 bit hash key.
There are enough signs to think that's indeed what happened, learning how to beat SF8 in all variations either from the start position (as HGM suggest it happened) which makes the learning even more easy or from predefined openings to avoid doubles.

Daniel Shawul wrote:AlphaGo's learning (NN training) is learning a general evaluation function. This can be compared to automatic parameter tuning done in chess programs, with the only difference being the NN actually constructs the important features while we have to code in the passed_pawns & king safety features our selves.
I know what the paper says, its 3400+ elo strength comes from 44 million self-play games. How believable is that? It also means, you, me and everybody else can do the same. The paper describes how it is done.

When I take a look at the games (have you?) I see SF8 slaughtered in Morphy style but... without making any calculation mistake which if there are calculation holes in AZ a program like SF8 immediately would punish. And so AZ comes across as the perfect Murphy. And I don't buy it.
Come on Ed, it is not so shocking if you look at how they got to this stage. AlphaGo->AlphaGoZero->AlphaZero. If you don't trust Google, believe Gian Carlo Pascutoo (our chess programmer colleague) who is trying to reproduce AlphaGoZero's success with leelazero and he is already reaching 2000 elos with it last time I checked. Ofcourse he doesn't have the luxury of 5000 TPUs for training, so he is using distributed computing to get to that level in 1 month.

In the first AlphaGo, which mixed supervised learning, it was shown clearly supervised learning contributes less to its strength than reinforcement learning. Everybody was asking why they used human games for training anyway, and indeed in their next paper AlphaGoZero showed the human games were what is holding it back from reaching higher levels! So the second achievement that you are vary suspicious of was not that surprizing for those who follow developments closely.

The third one (with its application to chess and don't forget shogi (which i believe they did to shut up doubters)) demonstrates the generality of the approach to different games -- especially those full of tactics which is the weak point of MCTS. Remeber Go has tactics too like ladders -- a lot of people asked how they solved that during the Lee Sedol match. Then its application to chess was predicted by many in this forum. I am still not so convinced with MCTS for chess because of my little stint with it, but I already learned something that greatly improved MCTS for chess. Even with all my question, I am convinced their approach could work for chess too and would no way think they are lying.

Think about it, lying and lying through these three papers to get attention (if you believe that is what they are striving for), or simple Occam's razor -- what they got and published in 2 nature papers is actually true. I would go for the latter. To scream conspiracy on everything that could be done better is not so productive IMO.

Daniel

I think I have explained myself in my answer to HGM.

Regarding AlphaGo, I already stated elsewhere I have no good reason to doubt that. The "GO" thing I can believe, white and black stones only, formation is all, doable (believable). In chess (however) a white pawn on a2 or a3 can make all the difference, big difference.

Furthermore I am thrilled to hear Gian Carlo picked this up and might go further where Giraffe stopped. It's clear it will take a lot longer due to the hardware that is needed to play those 44 million training games but I would say (following the logic of the paper) that the first 100,000 already would produce an engine of (say) 2000, 1 million games 2500, something in that order. Keep us informed about his progress.

corres · Post by **corres** » Wed Dec 20, 2017 12:52 pm

[quote="Michael Sherwin"]
Romi's learn file is stored on the hard drive. It is modified on the hard drive. The only part of it that is brought into memory is the subtree of the current position if there is one. And that is stored in the hash table so no extra memory footprint is created.
[/quote]

Sorry, I did not write RAM or hash but memory and the hard drive also a (very slow) memory. If you like to read it in this way: How big hard drive needs Romi to reach 3400 Elo? It is obvious for me the connection between measure of memory (pardon: hard drive) and the enhancement in Elo of Romi is not a linear function...

[quote="Michael Sherwin"]
Romi only being a 2425 ccrl elo engine needs to learn a lot of good moves to win games against way stronger engines. A top engine can take advantage of much less learning just simply because only one move is all it may need. A top engine will show a positive learning curve much sooner.
[/quote]

Naturally a very strong engine needs lesser help (if any) than a weaker needs. Moreover there is some bad effect of every learning processes:
It slows down the searching and the result is depend on from what was acquired its knowledge.

"Moreover a system based on neural network is more flexible and effective than using a learning file only."

[quote="Michael Sherwin"]
Romi does not use a learn file only. Technically there is no learning in a learn file. It is just data recording results. The real learning happens when the nodes are moved from the data tree to the hash file. The data moved into the hash is what allows the search to learn and hopefully play better moves. Those nodes moved into the hash are each a little nugget of accumulated knowledge that goes beyond the understanding of the eval and results in super human looking play. If an engine that achieves a 3800 elo can play near perfect chess then RL may not help much. If instead the elo ceiling is at 5000 or higher then RL can produce giant gains in elo with enough games. Romi's elo gain is linear in the range of 1 to 1000 elo in only 400 games using only 10 starting positions against one opponent. That is 2.5 elo per game. Against a humongous book and iirc 6 top engines Romi's elo gain was 50 elo per 5000 games.
[/quote]

Thanks for the detailed explanation.
But what we are called "learning file" or "learn file" this is a question of definition only. The essence is what difference are between AlphaZero and Romi in the sense of Elo gaining, memory - time - technical backgroud -energy - etc. demand.
The higher flexibility of neural network is doubtless.

Michael Sherwin · Post by **Michael Sherwin** » Wed Dec 20, 2017 1:23 pm

corres wrote:
Michael Sherwin wrote: Romi's learn file is stored on the hard drive. It is modified on the hard drive. The only part of it that is brought into memory is the subtree of the current position if there is one. And that is stored in the hash table so no extra memory footprint is created.
Sorry, I did not write RAM or hash but memory and the hard drive also a (very slow) memory. If you like to read it in this way: How big hard drive needs Romi to reach 3400 Elo? It is obvious for me the connection between measure of memory (pardon: hard drive) and the enhancement in Elo of Romi is not a linear function...

Michael Sherwin wrote: Romi only being a 2425 ccrl elo engine needs to learn a lot of good moves to win games against way stronger engines. A top engine can take advantage of much less learning just simply because only one move is all it may need. A top engine will show a positive learning curve much sooner.
Naturally a very strong engine needs lesser help (if any) than a weaker needs. Moreover there is some bad effect of every learning processes:
It slows down the searching and the result is depend on from what was acquired its knowledge.

"Moreover a system based on neural network is more flexible and effective than using a learning file only."

Michael Sherwin wrote: Romi does not use a learn file only. Technically there is no learning in a learn file. It is just data recording results. The real learning happens when the nodes are moved from the data tree to the hash file. The data moved into the hash is what allows the search to learn and hopefully play better moves. Those nodes moved into the hash are each a little nugget of accumulated knowledge that goes beyond the understanding of the eval and results in super human looking play. If an engine that achieves a 3800 elo can play near perfect chess then RL may not help much. If instead the elo ceiling is at 5000 or higher then RL can produce giant gains in elo with enough games. Romi's elo gain is linear in the range of 1 to 1000 elo in only 400 games using only 10 starting positions against one opponent. That is 2.5 elo per game. Against a humongous book and iirc 6 top engines Romi's elo gain was 50 elo per 5000 games.
Thanks for the detailed explanation.
But what we are called "learning file" or "learn file" this is a question of definition only. The essence is what difference are between AlphaZero and Romi in the sense of Elo gaining, memory - time - technical backgroud -energy - etc. demand.
The higher flexibility of neural network is doubtless.

Stockfish with reinforcement learning, no doubt improved over Romi's, if fully trained would be about 5,000 elo assuming the elo ceiling is indeed higher significantly. The truth despite, "The higher flexibility of neural network is doubtless", Alpha Zero is no where near as strong as SF with such learning.

I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!