hgm wrote:Lyudmil Tsvetkov wrote:Did not it? What about all those tuning games on the Framework?
Is not this a kind of a reinforcement learning?
No, it is not.
* Reinforcement learning is where you let the system do its thing, and then change it to encourage or discourage what it did, depending on whether it did what you want.
* Supervised learning is where you give the system examples of what it should so, and encourage it to do the same
* Tuning is where you first change the system, and then see if it does now better what you want, and keep the change if it does.
All by itself, so you still insist there was no code involved in Alpha apart from the game rules?
Of course. It is what they say, and there is no reason at all to doubt them.
Lyudmil Tsvetkov wrote:You got me totally confused, what the hell is the NN, is it a code, a machine, a combination of patterns or a self-learning oddity?
That seems to be your natural state...
In principle, a NN is a machine, very similar to the human brain. It consists of 'cells' that can be stimulated to get active or stay passive by other cells connected to them. The strength of the connections ('weights') is adaptable, and by changing them the NN can be programmed ('trained') to generate a response (like a move) from a certain input (like an image or a Chess diagram).
In practice the NN is simulated as a
virtual machine inside another computer. This means that there is a program that specifies how the cells are connected, keeps track of the current weight of all the conections, and shuttles the activation signals around.
CPUs like found in PCs are not very good in doing the things needed to simulate a NN; most of their transistors are doing other things not useful for the NN at all. So Google developed TPUs, which are chips that only do what is needed to simulate a NN. So a given NN isn't slowed down as much when running on a TPU as it would be when running on an ordinary CPU. The TPU is programmed to simulate a NN of general capabilities (but adapted in size to the board of the game it will be used for), and calculate what the output of the NN would be when a given Chess position would be presented to the net as input. Initially the NN contains no useful knowledge (all conections have random weights).
So, according to you, there is no code at all involved in Alpha?
There is code to simulate the NN, which in principle can be configured to handle NN of any size, and in practice is programmed to simulate an NN with a design good for learning board games. This code must know the board size and the number of participating piece types.
There is code to perform a Monte-Carlo Tree Search, presenting the positions that the search encounters to the NN, forcing legality upon the moves suggested by the NN in response (i.e. ignoring any illegal suggestions), and then searching the legal moves that it suggests. This code knows the rules for moving pieces, checking, game end.
Eventually all knowledge about strategy and tactics is in the weights of the NN, which were tuned during self-play according to the programmed rules, on basis of the programmed recogition of the game result. Without any human touch.