7x7 variants alpha-zero style engines

Discussion of chess software programming and technical issues.

Moderator: Ras

catugocatugocatugo
Posts: 41
Joined: Thu Feb 16, 2023 12:56 pm
Full name: Florea Aurelian

7x7 variants alpha-zero style engines

Post by catugocatugocatugo »

Hello,
I have made before here posts about my ambitions for making an engine for my 10x10 upcoming variants. Because of external circumstances I have started doing it for 2 7x7 variants and it works (mostly) .

First the 2 games:
Common ground:
the boards are 7x7
there is no queen
the back rank is:
RNBKNBR
the right side bishop is switched with the right side knight so that the two bishops would not be bound to the same color.
the second rank is:
PPPGPPP
G stands for general
the third rank has a pawn ahead of the general.
The Silver Game
The General is a silver general
the Knight moves in it's forward half like and orthodox knight, 1 square laterally or one square back
The Gold Game
The General is a gold general
the Knight moves in it's forward half like and orthodox knight, 1 square diagonally back
More common ground:
To win, a player must capture or bare the enemy king. The victory is on spot (no counter baring)
The first repeated position is a draw. There is a 35 moves rule (akin to the 50 moves rule). Stalemate is a loss for the stalemate. Also for practical purposes a game that lasts 200 turns is a draw.
There is no pawn double move hence no en passant. There is no castle.

For the following gemini 3.1 has help at every step.

First I have created for each of the two games a database with 2500 games played with a 3 moves, deep no pruning. Then I have trained a simple MLP on them. First I wanted to build an alpha zero style engine for the games. But I was also curious about using RBFNN as nobody has used those, and I did. Because of the still pretty huge amount of moves that needed to be checked I have changed the activation function to a triangle function rather than a gaussian. That did not worked as there were no intersections between different centers. So I have used a "piecewise triangle" (ie a triangle on a trapezoid- with the trapezoid having a very large difference between bases). I have also build two MLP then, a bit larger than before, a deeper MLP and a one layer shallower MLP but with double the neurons in the first hidden layer. So I have decided to challenge the 3 engines in a round robin so they can learn from each other. The round robin should have contained 3 champions (the old versions) and 3 challengers (the version trained with the last games) for a total of 6 players, that play a six-tuple round robin (90 games in total). And when challengers placed above their type champion they should have replaced them. But due to a misunderstanding the champions ended up being trained for 3 epochs in continuation, and the challengers retrained from scratch for ten epochs. That has proven a very good strategy. RBF has proved the best network, probably due to it's stability. I still have some small bugs I think. Also the games were not timed. I'll do that later. The good thing is that the engines seem to improve (challengers win more often than not). I am very excited by this, even if it is more a learing tool for me for upcomming engines.
User avatar
hgm
Posts: 28503
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: 7x7 variants alpha-zero style engines

Post by hgm »

You describe the goal as King capture rather than checkmate, which implies there is no checking rule. That means you could bare your opponent through a move that exposes your King. You say there is no exception for counter baring, but is there also no rule for 'counter King capture'?
catugocatugocatugo
Posts: 41
Joined: Thu Feb 16, 2023 12:56 pm
Full name: Florea Aurelian

Re: 7x7 variants alpha-zero style engines

Post by catugocatugocatugo »

No. The reason was to keep the game as simple as possible. But actually these have gave me a lot of headaches, because when random elements have been introduced, kings started wandering coldly into check.
catugocatugocatugo
Posts: 41
Joined: Thu Feb 16, 2023 12:56 pm
Full name: Florea Aurelian

Re: 7x7 variants alpha-zero style engines

Post by catugocatugocatugo »

It's been a while since I am at this project and things haven't worked that well. First I have had a bug in my depth 3 positions generator. It seems the king was wondering into check very happily, and even stayed there. It seems that was coming from the random move generator. Gemini has proposed that a random move among the top 3 should be chosen. It was a noob mistake. We have solved this by forcing the engine to be deterministic when the gap between the top two moves is 0.1 or more. Also the moves were chosen from a Boltzmann's distribution. So, in the end I have generated 10000 games for both silver and gold with depth3. The very basic evaluation function was very simple including only piece values and a small bonus for advancing pawns. This is intended to teach the very basics to the neural architecture.
But even if the games produced that way were "good" -> ie no laughable moves <- the 3 NN architectures I use are still letting pieces en prise, sometimes even the king. This happens probably because there are not enough such losing positions in the training data. And this is where I am today.
Some technicalities:
The 3 architectures I mentioned above are:
A MLP which I named MLPDeep with 512,256,128,64 neurons in the hidden layers. ReLU activation function.
A MLP which I named MLPWide with 1024,256,64 neurons in the hidden layers. ReLU activation function.
A RBF which I named RBF with 128 centers and a triangle above a trapezoid activation function that for speed reasons replaced the Gaussian.
They have both a champion and a challenger set of weights.
The six NN play a multiple Round Robin tournament.
They champion is replaced by it's challenger should it be surpassed by the challenger in the final classification. Then the champions is trained for 10 epochs at a 0.0003 learning rate. The challenger's weights are perturbed by 20%, and then it is trained for 20 epochs at a 0.0006 learning rate should the generation number not be divisible by 5. Otherwise it is trained for 50 generations from scratch at a learning rate of 0.001.
For fairness reasons (so they move in approximately the same time) the 3 architectures verify a fixed number of positions. That is 1000 for the RBF, 1500 for the deep and 1600 for the wide architectures.