Just Take a Pebble...

stegemma · Post by **stegemma** » Wed Sep 03, 2014 8:15 pm

I'm testing the evaluation parameter tuning of Satana, with a genetical algorithm. Finally, it seems to work but still i have no final results, from this.

I've created a population of players, who play each one vs any other, with White and black, in turns. I sum the score of any game and then i mix the population, keeping the best player's "DNA" intact. In fact, they play fast tourneys, with limited time per move. I look at the results in the console output but it would be annoying to see something like "player 1 vs player 2 = draw" so i'd named any player as... the name of Emerson Lake and Palmer songs

It is very nice to see something like:

'Take a Pebble' (17.29) vs 'The Barbarian'

(i hope that this doesn't violates any copyright)

The plan is to use genetical learning during pondering time or maybe in a separate thread, during normal search. I would like to extend this kind of Learning to other things, different for the sole parameters.

For now, i've obtained just a little weaker Satana 2.0.6, respect to 2.0.5 that doesn't have this feature.

ZirconiumX · Post by **ZirconiumX** » Wed Sep 03, 2014 8:58 pm

stegemma wrote:so i'd named any player as... the name of Emerson Lake and Palmer songs

You, sir, have good taste in music. My hopes are on "Fanfare for the Common Man" if you have him.

Genetic algorithms have always fascinated me; the way we try to get something big and successful from what are essentially random variables.

I wish you the best of luck.

stegemma · Post by **stegemma** » Wed Sep 03, 2014 9:20 pm

ZirconiumX wrote:
stegemma wrote:so i'd named any player as... the name of Emerson Lake and Palmer songs
You, sir, have good taste in music. My hopes are on "Fanfare for the Common Man" if you have him.

Genetic algorithms have always fascinated me; the way we try to get something big and successful from what are essentially random variables.

I wish you the best of luck.

Oh thanks! I will add all the songs and we see wich would be the winner. the first SNCET (song-named-chess-engines tournament) finished this way:

# 'The Barbarian' = 1794.34
# 'The Three Fates' = 1056.55
# 'Take a Pebble' = -883.70
# 'Knife Edge' = -1967.20

Nothing special, from a technical point of view, but is really fun, for me.

stegemma · Post by **stegemma** » Thu Sep 04, 2014 10:26 am

The parameters computed by the genetical algorithm seems better than hand-made ones. I'm doing a gauntlet with the new release and it seems that Satana has a better harmonic style of playing (i'm not a grand master but this is what i see).

tpetzke · Post by **tpetzke** » Fri Sep 05, 2014 7:40 am

Hi,

keeping the best player's "DNA" intact.

this is called Elitism. I found that elitism is leading to an early suboptimal conversion in my tests if a good set of values is found early. It is driving diversity out of the gene pool to fast. Without elitism the algorithm explores a larger solution space and eventually converges to a better solution.

Thomas...

stegemma · Post by **stegemma** » Fri Sep 05, 2014 9:13 am

tpetzke wrote:Hi,

keeping the best player's "DNA" intact.
this is called Elitism. I found that elitism is leading to an early suboptimal conversion in my tests if a good set of values is found early. It is driving diversity out of the gene pool to fast. Without elitism the algorithm explores a larger solution space and eventually converges to a better solution.

Thomas...

In fact the solution found after 100 generations (each one 8*7 games) is not very good. Doing some more test, it seems that the genetic parameters are not better than hand made. This could be because the GA explore only against itself (Satana vs Satana with different parameters), this lead to a solution good only in that context and bad against other engines.

I'm waiting for a second nightly Learning session to complete, to see if it find a better solution.

The actual solutions are this:

Code: Select all

99	1	Knife Edge'	0.026367	0.026367	0.020020	0.016602	0.023438	0.003418	0.013672	0.023682	0.023438	0.003418	0.018799	0.012207	0.023193	0.019043
86	1	'Battlefield'		0,019043	0,018311	0,024170	0,018799	0,014160	0,013184	0,015869	0,017578	0,019043	0,024414	0,020264	0,019287	0,024902	0,016113

This morning (in Italy +1 GMT) i plan a match between Satana no-gen, Satana 'Knife Edge' and Satana 'Battlefield".

En passant... some software use "personality" but this one i would like to call "souls". Satana can play with different souls... and this i coherent with its name

tpetzke · Post by **tpetzke** » Fri Sep 05, 2014 10:58 am

In fact the solution found after 100 generations (each one 8*7 games)

This is in total only 5600 games. Consider the error bar that each pairing has. You're are expecting probably to much given the small amount of games that you play. The parameters are still largely influenced by randomness.

You have to play more generations, more games per generation and depending on the numbers of values you want to tune also a population size that is big enough. The downside to this is, there is no overnight run.

In my eval tuning I use 1000 generations, about 600 games per generation (an increasing number in later generations) and a population size of 256. It takes 3 to 4 weeks to complete but gets good results then.

But it takes more than 600.000 games

Thomas...

stegemma · Post by **stegemma** » Fri Sep 05, 2014 11:33 am

tpetzke wrote:
In fact the solution found after 100 generations (each one 8*7 games)
This is in total only 5600 games. Consider the error bar that each pairing has. You're are expecting probably to much given the small amount of games that you play. The parameters are still largely influenced by randomness.

You have to play more generations, more games per generation and depending on the numbers of values you want to tune also a population size that is big enough. The downside to this is, there is no overnight run.

In my eval tuning I use 1000 generations, about 600 games per generation (an increasing number in later generations) and a population size of 256. It takes 3 to 4 weeks to complete but gets good results then.

But it takes more than 600.000 games

Thomas...

Yes, you're right. This first experiment have a time constraint, because Tomorrow starts the IGT tournament, in Torino, and i have to be there, with a playing engine. I will back to hand-made values, for now.

You're numbers are very important to me, they are a good point to start from, for the next tuning session.

Another problem in my approach is the evaluation of each result. Lets say that player A play against player B. I stop at move 50 and then it could be a win for A/B or not. If not, i call a fast iteration at depth 4 (for the said time constraints) on the final position, just to assign a value to the two engines. This value is from the White side so i add it to the White player (A) and subtract from the black (B).

The problems are:

1) the final evaluation is based on Player A parameters and doesn't take care of B point of view
2) more generally: maybe it would be better to choose an "external player", as evaluator; maybe another engine or Satana with hand-made values, for sample

The point 1 is not so important, because parameters tends to be almost equal, between the engines in the population.

The point 2 is the more complex. If i choose an external evaluator, the Whole population would tend to imitates those engine. That's not my goal.

Maybe ignoring the value, or considering only material, in the final position could be a solution.

I would add another point:

3) stopping after an arbitrary number of moves will avoid choosing the right parameters for endings or long middle-game; maybe only complete games or a bigger moves limit should be better

I think that this is an unexplored field that would grow in the future. Just speaking about a way to increment ELO of 10 points with alphabeta-FLDSMDFR algorithm becomes almost annoying

tpetzke · Post by **tpetzke** » Fri Sep 05, 2014 12:35 pm

My approach is somewhat simpler. I generate a population and then I play a knockout tournament between them to determine the best of all.

I increase the numbers of games per round, so in the first rounds with a lot of players I only play 2 games per pairing. In the later rounds the surviving engines play more games. This approach does not guarantee to find the best player (not enough games) but a good one. And even if by bad luck only an average player wins the tournament its influence on the solution is not big enough to be harmful.

If the difference in values between two engines is only small this might indicate a problem of your approach. In the early stages of the algorithm the difference should be big. You want a lot of diversity in the pool to explore a large solution space. When the algorithm runs the values for material, passed pawns and king safety converge rather fast to reasonable values (a program that thinks a queen is worth less than a knight will not win the tournament ...). Other minor terms take much longer and some terms do not converge at all. This might indicate that they are redundant and can be removed.

I have some posts in my blog with the details of my approach...

http://macechess.blogspot.de/2013/04/me ... n-0-1.html

I also find this area much more interesting, it keeps my interest in chess programming alive.

Thomas...

stegemma · Post by **stegemma** » Fri Sep 05, 2014 1:18 pm

tpetzke wrote:My approach is somewhat simpler. I generate a population and then I play a knockout tournament between them to determine the best of all. [...]
http://macechess.blogspot.de/2013/04/me ... n-0-1.html
[...]

Yes, we are doing something very similar. As said in my post, this should be veryfied "And the OSCAR goes to: evol-2", because you've tested ICE vs ICE. You played a lot more games, so your result are more correct that mine but still a tournament against other engines should give the right answer, to the question "who wins the OSCAR?".

I the future, i would like to test a GA to parametrize the... GA itself, so that number of games played, % of mutations and other stuff can be computed dinamically.

I've seen that you let grow even material value. I've choosen to don't do that, because the standard values seems to be already enough good, almost in any position. Of course that's not true in ALL positions but this requires that the GA runs at any move of a game, we could say "at runtime". That was my first approach, with Satana, but it requires too much time, to do it during a real match. Still i would like to use this approach in pondering or in a separate thread... but this a future... a lot future option.

Just Take a Pebble...

Just Take a Pebble...

Re: Just Take a Pebble...

Re: Just Take a Pebble...

Re: Just Take a Pebble...

Re: Just Take a Pebble...

Re: Just Take a Pebble...

Re: Just Take a Pebble...

Re: Just Take a Pebble...

Re: Just Take a Pebble...

Re: Just Take a Pebble...