A few weeks ago I started work on a new engine called "Leonidas". Leonidas is basically a stripped-down and cleaned-up version of my variant engine Sjaak, and like Sjaak it plays chess variants. Specifically, it plays Spartan Chess and some variations thereof (Double Spartan, Reverse Spartan and Double Persian, aka, normal Chess). I may expand the list of variants to Berolina (almost working already) and possibly Seirawan. For the moment, I'm focussing on Spartan.
While writing the evaluation function, I find myself lifting bits and pieces from both Sjaak and Jazz (my normal chess engine) and this has the unfortunate effect of making the evaluations very similar. I don't like that too much, because I want the programs to be different enough that I can have them play against eachother without feeling I'm watching a self-play match - and I'm having a bit of difficulty doing that.
Right now, Sjaak has some more sophisticated code for dealing with different pawn structures and a slightly different way of doing mobility. Leonidas has more complicated material imbalance evaluation during the middle game (although it's still very crude), Sjaak has some extra end-game terms. King safety is handled almost identically in the two codes and the two programs are about equally strong right now.
To people who have been in the situation of working on two different programs that play the same variants: short of forgetting everything you did before and reinventing everything from scratch, what is the easiest way to make the two programs different from eachother? Put in a different way: what evaluation features determine the playing style of the program? Piece square tables? Handling of reductions/extensions? Evaluation of pawn structure?
The main reason I want the programs to be different is that I want to get a better idea of where the holes in the evaluation function are. This is harder to find with self-play because the program will not recognise the opportunity when it presents itself. It also will not defend against it, so once the hole is found taking advantage of it is not that hard.
For normal chess I'd just use a selection of different engines, but for chess variants there aren't that many.
How to make evaluation different for new program
Moderators: hgm, Dann Corbit, Harvey Williamson
-
Evert
- Posts: 2929
- Joined: Sat Jan 22, 2011 12:42 am
- Location: NL
-
Ferdy
- Posts: 4833
- Joined: Sun Aug 10, 2008 3:15 pm
- Location: Philippines
Re: How to make evaluation different for new program
List all eval features that you know and those suggested by others.
Implement those and be sure to enable and disable those such that no 2 programs are the same. You can vary the bonus and penalty values also.
Implement those and be sure to enable and disable those such that no 2 programs are the same. You can vary the bonus and penalty values also.
-
Antonio Torrecillas
- Posts: 90
- Joined: Sun Nov 02, 2008 4:43 pm
- Location: Barcelona
Re: How to make evaluation different for new program
Since there are not many contributions to your interesting questions. Here I bring my two cents.
I could not tell which chapters of the eval defines the style of play. Just know what chapters have more weight in the evaluation has required certain thoughts.
My method is to evaluate a set of positions and breaking down each chapter and the total. We then calculated the correlation between each chapter and the value of total.The best correlation is the dominant section of the evaluation.( Material, I hope). Then repeat the process combining material and another section,chooses the next section and so on.
Here some results as examples:
This study helped me to arrange the chapters for a more effective lazy eval.
As for self-play, this can be useful to know how much knowledge we add to plug the hole or is already covered by other sections of the evaluation.To put it simply, take an example: We developed a new feature: the bishop pair! We assign a value +10 for the pair of bishops. On other variant of the same program we assign -10. As both programs are looking forward to a position with bishop pair on one side, most of the games arrive at such positions.Being the same program, there is nothing that can influence the outcome, except the knowledge introduced.A match will tell us whether this knowledge is enough to win. Or we need additional knowledge to swing the result.
In my case, I have a conventional engine and an experimental one where everything is quite different: search algorithm, evaluation, tuning system. So their behavior and strength are totally different.To people who have been in the situation of working on two different programs that play the same variants: short of forgetting everything you did before and reinventing everything from scratch, what is the easiest way to make the two programs different from eachother? Put in a different way: what evaluation features determine the playing style of the program? Piece square tables? Handling of reductions/extensions? Evaluation of pawn structure?
I could not tell which chapters of the eval defines the style of play. Just know what chapters have more weight in the evaluation has required certain thoughts.
My method is to evaluate a set of positions and breaking down each chapter and the total. We then calculated the correlation between each chapter and the value of total.The best correlation is the dominant section of the evaluation.( Material, I hope). Then repeat the process combining material and another section,chooses the next section and so on.
Here some results as examples:
Code: Select all
Simplex:
Opening = Material + Development + Pawn + Pst + Rooks + KingSafety + Knights + Bishops + Queens + PawnCoverage.
Endings = Material + Pawns + Pst + Rooks + Knights + Bishops + Queens + pcov.
Fruit 2.1:
Opening = Material + KingSafety + Pst + piece + passer + pawn + patterns.
Endings = Material + passer + piece + pawn + pst + patterns
Glaurung 1.2.1
Opening = Material + Passed Pawn + King Safety + misc + pst + shield + pawn + piece placement+ development + storm.
Ending = Material + Passed pawn + mobility + pawn + king activity + piece placement + pst.
On occasion I have used any of the following methods to find holes.For a weak engine, simply evaluate a set of positions with a strong enough engine. Compare with our evaluation and extract the positions we think it's won (> 150cp) and the oracle says lost (<-150cp) or vice versa. Just looking at the top 10 should be sufficient to identify a hole.Watching games we can see if certain configurations appear more frequently in lost matches.The main reason I want the programs to be different is that I want to get a better idea of where the holes in the evaluation function are. This is harder to find with self-play because the program will not recognise the opportunity when it presents itself. It also will not defend against it, so once the hole is found taking advantage of it is not that hard.
As for self-play, this can be useful to know how much knowledge we add to plug the hole or is already covered by other sections of the evaluation.To put it simply, take an example: We developed a new feature: the bishop pair! We assign a value +10 for the pair of bishops. On other variant of the same program we assign -10. As both programs are looking forward to a position with bishop pair on one side, most of the games arrive at such positions.Being the same program, there is nothing that can influence the outcome, except the knowledge introduced.A match will tell us whether this knowledge is enough to win. Or we need additional knowledge to swing the result.