Engine Testing with Various Openings

Cheney · Post by **Cheney** » Fri Jul 07, 2017 8:10 pm

Hi

I have been working on improving my engine's evaluation and noticed something related to using different openings.

I have been using an opening file with a mixture of openings that are 8 moves or 12 moves deep. I was testing with 500 games of each and these were pulled from Kirr's opening chess samples. I would tune the eval parameters until the newer version is at least 20 ELO or more than the previous version.

Recently, I decided to download the entire PGN suite of 6, 8, and 12 moves and play a few thousand games from each opening suite...

The 6-move openings: newer version winning percentage/ELO increase matches my original tests, if not better.

The 8-move and 12-move openings: the engines break even or the newer one has a few extra losses ( < 5elo loss) - but I chalk this up to possible randomness.

I am not really sure what to think about this.

* I could think the engine is not just stronger with deeper opening lines but then how is the older version stronger.

* I checked the opening lines to make sure they were not duplicate (there are duplicate ECO but all openings are unique to the last move).

* Maybe going that deep into an opening is more strategic than materialistic or is just a more of a positional advantage for a given side. I thought this because I repeat each opening giving each engine a turn at a side, and it seems the same side wins for the repeated opening.

I am suspecting it is the last, but if any of you have any thoughts or ideas on this, I'd appreciate it.

Thank you!
Cheney

Ferdy · Post by **Ferdy** » Sat Jul 08, 2017 12:56 am

Engine tuning has limits, you may tune to a number of positions but there are positions that your tuned values would not shine. Possibly this is over-tuning to certain positions.

Cheney · Post by **Cheney** » Mon Jul 10, 2017 2:23 pm

Thank you! I do feel I am at that point of tuning a particular value might overtune and maybe I need to tune all values together.

I have been checking out some of the tuning methods out there but have not started working on one yet.

As for openings to test from, I have read various links and all say test with thousands of games; however, I no not recall seeing a reference saying how deep a set of openings should be. Should I always use a set of 12 deep opening lines, deeper maybe, or mix them up? For example, 1000 games with a book of 4 deep openings, 1000 at 6 depth, and so on?

Thanks again

cdani · Post by **cdani** » Mon Jul 10, 2017 10:27 pm

Cheney wrote:Thank you! I do feel I am at that point of tuning a particular value might overtune and maybe I need to tune all values together.

I have been checking out some of the tuning methods out there but have not started working on one yet.

As for openings to test from, I have read various links and all say test with thousands of games; however, I no not recall seeing a reference saying how deep a set of openings should be. Should I always use a set of 12 deep opening lines, deeper maybe, or mix them up? For example, 1000 games with a book of 4 deep openings, 1000 at 6 depth, and so on?

Thanks again

I prefer to use shallow lines, to let the engine play on his own. And at the same time to have a lot (thousands or 10ths' of t) of starting positions, to have very varied positions.

Cheney · Post by **Cheney** » Tue Jul 11, 2017 12:39 am

Thank you

I have another question or two on this topic. Let's say I have 2000 opening lines. Is it important to play those 2000 lines and then repeat the test a few times?

Although I have tried this, I notice the results are very consistent with maybe +/- 3 in the win/loss column and if this happens it will happen only once in five iterations of the 2000 game tests.

So if I get more lines, like you mention, up to 10,000, should I repeat that test set even when the consistency seems fairly solid?

Dirt · Post by **Dirt** » Tue Jul 11, 2017 3:55 am

Cheney wrote:So if I get more lines, like you mention, up to 10,000, should I repeat that test set even when the consistency seems fairly solid?

There is never a good reason to reuse an opening with an unchanged engine. Your 10,000 will give you 20,000 unique games, which should be enough, but if you need more then make them.

Of course, if you are testing changes that mostly affect the end game you should use a different set of positions that start near where you expect to see a difference.

Ferdy · Post by **Ferdy** » Tue Jul 11, 2017 6:24 am

Cheney wrote:As for openings to test from, I have read various links and all say test with thousands of games; however, I no not recall seeing a reference saying how deep a set of openings should be. Should I always use a set of 12 deep opening lines, deeper maybe, or mix them up? For example, 1000 games with a book of 4 deep openings, 1000 at 6 depth, and so on?

I use varying depth, not efficient but should work for thousands of positions. To be efficient, the opening suite should be carefully selected so that,
1. You avoid end position duplicates (normally done of course)

2. Same typical position i.e same or almost same pawn structure, same number of pieces and almost same piece locations that it may transpose to same positions later in the game. These positions only wasted testing time. For a long time I have been thinking on automating this selection process but have not succeeded yet. It is not really the number of plies deep that counts but it is the uniqueness of positions such that each position may lead to a different path or goal, i.e one line may target for passer advantage, one line may target for increased mobility, one line may target king attack and other themes, it is obvious that we also need a good player here to identify different setups.

3. The one I am currently working is extracting interesting positions from games of top engines played on long TC with more cores one example is this.
http://www.talkchess.com/forum/viewtopi ... ew=threads
Typically mid-phase and early end-phase, I believe positions reached from these games (as starting positions) would be beneficial to lower rated engines. Perhaps lower rated engines will not be able to reach such high quality positions if we are just using common positions 12 plies deep or so from starting positions. Exposing these positions to lower rated engines may help the author improve its evaluation tuning. The result of such games could be just a draw but there are lot of interesting positions here that can be collected.

hgm · Post by **hgm** » Tue Jul 11, 2017 7:34 am

Dirt wrote:There is never a good reason to reuse an opening with an unchanged engine.

You can when the engine randomizes. Or when its opponent randomizes.

Dirt · Post by **Dirt** » Tue Jul 11, 2017 7:46 am

hgm wrote:
Dirt wrote:There is never a good reason to reuse an opening with an unchanged engine.
You can when the engine randomizes. Or when its opponent randomizes.

You can, and that's much better than nothing, but using unique opening positions is better still and easier.

Engine Testing with Various Openings

Engine Testing with Various Openings

Re: Engine Testing with Various Openings

Re: Engine Testing with Various Openings

Re: Engine Testing with Various Openings

Re: Engine Testing with Various Openings

Re: Engine Testing with Various Openings

Re: Engine Testing with Various Openings

Re: Engine Testing with Various Openings

Re: Engine Testing with Various Openings