CLOP on Stockfish

gladius · Post by **gladius** » Sat Mar 10, 2012 3:07 pm

I've been working on the ThreatAttacks evaluation in Stockfish, trying to come up with an improvement. So, I decided to try CLOP out.

I CLOPtomized the 5 parameters (code diff: https://github.com/glinscott/Stockfish/ ... te_threats). After 7500 games, CLOP was reporting excellent numbers in the weighted ELO of 28, with 95% LCB of 19.

The Max tab was showing a Mean Elo of 21 though, with 95% LCB of -3. There was no Max found.

Running 1500 games at 40/2sec+0.01 against base Stockfish, ended up at -3 ELO. So, it's close to being good, but not an improvement yet.

Any thoughts on where I could be going wrong with my CLOP usage? Just not enough games in CLOP land? I tested CLOP at 40/1sec initially, and the values I got tested at about -30ELO. So, it's definitely doing better at 40/2sec, but, still - not beating base Stockfish

.

mcostalba · Post by **mcostalba** » Sat Mar 10, 2012 3:51 pm

I am very interested in your tests with CLOP, I have looked at your patch and here are some quick comments:

1) 7500 games is defenitely to low number for 5 parameters tuning

2) You not only have added tuning code but have also changed the semantic unifying pawn threats with others pieces threats and implementing MultipleThreatBonus idea. I'd suggest to proceed in a more step by step approach. First rewrite the code in the way that you like but preserving the same functionality (verify with ./stockfish bench) this is very important to don't become crazy following multiple patterns at the same time, remember that you are chasing for 5 ELO maximum increase so it is mandatory to keep external noise level at the minimum. Then, once you have added your tuning code and be sure that functionality is the same, then start tuning but I'd suggest to start with 2-3 parameters maximum or, OTH, run much longer, say 30K-40K games.

4) If instead you want to test new ideas, I'd suggest to test one by one.

3) CLOP makes no miracles and tuning is hard and requires a lot of self discipline, given that SF is a mature engine it is even harder. But don't give up: efforts will be rewarded in the long term, you just need a lot of "patience and perseverance", this can well be the chess engine developer motto

gladius · Post by **gladius** » Sat Mar 10, 2012 4:40 pm

Thanks Marco! I agree that the patch is attempting to take on too much in one go. It was my first attempt with CLOP, so more of an experiment.

Verifying with bench was indeed very important. When all parameters were set at 100 initially, it was same as default. And CLOP did cluster the values around 100, so it seems to be working well.

I'll try breaking it up into smaller chunks and seeing how it goes.

Rémi Coulom · Post by **Rémi Coulom** » Sat Mar 10, 2012 4:41 pm

The win rates CLOP reports are biased towards optimistic values. So you should expect that the strength of the program you get at the end is inferior to what CLOP reports.

If you wish to optimize for small improvements (a few Elo points), then you need to play many more games.

Rémi

gladius · Post by **gladius** » Sat Mar 10, 2012 7:42 pm

Rémi Coulom wrote:The win rates CLOP reports are biased towards optimistic values. So you should expect that the strength of the program you get at the end is inferior to what CLOP reports.

If you wish to optimize for small improvements (a few Elo points), then you need to play many more games.

Rémi

Thanks Rémi, good to know that the Elo is a bit optimistic. CLOP is a fantastic tool btw! Thanks for releasing it for us all to play with

.

gladius · Post by **gladius** » Mon Mar 12, 2012 8:26 pm

Rémi Coulom wrote:The win rates CLOP reports are biased towards optimistic values. So you should expect that the strength of the program you get at the end is inferior to what CLOP reports.

If you wish to optimize for small improvements (a few Elo points), then you need to play many more games.

Rémi

A question regarding the opening book to use for CLOP. Do you think it's better to use different neutral opening positions (based on CLOP seed), or just start from the start position?

I was using fixed depth testing with the clop-cutechess adapter (which is awesome, thanks Ilari!), and with only small changes to the parameters, was ending up with the same game a lot of the time. So, I switched it to use a different opening position for each trial. Not sure yet how it's working out.

mcostalba · Post by **mcostalba** » Mon Mar 12, 2012 8:34 pm

gladius wrote: A question regarding the opening book to use for CLOP. Do you think it's better to use different neutral opening positions (based on CLOP seed), or just start from the start position?

We use varied.bin or performance.bin

http://wbec-ridderkerk.nl/html/download.htm

But we never made a serious opening book test: different tournaments by same players with different opening books / start positions (or even without any book) and verifying the results were consistent.

Rémi Coulom · Post by **Rémi Coulom** » Mon Mar 12, 2012 11:03 pm

gladius wrote:A question regarding the opening book to use for CLOP. Do you think it's better to use different neutral opening positions (based on CLOP seed), or just start from the start position?

I was using fixed depth testing with the clop-cutechess adapter (which is awesome, thanks Ilari!), and with only small changes to the parameters, was ending up with the same game a lot of the time. So, I switched it to use a different opening position for each trial. Not sure yet how it's working out.

It is good to introduce variety in the opening.

It is not extremely important to have balanced openings if you use replications: just play one game as white, and one game as black for each opening.

I don't know if your cutechess scripts alternated colors or not. But if the program always plays white, then that may explain some optimistic bias in the win rate reported by CLOP, too.

Rémi

gladius · Post by **gladius** » Tue Mar 13, 2012 3:19 pm

mcostalba wrote:We use varied.bin or performance.bin

http://wbec-ridderkerk.nl/html/download.htm

But we never made a serious opening book test: different tournaments by same players with different opening books / start positions (or even without any book) and verifying the results were consistent.

Thanks Marco. Testing with varied.bin on fixed node count games on parameters that have big elo impact (like weighting material from 0-100%) worked very well. It quickly figured out that low material % is bad

.

gladius · Post by **gladius** » Tue Mar 13, 2012 3:21 pm

Rémi Coulom wrote:It is good to introduce variety in the opening.

It is not extremely important to have balanced openings if you use replications: just play one game as white, and one game as black for each opening.

I don't know if your cutechess scripts alternated colors or not. But if the program always plays white, then that may explain some optimistic bias in the win rate reported by CLOP, too.

Rémi

I was using the "Replications 2" parameter - however, I was not setting the cutechess srand seed based on clop seed, so it was not useful! I fixed that, and the resulting test (using material as in the post above) was very successful.

Thanks!

CLOP on Stockfish

CLOP on Stockfish

Re: CLOP on Stockfish

Re: CLOP on Stockfish

Re: CLOP on Stockfish

Re: CLOP on Stockfish

Re: CLOP on Stockfish

Re: CLOP on Stockfish

Re: CLOP on Stockfish

Re: CLOP on Stockfish

Re: CLOP on Stockfish