CLOP on Stockfish

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

gladius
Posts: 568
Joined: Tue Dec 12, 2006 10:10 am
Full name: Gary Linscott

CLOP on Stockfish

Post by gladius »

I've been working on the ThreatAttacks evaluation in Stockfish, trying to come up with an improvement. So, I decided to try CLOP out.

I CLOPtomized the 5 parameters (code diff: https://github.com/glinscott/Stockfish/ ... te_threats). After 7500 games, CLOP was reporting excellent numbers in the weighted ELO of 28, with 95% LCB of 19.

The Max tab was showing a Mean Elo of 21 though, with 95% LCB of -3. There was no Max found.

Running 1500 games at 40/2sec+0.01 against base Stockfish, ended up at -3 ELO. So, it's close to being good, but not an improvement yet.

Any thoughts on where I could be going wrong with my CLOP usage? Just not enough games in CLOP land? I tested CLOP at 40/1sec initially, and the values I got tested at about -30ELO. So, it's definitely doing better at 40/2sec, but, still - not beating base Stockfish :).
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: CLOP on Stockfish

Post by mcostalba »

I am very interested in your tests with CLOP, I have looked at your patch and here are some quick comments:

1) 7500 games is defenitely to low number for 5 parameters tuning

2) You not only have added tuning code but have also changed the semantic unifying pawn threats with others pieces threats and implementing MultipleThreatBonus idea. I'd suggest to proceed in a more step by step approach. First rewrite the code in the way that you like but preserving the same functionality (verify with ./stockfish bench) this is very important to don't become crazy following multiple patterns at the same time, remember that you are chasing for 5 ELO maximum increase so it is mandatory to keep external noise level at the minimum. Then, once you have added your tuning code and be sure that functionality is the same, then start tuning but I'd suggest to start with 2-3 parameters maximum or, OTH, run much longer, say 30K-40K games.

4) If instead you want to test new ideas, I'd suggest to test one by one.

3) CLOP makes no miracles and tuning is hard and requires a lot of self discipline, given that SF is a mature engine it is even harder. But don't give up: efforts will be rewarded in the long term, you just need a lot of "patience and perseverance", this can well be the chess engine developer motto :-)
gladius
Posts: 568
Joined: Tue Dec 12, 2006 10:10 am
Full name: Gary Linscott

Re: CLOP on Stockfish

Post by gladius »

Thanks Marco! I agree that the patch is attempting to take on too much in one go. It was my first attempt with CLOP, so more of an experiment.

Verifying with bench was indeed very important. When all parameters were set at 100 initially, it was same as default. And CLOP did cluster the values around 100, so it seems to be working well.

I'll try breaking it up into smaller chunks and seeing how it goes.
Rémi Coulom
Posts: 438
Joined: Mon Apr 24, 2006 8:06 pm

Re: CLOP on Stockfish

Post by Rémi Coulom »

The win rates CLOP reports are biased towards optimistic values. So you should expect that the strength of the program you get at the end is inferior to what CLOP reports.

If you wish to optimize for small improvements (a few Elo points), then you need to play many more games.

Rémi
gladius
Posts: 568
Joined: Tue Dec 12, 2006 10:10 am
Full name: Gary Linscott

Re: CLOP on Stockfish

Post by gladius »

Rémi Coulom wrote:The win rates CLOP reports are biased towards optimistic values. So you should expect that the strength of the program you get at the end is inferior to what CLOP reports.

If you wish to optimize for small improvements (a few Elo points), then you need to play many more games.

Rémi
Thanks Rémi, good to know that the Elo is a bit optimistic. CLOP is a fantastic tool btw! Thanks for releasing it for us all to play with :).
gladius
Posts: 568
Joined: Tue Dec 12, 2006 10:10 am
Full name: Gary Linscott

Re: CLOP on Stockfish

Post by gladius »

Rémi Coulom wrote:The win rates CLOP reports are biased towards optimistic values. So you should expect that the strength of the program you get at the end is inferior to what CLOP reports.

If you wish to optimize for small improvements (a few Elo points), then you need to play many more games.

Rémi
A question regarding the opening book to use for CLOP. Do you think it's better to use different neutral opening positions (based on CLOP seed), or just start from the start position?

I was using fixed depth testing with the clop-cutechess adapter (which is awesome, thanks Ilari!), and with only small changes to the parameters, was ending up with the same game a lot of the time. So, I switched it to use a different opening position for each trial. Not sure yet how it's working out.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: CLOP on Stockfish

Post by mcostalba »

gladius wrote: A question regarding the opening book to use for CLOP. Do you think it's better to use different neutral opening positions (based on CLOP seed), or just start from the start position?
We use varied.bin or performance.bin

http://wbec-ridderkerk.nl/html/download.htm

But we never made a serious opening book test: different tournaments by same players with different opening books / start positions (or even without any book) and verifying the results were consistent.
Rémi Coulom
Posts: 438
Joined: Mon Apr 24, 2006 8:06 pm

Re: CLOP on Stockfish

Post by Rémi Coulom »

gladius wrote:A question regarding the opening book to use for CLOP. Do you think it's better to use different neutral opening positions (based on CLOP seed), or just start from the start position?

I was using fixed depth testing with the clop-cutechess adapter (which is awesome, thanks Ilari!), and with only small changes to the parameters, was ending up with the same game a lot of the time. So, I switched it to use a different opening position for each trial. Not sure yet how it's working out.
It is good to introduce variety in the opening.

It is not extremely important to have balanced openings if you use replications: just play one game as white, and one game as black for each opening.

I don't know if your cutechess scripts alternated colors or not. But if the program always plays white, then that may explain some optimistic bias in the win rate reported by CLOP, too.

Rémi
gladius
Posts: 568
Joined: Tue Dec 12, 2006 10:10 am
Full name: Gary Linscott

Re: CLOP on Stockfish

Post by gladius »

mcostalba wrote:We use varied.bin or performance.bin

http://wbec-ridderkerk.nl/html/download.htm

But we never made a serious opening book test: different tournaments by same players with different opening books / start positions (or even without any book) and verifying the results were consistent.
Thanks Marco. Testing with varied.bin on fixed node count games on parameters that have big elo impact (like weighting material from 0-100%) worked very well. It quickly figured out that low material % is bad :).
gladius
Posts: 568
Joined: Tue Dec 12, 2006 10:10 am
Full name: Gary Linscott

Re: CLOP on Stockfish

Post by gladius »

Rémi Coulom wrote:It is good to introduce variety in the opening.

It is not extremely important to have balanced openings if you use replications: just play one game as white, and one game as black for each opening.

I don't know if your cutechess scripts alternated colors or not. But if the program always plays white, then that may explain some optimistic bias in the win rate reported by CLOP, too.

Rémi
I was using the "Replications 2" parameter - however, I was not setting the cutechess srand seed based on clop seed, so it was not useful! I fixed that, and the resulting test (using material as in the post above) was very successful.

Thanks!