I've been working on the ThreatAttacks evaluation in Stockfish, trying to come up with an improvement. So, I decided to try CLOP out.
I CLOPtomized the 5 parameters (code diff: https://github.com/glinscott/Stockfish/ ... te_threats). After 7500 games, CLOP was reporting excellent numbers in the weighted ELO of 28, with 95% LCB of 19.
The Max tab was showing a Mean Elo of 21 though, with 95% LCB of -3. There was no Max found.
Running 1500 games at 40/2sec+0.01 against base Stockfish, ended up at -3 ELO. So, it's close to being good, but not an improvement yet.
Any thoughts on where I could be going wrong with my CLOP usage? Just not enough games in CLOP land? I tested CLOP at 40/1sec initially, and the values I got tested at about -30ELO. So, it's definitely doing better at 40/2sec, but, still - not beating base Stockfish .
CLOP on Stockfish
Moderators: hgm, Rebel, chrisw
-
- Posts: 2684
- Joined: Sat Jun 14, 2008 9:17 pm
Re: CLOP on Stockfish
I am very interested in your tests with CLOP, I have looked at your patch and here are some quick comments:
1) 7500 games is defenitely to low number for 5 parameters tuning
2) You not only have added tuning code but have also changed the semantic unifying pawn threats with others pieces threats and implementing MultipleThreatBonus idea. I'd suggest to proceed in a more step by step approach. First rewrite the code in the way that you like but preserving the same functionality (verify with ./stockfish bench) this is very important to don't become crazy following multiple patterns at the same time, remember that you are chasing for 5 ELO maximum increase so it is mandatory to keep external noise level at the minimum. Then, once you have added your tuning code and be sure that functionality is the same, then start tuning but I'd suggest to start with 2-3 parameters maximum or, OTH, run much longer, say 30K-40K games.
4) If instead you want to test new ideas, I'd suggest to test one by one.
3) CLOP makes no miracles and tuning is hard and requires a lot of self discipline, given that SF is a mature engine it is even harder. But don't give up: efforts will be rewarded in the long term, you just need a lot of "patience and perseverance", this can well be the chess engine developer motto
1) 7500 games is defenitely to low number for 5 parameters tuning
2) You not only have added tuning code but have also changed the semantic unifying pawn threats with others pieces threats and implementing MultipleThreatBonus idea. I'd suggest to proceed in a more step by step approach. First rewrite the code in the way that you like but preserving the same functionality (verify with ./stockfish bench) this is very important to don't become crazy following multiple patterns at the same time, remember that you are chasing for 5 ELO maximum increase so it is mandatory to keep external noise level at the minimum. Then, once you have added your tuning code and be sure that functionality is the same, then start tuning but I'd suggest to start with 2-3 parameters maximum or, OTH, run much longer, say 30K-40K games.
4) If instead you want to test new ideas, I'd suggest to test one by one.
3) CLOP makes no miracles and tuning is hard and requires a lot of self discipline, given that SF is a mature engine it is even harder. But don't give up: efforts will be rewarded in the long term, you just need a lot of "patience and perseverance", this can well be the chess engine developer motto
-
- Posts: 568
- Joined: Tue Dec 12, 2006 10:10 am
- Full name: Gary Linscott
Re: CLOP on Stockfish
Thanks Marco! I agree that the patch is attempting to take on too much in one go. It was my first attempt with CLOP, so more of an experiment.
Verifying with bench was indeed very important. When all parameters were set at 100 initially, it was same as default. And CLOP did cluster the values around 100, so it seems to be working well.
I'll try breaking it up into smaller chunks and seeing how it goes.
Verifying with bench was indeed very important. When all parameters were set at 100 initially, it was same as default. And CLOP did cluster the values around 100, so it seems to be working well.
I'll try breaking it up into smaller chunks and seeing how it goes.
-
- Posts: 438
- Joined: Mon Apr 24, 2006 8:06 pm
Re: CLOP on Stockfish
The win rates CLOP reports are biased towards optimistic values. So you should expect that the strength of the program you get at the end is inferior to what CLOP reports.
If you wish to optimize for small improvements (a few Elo points), then you need to play many more games.
Rémi
If you wish to optimize for small improvements (a few Elo points), then you need to play many more games.
Rémi
-
- Posts: 568
- Joined: Tue Dec 12, 2006 10:10 am
- Full name: Gary Linscott
Re: CLOP on Stockfish
Thanks Rémi, good to know that the Elo is a bit optimistic. CLOP is a fantastic tool btw! Thanks for releasing it for us all to play with .Rémi Coulom wrote:The win rates CLOP reports are biased towards optimistic values. So you should expect that the strength of the program you get at the end is inferior to what CLOP reports.
If you wish to optimize for small improvements (a few Elo points), then you need to play many more games.
Rémi
-
- Posts: 568
- Joined: Tue Dec 12, 2006 10:10 am
- Full name: Gary Linscott
Re: CLOP on Stockfish
A question regarding the opening book to use for CLOP. Do you think it's better to use different neutral opening positions (based on CLOP seed), or just start from the start position?Rémi Coulom wrote:The win rates CLOP reports are biased towards optimistic values. So you should expect that the strength of the program you get at the end is inferior to what CLOP reports.
If you wish to optimize for small improvements (a few Elo points), then you need to play many more games.
Rémi
I was using fixed depth testing with the clop-cutechess adapter (which is awesome, thanks Ilari!), and with only small changes to the parameters, was ending up with the same game a lot of the time. So, I switched it to use a different opening position for each trial. Not sure yet how it's working out.
-
- Posts: 2684
- Joined: Sat Jun 14, 2008 9:17 pm
Re: CLOP on Stockfish
We use varied.bin or performance.bingladius wrote: A question regarding the opening book to use for CLOP. Do you think it's better to use different neutral opening positions (based on CLOP seed), or just start from the start position?
http://wbec-ridderkerk.nl/html/download.htm
But we never made a serious opening book test: different tournaments by same players with different opening books / start positions (or even without any book) and verifying the results were consistent.
-
- Posts: 438
- Joined: Mon Apr 24, 2006 8:06 pm
Re: CLOP on Stockfish
It is good to introduce variety in the opening.gladius wrote:A question regarding the opening book to use for CLOP. Do you think it's better to use different neutral opening positions (based on CLOP seed), or just start from the start position?
I was using fixed depth testing with the clop-cutechess adapter (which is awesome, thanks Ilari!), and with only small changes to the parameters, was ending up with the same game a lot of the time. So, I switched it to use a different opening position for each trial. Not sure yet how it's working out.
It is not extremely important to have balanced openings if you use replications: just play one game as white, and one game as black for each opening.
I don't know if your cutechess scripts alternated colors or not. But if the program always plays white, then that may explain some optimistic bias in the win rate reported by CLOP, too.
Rémi
-
- Posts: 568
- Joined: Tue Dec 12, 2006 10:10 am
- Full name: Gary Linscott
Re: CLOP on Stockfish
Thanks Marco. Testing with varied.bin on fixed node count games on parameters that have big elo impact (like weighting material from 0-100%) worked very well. It quickly figured out that low material % is bad .mcostalba wrote:We use varied.bin or performance.bin
http://wbec-ridderkerk.nl/html/download.htm
But we never made a serious opening book test: different tournaments by same players with different opening books / start positions (or even without any book) and verifying the results were consistent.
-
- Posts: 568
- Joined: Tue Dec 12, 2006 10:10 am
- Full name: Gary Linscott
Re: CLOP on Stockfish
I was using the "Replications 2" parameter - however, I was not setting the cutechess srand seed based on clop seed, so it was not useful! I fixed that, and the resulting test (using material as in the post above) was very successful.Rémi Coulom wrote:It is good to introduce variety in the opening.
It is not extremely important to have balanced openings if you use replications: just play one game as white, and one game as black for each opening.
I don't know if your cutechess scripts alternated colors or not. But if the program always plays white, then that may explain some optimistic bias in the win rate reported by CLOP, too.
Rémi
Thanks!