Python script for TTM

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: Python script for TTM

Post by lucasart »

I just did the logistic regression, and found the best fit for:

score = 1 / (1 + exp(-lambda * qsearch))

where:
* score = 0, 0.5, 1 is the result of the game
* qsearch scale is 200 = 1 Pawn
* lambda = 0.00105

For example, when the qsearch says we're a rook down (qsearch = -1080 = -5.4 pawns), we still have a predicted score of 24.3%.

This is a bit surprising, and I expected a somewhat higher value for lambda.

What kind of values do people get ?

I'll have a look at depth=1 and depth=2, to see how much more predictive it gets. qsearch looks quite noisy.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
cetormenter
Posts: 170
Joined: Sun Oct 28, 2012 9:46 pm

Re: Python script for TTM

Post by cetormenter »

For Nirvana I get a lambda of ~0.00279. Which gives a predicted score of ~5%. That's still a bit higher than I would have expected. I don't think that exact tuning of this value is really needed. Is probably something that only needs to be "good enough" to kick things off.
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: Python script for TTM

Post by lucasart »

cetormenter wrote:For Nirvana I get a lambda of ~0.00279. Which gives a predicted score of ~5%. That's still a bit higher than I would have expected. I don't think that exact tuning of this value is really needed. Is probably something that only needs to be "good enough" to kick things off.
Thanks. I think there can be 2 reasons why my lambda is so low:
1. Training positions are of poor quality
2. My qsearch is not very predictive of game results (ie. my eval sucks).

I have a 2300 elo patzer, and generated training positions with self play, at depth=6 games, so it's worth testing hypothesis 1. I'll try again with better positions (generated with a stronger engine).
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Python script for TTM

Post by Ferdy »

lucasart wrote:I ended up writing it in C++ inside the engine, including concurrency. On my 4 core i7 (using 8 threads), it runs one iteration of 7.83m positions in 5.35s (1.46m qsearch/s). I think speed is key here, because the optimization process must run lots of iterations.
Speed matters if you are in a hurry :). Also what I have and other perhaps too, can be done incrementally, run for a couple of hours, stop, test in actual game the current best parameters produced. Then continue the tuning (using best parameters and errors found last time) again, get some sampling and test it and so on, it can be run in days, weeks, months until the error becomes zero or the errors could no longer be improved. But from experience in doing this stuff including other variant games, it is the selection of training positions that is very important.
Ferdinand: what kind of speed to you get with a script (in qsearch/s) ?
From an old log on i7-2600K a 4-core proc.
Using 3 threads, every thread is handling 680k pos, taking 115s.
So 3 x 680000 ~ 2M in 115s ~ 17K/s

This 115s is only for tuning 1 value in a given parameter. Every parameter takes 4 steps i.e +1, +2, -1, -2, x [increment ] then next parameter.
If step 1 is successful giving lesser error, then this would save time, as it would advance to the next parameter to be tuned.

Also after profiling, it is the engine's speed to initialize and setup the positions that is critical. I remember mine is slower than that of Stockfish.

Code: Select all

for fen in fen_list:
    send(ucinewgame)
    send(position fen)  # bottleneck
    send(go depth 0)
Sample log.

Code: Select all

['MidGameMobilityPercent', 100, 6, 0, 1000] 
<param>, <default>, <increment>, <min>, <max>
param to tune:
['MidGameMobilityPercent', 100, 6, 0, 1000]
['EndGameMobilityPercent', 100, 6, 0, 1000]
['ConnectedPasserPercent', 16, 6, 0, 100]
['BlockedPasserPenaltyPercent', 70, 4, 0, 100]
['TwoBishopAdv', 50, 1, 20, 80]
['Rook7thRankMg', 15, 1, 0, 200]
['Rook7thRankEg', 50, 1, 0, 200]
['Rook8thRankMg', 15, 1, 0, 200]
['Rook8thRankEg', 50, 1, 0, 200]
['PawnValueMg', 89, 1, 70, 110]
['PawnValueEg', 118, 1, 90, 150]
['RookThreat', 100, 4, 0, 1000]
['QueenThreat', 100, 4, 0, 1000]
['ThreatOnWeak', 100, 8, 0, 1000]
['ThreatOnStrong', 50, 8, 0, 1000]
StartE: 1000.0000000000

Get initial AveError ...
Thread 1, time: 115s, aveError: 89.5723168436

Thread 3, time: 115s, aveError: 89.4593985319

Thread 2, time: 115s, aveError: 89.7424590188


startAveError: 89.5913914648

cycle: 1

param history sorted score:
['MidGameMobilityPercent', 0]
['EndGameMobilityPercent', 0]
['ConnectedPasserPercent', 0]
['BlockedPasserPenaltyPercent', 0]
['TwoBishopAdv', 0]
['Rook7thRankMg', 0]
['Rook7thRankEg', 0]
['Rook8thRankMg', 0]
['Rook8thRankEg', 0]
['PawnValueMg', 0]
['PawnValueEg', 0]
['RookThreat', 0]
['QueenThreat', 0]
['ThreatOnWeak', 0]
['ThreatOnStrong', 0]

new param order:
['MidGameMobilityPercent', 100, 6, 0, 1000]
['EndGameMobilityPercent', 100, 6, 0, 1000]
['ConnectedPasserPercent', 16, 6, 0, 100]
['BlockedPasserPenaltyPercent', 70, 4, 0, 100]
['TwoBishopAdv', 50, 1, 20, 80]
['Rook7thRankMg', 15, 1, 0, 200]
['Rook7thRankEg', 50, 1, 0, 200]
['Rook8thRankMg', 15, 1, 0, 200]
['Rook8thRankEg', 50, 1, 0, 200]
['PawnValueMg', 89, 1, 70, 110]
['PawnValueEg', 118, 1, 90, 150]
['RookThreat', 100, 4, 0, 1000]
['QueenThreat', 100, 4, 0, 1000]
['ThreatOnWeak', 100, 8, 0, 1000]
['ThreatOnStrong', 50, 8, 0, 1000]

cycle 1, param to optimize: MidGameMobilityPercent
Try MidGameMobilityPercent = 106

History: cycle = 1, good = 0, step = +1, bestAveError = 89.5913914648

Thread 1, time: 116s, aveError: 89.5662467506
Thread 2, time: 116s, aveError: 89.7380670670


Thread 3, time: 116s, aveError: 89.4556463857


newAveError: 89.5866534011

newAveError is better!!

Param values update:
['MidGameMobilityPercent', 106, 6, 0, 1000]
['EndGameMobilityPercent', 100, 6, 0, 1000]
['ConnectedPasserPercent', 16, 6, 0, 100]
['BlockedPasserPenaltyPercent', 70, 4, 0, 100]
['TwoBishopAdv', 50, 1, 20, 80]
['Rook7thRankMg', 15, 1, 0, 200]
['Rook7thRankEg', 50, 1, 0, 200]
['Rook8thRankMg', 15, 1, 0, 200]
['Rook8thRankEg', 50, 1, 0, 200]
['PawnValueMg', 89, 1, 70, 110]
['PawnValueEg', 118, 1, 90, 150]
['RookThreat', 100, 4, 0, 1000]
['QueenThreat', 100, 4, 0, 1000]
['ThreatOnWeak', 100, 8, 0, 1000]
['ThreatOnStrong', 50, 8, 0, 1000]

cycle 1, param to optimize: EndGameMobilityPercent
Try EndGameMobilityPercent = 106
...
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: Python script for TTM

Post by lucasart »

lucasart wrote:
cetormenter wrote:For Nirvana I get a lambda of ~0.00279. Which gives a predicted score of ~5%. That's still a bit higher than I would have expected. I don't think that exact tuning of this value is really needed. Is probably something that only needs to be "good enough" to kick things off.
Thanks. I think there can be 2 reasons why my lambda is so low:
1. Training positions are of poor quality
2. My qsearch is not very predictive of game results (ie. my eval sucks).

I have a 2300 elo patzer, and generated training positions with self play, at depth=6 games, so it's worth testing hypothesis 1. I'll try again with better positions (generated with a stronger engine).
I regenerated some positions out of better quality games, and already my lambda has improved. Now lambda = 0.0013, which gives an expected score of 19.7% for being a rook down. Still high, but better, and the error (average squared difference to logistic curve) has also noticeably reduced.

However, I still cannot get anything sensible out of this tuning method. I tried a few things, every time reducing the error, but they were all losing elo :cry:
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
cetormenter
Posts: 170
Joined: Sun Oct 28, 2012 9:46 pm

Re: Python script for TTM

Post by cetormenter »

Hopefully you aren't taking the values as they are. For example for large arrays (such a mobility) I created a function to fit the values. This proved to give MUCH better results than simply tuning each parameter themselves (and it was much quicker as well!). Even with millions of games it is very easy to have overtuned valued simply because the cases in which they are used are simply too rare (how many times per game is a does a rook have a mobility value of exactly 10, 11, etc).

My function was pretty simple. min + (max - min) * pow(i, slope) / (pow (arrSize - 1, slope);
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Python script for TTM

Post by Ferdy »

lucasart wrote:
lucasart wrote:
cetormenter wrote:For Nirvana I get a lambda of ~0.00279. Which gives a predicted score of ~5%. That's still a bit higher than I would have expected. I don't think that exact tuning of this value is really needed. Is probably something that only needs to be "good enough" to kick things off.
Thanks. I think there can be 2 reasons why my lambda is so low:
1. Training positions are of poor quality
2. My qsearch is not very predictive of game results (ie. my eval sucks).

I have a 2300 elo patzer, and generated training positions with self play, at depth=6 games, so it's worth testing hypothesis 1. I'll try again with better positions (generated with a stronger engine).
I regenerated some positions out of better quality games, and already my lambda has improved. Now lambda = 0.0013, which gives an expected score of 19.7% for being a rook down. Still high, but better, and the error (average squared difference to logistic curve) has also noticeably reduced.

However, I still cannot get anything sensible out of this tuning method. I tried a few things, every time reducing the error, but they were all losing elo :cry:
Try this method of measuring performance of the tuned engine.

1. Create a match, engine_untrained vs engine_trained using some positions from your training sets as game starting positions. We expect that engine_trained would win such match because this engine is trained with those specific positions. If engine_trained wins, you should already be happy with it. I remember training an engine on sicilian defense positions, and when I test it on sicilian openings, it won.

2. Now try to create a game starting positions from your training sets plus positions that are not in your training sets. Make it 50-50 and create a game match based from those starting positions. Call it 50/50 test, 50% training sets / 50% untrained sets.

3. Try the 0/100 test, the ultimate test which is very difficult to pass.
AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: Python script for TTM

Post by AlvaroBegue »

lucasart wrote:I regenerated some positions out of better quality games, and already my lambda has improved. Now lambda = 0.0013, which gives an expected score of 19.7% for being a rook down. Still high, but better, and the error (average squared difference to logistic curve) has also noticeably reduced.

However, I still cannot get anything sensible out of this tuning method. I tried a few things, every time reducing the error, but they were all losing elo :cry:
Reducing the error in the training set could mean that you are overfitting your data. The standard approach in the machine learning community is to divide your data into three sets: training, validation and test. You could use something like 60% for the training set, 20% for the validation set and 20% for the test set.

The training set is where you actively minimize the error function. The validation set is used to measure how well variants of your training work. Once you are satisfied with the performance of your algorithm, you can measure its true out-of-sample performance using the test set. This should only be done once, and then you can be sure the measure is correct.

https://en.wikipedia.org/wiki/Test_set
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: Python script for TTM

Post by lucasart »

cetormenter wrote:Hopefully you aren't taking the values as they are. For example for large arrays (such a mobility) I created a function to fit the values. This proved to give MUCH better results than simply tuning each parameter themselves (and it was much quicker as well!). Even with millions of games it is very easy to have overtuned valued simply because the cases in which they are used are simply too rare (how many times per game is a does a rook have a mobility value of exactly 10, 11, etc).

My function was pretty simple. min + (max - min) * pow(i, slope) / (pow (arrSize - 1, slope);
I tried 3 separate patches:
* increase opening pawn value from 0.8 to 1 (relative to endgame pawn value = 1)
* increase bishop pair value in the opening from 0.4 to 0.5
* decrease rook and queen material value by 0.1

So I don't think it's a case of "too rare to tune". But, of course, you make a good point here.

Each patch improved the logistic fit (whether in isolation or accumulated). Each was tested individually, and failed to show an elo gain (SPRT, elo0=0, elo1=4, alpha=beta=0.05). And they failed pretty hard too (ie. we're not talking about a tiny but hard to measure gain).

I'll regenerate the training positions, with random openings this time (eg. 3 random moves after the starting positions, no filtering, accept clearly winning or losing positions).
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: Python script for TTM

Post by lucasart »

AlvaroBegue wrote:
lucasart wrote:I regenerated some positions out of better quality games, and already my lambda has improved. Now lambda = 0.0013, which gives an expected score of 19.7% for being a rook down. Still high, but better, and the error (average squared difference to logistic curve) has also noticeably reduced.

However, I still cannot get anything sensible out of this tuning method. I tried a few things, every time reducing the error, but they were all losing elo :cry:
Reducing the error in the training set could mean that you are overfitting your data. The standard approach in the machine learning community is to divide your data into three sets: training, validation and test. You could use something like 60% for the training set, 20% for the validation set and 20% for the test set.

The training set is where you actively minimize the error function. The validation set is used to measure how well variants of your training work. Once you are satisfied with the performance of your algorithm, you can measure its true out-of-sample performance using the test set. This should only be done once, and then you can be sure the measure is correct.

https://en.wikipedia.org/wiki/Test_set
Good point about out of sample validation. I wonder if this could be done more properly, by computing error bars on the average squared error. Can this be done analytically, or does one need to resort to monte carlo methods ?
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.