Python script for TTM

lucasart · Post by **lucasart** » Sun Oct 30, 2016 3:05 am

I just did the logistic regression, and found the best fit for:

score = 1 / (1 + exp(-lambda * qsearch))

where:
* score = 0, 0.5, 1 is the result of the game
* qsearch scale is 200 = 1 Pawn
* lambda = 0.00105

For example, when the qsearch says we're a rook down (qsearch = -1080 = -5.4 pawns), we still have a predicted score of 24.3%.

This is a bit surprising, and I expected a somewhat higher value for lambda.

What kind of values do people get ?

I'll have a look at depth=1 and depth=2, to see how much more predictive it gets. qsearch looks quite noisy.

cetormenter · Post by **cetormenter** » Sun Oct 30, 2016 4:16 am

For Nirvana I get a lambda of ~0.00279. Which gives a predicted score of ~5%. That's still a bit higher than I would have expected. I don't think that exact tuning of this value is really needed. Is probably something that only needs to be "good enough" to kick things off.

lucasart · Post by **lucasart** » Sun Oct 30, 2016 6:18 am

cetormenter wrote:For Nirvana I get a lambda of ~0.00279. Which gives a predicted score of ~5%. That's still a bit higher than I would have expected. I don't think that exact tuning of this value is really needed. Is probably something that only needs to be "good enough" to kick things off.

Thanks. I think there can be 2 reasons why my lambda is so low:
1. Training positions are of poor quality
2. My qsearch is not very predictive of game results (ie. my eval sucks).

I have a 2300 elo patzer, and generated training positions with self play, at depth=6 games, so it's worth testing hypothesis 1. I'll try again with better positions (generated with a stronger engine).

Ferdy · Post by **Ferdy** » Sun Oct 30, 2016 6:41 am

lucasart wrote:I ended up writing it in C++ inside the engine, including concurrency. On my 4 core i7 (using 8 threads), it runs one iteration of 7.83m positions in 5.35s (1.46m qsearch/s). I think speed is key here, because the optimization process must run lots of iterations.

Speed matters if you are in a hurry

. Also what I have and other perhaps too, can be done incrementally, run for a couple of hours, stop, test in actual game the current best parameters produced. Then continue the tuning (using best parameters and errors found last time) again, get some sampling and test it and so on, it can be run in days, weeks, months until the error becomes zero or the errors could no longer be improved. But from experience in doing this stuff including other variant games, it is the selection of training positions that is very important.

Ferdinand: what kind of speed to you get with a script (in qsearch/s) ?

From an old log on i7-2600K a 4-core proc.
Using 3 threads, every thread is handling 680k pos, taking 115s.
So 3 x 680000 ~ 2M in 115s ~ 17K/s

This 115s is only for tuning 1 value in a given parameter. Every parameter takes 4 steps i.e +1, +2, -1, -2, x [increment ] then next parameter.
If step 1 is successful giving lesser error, then this would save time, as it would advance to the next parameter to be tuned.

Also after profiling, it is the engine's speed to initialize and setup the positions that is critical. I remember mine is slower than that of Stockfish.

Code: Select all

for fen in fen_list&#58;
    send&#40;ucinewgame&#41;
    send&#40;position fen&#41;  # bottleneck
    send&#40;go depth 0&#41;

Sample log.

Code: Select all

&#91;'MidGameMobilityPercent', 100, 6, 0, 1000&#93; 
<param>, <default>, <increment>, <min>, <max>

param to tune:
['MidGameMobilityPercent', 100, 6, 0, 1000]
['EndGameMobilityPercent', 100, 6, 0, 1000]
['ConnectedPasserPercent', 16, 6, 0, 100]
['BlockedPasserPenaltyPercent', 70, 4, 0, 100]
['TwoBishopAdv', 50, 1, 20, 80]
['Rook7thRankMg', 15, 1, 0, 200]
['Rook7thRankEg', 50, 1, 0, 200]
['Rook8thRankMg', 15, 1, 0, 200]
['Rook8thRankEg', 50, 1, 0, 200]
['PawnValueMg', 89, 1, 70, 110]
['PawnValueEg', 118, 1, 90, 150]
['RookThreat', 100, 4, 0, 1000]
['QueenThreat', 100, 4, 0, 1000]
['ThreatOnWeak', 100, 8, 0, 1000]
['ThreatOnStrong', 50, 8, 0, 1000]
StartE: 1000.0000000000

Get initial AveError ...
Thread 1, time: 115s, aveError: 89.5723168436

Thread 3, time: 115s, aveError: 89.4593985319

Thread 2, time: 115s, aveError: 89.7424590188

startAveError: 89.5913914648

cycle: 1

param history sorted score:
['MidGameMobilityPercent', 0]
['EndGameMobilityPercent', 0]
['ConnectedPasserPercent', 0]
['BlockedPasserPenaltyPercent', 0]
['TwoBishopAdv', 0]
['Rook7thRankMg', 0]
['Rook7thRankEg', 0]
['Rook8thRankMg', 0]
['Rook8thRankEg', 0]
['PawnValueMg', 0]
['PawnValueEg', 0]
['RookThreat', 0]
['QueenThreat', 0]
['ThreatOnWeak', 0]
['ThreatOnStrong', 0]

new param order:
['MidGameMobilityPercent', 100, 6, 0, 1000]
['EndGameMobilityPercent', 100, 6, 0, 1000]
['ConnectedPasserPercent', 16, 6, 0, 100]
['BlockedPasserPenaltyPercent', 70, 4, 0, 100]
['TwoBishopAdv', 50, 1, 20, 80]
['Rook7thRankMg', 15, 1, 0, 200]
['Rook7thRankEg', 50, 1, 0, 200]
['Rook8thRankMg', 15, 1, 0, 200]
['Rook8thRankEg', 50, 1, 0, 200]
['PawnValueMg', 89, 1, 70, 110]
['PawnValueEg', 118, 1, 90, 150]
['RookThreat', 100, 4, 0, 1000]
['QueenThreat', 100, 4, 0, 1000]
['ThreatOnWeak', 100, 8, 0, 1000]
['ThreatOnStrong', 50, 8, 0, 1000]

cycle 1, param to optimize: MidGameMobilityPercent
Try MidGameMobilityPercent = 106

History: cycle = 1, good = 0, step = +1, bestAveError = 89.5913914648

Thread 1, time: 116s, aveError: 89.5662467506
Thread 2, time: 116s, aveError: 89.7380670670

Thread 3, time: 116s, aveError: 89.4556463857

newAveError: 89.5866534011

newAveError is better!!

Param values update:
['MidGameMobilityPercent', 106, 6, 0, 1000]
['EndGameMobilityPercent', 100, 6, 0, 1000]
['ConnectedPasserPercent', 16, 6, 0, 100]
['BlockedPasserPenaltyPercent', 70, 4, 0, 100]
['TwoBishopAdv', 50, 1, 20, 80]
['Rook7thRankMg', 15, 1, 0, 200]
['Rook7thRankEg', 50, 1, 0, 200]
['Rook8thRankMg', 15, 1, 0, 200]
['Rook8thRankEg', 50, 1, 0, 200]
['PawnValueMg', 89, 1, 70, 110]
['PawnValueEg', 118, 1, 90, 150]
['RookThreat', 100, 4, 0, 1000]
['QueenThreat', 100, 4, 0, 1000]
['ThreatOnWeak', 100, 8, 0, 1000]
['ThreatOnStrong', 50, 8, 0, 1000]

cycle 1, param to optimize: EndGameMobilityPercent
Try EndGameMobilityPercent = 106
...

lucasart · Post by **lucasart** » Mon Oct 31, 2016 12:11 am

lucasart wrote:
cetormenter wrote:For Nirvana I get a lambda of ~0.00279. Which gives a predicted score of ~5%. That's still a bit higher than I would have expected. I don't think that exact tuning of this value is really needed. Is probably something that only needs to be "good enough" to kick things off.
Thanks. I think there can be 2 reasons why my lambda is so low:
1. Training positions are of poor quality
2. My qsearch is not very predictive of game results (ie. my eval sucks).

I have a 2300 elo patzer, and generated training positions with self play, at depth=6 games, so it's worth testing hypothesis 1. I'll try again with better positions (generated with a stronger engine).

I regenerated some positions out of better quality games, and already my lambda has improved. Now lambda = 0.0013, which gives an expected score of 19.7% for being a rook down. Still high, but better, and the error (average squared difference to logistic curve) has also noticeably reduced.

However, I still cannot get anything sensible out of this tuning method. I tried a few things, every time reducing the error, but they were all losing elo

cetormenter · Post by **cetormenter** » Mon Oct 31, 2016 1:55 am

Hopefully you aren't taking the values as they are. For example for large arrays (such a mobility) I created a function to fit the values. This proved to give MUCH better results than simply tuning each parameter themselves (and it was much quicker as well!). Even with millions of games it is very easy to have overtuned valued simply because the cases in which they are used are simply too rare (how many times per game is a does a rook have a mobility value of exactly 10, 11, etc).

My function was pretty simple. min + (max - min) * pow(i, slope) / (pow (arrSize - 1, slope);

Ferdy · Post by **Ferdy** » Mon Oct 31, 2016 2:14 am

lucasart wrote:
lucasart wrote:
cetormenter wrote:For Nirvana I get a lambda of ~0.00279. Which gives a predicted score of ~5%. That's still a bit higher than I would have expected. I don't think that exact tuning of this value is really needed. Is probably something that only needs to be "good enough" to kick things off.
Thanks. I think there can be 2 reasons why my lambda is so low:
1. Training positions are of poor quality
2. My qsearch is not very predictive of game results (ie. my eval sucks).

I have a 2300 elo patzer, and generated training positions with self play, at depth=6 games, so it's worth testing hypothesis 1. I'll try again with better positions (generated with a stronger engine).
I regenerated some positions out of better quality games, and already my lambda has improved. Now lambda = 0.0013, which gives an expected score of 19.7% for being a rook down. Still high, but better, and the error (average squared difference to logistic curve) has also noticeably reduced.

However, I still cannot get anything sensible out of this tuning method. I tried a few things, every time reducing the error, but they were all losing elo

Try this method of measuring performance of the tuned engine.

1. Create a match, engine_untrained vs engine_trained using some positions from your training sets as game starting positions. We expect that engine_trained would win such match because this engine is trained with those specific positions. If engine_trained wins, you should already be happy with it. I remember training an engine on sicilian defense positions, and when I test it on sicilian openings, it won.

2. Now try to create a game starting positions from your training sets plus positions that are not in your training sets. Make it 50-50 and create a game match based from those starting positions. Call it 50/50 test, 50% training sets / 50% untrained sets.

3. Try the 0/100 test, the ultimate test which is very difficult to pass.

AlvaroBegue · Post by **AlvaroBegue** » Mon Oct 31, 2016 3:30 am

lucasart wrote:I regenerated some positions out of better quality games, and already my lambda has improved. Now lambda = 0.0013, which gives an expected score of 19.7% for being a rook down. Still high, but better, and the error (average squared difference to logistic curve) has also noticeably reduced.

However, I still cannot get anything sensible out of this tuning method. I tried a few things, every time reducing the error, but they were all losing elo

Reducing the error in the training set could mean that you are overfitting your data. The standard approach in the machine learning community is to divide your data into three sets: training, validation and test. You could use something like 60% for the training set, 20% for the validation set and 20% for the test set.

The training set is where you actively minimize the error function. The validation set is used to measure how well variants of your training work. Once you are satisfied with the performance of your algorithm, you can measure its true out-of-sample performance using the test set. This should only be done once, and then you can be sure the measure is correct.

https://en.wikipedia.org/wiki/Test_set

lucasart · Post by **lucasart** » Mon Oct 31, 2016 3:58 am

cetormenter wrote:Hopefully you aren't taking the values as they are. For example for large arrays (such a mobility) I created a function to fit the values. This proved to give MUCH better results than simply tuning each parameter themselves (and it was much quicker as well!). Even with millions of games it is very easy to have overtuned valued simply because the cases in which they are used are simply too rare (how many times per game is a does a rook have a mobility value of exactly 10, 11, etc).

My function was pretty simple. min + (max - min) * pow(i, slope) / (pow (arrSize - 1, slope);

I tried 3 separate patches:
* increase opening pawn value from 0.8 to 1 (relative to endgame pawn value = 1)
* increase bishop pair value in the opening from 0.4 to 0.5
* decrease rook and queen material value by 0.1

So I don't think it's a case of "too rare to tune". But, of course, you make a good point here.

Each patch improved the logistic fit (whether in isolation or accumulated). Each was tested individually, and failed to show an elo gain (SPRT, elo0=0, elo1=4, alpha=beta=0.05). And they failed pretty hard too (ie. we're not talking about a tiny but hard to measure gain).

I'll regenerate the training positions, with random openings this time (eg. 3 random moves after the starting positions, no filtering, accept clearly winning or losing positions).

lucasart · Post by **lucasart** » Mon Oct 31, 2016 5:53 am

AlvaroBegue wrote:
lucasart wrote:I regenerated some positions out of better quality games, and already my lambda has improved. Now lambda = 0.0013, which gives an expected score of 19.7% for being a rook down. Still high, but better, and the error (average squared difference to logistic curve) has also noticeably reduced.

However, I still cannot get anything sensible out of this tuning method. I tried a few things, every time reducing the error, but they were all losing elo
Reducing the error in the training set could mean that you are overfitting your data. The standard approach in the machine learning community is to divide your data into three sets: training, validation and test. You could use something like 60% for the training set, 20% for the validation set and 20% for the test set.

The training set is where you actively minimize the error function. The validation set is used to measure how well variants of your training work. Once you are satisfied with the performance of your algorithm, you can measure its true out-of-sample performance using the test set. This should only be done once, and then you can be sure the measure is correct.

https://en.wikipedia.org/wiki/Test_set

Good point about out of sample validation. I wonder if this could be done more properly, by computing error bars on the average squared error. Can this be done analytically, or does one need to resort to monte carlo methods ?

Python script for TTM

Re: Python script for TTM

Re: Python script for TTM

Re: Python script for TTM

Re: Python script for TTM

Re: Python script for TTM

Re: Python script for TTM

Re: Python script for TTM

Re: Python script for TTM

Re: Python script for TTM

Re: Python script for TTM