I just did the logistic regression, and found the best fit for:
score = 1 / (1 + exp(-lambda * qsearch))
where:
* score = 0, 0.5, 1 is the result of the game
* qsearch scale is 200 = 1 Pawn
* lambda = 0.00105
For example, when the qsearch says we're a rook down (qsearch = -1080 = -5.4 pawns), we still have a predicted score of 24.3%.
This is a bit surprising, and I expected a somewhat higher value for lambda.
What kind of values do people get ?
I'll have a look at depth=1 and depth=2, to see how much more predictive it gets. qsearch looks quite noisy.
Python script for TTM
Moderators: hgm, Rebel, chrisw
-
- Posts: 3232
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
Re: Python script for TTM
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
-
- Posts: 170
- Joined: Sun Oct 28, 2012 9:46 pm
Re: Python script for TTM
For Nirvana I get a lambda of ~0.00279. Which gives a predicted score of ~5%. That's still a bit higher than I would have expected. I don't think that exact tuning of this value is really needed. Is probably something that only needs to be "good enough" to kick things off.
-
- Posts: 3232
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
Re: Python script for TTM
Thanks. I think there can be 2 reasons why my lambda is so low:cetormenter wrote:For Nirvana I get a lambda of ~0.00279. Which gives a predicted score of ~5%. That's still a bit higher than I would have expected. I don't think that exact tuning of this value is really needed. Is probably something that only needs to be "good enough" to kick things off.
1. Training positions are of poor quality
2. My qsearch is not very predictive of game results (ie. my eval sucks).
I have a 2300 elo patzer, and generated training positions with self play, at depth=6 games, so it's worth testing hypothesis 1. I'll try again with better positions (generated with a stronger engine).
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
-
- Posts: 4833
- Joined: Sun Aug 10, 2008 3:15 pm
- Location: Philippines
Re: Python script for TTM
Speed matters if you are in a hurry . Also what I have and other perhaps too, can be done incrementally, run for a couple of hours, stop, test in actual game the current best parameters produced. Then continue the tuning (using best parameters and errors found last time) again, get some sampling and test it and so on, it can be run in days, weeks, months until the error becomes zero or the errors could no longer be improved. But from experience in doing this stuff including other variant games, it is the selection of training positions that is very important.lucasart wrote:I ended up writing it in C++ inside the engine, including concurrency. On my 4 core i7 (using 8 threads), it runs one iteration of 7.83m positions in 5.35s (1.46m qsearch/s). I think speed is key here, because the optimization process must run lots of iterations.
From an old log on i7-2600K a 4-core proc.Ferdinand: what kind of speed to you get with a script (in qsearch/s) ?
Using 3 threads, every thread is handling 680k pos, taking 115s.
So 3 x 680000 ~ 2M in 115s ~ 17K/s
This 115s is only for tuning 1 value in a given parameter. Every parameter takes 4 steps i.e +1, +2, -1, -2, x [increment ] then next parameter.
If step 1 is successful giving lesser error, then this would save time, as it would advance to the next parameter to be tuned.
Also after profiling, it is the engine's speed to initialize and setup the positions that is critical. I remember mine is slower than that of Stockfish.
Code: Select all
for fen in fen_list:
send(ucinewgame)
send(position fen) # bottleneck
send(go depth 0)
Code: Select all
['MidGameMobilityPercent', 100, 6, 0, 1000]
<param>, <default>, <increment>, <min>, <max>
param to tune:
['MidGameMobilityPercent', 100, 6, 0, 1000]
['EndGameMobilityPercent', 100, 6, 0, 1000]
['ConnectedPasserPercent', 16, 6, 0, 100]
['BlockedPasserPenaltyPercent', 70, 4, 0, 100]
['TwoBishopAdv', 50, 1, 20, 80]
['Rook7thRankMg', 15, 1, 0, 200]
['Rook7thRankEg', 50, 1, 0, 200]
['Rook8thRankMg', 15, 1, 0, 200]
['Rook8thRankEg', 50, 1, 0, 200]
['PawnValueMg', 89, 1, 70, 110]
['PawnValueEg', 118, 1, 90, 150]
['RookThreat', 100, 4, 0, 1000]
['QueenThreat', 100, 4, 0, 1000]
['ThreatOnWeak', 100, 8, 0, 1000]
['ThreatOnStrong', 50, 8, 0, 1000]
StartE: 1000.0000000000
Get initial AveError ...
Thread 1, time: 115s, aveError: 89.5723168436
Thread 3, time: 115s, aveError: 89.4593985319
Thread 2, time: 115s, aveError: 89.7424590188
startAveError: 89.5913914648
cycle: 1
param history sorted score:
['MidGameMobilityPercent', 0]
['EndGameMobilityPercent', 0]
['ConnectedPasserPercent', 0]
['BlockedPasserPenaltyPercent', 0]
['TwoBishopAdv', 0]
['Rook7thRankMg', 0]
['Rook7thRankEg', 0]
['Rook8thRankMg', 0]
['Rook8thRankEg', 0]
['PawnValueMg', 0]
['PawnValueEg', 0]
['RookThreat', 0]
['QueenThreat', 0]
['ThreatOnWeak', 0]
['ThreatOnStrong', 0]
new param order:
['MidGameMobilityPercent', 100, 6, 0, 1000]
['EndGameMobilityPercent', 100, 6, 0, 1000]
['ConnectedPasserPercent', 16, 6, 0, 100]
['BlockedPasserPenaltyPercent', 70, 4, 0, 100]
['TwoBishopAdv', 50, 1, 20, 80]
['Rook7thRankMg', 15, 1, 0, 200]
['Rook7thRankEg', 50, 1, 0, 200]
['Rook8thRankMg', 15, 1, 0, 200]
['Rook8thRankEg', 50, 1, 0, 200]
['PawnValueMg', 89, 1, 70, 110]
['PawnValueEg', 118, 1, 90, 150]
['RookThreat', 100, 4, 0, 1000]
['QueenThreat', 100, 4, 0, 1000]
['ThreatOnWeak', 100, 8, 0, 1000]
['ThreatOnStrong', 50, 8, 0, 1000]
cycle 1, param to optimize: MidGameMobilityPercent
Try MidGameMobilityPercent = 106
History: cycle = 1, good = 0, step = +1, bestAveError = 89.5913914648
Thread 1, time: 116s, aveError: 89.5662467506
Thread 2, time: 116s, aveError: 89.7380670670
Thread 3, time: 116s, aveError: 89.4556463857
newAveError: 89.5866534011
newAveError is better!!
Param values update:
['MidGameMobilityPercent', 106, 6, 0, 1000]
['EndGameMobilityPercent', 100, 6, 0, 1000]
['ConnectedPasserPercent', 16, 6, 0, 100]
['BlockedPasserPenaltyPercent', 70, 4, 0, 100]
['TwoBishopAdv', 50, 1, 20, 80]
['Rook7thRankMg', 15, 1, 0, 200]
['Rook7thRankEg', 50, 1, 0, 200]
['Rook8thRankMg', 15, 1, 0, 200]
['Rook8thRankEg', 50, 1, 0, 200]
['PawnValueMg', 89, 1, 70, 110]
['PawnValueEg', 118, 1, 90, 150]
['RookThreat', 100, 4, 0, 1000]
['QueenThreat', 100, 4, 0, 1000]
['ThreatOnWeak', 100, 8, 0, 1000]
['ThreatOnStrong', 50, 8, 0, 1000]
cycle 1, param to optimize: EndGameMobilityPercent
Try EndGameMobilityPercent = 106
...
-
- Posts: 3232
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
Re: Python script for TTM
I regenerated some positions out of better quality games, and already my lambda has improved. Now lambda = 0.0013, which gives an expected score of 19.7% for being a rook down. Still high, but better, and the error (average squared difference to logistic curve) has also noticeably reduced.lucasart wrote:Thanks. I think there can be 2 reasons why my lambda is so low:cetormenter wrote:For Nirvana I get a lambda of ~0.00279. Which gives a predicted score of ~5%. That's still a bit higher than I would have expected. I don't think that exact tuning of this value is really needed. Is probably something that only needs to be "good enough" to kick things off.
1. Training positions are of poor quality
2. My qsearch is not very predictive of game results (ie. my eval sucks).
I have a 2300 elo patzer, and generated training positions with self play, at depth=6 games, so it's worth testing hypothesis 1. I'll try again with better positions (generated with a stronger engine).
However, I still cannot get anything sensible out of this tuning method. I tried a few things, every time reducing the error, but they were all losing elo
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
-
- Posts: 170
- Joined: Sun Oct 28, 2012 9:46 pm
Re: Python script for TTM
Hopefully you aren't taking the values as they are. For example for large arrays (such a mobility) I created a function to fit the values. This proved to give MUCH better results than simply tuning each parameter themselves (and it was much quicker as well!). Even with millions of games it is very easy to have overtuned valued simply because the cases in which they are used are simply too rare (how many times per game is a does a rook have a mobility value of exactly 10, 11, etc).
My function was pretty simple. min + (max - min) * pow(i, slope) / (pow (arrSize - 1, slope);
My function was pretty simple. min + (max - min) * pow(i, slope) / (pow (arrSize - 1, slope);
-
- Posts: 4833
- Joined: Sun Aug 10, 2008 3:15 pm
- Location: Philippines
Re: Python script for TTM
Try this method of measuring performance of the tuned engine.lucasart wrote:I regenerated some positions out of better quality games, and already my lambda has improved. Now lambda = 0.0013, which gives an expected score of 19.7% for being a rook down. Still high, but better, and the error (average squared difference to logistic curve) has also noticeably reduced.lucasart wrote:Thanks. I think there can be 2 reasons why my lambda is so low:cetormenter wrote:For Nirvana I get a lambda of ~0.00279. Which gives a predicted score of ~5%. That's still a bit higher than I would have expected. I don't think that exact tuning of this value is really needed. Is probably something that only needs to be "good enough" to kick things off.
1. Training positions are of poor quality
2. My qsearch is not very predictive of game results (ie. my eval sucks).
I have a 2300 elo patzer, and generated training positions with self play, at depth=6 games, so it's worth testing hypothesis 1. I'll try again with better positions (generated with a stronger engine).
However, I still cannot get anything sensible out of this tuning method. I tried a few things, every time reducing the error, but they were all losing elo
1. Create a match, engine_untrained vs engine_trained using some positions from your training sets as game starting positions. We expect that engine_trained would win such match because this engine is trained with those specific positions. If engine_trained wins, you should already be happy with it. I remember training an engine on sicilian defense positions, and when I test it on sicilian openings, it won.
2. Now try to create a game starting positions from your training sets plus positions that are not in your training sets. Make it 50-50 and create a game match based from those starting positions. Call it 50/50 test, 50% training sets / 50% untrained sets.
3. Try the 0/100 test, the ultimate test which is very difficult to pass.
-
- Posts: 931
- Joined: Tue Mar 09, 2010 3:46 pm
- Location: New York
- Full name: Álvaro Begué (RuyDos)
Re: Python script for TTM
Reducing the error in the training set could mean that you are overfitting your data. The standard approach in the machine learning community is to divide your data into three sets: training, validation and test. You could use something like 60% for the training set, 20% for the validation set and 20% for the test set.lucasart wrote:I regenerated some positions out of better quality games, and already my lambda has improved. Now lambda = 0.0013, which gives an expected score of 19.7% for being a rook down. Still high, but better, and the error (average squared difference to logistic curve) has also noticeably reduced.
However, I still cannot get anything sensible out of this tuning method. I tried a few things, every time reducing the error, but they were all losing elo
The training set is where you actively minimize the error function. The validation set is used to measure how well variants of your training work. Once you are satisfied with the performance of your algorithm, you can measure its true out-of-sample performance using the test set. This should only be done once, and then you can be sure the measure is correct.
https://en.wikipedia.org/wiki/Test_set
-
- Posts: 3232
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
Re: Python script for TTM
I tried 3 separate patches:cetormenter wrote:Hopefully you aren't taking the values as they are. For example for large arrays (such a mobility) I created a function to fit the values. This proved to give MUCH better results than simply tuning each parameter themselves (and it was much quicker as well!). Even with millions of games it is very easy to have overtuned valued simply because the cases in which they are used are simply too rare (how many times per game is a does a rook have a mobility value of exactly 10, 11, etc).
My function was pretty simple. min + (max - min) * pow(i, slope) / (pow (arrSize - 1, slope);
* increase opening pawn value from 0.8 to 1 (relative to endgame pawn value = 1)
* increase bishop pair value in the opening from 0.4 to 0.5
* decrease rook and queen material value by 0.1
So I don't think it's a case of "too rare to tune". But, of course, you make a good point here.
Each patch improved the logistic fit (whether in isolation or accumulated). Each was tested individually, and failed to show an elo gain (SPRT, elo0=0, elo1=4, alpha=beta=0.05). And they failed pretty hard too (ie. we're not talking about a tiny but hard to measure gain).
I'll regenerate the training positions, with random openings this time (eg. 3 random moves after the starting positions, no filtering, accept clearly winning or losing positions).
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
-
- Posts: 3232
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
Re: Python script for TTM
Good point about out of sample validation. I wonder if this could be done more properly, by computing error bars on the average squared error. Can this be done analytically, or does one need to resort to monte carlo methods ?AlvaroBegue wrote:Reducing the error in the training set could mean that you are overfitting your data. The standard approach in the machine learning community is to divide your data into three sets: training, validation and test. You could use something like 60% for the training set, 20% for the validation set and 20% for the test set.lucasart wrote:I regenerated some positions out of better quality games, and already my lambda has improved. Now lambda = 0.0013, which gives an expected score of 19.7% for being a rook down. Still high, but better, and the error (average squared difference to logistic curve) has also noticeably reduced.
However, I still cannot get anything sensible out of this tuning method. I tried a few things, every time reducing the error, but they were all losing elo
The training set is where you actively minimize the error function. The validation set is used to measure how well variants of your training work. Once you are satisfied with the performance of your algorithm, you can measure its true out-of-sample performance using the test set. This should only be done once, and then you can be sure the measure is correct.
https://en.wikipedia.org/wiki/Test_set
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.