Tapered Evaluation and MSE (Texel Tuning)

Desperado · Post by **Desperado** » Thu Jan 14, 2021 12:22 pm

Ferdy wrote: ↑Thu Jan 14, 2021 12:11 pm
Desperado wrote: ↑Wed Jan 13, 2021 1:04 pm Hello everybody,

to understand what is going on, i thought i can use a databse that i did not generate myself.
So i used ccrl-40-15-elo-3200.epd from https://rebel13.nl/misc/epd.html.

Setup:

1. material only evaluator
2. cpw algorithm
3. scalingfactor K=1.0
4. pawn values are anchor for mg and eg [100,100]
5. starting values [300,300][300,300],[500,500][1000,1000]
6. loss function uses squared error
7. 50K sample size
8. phase value computation
Code: Select all
    int phase = 1 * Bit::popcnt(Pos::minors(pos));
    phase += 2 * Bit::popcnt(Pos::rooks(pos));
    phase += 4 * Bit::popcnt(Pos::queens(pos));
    phase = min(phase, 24);

    int s = (score.mg * phase + score.eg * (24 - phase)) / 24;
    return pos->stm == WHITE ? s : -s;
Just checking, given this training position,
Code: Select all
1r6/1p1r4/2p1p3/2PBP1Bk/R2P2bP/5p2/1R1K4/8 b - -,1-0
Your material eval would return a score of -300 cp, point of view is side or spov.

Notice the side to move and result.

Check your score and formula if your sigmoid is the same as mine.

if side is black set score to -score, then apply sigmoid.
Code: Select all
K=1.0
sigmoid = 1.0/(1.0 + 10.0**(-K*score/400.0))
Code: Select all
sigmoid: 0.8490204427886767
result = 1.0
error = 1.0 - sigmoid
squared_error = error*error

Thanks for the hint, i checked the pov situation more than, i cannot count it... lol
The formular is the same. As i pointed out in my last 2 posts for you, the choice of the positions
might have a big influence. (the draws). The randomization of the N-K batch is intersting but does
not have a big influence on the situation. I already know, that the draws change things a lot but i need to measure now a full run.

My internal interface

Code: Select all

void ISearch::staticEval(pos_t* pos)
{
    resultDepth = 0;
    resultScore = Eval::full(pos);
    resultScore = pos->stm == WHITE ? resultScore : -resultScore;
    resultMove = NO_MOVE;
    resultNodes = 0;
    Line::clear(&resultPv);
}

Ferdy · Post by **Ferdy** » Thu Jan 14, 2021 12:25 pm

Desperado wrote: ↑Thu Jan 14, 2021 12:04 pm
Ferdy wrote: ↑Thu Jan 14, 2021 11:46 am
Desperado wrote: ↑Thu Jan 14, 2021 10:51 am Hi Ferdy, from your readme

Postions are saved with the following conditions:
* If the move in the game is not a capture, and not a checking move and not a promote move and the side to move is not in check and the game has either 1-0 or 0-1 result.
Does that mean you ignore draws and positions which include a move in the epd/pgn of metioned type?
Yes.
So, you check the move in the epd for its attributes like capture, promotion or check, too. At least that is what i would expect now.

Yes, I generated those positions from the game in pgn file. We can see the game move in those games.

Desperado · Post by **Desperado** » Thu Jan 14, 2021 12:33 pm

Ferdy wrote: ↑Thu Jan 14, 2021 12:25 pm
Desperado wrote: ↑Thu Jan 14, 2021 12:04 pm
Ferdy wrote: ↑Thu Jan 14, 2021 11:46 am
Desperado wrote: ↑Thu Jan 14, 2021 10:51 am Hi Ferdy, from your readme

Postions are saved with the following conditions:
* If the move in the game is not a capture, and not a checking move and not a promote move and the side to move is not in check and the game has either 1-0 or 0-1 result.
Does that mean you ignore draws and positions which include a move in the epd/pgn of metioned type?
Yes.
So, you check the move in the epd for its attributes like capture, promotion or check, too. At least that is what i would expect now.
Yes, I generated those positions from the game in pgn file. We can see the game move in those games.

They are also included in corresponding epd already but it wouldn't be problem anyway.

Ferdy · Post by **Ferdy** » Thu Jan 14, 2021 12:41 pm

Desperado wrote: ↑Thu Jan 14, 2021 12:00 pm Hello Ferdy,

i have some questions, because looking into https://github.com/fsmosca/Piece-Value- ... aster/data
i am going to realize more and more differences compared to what i am doing.

1. choice of position related to the game result (i saw right now, that you answered it already...thx)

a. do you use draws or do you ignore them for the ccrl database ?

I generated twice, first everything, second only select those positions with material imbalance.

b. are there any other restrictions included ?

None.

2. data presentation R4n2/4k3/P3n2p/5R2/8/7P/2r3PK/8 b - -,2,-2,0,1,0,1

a. i dont see how you interpolate the phase score out of the delta. I mean you can generate

mg = 2 * 100 + -2 * 300 + 1 * 500
eg = 2 * 110 + -2 * 290 + 1 * 550
Is the error computation seperated for mg/eg ? Or do you include logic to interpolate before computing the error ?
Like eval() in the engine is doing. At the beginning i pointed out, that "my" problem does not occur if i compute the
phases seperately (doesn't matter if i do that in parallel or in sequence).

b. The imbalance is represented differently

"2,1,0,-1,0" is very imbalanced because it is not "0,0,0,0,0" while my eval() would consider the resulting score "0" as balanced.
So,the resulting test set would look very different, i guess.

In that line, the useful fields for texel tuning are only the epd (first field) and the result (last field) 1 as 1-0. Converted below.

Code: Select all

R4n2/4k3/P3n2p/5R2/8/7P/2r3PK/8 b - -,1-0

And a question related to your ccrl computation. You said that you computed the pawn value too, both mg and eg ?

That is true when I use the ccrl3200 material imbalance. This plot.

Ronald · Post by **Ronald** » Thu Jan 14, 2021 1:01 pm

Conclusions:

At first I was looking for a bug, but there is no bug.
I have tried to understand what is going on and have come to the following conclusions.

1. the data contains a natural imbalance between mg and eg scores.
By natural I mean that the positions were randomly selected from pgn games.

2. phase model: (mg * phase + eg *(maxphase - phase)) / maxphase.

The average values can be mg(13),eg(11),max(24) in the data set.
This forces the tuner to bring the mg score to its minimum and maximize the eg score.
The average scores (mg+eg)/2 are fine. Especially when mg-error > eg-error.

This is a correct behavior of the tuner.

I guess there must be something wrong with your code.

The reasoning behind this is that the MG and EG parts are not opposites of each other, they are the same property. If in a position the error is lowered by increasing the value of the property, increasing the value of either the MG or the EG property lowers the error. So a difference in average phase of the data set can not be the reason for this outcome.

If the result could be explained by the "imbalance" of the data set, this would mean that nearly all "MG positions" should encourage lowering the value of the parameter, which would mean that that positive balanced positions end in a draw/loss and negatively balanced positions in a draw/win, and EG positions the other way around. Because the MG values are so low this would have to occur for nearly every imbalanced position. That would not be a really good data set... You can easily check if this is the case for your dataset.

This leaves the code. Your tuning code doesn't seem to contain errors. Repeating to change the same parameter before going to the next might enhance the behavior, but is probably not the cause. I don't know what you're doing in the eval/qs part however. Might be something with applying the corrected values in the eval/caching of values or phase calculation/caching etc. You could try to reverse the parameter sequence in the tuning code by starting with the last parameter etc. to test if repeating the change for the same parameter may be the cause. You could also try to only tune the MG parameters (with lower EG values) and test if the tuner still lowers the MG values.

Ferdy · Post by **Ferdy** » Thu Jan 14, 2021 4:54 pm

Desperado wrote: ↑Thu Jan 14, 2021 12:22 pm
Ferdy wrote: ↑Thu Jan 14, 2021 12:11 pm
Desperado wrote: ↑Wed Jan 13, 2021 1:04 pm Hello everybody,

to understand what is going on, i thought i can use a databse that i did not generate myself.
So i used ccrl-40-15-elo-3200.epd from https://rebel13.nl/misc/epd.html.

Setup:

1. material only evaluator
2. cpw algorithm
3. scalingfactor K=1.0
4. pawn values are anchor for mg and eg [100,100]
5. starting values [300,300][300,300],[500,500][1000,1000]
6. loss function uses squared error
7. 50K sample size
8. phase value computation
Code: Select all
    int phase = 1 * Bit::popcnt(Pos::minors(pos));
    phase += 2 * Bit::popcnt(Pos::rooks(pos));
    phase += 4 * Bit::popcnt(Pos::queens(pos));
    phase = min(phase, 24);

    int s = (score.mg * phase + score.eg * (24 - phase)) / 24;
    return pos->stm == WHITE ? s : -s;
Just checking, given this training position,
Code: Select all
1r6/1p1r4/2p1p3/2PBP1Bk/R2P2bP/5p2/1R1K4/8 b - -,1-0
Your material eval would return a score of -300 cp, point of view is side or spov.

Notice the side to move and result.

Check your score and formula if your sigmoid is the same as mine.

if side is black set score to -score, then apply sigmoid.
Code: Select all
K=1.0
sigmoid = 1.0/(1.0 + 10.0**(-K*score/400.0))
Code: Select all
sigmoid: 0.8490204427886767
result = 1.0
error = 1.0 - sigmoid
squared_error = error*error
Thanks for the hint, i checked the pov situation more than, i cannot count it... lol
The formular is the same. As i pointed out in my last 2 posts for you, the choice of the positions
might have a big influence. (the draws). The randomization of the N-K batch is intersting but does
not have a big influence on the situation. I already know, that the draws change things a lot but i need to measure now a full run.

My internal interface
Code: Select all
void ISearch::staticEval(pos_t* pos)
{
    resultDepth = 0;
    resultScore = Eval::full(pos);
    resultScore = pos->stm == WHITE ? resultScore : -resultScore;
    resultMove = NO_MOVE;
    resultNodes = 0;
    Line::clear(&resultPv);
}

That is your eval.

Given the training position,
1r6/1p1r4/2p1p3/2PBP1Bk/R2P2bP/5p2/1R1K4/8 b - -,1-0
that is black to move, you have to negate again the score.

If that static eval returns -300cp, you have to use,

score = -1 * (-300) or 300

and use that score in sigmoid formula.

Desperado · Post by **Desperado** » Thu Jan 14, 2021 5:34 pm

Ferdy wrote: ↑Thu Jan 14, 2021 4:54 pm
Desperado wrote: ↑Thu Jan 14, 2021 12:22 pm
Ferdy wrote: ↑Thu Jan 14, 2021 12:11 pm
Desperado wrote: ↑Wed Jan 13, 2021 1:04 pm Hello everybody,

to understand what is going on, i thought i can use a databse that i did not generate myself.
So i used ccrl-40-15-elo-3200.epd from https://rebel13.nl/misc/epd.html.

Setup:

1. material only evaluator
2. cpw algorithm
3. scalingfactor K=1.0
4. pawn values are anchor for mg and eg [100,100]
5. starting values [300,300][300,300],[500,500][1000,1000]
6. loss function uses squared error
7. 50K sample size
8. phase value computation
Code: Select all
    int phase = 1 * Bit::popcnt(Pos::minors(pos));
    phase += 2 * Bit::popcnt(Pos::rooks(pos));
    phase += 4 * Bit::popcnt(Pos::queens(pos));
    phase = min(phase, 24);

    int s = (score.mg * phase + score.eg * (24 - phase)) / 24;
    return pos->stm == WHITE ? s : -s;
Just checking, given this training position,
Code: Select all
1r6/1p1r4/2p1p3/2PBP1Bk/R2P2bP/5p2/1R1K4/8 b - -,1-0
Your material eval would return a score of -300 cp, point of view is side or spov.

Notice the side to move and result.

Check your score and formula if your sigmoid is the same as mine.

if side is black set score to -score, then apply sigmoid.
Code: Select all
K=1.0
sigmoid = 1.0/(1.0 + 10.0**(-K*score/400.0))
Code: Select all
sigmoid: 0.8490204427886767
result = 1.0
error = 1.0 - sigmoid
squared_error = error*error
Thanks for the hint, i checked the pov situation more than, i cannot count it... lol
The formular is the same. As i pointed out in my last 2 posts for you, the choice of the positions
might have a big influence. (the draws). The randomization of the N-K batch is intersting but does
not have a big influence on the situation. I already know, that the draws change things a lot but i need to measure now a full run.

My internal interface
Code: Select all
void ISearch::staticEval(pos_t* pos)
{
    resultDepth = 0;
    resultScore = Eval::full(pos);
    resultScore = pos->stm == WHITE ? resultScore : -resultScore;
    resultMove = NO_MOVE;
    resultNodes = 0;
    Line::clear(&resultPv);
}
That is your eval.

Given the training position,
1r6/1p1r4/2p1p3/2PBP1Bk/R2P2bP/5p2/1R1K4/8 b - -,1-0
that is black to move, you have to negate again the score.

If that static eval returns -300cp, you have to use,

score = -1 * (-300) or 300

and use that score in sigmoid formula.

That exactly happens.

Code: Select all

    pos_t pos;
    Pos::setup(&pos, (char*) "fen 1r6/1p1r4/2p1p3/2PBP1Bk/R2P2bP/5p2/1R1K4/8 b - -");
    ISearch::staticEval(&pos);
    printf("\nScore POV side to move: %d ", ISearch::resultScore);
    Dbg::printPosition(&pos);

Code: Select all

Score POV white to move: 300

CTM : BLACK
woo : 0
wooo: 0
boo : 0
booo: 0
R50 : 0
EPT : --
-- br -- -- -- -- -- --
-- bp -- br -- -- -- --
-- -- bp -- bp -- -- --
-- -- wp wb wp -- wb bk
wr -- -- wp -- -- bb wp
-- -- -- -- -- bp -- --
-- wr -- wk -- -- -- --
-- -- -- -- -- -- -- --

Eval() always return with POV from side to move. So, only if it is black to move the the sign needs to be updated.

Code: Select all

int Eval::full(){
...
return pos->stm == WHITE ? s : -s;
}

ISearch::staticEval() {
...
 resultScore = Eval::full();
 resultScore= pos->stm == WHITE ? resultScore : -resultScore;
 ...
}

The output of my simoid(300,400) is 0.849020 for 300 cp

Code: Select all

    printf("\n%f ", Math::sigmoid(300, 400));
    printf("\n%f ", computeError(1.0, 300));
    printf("\n%f ", computeError(0.5, 300));
    printf("\n%f ", computeError(0.0, 300));
    getchar();

0.849020
0.022795
0.121815
0.720836

Code: Select all

double Math::sigmoid(double x, double y)
{
    return((double) 1 / (1 + pow(10, -x / y)));
}

double Tuner::computeError(double result, int score)
{
    double err = result - Math::sigmoid(scalingFactorK * score, 400);
    return err * err;
}

Desperado · Post by **Desperado** » Thu Jan 14, 2021 6:18 pm

Ronald wrote: ↑Thu Jan 14, 2021 1:01 pm
Conclusions:

At first I was looking for a bug, but there is no bug.
I have tried to understand what is going on and have come to the following conclusions.

1. the data contains a natural imbalance between mg and eg scores.
By natural I mean that the positions were randomly selected from pgn games.

2. phase model: (mg * phase + eg *(maxphase - phase)) / maxphase.

The average values can be mg(13),eg(11),max(24) in the data set.
This forces the tuner to bring the mg score to its minimum and maximize the eg score.
The average scores (mg+eg)/2 are fine. Especially when mg-error > eg-error.

This is a correct behavior of the tuner.
I guess there must be something wrong with your code.

The reasoning behind this is that the MG and EG parts are not opposites of each other, they are the same property. If in a position the error is lowered by increasing the value of the property, increasing the value of either the MG or the EG property lowers the error. So a difference in average phase of the data set can not be the reason for this outcome.

If the result could be explained by the "imbalance" of the data set, this would mean that nearly all "MG positions" should encourage lowering the value of the parameter, which would mean that that positive balanced positions end in a draw/loss and negatively balanced positions in a draw/win, and EG positions the other way around. Because the MG values are so low this would have to occur for nearly every imbalanced position. That would not be a really good data set... You can easily check if this is the case for your dataset.

This leaves the code. Your tuning code doesn't seem to contain errors. Repeating to change the same parameter before going to the next might enhance the behavior, but is probably not the cause. I don't know what you're doing in the eval/qs part however. Might be something with applying the corrected values in the eval/caching of values or phase calculation/caching etc. You could try to reverse the parameter sequence in the tuning code by starting with the last parameter etc. to test if repeating the change for the same parameter may be the cause. You could also try to only tune the MG parameters (with lower EG values) and test if the tuner still lowers the MG values.

Hello Ronald,

unfortunately i have not more time today and i agree that it looks like something is going completely wrong in my code.
It must be a devilish little thing, somewhere in the helper functions, because the big things have been simplified and i use public know algorithms
and formulars now.

Based on your post, I got some more rough numbers today. Tomorrow I have to continue.

I made a new file out of the ccrl3200 epd with the following restrictions.
Every position that included a capture, a promotion or a checking move was thrown out. I also removed draws.

Then i counted the number of positions that showed an advantage for the vector (100,300,300,500,1000) but
reported at the same time a loss.

Code: Select all

        r = Epd::getResultFromString(i);
        if(r == 0.0 && ISearch::resultScore >= 400) bad++;
        else if(r == 1.0 && ISearch::resultScore <= -400) bad++;
        else ok++;

The resulting numbers for 100,200,300,400 are

total / % / ok / bad
492067 0.213690 386917 105150 // 100
492067 0.074913 455205 36862 // 200
492067 0.024489 480017 12050 // 300
492067 0.008472 487898 4169 // 400

I need to leave for today.

I really would be happy to report a bug meanwhile!

Thanks everybody for the support and interest.

hgm · Post by **hgm** » Thu Jan 14, 2021 6:46 pm

If you doubt your code / algorithm, just test it on a simple case with a known outcome. E.g. let it run on a set of 10 positions: K + P,N,B,R or Q vs Q, which you evaluate as 100, 300, 350, 500 and 975, and a FIDE setup from which you deleted a black P,N,B,R or Q, eveluated as 80, 325, 325, 475 and 900, respectively. And see what it makes of it. Should only take minutes. If it cannot get good values from that, the code sucks. If it does get good values for this case, your data set sucks.

Desperado · Post by **Desperado** » Thu Jan 14, 2021 7:00 pm

hgm wrote: ↑Thu Jan 14, 2021 6:46 pm If you doubt your code / algorithm, just test it on a simple case with a known outcome. E.g. let it run on a set of 10 positions: K + P,N,B,R or Q vs Q, which you evaluate as 100, 300, 350, 500 and 975, and a FIDE setup from which you deleted a black P,N,B,R or Q, eveluated as 80, 325, 325, 475 and 900, respectively. And see what it makes of it. Should only take minutes. If it cannot get good values from that, the code sucks. If it does get good values for this case, your data set sucks.

Hello HG,

of course I did that at the very beginning and because I don't trust myself then every hour again lol...

I will gain some distance and then rested analyze the problem further.

Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)

Re: Tapered Evaluation and MSE (Texel Tuning)