Question for Bob Hyatt

bob · Post by **bob** » Thu Jan 14, 2010 9:03 pm

diep wrote:
lkaufman wrote:First, as this is my first post here, let me introduce myself to those who don't know me. I am a chess Grandmaster, World Senior Champion in 2008, and although not a programmer myself I have been a consultant on chess programs (testing, writing opening books, assigning parameter values, proposing evaluation terms, etc.) from 1967 (MacHack) to the present (Rybka 3 and now Doch with Don Dailey).
Now for my question. I understand that you have tuned the evaluation function of Crafty very finely by playing perhaps millions of games at fast time limits. What I would like to know is to what extent do the optimum values depend on the time limit (or depth)? If there is measurable dependence, which values (individual piece values, mobility, king safety, pawn structure, etc.) go up with increased depth and which ones go down, and roughly by how much per ply? Do the values approach an asymptote at some modest level or do they appear to continue to change up to the maximum depth at which you test?
Thanks in advance for your answer.
When increasing chessprograms from their 60s / 70s incarnation to todays knowledge, it seems the piece values have gone up.

One of the first to discover this was Chrilly Donninger. He put therefore the piece values back in 1998 already at:

pawn = 100
knight = 420
bishop = 420
rook = 620
queen = 1250 - 1300

Note in those days not a single engine had the Max Euwe values anymore, let alone the small modification Fischer did do later in those values. Those already had been refuted long before that.

A queen in general is stronger than 2 rooks, but definitely not weaker.
Only in a few exceptions 2 rooks are stronger,
by accident chess literature only describes those exceptions.

Similar refutation is there of queen+knight being stronger than queen+bishop. Another bad factorisation of human chessplayers.

I guess it was around 2004 that i noticed that fruit's biggest problem was a total wrong tuning of especially its material and reported that to Fabien.

Fabien then started some sort of massive tuningsproject and produced Fruit 2.1 having the values:

pawn 100
knight 406
bishop 406
rook 625
queen 1250

In todays chess software we still find the Donninger values.

Latest stockfish 1.6.2 for example has the values:

const Value PawnValueMidgame = Value(0x0C6); 198 1.00
const Value KnightValueMidgame = Value(0x331); 817 4.126
const Value BishopValueMidgame = Value(0x344); 836 4.22
const Value RookValueMidgame = Value(0x4F6); 1270 6.41
const Value QueenValueMidgame = Value(0x9D9); 2521 12.73

So that's very close again to the Donninger values.

The interesting thing in the stockfish values is that a bishop gets valued higher than a knight. Probably can be explained by stockfish not having too much chessknowledge on bishops and knights. In general chessplayers overvalue the bishop too much. For me being someone who his entire life plays for owning the bishops, that was a big shock of course to find out that in nowadays chess most programs prefer in case of doubt always a knight;
this can be explained now by objective analysis but by subjective ones.
edit: stockfish difference between bishop and knight is really little, so it definitely picks a centralized knight over a bishop.

Chessplayers are really good in knowing when a bishop is stronger and when a knight is doing fine. Chessprograms are a lot more stupid of course, they have no clue, so it is safer for them to pick the knight as for chessprograms evaluating whether a knight is strong is a lot simpler than evaluating whether a bishop is bad.

Kasparov's games in the 80s already clearly show that Kasparov prefers in a lot of cases a knight over a bishop in positions where the 'average western' chessplayer prefers a bishop. Good example is Kasparov-Anand.

So i would argue the chessknowledge that a chessprogram possesses is having a far bigger impact onto the values than the search depth or other factors.

But in general spoken the Donninger values still hold true for most of todays engines.

In Diep of course material is a lot more complex, so not easy to give the values, they're a lot lower than all this, thanks to all kind of material rules that i introduced at start of the 21th century not long after world champs 2000. Recently those needed a lot of bugfixing and still do.

But grosso modo here is the values:

{ 1000, 3875, 3875, 6175, 12350 }, /* 0 */

Please realize that past years those values only got *higher*.

I started at piece = 3.5 somewhere in the 90s.

That doesn't mean that in 90s the 3.5 was correct, in contradiction, it just means we're learning now better what the values ought to be.

Because of other rules effectively a piece is worth 5.175 pawn as of now.

Thanks,
Vincent

One note. I believe the inflated piece values were a direct response to programs trading knight for 3 pawns and ending up in hopeless positions, and such. I did the "bad trade" idea in Crafty to avoid this, since the bad trade idea directly addresses the issue rather than indirectly thru modifying piece values.

I suspect that piece values for humans have no relationship to piece values for computers, any more than the shapes/sizes of our search trees which are not comparable.

bnemias · Post by **bnemias** » Thu Jan 14, 2010 9:06 pm

diep wrote:I guess it was around 2004 that i noticed that fruit's biggest problem was a total wrong tuning of especially its material and reported that to Fabien.

Fabien then started some sort of massive tuningsproject and produced Fruit 2.1 having the values:

pawn 100
knight 406
bishop 406
rook 625
queen 1250

Interesting read, including the rest of it that I snipped. But I have to mention that I watched lots of games where fruit lost because it overvalued the queen. It was happily giving up 2 minors + rook for the queen. I never looked at the code to see why, but your post explains it for me.

Actually, I guess it doesnt... the values add to more than the Q. Anyway, I watched fruit do it numerous times and was always puzzled.

diep · Post by **diep** » Thu Jan 14, 2010 9:18 pm

bob wrote:
diep wrote:
lkaufman wrote:First, as this is my first post here, let me introduce myself to those who don't know me. I am a chess Grandmaster, World Senior Champion in 2008, and although not a programmer myself I have been a consultant on chess programs (testing, writing opening books, assigning parameter values, proposing evaluation terms, etc.) from 1967 (MacHack) to the present (Rybka 3 and now Doch with Don Dailey).
Now for my question. I understand that you have tuned the evaluation function of Crafty very finely by playing perhaps millions of games at fast time limits. What I would like to know is to what extent do the optimum values depend on the time limit (or depth)? If there is measurable dependence, which values (individual piece values, mobility, king safety, pawn structure, etc.) go up with increased depth and which ones go down, and roughly by how much per ply? Do the values approach an asymptote at some modest level or do they appear to continue to change up to the maximum depth at which you test?
Thanks in advance for your answer.
When increasing chessprograms from their 60s / 70s incarnation to todays knowledge, it seems the piece values have gone up.

One of the first to discover this was Chrilly Donninger. He put therefore the piece values back in 1998 already at:

pawn = 100
knight = 420
bishop = 420
rook = 620
queen = 1250 - 1300

Note in those days not a single engine had the Max Euwe values anymore, let alone the small modification Fischer did do later in those values. Those already had been refuted long before that.

A queen in general is stronger than 2 rooks, but definitely not weaker.
Only in a few exceptions 2 rooks are stronger,
by accident chess literature only describes those exceptions.

Similar refutation is there of queen+knight being stronger than queen+bishop. Another bad factorisation of human chessplayers.

I guess it was around 2004 that i noticed that fruit's biggest problem was a total wrong tuning of especially its material and reported that to Fabien.

Fabien then started some sort of massive tuningsproject and produced Fruit 2.1 having the values:

pawn 100
knight 406
bishop 406
rook 625
queen 1250

In todays chess software we still find the Donninger values.

Latest stockfish 1.6.2 for example has the values:

const Value PawnValueMidgame = Value(0x0C6); 198 1.00
const Value KnightValueMidgame = Value(0x331); 817 4.126
const Value BishopValueMidgame = Value(0x344); 836 4.22
const Value RookValueMidgame = Value(0x4F6); 1270 6.41
const Value QueenValueMidgame = Value(0x9D9); 2521 12.73

So that's very close again to the Donninger values.

The interesting thing in the stockfish values is that a bishop gets valued higher than a knight. Probably can be explained by stockfish not having too much chessknowledge on bishops and knights. In general chessplayers overvalue the bishop too much. For me being someone who his entire life plays for owning the bishops, that was a big shock of course to find out that in nowadays chess most programs prefer in case of doubt always a knight;
this can be explained now by objective analysis but by subjective ones.
edit: stockfish difference between bishop and knight is really little, so it definitely picks a centralized knight over a bishop.

Chessplayers are really good in knowing when a bishop is stronger and when a knight is doing fine. Chessprograms are a lot more stupid of course, they have no clue, so it is safer for them to pick the knight as for chessprograms evaluating whether a knight is strong is a lot simpler than evaluating whether a bishop is bad.

Kasparov's games in the 80s already clearly show that Kasparov prefers in a lot of cases a knight over a bishop in positions where the 'average western' chessplayer prefers a bishop. Good example is Kasparov-Anand.

So i would argue the chessknowledge that a chessprogram possesses is having a far bigger impact onto the values than the search depth or other factors.

But in general spoken the Donninger values still hold true for most of todays engines.

In Diep of course material is a lot more complex, so not easy to give the values, they're a lot lower than all this, thanks to all kind of material rules that i introduced at start of the 21th century not long after world champs 2000. Recently those needed a lot of bugfixing and still do.

But grosso modo here is the values:

{ 1000, 3875, 3875, 6175, 12350 }, /* 0 */

Please realize that past years those values only got *higher*.

I started at piece = 3.5 somewhere in the 90s.

That doesn't mean that in 90s the 3.5 was correct, in contradiction, it just means we're learning now better what the values ought to be.

Because of other rules effectively a piece is worth 5.175 pawn as of now.

Thanks,
Vincent
One note. I believe the inflated piece values were a direct response to programs trading knight for 3 pawns and ending up in hopeless positions, and such. I did the "bad trade" idea in Crafty to avoid this, since the bad trade idea directly addresses the issue rather than indirectly thru modifying piece values.

I suspect that piece values for humans have no relationship to piece values for computers, any more than the shapes/sizes of our search trees which are not comparable.

It is very good to factorize clearly what has which bonus.

So you can also argue that a piece is worth 1 pawn using the displayed logics you use above.

If a pawn is worth nearly a piece because it is a far passer nearly promoting, then the reason is the pattern of promotion. Not because a piece is worth just 1 pawn when relatively calculating the value of that piece to the pawn.

So we must take in short the value of the material table as the value that is the factual truth for the specific program.

The logics to say that this table is wrong because of pattern X or Y is dead wrong. Hence we have a refutation of your entire posting here and can conclude from that easily that a piece relative to pawn value is worth around 4.0 to 4.2 as that's about what every commercial program has as its base value.

It is a matter of being good in factorisation, *what* causes *what*.

Thanks,
Vincent

diep · Post by **diep** » Thu Jan 14, 2010 9:31 pm

please note the bad trade bonus is part of the material logics in crafty i'd argue, as you can make from that a material table that is pattern independant and only material dependant.

thanks,
Vincent

diep · Post by **diep** » Thu Jan 14, 2010 10:15 pm

bnemias wrote:
diep wrote:I guess it was around 2004 that i noticed that fruit's biggest problem was a total wrong tuning of especially its material and reported that to Fabien.

Fabien then started some sort of massive tuningsproject and produced Fruit 2.1 having the values:

pawn 100
knight 406
bishop 406
rook 625
queen 1250
Interesting read, including the rest of it that I snipped. But I have to mention that I watched lots of games where fruit lost because it overvalued the queen. It was happily giving up 2 minors + rook for the queen. I never looked at the code to see why, but your post explains it for me.

Actually, I guess it doesnt... the values add to more than the Q. Anyway, I watched fruit do it numerous times and was always puzzled.

Again wrong conclusion. It doesn't overvalue the queen at all. It overvalues passed pawns. So in the positions where it gives up the queen for 2 rooks where the 2 rooks happen to be stronger, it overvalues the passed pawn(s) simply which the rooks collect and then fruit loses.

That happens a lot yeah with Fruit.

A good example of that is Diep-Fruit world champs 2005

Fruit gets the queen and already when getting the queen in its mainline is the push a5-a4-a3, so it basically got the queen against 2 rooks as it also saw its pawn on a3. Needless to say it never came that far in reality. Diep collected the passers and won easily.

So it isn't the material table there that loses it for Fruit.

Thanks,
Vincent

Kempelen · Post by **Kempelen** » Thu Jan 14, 2010 10:45 pm

Even I have seen piece values changing depending on the game phase. Not remember where, but I have planned to test it in my engine.....

Uri Blass · Post by **Uri Blass** » Thu Jan 14, 2010 10:59 pm

I think that it is not easy to give the value of pieces for existing programs
and the interesting question is not what is the right value but if the optimal value is smaller or bigger at longer time control.

Even if you take a stupid piece square table program(with no knowledge about pawn structure) then it may have a different value for different squares for pawns so the first problem is to define the value of pawn for that program.

You can do average on all the squares but
I doubt if it is correct because the squares do not have identical probability to have a pawn in them.

The problem with programs that have bonuses for passed pawns and penalties for weak pawns is even harder.

The basic value of pawn for stockfish tells me nothing because it is possible to have the same program with different basic value and some small penalty for the fact that a pawn is not a passed pawn.

Note also that the average practical value of advantage in pawns of 3 pawns(when you only count pawns and the opponent may have a bishop or a knight to compensate for it) may be more than 3 times the average value of a pawn because there is a big probability that a part of the pawns is passed pawns.

Uri

lkaufman · Post by **lkaufman** » Thu Jan 14, 2010 11:12 pm

I'll propose some tests for Crafty, of course feel free to run as many or few as you like. Try doubling whatever time you used for your current values, and then : 1. optimize the value of the rook 2. optimize the value of the knight 3. optimize the value of the bishop 4. optimize the value of bishop mobility 5. optimize the value of rook mobility 6. if possible, optimize the value of pawn structure all together (maybe you can't do this). I suspect that the optimum piece values may differ by a non-trivial amount at different time limits, but I lack sufficient testing resources to confirm or disprove this hypothesis myself.

Thanks in advance,
Larry

lkaufman · Post by **lkaufman** » Thu Jan 14, 2010 11:18 pm

I disagree with your statement that in general the queen is stronger than two rooks, definitely not weaker. All of my research, both in database stats and in optimizing the piece values for both Rybka and Doch, indicates that the queen is nearly equal to two rooks, but is worth a bit less (maybe a quarter pawn or so), not a bit more. Looking at the values assigned by different programs to the pieces is only useful if you know that they are well "centered", which is not always the case. I know it is not at all the case in Stockfish, the pawn is worth more in that program than its stated value.

Uri Blass · Post by **Uri Blass** » Thu Jan 14, 2010 11:20 pm

I am afraid that the difference in rating points is going to be too small and even bob does not play enough games to measure it.

I suspect that you may need to multiply the time control by 10 to find a significant difference in the optimal values.

Uri

Question for Bob Hyatt

Re: Question for Bob Hyatt

Re: Question for Bob Hyatt

Re: Question for Bob Hyatt

Re: Question for Bob Hyatt

Re: Question for Bob Hyatt

Re: Question for Bob Hyatt

Re: Question for Bob Hyatt

Re: Question for Bob Hyatt

Re: Question for Bob Hyatt

Re: Question for Bob Hyatt