Question for Bob Hyatt

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

jwes
Posts: 778
Joined: Sat Jul 01, 2006 7:11 am

Re: Question for Bob Hyatt

Post by jwes »

bob wrote:
jwes wrote:
bob wrote:One note. I believe the inflated piece values were a direct response to programs trading knight for 3 pawns and ending up in hopeless positions, and such. I did the "bad trade" idea in Crafty to avoid this, since the bad trade idea directly addresses the issue rather than indirectly thru modifying piece values.
I wonder to what extent it is that programs do not understand how to play with material differences, e.g. with 3 pawns vs. a piece, you need to use the pawns aggressively.
That is one thing that makes this tuning stuff so difficult. I remember many years ago that we simply could not come up with a scheme to handle some of the openings where the program would play g3/g6 and then Bg2/Bg7. The bishop is often critical, and trading it for a knight is generally not a good idea unless the knight is causing lots of problems where it stands. So we simply tuned the opening book to avoid such lines and did just fine (this was a Cray Blitz issue, by the way). Very early Crafty versions used the old CB book, but as I worked on king safety, slowly this problem went away. Yet the book avoided the Bg2 type positions and would instead go into something that became even more problematic.

Bottom line is that as the evaluation is modified, all terms suddenly become suspect. Sort of like optimizing for speed. As one peak gets driven down by optimizations you apply, others rise to take its place, and the process is actually never completed, just continually improved/refined...
It would be an interesting (and tedious) experiment to collect a few thousand relatively even positions with unbalanced material, e.g. N v PPP,
and tune a version of crafty specifically for those positions to see how much better it would play in those positions than regular crafty.
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: Question for Bob Hyatt

Post by diep »

jwes wrote:
bob wrote:
jwes wrote:
bob wrote:One note. I believe the inflated piece values were a direct response to programs trading knight for 3 pawns and ending up in hopeless positions, and such. I did the "bad trade" idea in Crafty to avoid this, since the bad trade idea directly addresses the issue rather than indirectly thru modifying piece values.
I wonder to what extent it is that programs do not understand how to play with material differences, e.g. with 3 pawns vs. a piece, you need to use the pawns aggressively.
That is one thing that makes this tuning stuff so difficult. I remember many years ago that we simply could not come up with a scheme to handle some of the openings where the program would play g3/g6 and then Bg2/Bg7. The bishop is often critical, and trading it for a knight is generally not a good idea unless the knight is causing lots of problems where it stands. So we simply tuned the opening book to avoid such lines and did just fine (this was a Cray Blitz issue, by the way). Very early Crafty versions used the old CB book, but as I worked on king safety, slowly this problem went away. Yet the book avoided the Bg2 type positions and would instead go into something that became even more problematic.

Bottom line is that as the evaluation is modified, all terms suddenly become suspect. Sort of like optimizing for speed. As one peak gets driven down by optimizations you apply, others rise to take its place, and the process is actually never completed, just continually improved/refined...
It would be an interesting (and tedious) experiment to collect a few thousand relatively even positions with unbalanced material, e.g. N v PPP,
and tune a version of crafty specifically for those positions to see how much better it would play in those positions than regular crafty.
I feel you've been missing what happened to crafty past dozens of months.

With just 'a few positions' you aren't going to be able to approximate the millions of 'monte carlo type' datapoints crafty has already been tuned to by means of millions of games.

Assuming you don't fix the chessknowledge, but just tune parameters, you can already estimate that most likely the first few months in your experiment you will manage to lose an elo of 200 or so, not win anything.

Sure, some programmers i heard saying that Bob is wasting massive system time and could use it more effective, but let's face it. He has the monte carlo effect work in his favour. Not a single other tuning method you'll be able to design 'just like that' to do just that. That's really fulltime professional work to setup a method that's better.

Note i try to do that just as well of course.

Please realize how effective that monte carlo effect is. A position that gets total misevaluated by both crafty as well as its opponent, you still make a chance to score correctly, as the game result will decide for it. Now you just need tons of similar type positions that get a result different from the evaluation function and still the automatic crafty tuner will tune it correctly.

Can you tune more effective initially, so *initially* avoid playing games?

Oh sure, that's what i try to do of course also. Things get very complex then suddenly however.

What Bob's brainchild undergoes is simple yet very very effective. It's far more effective than most notice here.

I see daily crafty play and i can assure you that crafty has been BETTER tuned than rybka. Far superior. It's just a difference in CHESSKNOWLEDGE that lets rybka have the current momentum. Also i consider rybka's search a piece of crap. Just mainline checking and never trying to find a better move. Cowardchess would be a far better name.

It's not clear to me whether it is because of this CHESSKNOWLDGE reason that rybka scales better at bigger hardware and slower time controls than crafty, or whether something else is the case, such as that bob has 8 cores and most rybka's have 4.

That is still trying to find truths in the mudd.

So for now with respect to crafty i'd say: he has the hardware, so calling it a waste of system time i would heavily disagree, especially because i know what else would run on that hardware over there, so every minute of that is well spent to crafty.

Vincent
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Question for Bob Hyatt

Post by lkaufman »

"i consider rybka's search a piece of crap. Just mainline checking and never trying to find a better move." I'm not saying you are right or wrong, but Vas would never have used this type of search in Rybka if it did not win against a similar evaluation using a conventional search. Do you have any data to support your apparent belief that Rybka would be even stronger with a conventional search? Has anyone else tried this sort of search with an eval used in an conventional program to see if it helps or hurts?
Uri Blass
Posts: 10279
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Question for Bob Hyatt

Post by Uri Blass »

diep wrote:
jwes wrote:
bob wrote:
jwes wrote:
bob wrote:One note. I believe the inflated piece values were a direct response to programs trading knight for 3 pawns and ending up in hopeless positions, and such. I did the "bad trade" idea in Crafty to avoid this, since the bad trade idea directly addresses the issue rather than indirectly thru modifying piece values.
I wonder to what extent it is that programs do not understand how to play with material differences, e.g. with 3 pawns vs. a piece, you need to use the pawns aggressively.
That is one thing that makes this tuning stuff so difficult. I remember many years ago that we simply could not come up with a scheme to handle some of the openings where the program would play g3/g6 and then Bg2/Bg7. The bishop is often critical, and trading it for a knight is generally not a good idea unless the knight is causing lots of problems where it stands. So we simply tuned the opening book to avoid such lines and did just fine (this was a Cray Blitz issue, by the way). Very early Crafty versions used the old CB book, but as I worked on king safety, slowly this problem went away. Yet the book avoided the Bg2 type positions and would instead go into something that became even more problematic.

Bottom line is that as the evaluation is modified, all terms suddenly become suspect. Sort of like optimizing for speed. As one peak gets driven down by optimizations you apply, others rise to take its place, and the process is actually never completed, just continually improved/refined...
It would be an interesting (and tedious) experiment to collect a few thousand relatively even positions with unbalanced material, e.g. N v PPP,
and tune a version of crafty specifically for those positions to see how much better it would play in those positions than regular crafty.
I feel you've been missing what happened to crafty past dozens of months.

With just 'a few positions' you aren't going to be able to approximate the millions of 'monte carlo type' datapoints crafty has already been tuned to by means of millions of games.

Assuming you don't fix the chessknowledge, but just tune parameters, you can already estimate that most likely the first few months in your experiment you will manage to lose an elo of 200 or so, not win anything.

Vincent
1)If you tune parameters in your program only for specific small set and
are stupid enough to use these value everywhere then
you may lose elo but I think that you are unable to lose 200 elo or even only 80 elo by tuning your program to play well in many N v PPP positions.

2)If you tune your program for N vs PPP positions then you do not need to use the values that you get everywhere.

You can have a preprocessor and use a different evaluation when the position in the board is of the type of N vs PPP.

Uri
Albert Silver
Posts: 3019
Joined: Wed Mar 08, 2006 9:57 pm
Location: Rio de Janeiro, Brazil

Re: Question for Bob Hyatt

Post by Albert Silver »

Uri Blass wrote:
diep wrote:
jwes wrote:
bob wrote:
jwes wrote:
bob wrote:One note. I believe the inflated piece values were a direct response to programs trading knight for 3 pawns and ending up in hopeless positions, and such. I did the "bad trade" idea in Crafty to avoid this, since the bad trade idea directly addresses the issue rather than indirectly thru modifying piece values.
I wonder to what extent it is that programs do not understand how to play with material differences, e.g. with 3 pawns vs. a piece, you need to use the pawns aggressively.
That is one thing that makes this tuning stuff so difficult. I remember many years ago that we simply could not come up with a scheme to handle some of the openings where the program would play g3/g6 and then Bg2/Bg7. The bishop is often critical, and trading it for a knight is generally not a good idea unless the knight is causing lots of problems where it stands. So we simply tuned the opening book to avoid such lines and did just fine (this was a Cray Blitz issue, by the way). Very early Crafty versions used the old CB book, but as I worked on king safety, slowly this problem went away. Yet the book avoided the Bg2 type positions and would instead go into something that became even more problematic.

Bottom line is that as the evaluation is modified, all terms suddenly become suspect. Sort of like optimizing for speed. As one peak gets driven down by optimizations you apply, others rise to take its place, and the process is actually never completed, just continually improved/refined...
It would be an interesting (and tedious) experiment to collect a few thousand relatively even positions with unbalanced material, e.g. N v PPP,
and tune a version of crafty specifically for those positions to see how much better it would play in those positions than regular crafty.
I feel you've been missing what happened to crafty past dozens of months.

With just 'a few positions' you aren't going to be able to approximate the millions of 'monte carlo type' datapoints crafty has already been tuned to by means of millions of games.

Assuming you don't fix the chessknowledge, but just tune parameters, you can already estimate that most likely the first few months in your experiment you will manage to lose an elo of 200 or so, not win anything.

Vincent
1)If you tune parameters in your program only for specific small set and
are stupid enough to use these value everywhere then
you may lose elo but I think that you are unable to lose 200 elo or even only 80 elo by tuning your program to play well in many N v PPP positions.

2)If you tune your program for N vs PPP positions then you do not need to use the values that you get everywhere.

You can have a preprocessor and use a different evaluation when the position in the board is of the type of N vs PPP.

Uri
That is what is done with top NN backgammon programs such as Snowie or GNUBG. In Snowie 4's case, no less than 9 different neural nets are created, tuned, and used to cover different game situations. For example, GNU has a race neural net which covers situations where the checkers of both players can no longer hit each other, and thus it is a pure race to bring the checkers home and remove them. The race NN ensures almost perfect plays with no wastage. Other NN cover other types of situations.

Naturally, the number of nodes is completely different. By comparison, GNUBG has no less than 128 thousand nodes that it uses in its balancing acts. The NNs are developed and then subject to look ahead which is typically 2 plies ahead. The differences are fundamental between the two games, no question about it, but the point is that the value of certain strategic advantages will vary according to the situation and the various NNs are designed to deal with those phases accordingly. I have no idea if this could be done in chess with any reasonable success, but perhaps with very low depths, achieved in milliseconds (5-7 plies), something could be achieved.
"Tactics are the bricks and sticks that make up a game, but positional play is the architectural blueprint."
Uri Blass
Posts: 10279
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Question for Bob Hyatt

Post by Uri Blass »

lkaufman wrote:"i consider rybka's search a piece of crap. Just mainline checking and never trying to find a better move." I'm not saying you are right or wrong, but Vas would never have used this type of search in Rybka if it did not win against a similar evaluation using a conventional search. Do you have any data to support your apparent belief that Rybka would be even stronger with a conventional search? Has anyone else tried this sort of search with an eval used in an conventional program to see if it helps or hurts?
I believe that the main reason for rybka's strength is the search.

I remember that I tried to change strelka2.0's evaluation(based on rybka1 beta) to piece square table evaluation and I was amazed how strong it is at fast time control inspite of the bad piece square values(it did not had the material tables so it clearly overvalued the pawns) and the only knowledge that it had was to do average between opening and endgame
piece square table based on the stage of the game.

It could beat Joker1.14 at 1+1 time control 24-11 and 5 draws
and Joker has CCRL rating of more than 2400 at 40/40(I used all the possible positions after 1 ply for the test that means (positon after 1.a3
position after 1.a4,...).

It may get worse results at long time control but
I will not be surprised if piece square table version of rybka3 can get CCRL rating above 2500

Uri
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Question for Bob Hyatt

Post by lkaufman »

That is quite interesting, but unless we know how a conventional top program like Stockfish would compare with only piece square tables we don't know how to interpret your result. I'm also not clear on what you mean by no material tables. Surely you don't mean that it valued pawns, pieces, and queens all about the same, that would play ridiculously badly, so I suppose the piece square tables had some reasonable values for the pieces built in? In theory I could run the same test using Doch, but unless I had the same piece square tables it wouldn't mean much.
User avatar
hgm
Posts: 27790
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Question for Bob Hyatt

Post by hgm »

I guess there is bit of a danger for confusion here, as there are three conventional ways to do evaluations: piece (base) values, piece square tables and material tables. The first is the most basic, but if you have either of the other two, you could eliminate it, by either including the base values in the piece-square table, or in the material table.

As it would take you an extra addition plus lookup in a small table per piece to implement base values, it seems more efficient not to do it. But there could still be reasons why you want to do it anyway, e.g. to save space in your PST or material table. (e.g. if you want ceni-Pawn resolution in the PST, the value of a Queen would get to be round 1000, and would not fit in a single byte anymore. The extra L1 cache loading by arger PST might very well outweigh the savings of the addition of the base values.)

If you implement full material tables it would matter enormously if you needed one byte or 2 bytes per entry. So when people talk about material tables, I guess they tacitly assume that you only store a small deviation from the base values there, to use as few bits per piece composition as possible. Which means the base values either have to go into the PST, or be programmed as base values.

In my engine HaQiKi D, I use partial material tables plus PST, but no base values. I can easily accomodate the base value in the PST, as I use 32-bt PST: the 16 upper bits are the real PST values, while two groups of 8 bits keep track of the King-Safety points for white and black, so that they are updated 'for free' (except for the extra cache load) together with the differential eval.
Uri Blass
Posts: 10279
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Question for Bob Hyatt

Post by Uri Blass »

lkaufman wrote:That is quite interesting, but unless we know how a conventional top program like Stockfish would compare with only piece square tables we don't know how to interpret your result. I'm also not clear on what you mean by no material tables. Surely you don't mean that it valued pawns, pieces, and queens all about the same, that would play ridiculously badly, so I suppose the piece square tables had some reasonable values for the pieces built in? In theory I could run the same test using Doch, but unless I had the same piece square tables it wouldn't mean much.
I meant no material imbalance tables.
strelka2 is free and

The piece square table values that I used are based on the following code

Code: Select all


const __int16 PawnFileOpening = 181;
const __int16 PawnFileEndgame = -97;
const __int16 KnightCentrOpening = 347;
const __int16 KnightCentrEndgame = 56;
const __int16 KnightRankOpening = 358;
const __int16 KnightTrapped = 3200;
const __int16 BishopCentrOpening = 147;
const __int16 BishopCentrEndgame = 49;
const __int16 BishopBackRankOpening = 251;
const __int16 BishopDiagonalOpening = 378;
const __int16 RookFileOpening = 104;
const __int16 QueenCentrOpening = 98;
const __int16 QueenCentrEndgame = 108;
const __int16 QueenBackRankOpening = 201;
const __int16 KingFileOpening = 469;
const __int16 KingCentrEndgame = 401;

const __int16 PawnFile[8] = {-3,-1,0,1,1,0,-1,-3};
const __int16 KnightLine[8] = {-4,-2,0,1,1,0,-2,-4};
const __int16 KnightRank[8] = {-2,-1,0,1,2,3,2,1};
const __int16 BishopLine[8] = {-3,-1,0,1,1,0,-1,-3};
const __int16 RookFile[8] = {-2,-1,0,1,1,0,-1,-2};
const __int16 QueenLine[8] = {-3,-1,0,1,1,0,-1,-3};
const __int16 KingLine[8] = {-3,-1,0,1,1,0,-1,-3};
const __int16 KingFile[8] = {3,4,2,0,0,2,4,3};

void init_evals()
{ int i, j;

    for &#40;i = 0; i < 64; i++) &#123;
    PieceSquareValue&#91;WhitePawn-2&#93;&#91;i&#93;&#91;0&#93; = PawnFile&#91;i & 7&#93; * PawnFileOpening;
    PieceSquareValue&#91;WhitePawn-2&#93;&#91;i&#93;&#91;1&#93; = PawnFile&#91;i & 7&#93; * PawnFileEndgame;
    PieceSquareValue&#91;WhiteKnight-2&#93;&#91;i&#93;&#91;0&#93; =  KnightLine&#91;i & 7&#93;  * KnightCentrOpening +
                                             KnightLine&#91;i >> 3&#93; * KnightCentrOpening +
                                             KnightRank&#91;i >> 3&#93; * KnightRankOpening;
    PieceSquareValue&#91;WhiteKnight-2&#93;&#91;i&#93;&#91;1&#93; =  KnightLine&#91;i & 7&#93;  * KnightCentrEndgame +
                                             KnightLine&#91;i >> 3&#93; * KnightCentrEndgame;
    PieceSquareValue&#91;WhiteBishop-2&#93;&#91;i&#93;&#91;0&#93; =  BishopLine&#91;i & 7&#93;  * BishopCentrOpening +
                                             BishopLine&#91;i >> 3&#93; * BishopCentrOpening;
    PieceSquareValue&#91;WhiteBishop-2&#93;&#91;i&#93;&#91;1&#93; =  BishopLine&#91;i & 7&#93;  * BishopCentrEndgame +
                                             BishopLine&#91;i >> 3&#93; * BishopCentrEndgame;
    PieceSquareValue&#91;WhiteRook-2&#93;&#91;i&#93;&#91;0&#93; = RookFile&#91;i & 7&#93; * RookFileOpening;
    PieceSquareValue&#91;WhiteRook-2&#93;&#91;i&#93;&#91;1&#93; = 0;
    PieceSquareValue&#91;WhiteQueen-2&#93;&#91;i&#93;&#91;0&#93; =  QueenLine&#91;i & 7&#93;  * QueenCentrOpening +
                                            QueenLine&#91;i >> 3&#93; * QueenCentrOpening;
    PieceSquareValue&#91;WhiteQueen-2&#93;&#91;i&#93;&#91;1&#93; =  QueenLine&#91;i & 7&#93;  * QueenCentrEndgame +
                                            QueenLine&#91;i >> 3&#93; * QueenCentrEndgame;
    PieceSquareValue&#91;WhiteKing-2&#93;&#91;i&#93;&#91;0&#93; = KingFile&#91;i & 7&#93; * KingFileOpening;
    PieceSquareValue&#91;WhiteKing-2&#93;&#91;i&#93;&#91;1&#93; = KingLine&#91;i & 7&#93;  * KingCentrEndgame +
                                          KingLine&#91;i >> 3&#93; * KingCentrEndgame;
  &#125;
  PieceSquareValue&#91;WhitePawn-2&#93;&#91;35&#93;&#91;0&#93; += 74;
  PieceSquareValue&#91;WhitePawn-2&#93;&#91;36&#93;&#91;0&#93; += 74;
  PieceSquareValue&#91;WhiteKnight-2&#93;&#91;56&#93;&#91;0&#93; -= KnightTrapped;
  PieceSquareValue&#91;WhiteKnight-2&#93;&#91;63&#93;&#91;0&#93; -= KnightTrapped;
  for &#40;i = 0; i < 8; i++) &#123;
    PieceSquareValue&#91;WhiteBishop-2&#93;&#91;i&#93;&#91;0&#93; -= BishopBackRankOpening;
    PieceSquareValue&#91;WhiteBishop-2&#93;&#91;&#40;i << 3&#41; | i&#93;&#91;0&#93; += BishopDiagonalOpening;
    PieceSquareValue&#91;WhiteBishop-2&#93;&#91;(&#40;i << 3&#41; | i&#41; ^ 070&#93;&#91;0&#93; += BishopDiagonalOpening;
    PieceSquareValue&#91;WhiteQueen-2&#93;&#91;i&#93;&#91;0&#93; -= QueenBackRankOpening;
  &#125;
  for &#40;i = 0; i < 12; i += 2&#41; for &#40;j = 0; j < 64; j++) &#123;
    PieceSquareValue&#91;i+1&#93;&#91;j&#93;&#91;0&#93; = -PieceSquareValue&#91;i&#93;&#91;j^070&#93;&#91;0&#93;;
    PieceSquareValue&#91;i+1&#93;&#91;j&#93;&#91;1&#93; = -PieceSquareValue&#91;i&#93;&#91;j^070&#93;&#91;1&#93;;
  &#125;
&#125;
Uri
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: Question for Bob Hyatt

Post by diep »

Uri Blass wrote:
lkaufman wrote:"i consider rybka's search a piece of crap. Just mainline checking and never trying to find a better move." I'm not saying you are right or wrong, but Vas would never have used this type of search in Rybka if it did not win against a similar evaluation using a conventional search. Do you have any data to support your apparent belief that Rybka would be even stronger with a conventional search? Has anyone else tried this sort of search with an eval used in an conventional program to see if it helps or hurts?
I believe that the main reason for rybka's strength is the search.

I remember that I tried to change strelka2.0's evaluation(based on rybka1 beta) to piece square table evaluation and I was amazed how strong it is at fast time control inspite of the bad piece square values(it did not had the material tables so it clearly overvalued the pawns) and the only knowledge that it had was to do average between opening and endgame
piece square table based on the stage of the game.

It could beat Joker1.14 at 1+1 time control 24-11 and 5 draws
and Joker has CCRL rating of more than 2400 at 40/40(I used all the possible positions after 1 ply for the test that means (positon after 1.a3
position after 1.a4,...).

It may get worse results at long time control but
I will not be surprised if piece square table version of rybka3 can get CCRL rating above 2500

Uri
this is uninteresting experiment as normal rating is 3100, so you lost 600 and also you played another engine with shitzero knowledge which is a big flaw in the experiment. Further you use great low level code from many many different topprogrammers against an engine from a professor in Amsterdam who did not do much effort to get much speed out of joker. At superfast time controls speed matters and i bet you ran single core. So the experiment is total flawed from every direction.

equip it with crafty eval and play it versus crafty 23.1 i'd say at some more mature time control as it'll be single core. but i bet you won't do that as that is an afternoon of hacking for you to get that done, you prefer to do an afternoon of posts instead of an afternoon of effort for a more serious attempt i guess.