jwes wrote:bob wrote:jwes wrote:bob wrote:One note. I believe the inflated piece values were a direct response to programs trading knight for 3 pawns and ending up in hopeless positions, and such. I did the "bad trade" idea in Crafty to avoid this, since the bad trade idea directly addresses the issue rather than indirectly thru modifying piece values.
I wonder to what extent it is that programs do not understand how to play with material differences, e.g. with 3 pawns vs. a piece, you need to use the pawns aggressively.
That is one thing that makes this tuning stuff so difficult. I remember many years ago that we simply could not come up with a scheme to handle some of the openings where the program would play g3/g6 and then Bg2/Bg7. The bishop is often critical, and trading it for a knight is generally not a good idea unless the knight is causing lots of problems where it stands. So we simply tuned the opening book to avoid such lines and did just fine (this was a Cray Blitz issue, by the way). Very early Crafty versions used the old CB book, but as I worked on king safety, slowly this problem went away. Yet the book avoided the Bg2 type positions and would instead go into something that became even more problematic.
Bottom line is that as the evaluation is modified, all terms suddenly become suspect. Sort of like optimizing for speed. As one peak gets driven down by optimizations you apply, others rise to take its place, and the process is actually never completed, just continually improved/refined...
It would be an interesting (and tedious) experiment to collect a few thousand relatively even positions with unbalanced material, e.g. N v PPP,
and tune a version of crafty specifically for those positions to see how much better it would play in those positions than regular crafty.
I feel you've been missing what happened to crafty past dozens of months.
With just 'a few positions' you aren't going to be able to approximate the millions of 'monte carlo type' datapoints crafty has already been tuned to by means of millions of games.
Assuming you don't fix the chessknowledge, but just tune parameters, you can already estimate that most likely the first few months in your experiment you will manage to lose an elo of 200 or so, not win anything.
Sure, some programmers i heard saying that Bob is wasting massive system time and could use it more effective, but let's face it. He has the monte carlo effect work in his favour. Not a single other tuning method you'll be able to design 'just like that' to do just that. That's really fulltime professional work to setup a method that's better.
Note i try to do that just as well of course.
Please realize how effective that monte carlo effect is. A position that gets total misevaluated by both crafty as well as its opponent, you still make a chance to score correctly, as the game result will decide for it. Now you just need tons of similar type positions that get a result different from the evaluation function and still the automatic crafty tuner will tune it correctly.
Can you tune more effective initially, so *initially* avoid playing games?
Oh sure, that's what i try to do of course also. Things get very complex then suddenly however.
What Bob's brainchild undergoes is simple yet very very effective. It's far more effective than most notice here.
I see daily crafty play and i can assure you that crafty has been BETTER tuned than rybka. Far superior. It's just a difference in CHESSKNOWLEDGE that lets rybka have the current momentum. Also i consider rybka's search a piece of crap. Just mainline checking and never trying to find a better move. Cowardchess would be a far better name.
It's not clear to me whether it is because of this CHESSKNOWLDGE reason that rybka scales better at bigger hardware and slower time controls than crafty, or whether something else is the case, such as that bob has 8 cores and most rybka's have 4.
That is still trying to find truths in the mudd.
So for now with respect to crafty i'd say: he has the hardware, so calling it a waste of system time i would heavily disagree, especially because i know what else would run on that hardware over there, so every minute of that is well spent to crafty.
Vincent