More I look at Giraffe, more amazing it seems to me. The pawn handicaps shown by you here look in line with what the best engines like Komodo and Stockfish show in several minutes search. We had some tests (play-outs) determining ELO of handicaps (at some reasonably long time controls), and tests showed that a2+c2 handicap is actually larger ELO-wise than f7 handicap, 560 ELO points versus 510 ELO points. Most strong engines, Fruit, Komodo, Stockfish included, exhibit the opposite eval, f7 as being larger handicap than a2+c2.matthewlai wrote:Yeah +100.00 is the max positive non-mate score, and -100.00 is the max negative non-mate score.Laskos wrote: It shows an English symmetrical four knights with 4.e4 and a slightly negative eval (I suppose this means less than 50% performance). My database shows slightly below 50% performance on this too.
I find that as training progresses, it usually starts by preferring 1. d4, then eventually switches to 1. e4, then 1. c4 (not counting very beginning where it just plays randomly).
With a7 pawn missing it's about +7.00.
b7 => +12.00
c7 => +13.00
d7 => +13.00
e7 => +13.00
f7 => +20.00
g7 => +18.00
h7 => +11.50
Not sure why it thinks f7 and g7 pawns are so important. Probably because of easier king-side attacks.
Here is the link to our recent discussion with Larry:
http://www.talkchess.com/forum/viewtopi ... 5&start=51
Giraffe shows the correct eval:Larry Kaufman wrote:Based on a one minute search on 4 cores of the handicap starting positions, f7 actually shows as more of a handicap than even a2 + c2, 1.32 vs 1.14. Probably this means something isn't quite right about the material vs. positional scoring in Komodo, although it is well-tuned. Something we need to investigate. f7 is quite a large handicap because Black's development must be very modest due to tactical problems, but still a2 + c2 seems larger (and scores as higher in your tests). I think that Komodo's eval tricked me into underestimating the two pawn handicaps.
f7: +20.00
a2+c2: -22.00
Then, I tested several opening positions compared to large databases of human games, and Giraffe seems to show an eval more faithful to the database statistics of outcomes compared to top engines like Komodo or Stockfish.
I am also curious about its scaling (ELO strength in time). Conventional engines at first can be said that exhibit a logarithmic scaling behavior, the ELO increase is proportional to number of plies which is basically log(nodes). Then, taking into account the "diminishing returns", it's actually very close to
ELO ~ log[log(nodes)]
I understand that Giraffe, when learning, first evaluates a position for some time and then plays against itself for several more moves (incomplete play-outs). Then it would adjust the model so that the eval of the original position gets closer to evaluations of leaf positions. It also uses iterative deepening. If so, I suspect that the ELO scaling should be fairly similar to other engines, log(nodes) at first glance, and maybe the same log[log(nodes)] taking into account "diminishing returns". Not sure about the second, and not sure about the proportionality factors, at first it seems that Giraffe is getting comparatively stronger at longer time controls.
I hope you will continue to work at this amazing engine. Even if not the strongest engine all around, it will be very useful in areas like building opening books and such positional stuff, where top conventional engines suffer.