rbarreira wrote:For testing purposes, it might be a bad idea to make games dependent by not clearing the hash (thus breaking the elo model).
Also it really is a form of booklearning if the theory of hashhits from previous games really is what helps. Most testing groups do not want to include booklearning in their testing but the effect in the established ratinglists may be small because they use longer timecontrols and limited hash.
Eelco
Debugging is twice as hard as writing the code in the first
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan
rbarreira wrote:For testing purposes, it might be a bad idea to make games dependent by not clearing the hash (thus breaking the elo model).
I agree. Therefore I will clear the hash between games for future testing, especially because I test with slow computers, ultra fast time control, default TT size and self play.
// Bonuses for enemy's safe checks
...
const int QueenCheckBonus = 3;
const int RookCheckBonus = 2;
const int BishopCheckBonus = 1;
const int KnightCheckBonus = 1;
I'd say the KnightCheckBonus should be higher than 1 because a safe check from a knight is more dangerous than a check from a bishop, because the only way to escape from knight's check is to move the king. KnightCheckBonus = 2 works for me in in ultra fast games. (~ +5 Elo)
Thanks Ralph, I will add your tweak to our test queue.
Because of the king-has-to-move-away characteristic of safe knight checks it may be even appropriate to double the bonus if it's opponent's turn, like in the contact check cases. (not tested)
// Bonuses for enemy's safe checks
...
const int QueenCheckBonus = 3;
const int RookCheckBonus = 2;
const int BishopCheckBonus = 1;
const int KnightCheckBonus = 1;
I'd say the KnightCheckBonus should be higher than 1 because a safe check from a knight is more dangerous than a check from a bishop, because the only way to escape from knight's check is to move the king. KnightCheckBonus = 2 works for me in in ultra fast games. (~ +5 Elo)
After 6226 games:
Mod vs Orig 968 - 1007 - 4251 ELO -2 (+- 3.5)
FWIW king safety we know is a TC dependent code, you may want to test that part of evaluation with longer TC.
Anyhow thanks for the idea and for your testing effort.
Ralph Stoesser wrote:Is there any special reason for not clearing the TT after a new game has started?
Short answer: Because it is stronger in long matches.
Long answer: During last year we started a long match against Rybka 3 and we noted that SF started well but then, after several hundreds games the winning scores decreased. We observed this behaviour many times and so we stopped clearing the hash at each new game and suddendly the score didn't decrease anymore.
That's almost unbelievable! You are saying that positions remained in the hash table from several of the previously played games! I don't doubt what you say but I'm surprised!
So lessons learned were:
1 ) Don't clearing the TT at each new games gives an edge on long matches.
2) Very possibly Rybka does not clear the TT too
BTW as you know SF supports changing parameters values on the fly during the same game. This makes your assert fail anyway because user can always tweak the weights during a match so that the same position gives a different evaluation score.
Don wrote:That's almost unbelievable! You are saying that positions remained in the hash table from several of the previously played games! I don't doubt what you say but I'm surprised!
I didn't check for sure, just in an indirect way with playing games.
Perhaps could be interesting setting up an experiment storing in TT, togheter with the usual entry, a sequential incremented each new game and then sample the hash table after say 10 matches and plot the histograms of the TT entries ages.
Don wrote:That's almost unbelievable! You are saying that positions remained in the hash table from several of the previously played games! I don't doubt what you say but I'm surprised!
I didn't check for sure, just in an indirect way with playing games.
Perhaps could be interesting setting up an experiment storing in TT, togheter with the usual entry, a sequential incremented each new game and then sample the hash table after say 10 matches and plot the histograms of the TT entries ages.
My tester always restarts programs but I could do a test with xboard to check this out.
I suspect that if your hash table is way over-provisioned it could be a factor.
Another experiment one could do is to increase the size of the "age" field in the hash table records. I don't know what you call this in SF but it's the thing I increment after each search so that I know a score is coming from a different search. Increase it enough to span a few games. Then you could LOG any hit that has a very old age and get a rough sense of how often that is happening. Of course refreshes might defeat that.
If it is true that has entries from previous games transpose to positions from future games often enough, then maybe it would be beneficial to to load some sort of large generic hash table when first loading the engine, containing entries for common openings.
Instinctively, I do feel that something is not correct about this theory, as there are so many positions, amking chess too complex for it to be true.
I suppose somebody could clear this up with a test:
Figure out how to play a long match (varied opening book) between 2 identical versions of Stockfish, with one version clearing hash after each game, the other version not clearing hash. The result will answer the question.
mhalstern wrote:If it is true that has entries from previous games transpose to positions from future games often enough, then maybe it would be beneficial to to load some sort of large generic hash table when first loading the engine, containing entries for common openings.
Instinctively, I do feel that something is not correct about this theory, as there are so many positions, amking chess too complex for it to be true.
I suppose somebody could clear this up with a test:
Figure out how to play a long match (varied opening book) between 2 identical versions of Stockfish, with one version clearing hash after each game, the other version not clearing hash. The result will answer the question.
Actually with my tester I could easily do this - assuming Stockfish honors the clear_hash command or whatever it's called. And I think someone said it does.