fast vs slow games in testing

bob · Post by **bob** » Tue Apr 14, 2009 6:22 pm

rjgibert wrote:
bob wrote:This has been discussed a good bit, but I just ran into a case that I thought might be interesting. I slightly modified the default piece values. Original values were P=1.0, N/B=3.25, R=5.0 and Q=9.7. I changed the R/Q values to 5.5 and 10.7. And ran a couple of fast game tests (32,000 games per run) at 10s+0.1s time controls. And found that this was worth about +10 Elo. I then re-ran the same test, but changed the time control to 5m+5s (much slower). Took a lot longer, but the interesting thing was this was a -20 Elo change. Nothing else changed between the two versions, both runs were 32,000 games against the usual opponents and positions.

We've had this discussion in the past where some claim that they've never seen a case where a program was better at fast games than at slow games or vice-versa. Here is a simple change that produces exactly that. Version A (original) is 10 Elo weaker than version B (new material scores) at very fast games. But version A is 20 Elo stronger at longer games. A 30 Elo change.

Goes to show that just fast games is not enough.
I think the issue isn't just whether or not this can happen, but also if it does, why? I suspect at least 2 things may be going on here.

One, deeper search can offset small errors by a program. Two, some eval changes can result in a speed up or slow down of search.

Perhaps greater search depth has more importance at faster time controls relatively, while better eval has relatively greater importance at slower time controls.

Is there a significant difference in time to depth between the 2 versions? If not, something else is going on that is maybe more subtle.

The changes I tested make no difference in execution speed. In particular, just changing the material values makes no difference whatsoever in speed. Some test positions will go a bit faster, some a bit slower, because changing the material values will change the tree itself...

I have always known humans that exhibited this same behaviour. Somehow their snap-judgement is better than another player, where in longer games the two are pretty equal. I used to watch IM Mike Valvo play Danny Kopec time-odds matches and win most of them.

In fast games, it might well be that making the queen more valuable tends to keep in on the board longer, which lets a little faster search find better ways to infiltrate, whereas in longer games, this tactical advantage is offset by other things the opponent does.

I just re-ran the test again, with the same results. +10 at 10s+0.1s inc, -20 at 5m+5s.

bob · Post by **bob** » Tue Apr 14, 2009 6:26 pm

MattieShoes wrote:With more datapoints, I suppose it'd be possible to construct a regression of optimal piece values based on TC*speed. If there's a 30 Elo difference with just those two settings, having piece values tuned to TC*speed could show a fairly significant strength jump, yes?

On a related note, have you experimented with changing piece values as the game progresses? Just from watching games, it seems to become much more important to have at least the same number of pieces since you eventually run out of pieces to protect your pawns...

yes, but probably not how you are thinking about it. In Crafty, a piece's value is

val = material_value + positional_score

where material_value is static but positional_score is not. So the actual value of the piece varies all over the place depending on the position, the remaining pieces, king safety, pawn structure, etc.

Hmm, that'd be another interesting thing, adjusting piece values based on the number of pawns you have left... The most obvious would be something like KNP vs KB or something -- the pawn is worth far more than the knight at that point. Search would find this but correct default values based on situation could perhaps help in less obvious situations.

That's the Kaufman idea. We've tried this but had no success at all.

MattieShoes · Post by **MattieShoes** » Tue Apr 14, 2009 6:50 pm

Heh, nothing new under the sun indeed

I imagine all engines include some positional score -- pcsq tables, perhaps mobility, perhaps weighted mobility... I was considering ways to change the positional score based on the phase of the game, like phasing from middle to endgame pcsq tables, and so forth. It just occurred to me that one could tackle the problem from the other side and adjust the base value directly rather than the positional score based on game state. For instance, if there are passers on both flanks, bishops probably increase in value relative to knights. If your opponent has a passer, your bishop that controls their promotion square would be slightly more valuable than otherwise. I'm sure this all been hashed out ad nauseum, but I'm having fun rediscovering the wheel

The thing that's annoying me is, lets say I want to increase the value of a bishop that controls the promotion square of an enemy passer. How much do I increase it? I don't have the facilities to simply run tens of thousands of games with different numbers to actually get the answer, so I've been trying to figure out other ways and nothing seems much better than the old "hmm, this feels about right..." method.

I played with the adjusteval program that came with gradualtest (it will send commands to iterate through possible values, then test them against an EPD and in theory just maybe give you a reasonable approximate value) but I can't seem to get it to work at all. It comes up with the same results every time and as far as I can tell, you can't produce debug output. Maybe I'll have to revisit writing my own eval tuner.

bob · Post by **bob** » Tue Apr 14, 2009 8:07 pm

MattieShoes wrote:Heh, nothing new under the sun indeed

I imagine all engines include some positional score -- pcsq tables, perhaps mobility, perhaps weighted mobility... I was considering ways to change the positional score based on the phase of the game, like phasing from middle to endgame pcsq tables, and so forth. It just occurred to me that one could tackle the problem from the other side and adjust the base value directly rather than the positional score based on game state. For instance, if there are passers on both flanks, bishops probably increase in value relative to knights. If your opponent has a passer, your bishop that controls their promotion square would be slightly more valuable than otherwise. I'm sure this all been hashed out ad nauseum, but I'm having fun rediscovering the wheel

The thing that's annoying me is, lets say I want to increase the value of a bishop that controls the promotion square of an enemy passer. How much do I increase it? I don't have the facilities to simply run tens of thousands of games with different numbers to actually get the answer, so I've been trying to figure out other ways and nothing seems much better than the old "hmm, this feels about right..." method.

I played with the adjusteval program that came with gradualtest (it will send commands to iterate through possible values, then test them against an EPD and in theory just maybe give you a reasonable approximate value) but I can't seem to get it to work at all. It comes up with the same results every time and as far as I can tell, you can't produce debug output. Maybe I'll have to revisit writing my own eval tuner.

Tuning is a project, to say the least. For many years, I simply analyzed long games and then did manual tuning to eliminate a positional mistake. For example, blockading a passed pawn. The quicker you blockade, the better, in that you have more move before it potentially promotes if you want to move the blockader for some attacking opportunity. I always went for the smallest bonus possible to stop the problematic move from being played, in the hope that this change had the least effect overall on other positions that are different in some way. It is not easy. In fact, it is not easy with a cluster as different time controls change the effect of eval or search changes as well, which is yet another pain.

The best testing is to test at the exact time control you are going to play normally, knowing that you are going to hurt performance somewhat (possibly) in other time controls.

And none of that is easy as long time controls make for unreasonable time requirements to complete the testing.

krazyken · Post by **krazyken** » Tue Apr 14, 2009 8:37 pm

I would expect that longer time controls reduce the variability you would get in results (especially if you are avoiding randomization provided by the opening book). Thus I'd suspect the actual number of games needed to show a difference would be smaller with longer time controls.

Gian-Carlo Pascutto · Post by **Gian-Carlo Pascutto** » Tue Apr 14, 2009 8:38 pm

krazyken wrote:I would expect that longer time controls reduce the variability you would get in results (especially if you are avoiding randomization provided by the opening book).

Why?

krazyken · Post by **krazyken** » Tue Apr 14, 2009 8:41 pm

Gian-Carlo Pascutto wrote:
krazyken wrote:I would expect that longer time controls reduce the variability you would get in results (especially if you are avoiding randomization provided by the opening book).
Why?

how many sources of randomness do you have in your chess program? Isn't the main reason you switch moves is from deeper search finding something better? as your search goes deeper you approach a point of diminishing returns, less likely to change moves I would think.

bob · Post by **bob** » Tue Apr 14, 2009 8:59 pm

krazyken wrote:I would expect that longer time controls reduce the variability you would get in results (especially if you are avoiding randomization provided by the opening book). Thus I'd suspect the actual number of games needed to show a difference would be smaller with longer time controls.

Unfortunately you would be wrong.

From a ton of prior testing...

bob · Post by **bob** » Tue Apr 14, 2009 9:00 pm

krazyken wrote:
Gian-Carlo Pascutto wrote:
krazyken wrote:I would expect that longer time controls reduce the variability you would get in results (especially if you are avoiding randomization provided by the opening book).
Why?
how many sources of randomness do you have in your chess program? Isn't the main reason you switch moves is from deeper search finding something better? as your search goes deeper you approach a point of diminishing returns, less likely to change moves I would think.

Every move you make is the result of a tree that can vary somewhat in size due to timing variables. I've tried the same position played 100 times at fast and at slow time controls, and both exhibit extreme variance, not the same result over and over...

krazyken · Post by **krazyken** » Tue Apr 14, 2009 9:08 pm

bob wrote:
krazyken wrote:
Gian-Carlo Pascutto wrote:
krazyken wrote:I would expect that longer time controls reduce the variability you would get in results (especially if you are avoiding randomization provided by the opening book).
Why?
how many sources of randomness do you have in your chess program? Isn't the main reason you switch moves is from deeper search finding something better? as your search goes deeper you approach a point of diminishing returns, less likely to change moves I would think.
Every move you make is the result of a tree that can vary somewhat in size due to timing variables. I've tried the same position played 100 times at fast and at slow time controls, and both exhibit extreme variance, not the same result over and over...

I guess I'm wrong then. Although I'm not sure of your definition of fast and slow. I'm going from my experience of analyzing games with engine assistance, usually the engine will lock on a move and stick with it after an amount of time, I'd say somewhere around a minute a move on fast hardware.

edit: I'd love to see the results if you have time to run the test again. Something like pick a set of quiet positions all with several possible good moves. run each position for 1 sec second 1000 times recording the moves picked and the number of times picked. repeat for 5, 10, 30, 60, and 120 secs to see if there is convergence.

fast vs slow games in testing

Re: fast vs slow games in testing

Re: fast vs slow games in testing

Re: fast vs slow games in testing

Re: fast vs slow games in testing

Re: fast vs slow games in testing

Re: fast vs slow games in testing

Re: fast vs slow games in testing

Re: fast vs slow games in testing

Re: fast vs slow games in testing

Re: fast vs slow games in testing