does low time control testing work?

Discussion of chess software programming and technical issues.

Moderators: hgm, Dann Corbit, Harvey Williamson

DrRibosome
Posts: 19
Joined: Tue Mar 12, 2013 5:31 pm

does low time control testing work?

Post by DrRibosome »

Are short time control test games indicative of long time control game results?

I haven an engine ~2100 elo by CCRL standards, but eval is very untuned. I was wondering about a good testing time control. Ideally I would like to play lots of games against itself (and maybe other engines), which seems to require time controls under a second per move. However, I worry that I might end up tuning an engine that plays to those time controls. Has anyone studied the correlation between low tc results and results under more realistic tc?
PK
Posts: 893
Joined: Mon Jan 15, 2007 11:23 am
Location: Warsza

Re: does low time control testing work?

Post by PK »

This question has been asked several times and there are some systematic tests by prof. Hyatt to back the answer if I'm not mistaken.

Most of the time fast tests are OK, but it depends what You are testing and how stable your engine is. For example things like material values, mobility, passed pawn values are OK to be tested at big speed (depth 5 is enough to get meaningful result). King safety probably needs some more depth to kick in (You need to give a couple of plies to a program to see if it can prepare an attack).

There are also some search changes that require bigger depth. For example, if Your null move reduction changes from 2 plies to 3 plies at depth 6, You need to reach that depth consistently in the test games. If it changes from 3 to 4 at depth 10 - bad luck, You need to reach that depth in order to test that change.

Generally speaking, the more magic is involved in the feature You test, the more need to use different time controls. For really complex stuff (like late move reduction) I sometimes run a series of matches (5s per game + 0.05 s per move / then 10/0.1 / then something like 20/0.2). It helps to see if a change is still beneficial at longer TC.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: does low time control testing work?

Post by Don »

PK wrote:This question has been asked several times and there are some systematic tests by prof. Hyatt to back the answer if I'm not mistaken.

Most of the time fast tests are OK, but it depends what You are testing and how stable your engine is. For example things like material values, mobility, passed pawn values are OK to be tested at big speed (depth 5 is enough to get meaningful result). King safety probably needs some more depth to kick in (You need to give a couple of plies to a program to see if it can prepare an attack).

There are also some search changes that require bigger depth. For example, if Your null move reduction changes from 2 plies to 3 plies at depth 6, You need to reach that depth consistently in the test games. If it changes from 3 to 4 at depth 10 - bad luck, You need to reach that depth in order to test that change.

Generally speaking, the more magic is involved in the feature You test, the more need to use different time controls. For really complex stuff (like late move reduction) I sometimes run a series of matches (5s per game + 0.05 s per move / then 10/0.1 / then something like 20/0.2). It helps to see if a change is still beneficial at longer TC.
That is a good answer.

It's not good for measuring the difference between programs but generally it is fine for self-testing program improvements. If it's too fast, as you already mentioned, you may not be testing all the search features.

We also have seen evidence that not all evaluation weights and features scale the same but that is difficult to measure.

It will probably always be the case that programs scale differently. A program that now plays an entire game in 1 second is like a program of 30 years ago playing at tournament time controls and even back them we were concerned about scaling when testing - thinking that if we could just test with time control a bit longer it would be ok. So it's my feeling that no matter how fast computers get there will always be some programs that play better at 5 seconds than 5 minutes. There is no threshold beyond which it is all the same.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.