Quick Performance Test

Discussion of chess software programming and technical issues.

Moderator: Ras

Uri Blass
Posts: 11152
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Quick Performance Test

Post by Uri Blass »

Don wrote:
Henk wrote:If tuning is done by a tuner (algorithm) you can also try to find a better tuning algorithm. Or try to constrain the parameters to be tuned to a smaller domain. I guess not all combinations are allowed.
There has been progress, but nobody has come up with anything better than just playing games for testing a single change. One optimization algorithm that has some merit is CLOP, but guess what? It's based on playing thousands of games.

Consider this. To measure a small ELO change, you much play thousands of games. This is using the most direct measure possible, playing actual games.

Now is it reasonable to expect that you could measure this just as accurately using some indirect methods that requires far less effort?

Of course it isn't. If you want to explore this further, try to figure out how to get more out of your testing procedure. CLOP is one such way and HG proposed another methods called orthogonal multi-tuning. Both of these are based on playing games but try to squeeze more information out of those games.
You may need many games for most changes but
it is dependent on the change and there are changes when it is better not to use games to test them.

For example if you do some change that gives 1% speed improvement then I think that playing many games to prove it may be a waste of time and you can simply see that the program get the same number of nodes slightly faster in searching to a fixed depth in many positions.

Another example is that you fix some bug that you know that is relevant only in some rare endgame.

The speed after fixing the bug is the same and you simply change evaluation that is not correct for some tablebases position to correct evaluation in some rare cases.

Again my opinion is that testing it in many games is a waste of time.
Henk
Posts: 7251
Joined: Mon May 27, 2013 10:31 am

Re: Quick Performance Test

Post by Henk »

So if testing is slow, slow down the generation. I mean only test promising changes. Spending so much computer power for minor changes I would not do that. Even if you can afford it. But I'm in a different position.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Quick Performance Test

Post by Don »

Uri Blass wrote:
Don wrote:
Henk wrote:If tuning is done by a tuner (algorithm) you can also try to find a better tuning algorithm. Or try to constrain the parameters to be tuned to a smaller domain. I guess not all combinations are allowed.
There has been progress, but nobody has come up with anything better than just playing games for testing a single change. One optimization algorithm that has some merit is CLOP, but guess what? It's based on playing thousands of games.

Consider this. To measure a small ELO change, you much play thousands of games. This is using the most direct measure possible, playing actual games.

Now is it reasonable to expect that you could measure this just as accurately using some indirect methods that requires far less effort?

Of course it isn't. If you want to explore this further, try to figure out how to get more out of your testing procedure. CLOP is one such way and HG proposed another methods called orthogonal multi-tuning. Both of these are based on playing games but try to squeeze more information out of those games.
You may need many games for most changes but
it is dependent on the change and there are changes when it is better not to use games to test them.

For example if you do some change that gives 1% speed improvement then I think that playing many games to prove it may be a waste of time and you can simply see that the program get the same number of nodes slightly faster in searching to a fixed depth in many positions.

Another example is that you fix some bug that you know that is relevant only in some rare endgame.

The speed after fixing the bug is the same and you simply change evaluation that is not correct for some tablebases position to correct evaluation in some rare cases.

Again my opinion is that testing it in many games is a waste of time.
Naturally we would never play 50,000 games to test a free 1% speedup. I'm only talking about interesting changes where we don't already know in advance that it's an improvement. And that is probably 98% of the tests we do.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Quick Performance Test

Post by Don »

Henk wrote:So if testing is slow, slow down the generation. I mean only test promising changes. Spending so much computer power for minor changes I would not do that. Even if you can afford it. But I'm in a different position.
We do not make any changes that we do not test but we follow the basic principle you outline here by not MAKING changes that we don't' consider promising.

We are in a position where we don't have promising 50 ELO ideas though.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
User avatar
Andres Valverde
Posts: 596
Joined: Sun Feb 18, 2007 11:07 pm
Location: Almeria. SPAIN
Full name: Andres Valverde Toresano

Re: Quick Performance Test

Post by Andres Valverde »

Don wrote: Right now I am running a test at a time control of 3s+0.3 fischer and getting about 36 games per minute using a 6 core i7 and utilizing all 6 cores.
At that time control an average game could last (60 moves x 0.3 secs/move+ 3 secs) x 2 = 42 secs if not more. This is 1.42 games/min/core, or 8.57 games/min using 6 cores. Am I missing something?
Saludos, Andres
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Quick Performance Test

Post by Don »

Andres Valverde wrote:
Don wrote: Right now I am running a test at a time control of 3s+0.3 fischer and getting about 36 games per minute using a 6 core i7 and utilizing all 6 cores.
At that time control an average game could last (60 moves x 0.3 secs/move+ 3 secs) x 2 = 42 secs if not more. This is 1.42 games/min/core, or 8.57 games/min using 6 cores. Am I missing something?
It was a misprint. The test is actually 3s + 0.03s not 3s+0.3

Don
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
lucasart
Posts: 3243
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: Quick Performance Test

Post by lucasart »

Henk wrote:Developing a program means generate and test.
If generating is fast testing should be fast too. Is there a quick way to test whether your chess program has improved ?
[/code]
By "generate" you mean writing code ?
Henk wrote: Search depth and number of nodes does not say much.
In general no, but it depends. In the early stages of developpement when you have a basic PVS with no search instabilities, then the more you improve your time to depth the more you improve your engine. This becomes less and less clear as you start to introduce some "tradeoffs".
Henk wrote: Playing games takes much time.
Setting up a database with chess positions is a lot of developing effort.
Maybe, but it's the only way. There are two kinds of code patches:

1/ non functional patches: here you need to test for bugs. you simply setup a deterministic run of a few positions at a given depth, and look at the number of nodes. if it hasn't changed, then you are almost sure that no bug was introduced.

2/ functional changes: you have to play games. there's no otherway to find out if your engine is stronger than to play games. forget about the "database of chess positions". this is not necessarly time consuming (you can find an EPD from an external source) but it's completely the wrong way of tuning your engine. if you tune your engine to solve chess problems you are optimizing for the wrong target. playing games is not that bad. here's my testing methodology that I apply with an 8 core machine (I play 7 games concurrently, as 8 introduces a very measurable amount of noise):
* 5000 games in 5"+0.05": if result is < 50% (regardless of error bar) I stop here and reject the patch. that test takes on average (with aggressive draw and resign adjudication in cutechess-cli) 2*(5"+0.05"*60)/7*5000 = 11429" = 3.17h
* 10000 games in 10"+0.1" (if the previous pre-selection test was successful). That one takes 2*(10"+0.1"*60)/7*10000 = 12.7h. I just leave it overnight or start it in the morning before going to work.
If you use git and create branches for your test patches, it forces you to be organized, and that really pays in terms of productivity. While the test is running I create other branches. Sometimes I stop tests early when they are clearly not going well, and I kill the branch and run the next one etc. Eventually one branch is good and I merge it into the master branch etc.

In short: NO there is no easy way.

PS: SPRT is provably the best method, but the cutechess-cli version uses a hardcoded drawelo instead of estimating it out of sample. I need to hack cutechess-cli before I can use it (drawelo makes a big impact on stopping time).
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
tpetzke
Posts: 686
Joined: Thu Mar 03, 2011 4:57 pm
Location: Germany

Re: Quick Performance Test

Post by tpetzke »

There are many algorithms for finding local maxima. But global maxima: I know only simulated annealing and branch and bound. Well if testing is the bottleneck I think I certainly can forget neural networks.
What have neural networks to do with a possible testing bottleneck.

With testing you change a single small thing in your program and then run a lot of games to see if the change was good.

If you use an optimization algorithm this algorithm takes some time to produce a result. I'm currently use a genetic algorithm and it takes about 2 weeks to converge to a solution. At the end I also perform a single test whether the produced solution is stronger than my original solution.

But I have not changed only a small thing in my program, I assign new values to all of my evaluation parameters at the same time. Others use CLOP for that, but I rather use my own stuff.

Thomas...