fixed nodes testing

Don · Post by **Don** » Wed Jul 20, 2011 10:56 pm

Laskos wrote:
Don wrote:
We do not confine our testing to a single method or time control

I think the first rule of thumb is self-testing with identical time use. Then all others.

Kai

We are hoping to use this to test changes that (should) have little impact on the nodes and speed of the program. We don't know if it will help but we wont' know if we don't study it. We probably would not use this for search changes. Recompiling does affect the speed and thus any test involving time introduces the problem we are trying to at least partially solve.

I don't think our computers are consistent either. I can run a timing 3 times and get a different result each time. I could probably help things a lot by customizing the kernel and turning off most services but we use our computers for many things and I don't want to do this just in the chance it might help some.

bob · Post by **bob** » Wed Jul 20, 2011 11:16 pm

Don wrote:Larry and I are experimenting with doing some of our testing at fixed nodes. It seems that time testing is not giving us very consistent results. Fixed depth testing is very consistent but has it's own problems and issues and is not how program naturally play.

I fixed Komodo up to support fixed node levels and now it does this perfectly, searching EXACTLY the number of nodes that the user sets and stopping.

One problem is that the strong programs that we need for testing have lousy support for this. Some don't have this support at all, and others such as stockfish may exceed the nodes by an order of magnitude or more, or else abort much earlier. Because of this, we seem to be reduced to self-testing and we prefer testing against a variety of opponents.

The UCI standard says, "search x nodes only" so I would like to suggest that the authors implement this level correctly. The next release of Komodo will have this working correctly of course. Komodo does require at least a 1 ply search to complete before it will abort, primarily to ensure that an actual move is returned and the result not be totally random.

Does anyone have any experience with this kind of testing? For the most part I think I am aware of the pros and cons but I would like to hear from others on this.

I have played with it some. One down-side is that you eliminate any time-control influence, which means that things like "fail low" or "easy move" become meaningless. Other issue is how many nodes for each program. The same number will give repeatable results, but probably won't accurately reflect the strength difference between the two programs if one searches way slower, but they both get the same number of nodes.

Another issue is "time skew". If your program speeds up (or slows down) as the game progresses, then you can get bad results. For example, suppose your program is something like Ferret, which had a 4x-5x speedup as the endgame approaches. That speedup disappears in fixed node testing, you just move faster but search the same sized trees. So in testing with a program like that, you lose significantly in the endgame, because your faster speed does not help you in fixed node testing. If your endgame skills depend on that speed gain, you could well tune the wrong way, because you would want to be more aggressive and do what you can in the middlegame, before you get to the endgame where you effectively slow down and lose more games than expected...

Don · Post by **Don** » Wed Jul 20, 2011 11:42 pm

bob wrote:
Don wrote:Larry and I are experimenting with doing some of our testing at fixed nodes. It seems that time testing is not giving us very consistent results. Fixed depth testing is very consistent but has it's own problems and issues and is not how program naturally play.

I fixed Komodo up to support fixed node levels and now it does this perfectly, searching EXACTLY the number of nodes that the user sets and stopping.

One problem is that the strong programs that we need for testing have lousy support for this. Some don't have this support at all, and others such as stockfish may exceed the nodes by an order of magnitude or more, or else abort much earlier. Because of this, we seem to be reduced to self-testing and we prefer testing against a variety of opponents.

The UCI standard says, "search x nodes only" so I would like to suggest that the authors implement this level correctly. The next release of Komodo will have this working correctly of course. Komodo does require at least a 1 ply search to complete before it will abort, primarily to ensure that an actual move is returned and the result not be totally random.

Does anyone have any experience with this kind of testing? For the most part I think I am aware of the pros and cons but I would like to hear from others on this.
I have played with it some. One down-side is that you eliminate any time-control influence, which means that things like "fail low" or "easy move" become meaningless.

I understand this, but it's our hope that we can deal with time control issues separately using time control games. Our plan is to use this sparingly, when we think it makes most sense.

We have easy move and trigger conditions for searching longer, but presumably these have only the most minor interactions with minor evaluation changes for instance. In other words if we measure an improvement we probably don't need to see if the improvement interacts badly with the easy move algorithm but we know it's possible, for instance it can imagine it happening with certain kinds of aggressive extensions.

Other issue is how many nodes for each program. The same number will give repeatable results, but probably won't accurately reflect the strength difference between the two programs if one searches way slower, but they both get the same number of nodes.

We rarely use the tester to measure other programs (or where we stand relative to them.) Our tester allows us to set time control independently for each entity. We can even mix fixed depth with fischer, etc.

So generally we see how some baseline version tests against 3 foreign opponents, then we test other versions of Komodo to compare how they do under the same conditions.

Another issue is "time skew". If your program speeds up (or slows down) as the game progresses, then you can get bad results.

Yes, I am aware of that problem. In real games we spend less time per move on endings. Since we speedup up in the endings, hopefully it won't be too skewed, but I'm sure it won't be the same.

For example, suppose your program is something like Ferret, which had a 4x-5x speedup as the endgame approaches. That speedup disappears in fixed node testing, you just move faster but search the same sized trees. So in testing with a program like that, you lose significantly in the endgame, because your faster speed does not help you in fixed node testing. If your endgame skills depend on that speed gain, you could well tune the wrong way, because you would want to be more aggressive and do what you can in the middlegame, before you get to the endgame where you effectively slow down and lose more games than expected...

I have not done a careful measurement of how much we speed up in the endgame, if any. If it's modest, then as I said it will hopefully not be too far off. But if the speedup is dramatic, it could be a big issue. I'll check on this.

Thanks for the feedback Bob.

michiguel · Post by **michiguel** » Thu Jul 21, 2011 12:23 am

Don wrote:Larry and I are experimenting with doing some of our testing at fixed nodes. It seems that time testing is not giving us very consistent results. Fixed depth testing is very consistent but has it's own problems and issues and is not how program naturally play.

I fixed Komodo up to support fixed node levels and now it does this perfectly, searching EXACTLY the number of nodes that the user sets and stopping.

One problem is that the strong programs that we need for testing have lousy support for this. Some don't have this support at all, and others such as stockfish may exceed the nodes by an order of magnitude or more, or else abort much earlier. Because of this, we seem to be reduced to self-testing and we prefer testing against a variety of opponents.

The UCI standard says, "search x nodes only" so I would like to suggest that the authors implement this level correctly. The next release of Komodo will have this working correctly of course. Komodo does require at least a 1 ply search to complete before it will abort, primarily to ensure that an actual move is returned and the result not be totally random.

Does anyone have any experience with this kind of testing? For the most part I think I am aware of the pros and cons but I would like to hear from others on this.

I advocated this at least a couple of times here. This first one that encouraged other programmers to do this, AFAIK was Uri. When I mentioned it, you did not like better than limited depth testing, though. Of course, it has cons, but the pros outweigh them, by far. You can always confirm later with time limited TC, nodes-limited testing is IMHO, the best when you are screening certain parameters. For instance, you can play REALLY fast games that will be very unreliable with any other method (because of the noise). Sometimes, I self-test with 10knodes per move.

Gaviota supports this, but it is way much weaker than Komodo, so it won't help you. But if Komodo supports this, it will help me

Miguel

BubbaTough · Post by **BubbaTough** » Thu Jul 21, 2011 1:12 am

michiguel wrote:
Don wrote:Larry and I are experimenting with doing some of our testing at fixed nodes. It seems that time testing is not giving us very consistent results. Fixed depth testing is very consistent but has it's own problems and issues and is not how program naturally play.

I fixed Komodo up to support fixed node levels and now it does this perfectly, searching EXACTLY the number of nodes that the user sets and stopping.

One problem is that the strong programs that we need for testing have lousy support for this. Some don't have this support at all, and others such as stockfish may exceed the nodes by an order of magnitude or more, or else abort much earlier. Because of this, we seem to be reduced to self-testing and we prefer testing against a variety of opponents.

The UCI standard says, "search x nodes only" so I would like to suggest that the authors implement this level correctly. The next release of Komodo will have this working correctly of course. Komodo does require at least a 1 ply search to complete before it will abort, primarily to ensure that an actual move is returned and the result not be totally random.

Does anyone have any experience with this kind of testing? For the most part I think I am aware of the pros and cons but I would like to hear from others on this.
I advocated this at least a couple of times here. This first one that encouraged other programmers to do this, AFAIK was Uri. When I mentioned it, you did not like better than limited depth testing, though. Of course, it has cons, but the pros outweigh them, by far. You can always confirm later with time limited TC, nodes-limited testing is IMHO, the best when you are screening certain parameters. For instance, you can play REALLY fast games that will be very unreliable with any other method (because of the noise). Sometimes, I self-test with 10knodes per move.

Gaviota supports this, but it is way much weaker than Komodo, so it won't help you. But if Komodo supports this, it will help me

Miguel

I think Miguel convinced me to use it at least for self tuning during a TCEC chat

. Since I am the one that took it out of Hannibal, I guess I can at least do a speed test before the next release and put it back in assuming it does no harm. Now, all we have to do is improve Hannibal a couple hundred elo so it can be a reasonable test partner.

-Sam

Don · Post by **Don** » Thu Jul 21, 2011 3:34 am

michiguel wrote:
Don wrote:Larry and I are experimenting with doing some of our testing at fixed nodes. It seems that time testing is not giving us very consistent results. Fixed depth testing is very consistent but has it's own problems and issues and is not how program naturally play.

I fixed Komodo up to support fixed node levels and now it does this perfectly, searching EXACTLY the number of nodes that the user sets and stopping.

One problem is that the strong programs that we need for testing have lousy support for this. Some don't have this support at all, and others such as stockfish may exceed the nodes by an order of magnitude or more, or else abort much earlier. Because of this, we seem to be reduced to self-testing and we prefer testing against a variety of opponents.

The UCI standard says, "search x nodes only" so I would like to suggest that the authors implement this level correctly. The next release of Komodo will have this working correctly of course. Komodo does require at least a 1 ply search to complete before it will abort, primarily to ensure that an actual move is returned and the result not be totally random.

Does anyone have any experience with this kind of testing? For the most part I think I am aware of the pros and cons but I would like to hear from others on this.
I advocated this at least a couple of times here. This first one that encouraged other programmers to do this, AFAIK was Uri. When I mentioned it, you did not like better than limited depth testing, though. Of course, it has cons, but the pros outweigh them, by far. You can always confirm later with time limited TC, nodes-limited testing is IMHO, the best when you are screening certain parameters. For instance, you can play REALLY fast games that will be very unreliable with any other method (because of the noise). Sometimes, I self-test with 10knodes per move.

Gaviota supports this, but it is way much weaker than Komodo, so it won't help you. But if Komodo supports this, it will help me

Miguel

Yes, I remember this and this is probably why it was on my mind because it was either you or someone else that said it did not work quite right on Komodo. So I spent a few minute building support for it in my tester too.

Desperado · Post by **Desperado** » Thu Jul 21, 2011 10:39 am

Hello Don,

because i am working mainly on my tester currently, this is a _hot_ topic
for me. I ignored nodecount testing so far. But thinking more closely about
it one could build a _nearly_ 1:1 behaviour compared to time based games.

Why must it be an exact node count ?
A normal time based mechanism works basically like
"currentTime >= timeLimit" which can be translated if you want
into "currentNodeCount >= nodeCountLimit".

The option to let an engine finish iterations is very important, if it likes
to do so also in time based games. This can have significantly influence on
the strength.

So maybe it only would be necessary to give engines an option of
_nodecount resolution_ to check for a stop.

In an more advanced nodcount testing frame, time management
arguements can _also_ be translated very easy. As example
in a fail low situation raise the nodecount limit by an amount of...
(instead the timelimit, self explaining i think).

so, just some ideas...

Michael

edit:

(But it can also be more important first, to get a standard to count nodes...)

Desperado · Post by **Desperado** » Thu Jul 21, 2011 12:04 pm

Meanwhile i was thinking about the following points:

_basic_ concept:
=================

* nodecount resolution is a concrete nodecount number passed as option.

check every _amountOfNodes_ for a stop..., with the given arguements
of my last post.

_improved_ concept (maybe):
===========================

nodeCountLimit = (n/t) * factorRelativNodeCount

A: nodes to search
===================

Komodo: nodeCountLimit = 1500 Kn/s * 0.001 = 1.5K
Nemo : nodeCountLimit = 2500 Kn/s * 0.001 = 2.5K

so Komodo would search 1500 nodes and Nemo would search 2500 nodes.
That makes counting nodes in different ways irrelevant.

B: nodecount resolution
========================

nodeCountResolution <= nodeCountLimit

Komodo: 1500 nodes / 0.1 = 15 n
Nemo : 1500 nodes / 0.1 = 25 n

so komodo checks every 15 nodes and nemo checks every 25 nodes as example.

C: recapitulation
========================

- engines should support 2 options:

* option_1: relative nodecount factor
* option_2: nodecount resolution factor

- independant how nodes are counted!
- avoiding handicaps, relative strength should be equal
- timebased controllers can be translated into nodecount equivalent controllers.

I think people who are interested in building their own testers should
be stimulated now to add _2 more options_.

I cannot await some of you guys rupture my ideas. Let s see

.

Michael

hgm · Post by **hgm** » Thu Jul 21, 2011 12:59 pm

Well, the first mistake of course is that the engine is UCI.

So the first step would be to convert it to WB, and then implement the highly superior nps command. You could then use any time contol you want (so stay away from the highly inefficient fixed per move, and use classical or incremental, thus not bypassing the engine's time control). Just play at 40 moves/min, and send the engine nps 1000 to have it convert that to 40 moves/60000 nodes (so 1500 nodes/move on average).

Desperado · Post by **Desperado** » Thu Jul 21, 2011 2:32 pm

hgm wrote:Well, the first mistake of course is that the engine is UCI. So the first step would be to convert it to WB, and then implement the highly superior nps command. You could then use any time contol you want (so stay away from the highly inefficient fixed per move, and use classical or incremental, thus not bypassing the engine's time control). Just play at 40 moves/min, and send the engine nps 1000 to have it convert that to 40 moves/60000 nodes (so 1500 nodes/move on average).

do you really think that avarageNodes/move timeBased controlled is the
same as averageNodes/move nodecountBased controlled ?
(especially in _ultraFast_ games ? )

My guess is that nodecount resolution will be much more accurate than
time based resolution. How many overhead(nodes) will you have when controlling the interruption with a resolution of 0.001 seconds compared
to check for an interrupt every 40 nodes ? And arguments like _checking
every 40 nodes_ is also overhead, isnt an argument at all because there
is no time limit. So if someone likes he can check 100 times the same
condition without influencing any search result. Well it needs of course
no further explanation, but time based controllers (especially in ultra fast
games) _can_ but must not be a handicap.

Further there will never be a time loss, what can be a big pro comparing engines _real_ strength on ultra fast games,
but always in mind that the behaviour of this nodecount type search is very similar (maybe almost identical) in the most important aspects.

As ever just some first thoughts ...

maybe this is finickiness (hope that is the right word for it), maybe not.

Michael

fixed nodes testing

Re: fixed nodes testing

Re: fixed nodes testing

Re: fixed nodes testing

Re: fixed nodes testing

Re: fixed nodes testing

Re: fixed nodes testing

Re: fixed nodes testing

Re: fixed nodes testing

Re: fixed nodes testing

Re: fixed nodes testing