I do have lots of ideas but I don't have the resources to test them all. I don't want to rely on what I think is good, for example an eval parameter, as it often turns out not good in real games. In order to test my ideas, I really have no choice but to test them.
In one of the threads, I've read that Allard, the author of Bright, tested his engine with a full eval against the one with simplified eval using fixed nodes and it seems to produce close results with running long time controls.
I'm thinking that if the search is similar and only eval is being tuned, fixed nodes testing seems feasible. What are the experts opinion on this one? What are the advantages and disadvantages of this approach?
Testing at fixed nodes to tune eval
Moderator: Ras
-
- Posts: 803
- Joined: Mon Jul 17, 2006 5:53 am
- Full name: Edsel Apostol
-
- Posts: 28359
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Testing at fixed nodes to tune eval
Fixed-node testing, like fixed-time testing, is a bit unnatural, as an engine has no way to allocate extra time when it encounters a move that really needs it. For this reason I developed an alternative, that serves the same purpose, but does not have this disadvantage, in WinBoard: In stead of giving a fixed number of nodes per move, I give a number of nodes per game or per N-move session (N>>1).
The only modification in the engine this needs is that it does not read the clock, but its move count, and convert it to 'virtual seconds'. It can then use all its normal time management, and you can use all normal time-control types (classical, incremental, sudden-death) like you were used to.
If you want to give the engine very few nodes, just use a small factor to convert nodes to seconds. E.g. if you play at 5 (virtual) min + 1 (virtual) sec/move, you can divide the node count by 1000 to derive the virtual seconds. On an engine that runs at 1M nps that would on average make it search 6000 nodes. If at the same TC setting you would divide the node count by 100 in stead, it would have searched an average of 600 nodes.
Bob has pointed out that this might still not fully simulate time-based play, as many engines speed up in terms of nps in the end-game. But this effect is already orders of magnitude smaller than when you would play at fixed depth. And if both engines suffer the same speed-up in the end-game, the average effect on the result might be very small. In fact one could argue for eval testing that we want to have the engines play faster in the end-game.
If you don't want the engine to speed up, and you are testing your own engine, it is easy to equip it with a 'virtual node counter' that gives a more realistic impression of its time use. E.g. if you know a QS node to take only half the time of a full-width node, you could count full-width nodes double, and QS nodes single. If you know a TB probe on the average to take 10 times as long as searching a node, you could increment the virtual-node count by 10 for every TB probe. If you do that careful enough, you should be able to eliminate almost all artifacts, and use the node count as a reliable sub-microsecond internal clock.
The only modification in the engine this needs is that it does not read the clock, but its move count, and convert it to 'virtual seconds'. It can then use all its normal time management, and you can use all normal time-control types (classical, incremental, sudden-death) like you were used to.
If you want to give the engine very few nodes, just use a small factor to convert nodes to seconds. E.g. if you play at 5 (virtual) min + 1 (virtual) sec/move, you can divide the node count by 1000 to derive the virtual seconds. On an engine that runs at 1M nps that would on average make it search 6000 nodes. If at the same TC setting you would divide the node count by 100 in stead, it would have searched an average of 600 nodes.
Bob has pointed out that this might still not fully simulate time-based play, as many engines speed up in terms of nps in the end-game. But this effect is already orders of magnitude smaller than when you would play at fixed depth. And if both engines suffer the same speed-up in the end-game, the average effect on the result might be very small. In fact one could argue for eval testing that we want to have the engines play faster in the end-game.
If you don't want the engine to speed up, and you are testing your own engine, it is easy to equip it with a 'virtual node counter' that gives a more realistic impression of its time use. E.g. if you know a QS node to take only half the time of a full-width node, you could count full-width nodes double, and QS nodes single. If you know a TB probe on the average to take 10 times as long as searching a node, you could increment the virtual-node count by 10 for every TB probe. If you do that careful enough, you should be able to eliminate almost all artifacts, and use the node count as a reliable sub-microsecond internal clock.
-
- Posts: 803
- Joined: Mon Jul 17, 2006 5:53 am
- Full name: Edsel Apostol
Re: Testing at fixed nodes to tune eval
I agree with you on this but I'm willing to ignore that problem as I'm not testing my time management. I will only be doing some tests to see what eval features and parameters are best. As long as they have the same search I think it is safe to do fixed-nodes testing.hgm wrote:Fixed-node testing, like fixed-time testing, is a bit unnatural, as an engine has no way to allocate extra time when it encounters a move that really needs it.
I did some tests having a round robin tournament of my engine versions from a set of starting positions with reverse colors also being played. The only difference of the versions are the different eval terms and features. I noticed that the one that scored best in this tournament also scored best when I run a gauntlet of them against other opponents.
So I think this kind of testing methodology works, at least in my case. I have played only a few hundred of games so my result may not be that accurate.
Other ideas you have mentioned in your post seems to be feasible also but I think it is already too complex as a solution to a simple problem. My opinion is that as long as you eliminate randomness in the test, for example in my test the only difference is in the scoring of eval terms, you could trust the results with confidence.
Edsel Apostol
https://github.com/ed-apostol/InvictusChess
https://github.com/ed-apostol/InvictusChess
-
- Posts: 901
- Joined: Sun Nov 19, 2006 9:16 pm
- Location: Russia
- Full name: Aleks Peshkov
Re: Testing at fixed nodes to tune eval
I think it will be better to collect a set of free engines that implemented node-count parameter.
Each experimental engine should play with each engine from the collection. Their relative strength can be adjusted by node handicap.
Each experimental engine should play with each engine from the collection. Their relative strength can be adjusted by node handicap.
-
- Posts: 5106
- Joined: Tue Apr 29, 2008 4:27 pm
Re: Testing at fixed nodes to tune eval
When I test, I try to focus on a single thing, not the overall program. The overall program and interaction between pieces is important, but that should be studied separately in most cases. The reason is that you cannot easily account for every interaction and you lose understanding.Edsel Apostol wrote:I do have lots of ideas but I don't have the resources to test them all. I don't want to rely on what I think is good, for example an eval parameter, as it often turns out not good in real games. In order to test my ideas, I really have no choice but to test them.
In one of the threads, I've read that Allard, the author of Bright, tested his engine with a full eval against the one with simplified eval using fixed nodes and it seems to produce close results with running long time controls.
I'm thinking that if the search is similar and only eval is being tuned, fixed nodes testing seems feasible. What are the experts opinion on this one? What are the advantages and disadvantages of this approach?
For example, suppose you are just tuning weights in your chess program. Fixed depth testing works well for this - because the information you get back from your testing is not subject to interpretation. Even if you do something with evaluation that slows the search down, you want to KNOW that it improved the evaluation, but slowed the search down. You want to know this because you might be able to do something about it. It's unsatisfying to know that it just made the program weaker in general without knowing why.
Fixed nodes is something I have always championed, but it has problems too. Some pruning algorithms have a large enough impact on nodes per second that you can think you made an improvement when you haven't. If you are into fixed node testing without regard to time, then you can add any amount of evaluation slowdown for free. So you are forced to look at time no matter what you do in my opinion.
But I think it's a reasonable compromise. If you implement a level based on nodes, I recommend treating nodes as time and use your normal time control algorithms as opposed to just stopping cold. But you should also measure in your testing if some change made it take longer to do the same number of nodes.
Another advantage to nodes is that you do not have to worry about how loaded your machine, you can run tests of computers of different speeds and get the same results. When I do this I use one machine as a reference point of time. My tester reports how much time per game each program takes.
-
- Posts: 28359
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Testing at fixed nodes to tune eval
This is exactly what WinBoard / XBoard supports. The only thing you have to do to use it is to have the engine understand the nps N command of WB protocol, and change the routine that normally returns the time to return node counts divided by the given parameter in stead. (If a parameter has been given.)Don wrote:If you implement a level based on nodes, I recommend treating nodes as time and use your normal time control algorithms as opposed to just stopping cold. But you should also measure in your testing if some change made it take longer to do the same number of nodes.
-
- Posts: 5106
- Joined: Tue Apr 29, 2008 4:27 pm
Re: Testing at fixed nodes to tune eval
That's an excellent feature to have.hgm wrote:This is exactly what WinBoard / XBoard supports. The only thing you have to do to use it is to have the engine understand the nps N command of WB protocol, and change the routine that normally returns the time to return node counts divided by the given parameter in stead. (If a parameter has been given.)Don wrote:If you implement a level based on nodes, I recommend treating nodes as time and use your normal time control algorithms as opposed to just stopping cold. But you should also measure in your testing if some change made it take longer to do the same number of nodes.
A feature I've always thought would be nice is a skittle mode - where the program adjusts it's time to it's opponent. My desire for this goes back to the days when programs were pretty weak. A strong player would offer to play my program, and when negotiating a time control the player would say, "Just make it play fast, I'll promise to play fast and match it."
But that never happened. The computer would get an advantage, then the human would slow down to a crawl. He would take whatever time he needed to beat the computer, while it was playing instantly.
I never implemented this, but I aways wanted to see it. For skittles mode you start with a pool of time such as would produce relatively fast play, such as 2 minutes. The total amount of time the human uses is always added to this pool and the computer uses some fraction of time from this pool such as 1/20 or so. I would put some kind of limit to the maximum time charged against the human since this is skittles and the opponent may have walked away from the board for a while to take a break. This could be configurable with 2 parameters, starting_pool_of_time and max_time_charged.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Testing at fixed nodes to tune eval
I did this years ago in Crafty, but there is a way to exploit it that made me get rid of it. I take a long time to work out a tactical plan that is complex, but not fool-proof. I then start moving very quickly since i have that worked out, and the program now moves quickly and fails to see just what it is voluntarily walking into until it is too late... I did this back in the days where opponents could almost trivially beat every program on ICC (ICS at the time) by simply playing very quickly and running the program out of time.Don wrote:That's an excellent feature to have.hgm wrote:This is exactly what WinBoard / XBoard supports. The only thing you have to do to use it is to have the engine understand the nps N command of WB protocol, and change the routine that normally returns the time to return node counts divided by the given parameter in stead. (If a parameter has been given.)Don wrote:If you implement a level based on nodes, I recommend treating nodes as time and use your normal time control algorithms as opposed to just stopping cold. But you should also measure in your testing if some change made it take longer to do the same number of nodes.
A feature I've always thought would be nice is a skittle mode - where the program adjusts it's time to it's opponent. My desire for this goes back to the days when programs were pretty weak. A strong player would offer to play my program, and when negotiating a time control the player would say, "Just make it play fast, I'll promise to play fast and match it."
But that never happened. The computer would get an advantage, then the human would slow down to a crawl. He would take whatever time he needed to beat the computer, while it was playing instantly.
I never implemented this, but I aways wanted to see it. For skittles mode you start with a pool of time such as would produce relatively fast play, such as 2 minutes. The total amount of time the human uses is always added to this pool and the computer uses some fraction of time from this pool such as 1/20 or so. I would put some kind of limit to the maximum time charged against the human since this is skittles and the opponent may have walked away from the board for a while to take a break. This could be configurable with 2 parameters, starting_pool_of_time and max_time_charged.
I found it better to just "play chess" and _always_ use a clock, and let things go beyond that point...
-
- Posts: 5106
- Joined: Tue Apr 29, 2008 4:27 pm
Re: Testing at fixed nodes to tune eval
Yes, but when the human takes a lot of time to work out a plan, he is adding a lot of time to the computer. The only way this really works in any consistent way is if the human is already winning, and we are assuming that is not the case. The human spends a lot of time to work out something "tricky" which will make the computer blunder - a risky strategy even years ago. If a player back then was skilled enough to make this work, he probably was a better player anyway and I believe tricks like these are counter-productive.bob wrote:I did this years ago in Crafty, but there is a way to exploit it that made me get rid of it. I take a long time to work out a tactical plan that is complex, but not fool-proof. I then start moving very quickly since i have that worked out, and the program now moves quickly and fails to see just what it is voluntarily walking into until it is too late... I did this back in the days where opponents could almost trivially beat every program on ICC (ICS at the time) by simply playing very quickly and running the program out of time.Don wrote:That's an excellent feature to have.hgm wrote:This is exactly what WinBoard / XBoard supports. The only thing you have to do to use it is to have the engine understand the nps N command of WB protocol, and change the routine that normally returns the time to return node counts divided by the given parameter in stead. (If a parameter has been given.)Don wrote:If you implement a level based on nodes, I recommend treating nodes as time and use your normal time control algorithms as opposed to just stopping cold. But you should also measure in your testing if some change made it take longer to do the same number of nodes.
A feature I've always thought would be nice is a skittle mode - where the program adjusts it's time to it's opponent. My desire for this goes back to the days when programs were pretty weak. A strong player would offer to play my program, and when negotiating a time control the player would say, "Just make it play fast, I'll promise to play fast and match it."
But that never happened. The computer would get an advantage, then the human would slow down to a crawl. He would take whatever time he needed to beat the computer, while it was playing instantly.
I never implemented this, but I aways wanted to see it. For skittles mode you start with a pool of time such as would produce relatively fast play, such as 2 minutes. The total amount of time the human uses is always added to this pool and the computer uses some fraction of time from this pool such as 1/20 or so. I would put some kind of limit to the maximum time charged against the human since this is skittles and the opponent may have walked away from the board for a while to take a break. This could be configurable with 2 parameters, starting_pool_of_time and max_time_charged.
I found it better to just "play chess" and _always_ use a clock, and let things go beyond that point...
I agree with you about just using clocks if you can get the human opponent to do that. But sometimes they don't want to because some people (at least back then) felt pressured by a clock and didn't want a formal match - just a friendly game.
And the issue I wanted to specifically address is where the human with all good intentions to play fast gets into trouble and starts playing slow. In almost every case, once he started playing slowly he continued to play slowly.
Of course when the computer starts playing slower, this slows the human down even more. A trick good players use in tournaments when they are losing is to play fast like you are giving up - you can usually sucker the opponent into also playing fast and hope for a blunder. Mature players usually don't fall for this however.
I'm not so concerned with skittles mode these days, as humans rarely can beat decent programs.
Re: Testing at fixed nodes to tune eval
Using a tactic like that to beat a good engine is still very hard since engines see many many plies in a fraction of a second. Surely it was worse back when you tried it when people didn't have trouble beating the engines in the first place.bob wrote:I did this years ago in Crafty, but there is a way to exploit it that made me get rid of it. I take a long time to work out a tactical plan that is complex, but not fool-proof. I then start moving very quickly since i have that worked out, and the program now moves quickly and fails to see just what it is voluntarily walking into until it is too late... I did this back in the days where opponents could almost trivially beat every program on ICC (ICS at the time) by simply playing very quickly and running the program out of time.
I found it better to just "play chess" and _always_ use a clock, and let things go beyond that point...
I find it rather interesting as a mean to force the player to play rather quickly, without the risk of simply losing on time. The penalty you get for playing slow is facing stronger moves.