hi,
suppose You have made two changes to a chess playing program, each of them showing, say, a 20 Elo increase in strength. now You test version with those two accepted changes against version without them. what result, in terms of Elo, do You expect? I have never got anything near to the sum of the gains so far.
testing: Elo gain out of two changes
Moderator: Ras
-
- Posts: 908
- Joined: Mon Jan 15, 2007 11:23 am
- Location: Warsza
-
- Posts: 454
- Joined: Sat Apr 04, 2009 6:44 pm
- Location: Bulgaria
Re: testing: Elo gain out of two changes
Regardless that you may consider it as a bizarre situation, sometimes, combining "2 changes" could end up worst than any of them separately.
This is not an exact science (still).
This is not an exact science (still).
-
- Posts: 3026
- Joined: Wed Mar 08, 2006 9:57 pm
- Location: Rio de Janeiro, Brazil
Re: testing: Elo gain out of two changes
Unless they overlap, meaning one interferes with another, I don't see how that is possible.Mincho Georgiev wrote:Regardless that you may consider it as a bizarre situation, sometimes, combining "2 changes" could end up worst than any of them separately.
This is not an exact science (still).
"Tactics are the bricks and sticks that make up a game, but positional play is the architectural blueprint."
-
- Posts: 454
- Joined: Sat Apr 04, 2009 6:44 pm
- Location: Bulgaria
Re: testing: Elo gain out of two changes
Exactly!Albert Silver wrote:Unless they overlap, meaning one interferes with another, I don't see how that is possible.Mincho Georgiev wrote:Regardless that you may consider it as a bizarre situation, sometimes, combining "2 changes" could end up worst than any of them separately.
This is not an exact science (still).
The way that question was asked just doesn't exclude that worst case scenario. i.e. almost no information was given.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: testing: Elo gain out of two changes
Depends. As an example, last year I measured the gain for Null-move and LMR. I took a baseline version of Crafty with both disabled, then enabled Null-move only and got +80. I then disabled NM again and enabled LMR and also got +80. For the final test I enabled both, and got +120. In thinking about it, the two methodologies have significant overlap, since both are using reduced-depth searches to speed things up.PK wrote:hi,
suppose You have made two changes to a chess playing program, each of them showing, say, a 20 Elo increase in strength. now You test version with those two accepted changes against version without them. what result, in terms of Elo, do You expect? I have never got anything near to the sum of the gains so far.
If you have two changes, A and B, that are totally independent, and each produces +20, you should get +40 when you test both together. However, quite often, there is unexpected interaction between the two that will cause you to get less than this.
We have seen both extremes in our cluster testing, where A+B is _more_ than what we get if we test A and then B and add the improvements. We have also gotten less. And if you are not careful, you can add either A or B and get +20, and when you add them both you still get just +20 (or even less).
It depends on how "connected" the two terms happen to be. I always try to test just one change at a time, and if good, that becomes the new baseline for testing the next change.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: testing: Elo gain out of two changes
One term can pull you toward king attacks. The other term could make you more defense-minded by having better pawn structure. Added together you can become way too aggressive or way too passive. It is not that uncommon..Albert Silver wrote:Unless they overlap, meaning one interferes with another, I don't see how that is possible.Mincho Georgiev wrote:Regardless that you may consider it as a bizarre situation, sometimes, combining "2 changes" could end up worst than any of them separately.
This is not an exact science (still).
-
- Posts: 454
- Joined: Sat Apr 04, 2009 6:44 pm
- Location: Bulgaria
Re: testing: Elo gain out of two changes
Besides, let's not forget something else. Too few people are capable of making tests with the density of yours. For the rest of us, that lacks the hardware, +20 is not really +20 (it could be even +5 or -5) in probably 75% of the tests, so no wonder, if after combining the two changes, the result becomes highly unexpected.bob wrote:Depends. As an example, last year I measured the gain for Null-move and LMR. I took a baseline version of Crafty with both disabled, then enabled Null-move only and got +80. I then disabled NM again and enabled LMR and also got +80. For the final test I enabled both, and got +120. In thinking about it, the two methodologies have significant overlap, since both are using reduced-depth searches to speed things up.PK wrote:hi,
suppose You have made two changes to a chess playing program, each of them showing, say, a 20 Elo increase in strength. now You test version with those two accepted changes against version without them. what result, in terms of Elo, do You expect? I have never got anything near to the sum of the gains so far.
If you have two changes, A and B, that are totally independent, and each produces +20, you should get +40 when you test both together. However, quite often, there is unexpected interaction between the two that will cause you to get less than this.
We have seen both extremes in our cluster testing, where A+B is _more_ than what we get if we test A and then B and add the improvements. We have also gotten less. And if you are not careful, you can add either A or B and get +20, and when you add them both you still get just +20 (or even less).
It depends on how "connected" the two terms happen to be. I always try to test just one change at a time, and if good, that becomes the new baseline for testing the next change.