The idea is sound. The implementation details, are difficult. You have to be certain that there is no correlation between the two things you are testing, so that you can get away with the reduced number of games. If you are making a search change, and an eval change, you might well get away with this. But I do not do that kind of test very often. I'm either working on search changes, or testing eval changes. But I don't do both at the same time. Makes life far simpler.Don wrote:I suppose it's in the interpretation. I would view this as a single change or if you prefer a single "compound change" because it's all part of the same thing.Mangar wrote:Hi,bob wrote:My comment is simply that such a thing is a result of poor testing, because one should _never_ test multiple changes at the same time.
this is not allways true. There might be single changes that don´t improve but a combination of them that improves.
Currently I had the situation, that 4 changes in reduction/extension tested seperately didn´t bring anything but alltogether gained a good amount of elo. I think that testing only single issues gives you a high chance to starve in a lokal maximum.
Greetings Volker
You and Bob bring up an interesting issue - can changes be tested in combination? H.G. Muller suggested something called orthogonal multi-tester many months (perhaps years) ago.
It may be that you CAN combine changes if you set up your testing accordingly, but you are still testing them individually as it is required that you separate them. You would not test 2 separate things combined into a single change unless you were convinced that it makes sense and you were looking specifically for interactions - but you could even do that with multi-testing. You could look for pair-wise interactions of everything you test for that matter.
Singular Extensions
Moderator: Ras
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Singular Extensions - long games
-
- Posts: 65
- Joined: Thu Jul 08, 2010 9:16 am
Re: Singular Extensions - long games
Hi Bob,
IMHO most things in search/eval are related and its hard to see how strong they are related. There is a high risk of missjudgement of this relation.
I expect that one of the main reason why newly written chess engines are able to get this strong because there is a strong relation between different search terms and eval coming from pruning techniques like lmr.
All engines that had been much optimized with a search and eval without lmr have a huge drawback as this optimization is contraproductive. I thinks that there is no way to optimize toward a lmr search only by proving one change after the other.
Greetings Volker
IMHO most things in search/eval are related and its hard to see how strong they are related. There is a high risk of missjudgement of this relation.
I expect that one of the main reason why newly written chess engines are able to get this strong because there is a strong relation between different search terms and eval coming from pruning techniques like lmr.
All engines that had been much optimized with a search and eval without lmr have a huge drawback as this optimization is contraproductive. I thinks that there is no way to optimize toward a lmr search only by proving one change after the other.
Greetings Volker
Mangar Spike Chess
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Singular Extensions - long games
There is certainly some correlation. But once an engine is running, I can't imagine adding feature A and you get nothing. then adding B and you get nothing. And then adding A+B and you get something significant. Of course if the two changes are directly related, this would happen... in one place you evaluate pawn structure, then you use this information in another place. But those are obviously part of one big change.Mangar wrote:Hi Bob,
IMHO most things in search/eval are related and its hard to see how strong they are related. There is a high risk of missjudgement of this relation.
I expect that one of the main reason why newly written chess engines are able to get this strong because there is a strong relation between different search terms and eval coming from pruning techniques like lmr.
All engines that had been much optimized with a search and eval without lmr have a huge drawback as this optimization is contraproductive. I thinks that there is no way to optimize toward a lmr search only by proving one change after the other.
Greetings Volker
But I can't visualize how +A is no better, +B is no better, but +AB is significantly better.
-
- Posts: 65
- Joined: Thu Jul 08, 2010 9:16 am
Re: Singular Extensions - long games
Hi,
in "my" case the four changes had been changes to move reduction. The basic research was why stockfish searches that much deeper than spike. I tested "static nullmove pruning", "value based pruning" a more agressive late move reduction even at root and much less extensions. Every single change gave a drop of 10-20 elo in Spike. Alltogether they gave me a gain of about 60 elo.
My tests are far away from perfect. A version is usually tested with 1200 games (50 different positions, 12 opponent engines, 60s + 1s time control). The +60 elo are (for me) proven as I have allready tested about 20 single changes on top of the mentioned 4 changes (all of them with 1200 games) that got about the same elo range.
Greetings Volker
in "my" case the four changes had been changes to move reduction. The basic research was why stockfish searches that much deeper than spike. I tested "static nullmove pruning", "value based pruning" a more agressive late move reduction even at root and much less extensions. Every single change gave a drop of 10-20 elo in Spike. Alltogether they gave me a gain of about 60 elo.
My tests are far away from perfect. A version is usually tested with 1200 games (50 different positions, 12 opponent engines, 60s + 1s time control). The +60 elo are (for me) proven as I have allready tested about 20 single changes on top of the mentioned 4 changes (all of them with 1200 games) that got about the same elo range.
Greetings Volker
Mangar Spike Chess
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: Singular Extensions - long games
This pretty much sounds like a case of not enough games. All of your changes look positively correlated . You need to have something like an extension and a reduction together (i.e negatively correlated). One increasing depth the other reducing it so the optimum could be anywhere. If they all go the same way , you could have tuned each and every change with the correct parameters to get a benefit out of it. Also 1200 games if far too less.
Note that the factor you use during the combination, and individual testing are different. For example if you tested with a factor of 1 for each of 4 combined tests, then when you use them together the effect will be roughly 4x. Assuming the combination is a success, using a factor of 4x for each individual test could give you a boost. Hope I am clear enough.
Note that the factor you use during the combination, and individual testing are different. For example if you tested with a factor of 1 for each of 4 combined tests, then when you use them together the effect will be roughly 4x. Assuming the combination is a success, using a factor of 4x for each individual test could give you a boost. Hope I am clear enough.
-
- Posts: 5106
- Joined: Tue Apr 29, 2008 4:27 pm
Re: Singular Extensions - long games
My mistake, yes it was something Ralph said. I take back all those bad things I said about you.Daniel Shawul wrote:What a stupid thing to say! If you can't read properly what people posted, just stop posting. I never ever said anything like that..
What I'm saying is that all you do is talk. That makes me believe it won't matter how we do our testing, you would find it flawed and always be able to produce some reason why it's not the way you think it should be.
You claim that Bob mysteriously stopped his test, etc. Of course it was suspicious since it did not match what you expected.
Recheck the threads and maybe you will find something like that in Ralph's post.
I have been running the singular test on Komodo and the results are not so hot after all. It shows as a small net improvement only.
Here is what I get:
Code: Select all
Total games played: 4324
Total this run: 4324
Matches in progress: 4
PGN file: sing.pgn
Total run time: 75:38:48
Games per minute: 0.95
RANK ELO +/- Tme/Gme Tot Gms PLAYER
------- ------- ----- ---------- ------- ----------------
1 3000.0 10.6 109.835 4324 komodo 1.2
2 2994.4 10.6 108.736 4324 komodo 1.2-noSing
1 2
----- -----
1. -- 50.8 50.8 percent of 4324 games
2. 49.2 -- 49.2 percent of 4324 games
If this doesn't scale, then it may be a problem. This was run at roughly 60 seconds per game on a core 2 duo laptop.
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: Singular Extensions - long games
Thanks, this is the nicest thing I heard in a while. I was contemplating to take a coupleMy mistake, yes it was something Ralph said. I take back all those bad things I said about you.
of months off CC due to the number of "flames" I seem to get myself into. I will take
the break anyway due to other obligations.
I admit that I also misundertood you. When you said "government cover up", I took it literally.
I sometimes miss sarcasm. English is my second language

Thanks for the update. I tested some variants of SE, and none of them seem to help (all tested atI have been running the singular test on Komodo and the results are not so hot after all. It shows as a small net improvement only.
Here is what I get:
Code:
Total games played: 4324
Total this run: 4324
Matches in progress: 4
PGN file: sing.pgn
Total run time: 75:38:48
Games per minute: 0.95
RANK ELO +/- Tme/Gme Tot Gms PLAYER
------- ------- ----- ---------- ------- ----------------
1 3000.0 10.6 109.835 4324 komodo 1.2
2 2994.4 10.6 108.736 4324 komodo 1.2-noSing
1 2
----- -----
1. -- 50.8 50.8 percent of 4324 games
2. 49.2 -- 49.2 percent of 4324 games
And since the error margins are relatively high I cannot even say with any confidence that this helps.
If this doesn't scale, then it may be a problem. This was run at roughly 60 seconds per game on a core 2 duo laptop.
hyper-blitz setting which could be the problem). I hope one of the variants being tested by Bob gives something significant.
regards,
Daniel
-
- Posts: 408
- Joined: Sat Mar 06, 2010 9:28 am
Re: Singular Extensions - long games
I said nothing about a mystery, nor a government cover up. That originated from your humor. I said (not literally) that it was no surprise to me that he does not want to look deeper into the scaling issue. Or should I say non-issue?Don wrote:My mistake, yes it was something Ralph said. I take back all those bad things I said about you.Daniel Shawul wrote:What a stupid thing to say! If you can't read properly what people posted, just stop posting. I never ever said anything like that..
What I'm saying is that all you do is talk. That makes me believe it won't matter how we do our testing, you would find it flawed and always be able to produce some reason why it's not the way you think it should be.
You claim that Bob mysteriously stopped his test, etc. Of course it was suspicious since it did not match what you expected.
Recheck the threads and maybe you will find something like that in Ralph's post.
I have been running the singular test on Komodo and the results are not so hot after all. It shows as a small net improvement only.
He stated in this thread that he's mainly interested in myth debunking, so it seemed natural that he will not look deeper into the issue "does ttSE scale with longer TC and if yes, how much", after the Elo gain had quadrupled compared to the 5+5 results.
Nothing to get angry about. Finally he can do what he want with his cluster (as long as the government agents do not abort his tests to free up cpu time for their ww4 simulation

-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Singular Extensions - long games
Not particularly. I ran almost 1,000 games at 60+60 (on hardware about 2x faster to boot) before aborting, when asked to do an A/C stress test again... Elo was +16, but with a bigger error bar. At 540 games at a time, 3-4 hours per game per cpu, 1000 took something around 8 hours. 30,000 was not worth it... That +4 could be anywhere between 0 and +8, that +18 could have been anywhere between +10 and +26. Hard to get too excited when logic says that trend is impossible (2x = 4x, 4x=16x, 16x=256x. Pretty soon you are talking serious elo.Ralph Stoesser wrote:Why tell roughly about exact measurements?bob wrote:where is this "much" stronger coming from? I got roughly +5 at one time control, +17 at another. That is not "much stronger".Ralph Stoesser wrote:Daniel Shawul wrote: 5 + 5 gives enough depth so why ask for more ??
Because 10+10 was measured (comparatively much) stronger, with an increasing tendency? Don't you believe in holy cluster test results??
But suddenly the test was stopped ... suprise, surprise.
TC 5+5: +4 Elobob wrote:I've aborted this test. the ttSE version is +4 Elo stronger, maybe. I have started a 10min+10s match although with a lot fewer games. Report tomorrow although I am not sure how long it will run.
Code: Select all
1 Stockfish 1.8 64bit 2850 4 4 30193 82% 2550 20% 2 Stockfish 1.8noSE 64bit 2846 4 4 30246 82% 2551 21%
TC 10+10: +18 Elobob wrote: I finally stopped the test last night, error bar was down to +/- 8, difference was +18 Elo. Not insignificant, but also not in line what claims I had seen on freechess. One person there claimed +100 or so which would be remarkable for any change.
Each time the latest results reported by yourself.
In absolute terms +18 Elo difference may look tiny, but in relative terms it's much more compared to the 5+5 results. Roughly TC doubled, ELO gain quadrupled.
Isn't that something worth to look deeper into?

-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Singular Extensions - long games
So far, you can wish in one hand, crap in the other, and see which one fills up first.Daniel Shawul wrote:Thanks, this is the nicest thing I heard in a while. I was contemplating to take a coupleMy mistake, yes it was something Ralph said. I take back all those bad things I said about you.
of months off CC due to the number of "flames" I seem to get myself into. I will take
the break anyway due to other obligations.
I admit that I also misundertood you. When you said "government cover up", I took it literally.
I sometimes miss sarcasm. English is my second language
Thanks for the update. I tested some variants of SE, and none of them seem to help (all tested atI have been running the singular test on Komodo and the results are not so hot after all. It shows as a small net improvement only.
Here is what I get:
Code:
Total games played: 4324
Total this run: 4324
Matches in progress: 4
PGN file: sing.pgn
Total run time: 75:38:48
Games per minute: 0.95
RANK ELO +/- Tme/Gme Tot Gms PLAYER
------- ------- ----- ---------- ------- ----------------
1 3000.0 10.6 109.835 4324 komodo 1.2
2 2994.4 10.6 108.736 4324 komodo 1.2-noSing
1 2
----- -----
1. -- 50.8 50.8 percent of 4324 games
2. 49.2 -- 49.2 percent of 4324 games
And since the error margins are relatively high I cannot even say with any confidence that this helps.
If this doesn't scale, then it may be a problem. This was run at roughly 60 seconds per game on a core 2 duo laptop.
hyper-blitz setting which could be the problem). I hope one of the variants being tested by Bob gives something significant.
regards,
Daniel

Still testing and tweaking on the last idea I explained. but so far, no glory...
I'm getting more pessimistic and am beginning to think that the extra overhead is simply not worth it.