Singular Extensions

bob · Post by **bob** » Sun Aug 01, 2010 4:00 am

Daniel Shawul wrote:Don't forget to bring down the error bar to 1 elo until Marco gets convinced

I am now trying to use my old guns [single reply and recapture extensions and mate threat]
with full depth increment at upper parts of the tree. I excluded those from my SE test so
maybe that was where the benefit comes from. Seems interesting so far..

We used to use those and at one point placed restrictions on them, probably something similar to what you suggested. But once we had the ability to really do accurate testing, it became quite obvious (quite quickly too) that except for giving check, the rest hurt. Not a lot. And the give check is not a big win itself. They seemed to be much more important 10-15 years ago...

Still got lots of testing to do, but I have just finished a search rewrite to get rid of SearchRoot() to make future SE tests easier. Doing the same for QuiesceChecks() right now as well... then back to testing. New search is done, think new quiesce just pasted validation tests as well...

bob · Post by **bob** » Sun Aug 01, 2010 4:21 am

Code: Select all

Stockfish 1.8 64bit      2849    5    5 20337   82%  2551   20% 
Stockfish 1.8noSE 64bit  2846    4    4 20397   82%  2551   21%

This is pretty much a settled issue at 5+5. I will start a 10min+10s match in a bit, but will not be able to use both clusters so it will take longer.

bob · Post by **bob** » Sun Aug 01, 2010 6:35 am

I've aborted this test. the ttSE version is +4 Elo stronger, maybe. I have started a 10min+10s match although with a lot fewer games. Report tomorrow although I am not sure how long it will run.

Code: Select all

   1 Stockfish 1.8 64bit      2850    4    4 30193   82%  2550   20% 
   2 Stockfish 1.8noSE 64bit  2846    4    4 30246   82%  2551   21%

bob · Post by **bob** » Sun Aug 01, 2010 8:17 am

bob wrote:I've aborted this test. the ttSE version is +4 Elo stronger, maybe. I have started a 10min+10s match although with a lot fewer games. Report tomorrow although I am not sure how long it will run.
Code: Select all
   1 Stockfish 1.8 64bit      2850    4    4 30193   82%  2550   20% 
   2 Stockfish 1.8noSE 64bit  2846    4    4 30246   82%  2551   21% 

This is much slower going, but 10M + 10S inc: Will update tomorrow after the error bar gets down to something reasonable... Right now +12, but the error bars are huge. Same gauntlet run as before except it will stop much sooner as 30K games at this speed using 1/2 of one cluster will take forever.

Code: Select all

Name                      Elo    +    - games score oppo. draws
Stockfish 1.8 64bit      2776   46   46   135   73%  2602   38% 
Stockfish 1.8noSE 64bit  2764   44   44   141   71%  2606   45%

Daniel Shawul · Post by **Daniel Shawul** » Sun Aug 01, 2010 11:23 am

This is going to be waste of time, definitely not a +70 elo idea. The point is not to make something work at some time control because we would like it to.. I bet no one tested it at 10+10 because it just takes too much valuable time. 5 + 5 is already long enough IMO. Maximum TC i ever used is 2+2.

It is infact a good thing that there is a plus elo to the idea. (5) without tuning lost a couple of elos in my tests ..

bob · Post by **bob** » Sun Aug 01, 2010 7:50 pm

Daniel Shawul wrote:This is going to be waste of time, definitely not a +70 elo idea. The point is not to make something work at some time control because we would like it to.. I bet no one tested it at 10+10 because it just takes too much valuable time. 5 + 5 is already long enough IMO. Maximum TC i ever used is 2+2.

It is infact a good thing that there is a plus elo to the idea. (5) without tuning lost a couple of elos in my tests ..

I'm only running the test to try to eliminate "urban legend" again. This is a common argument, that the thing needs to search deeper to reap the full effect. As in the LMR discussion. And this one. I've explained my testing approach several times, and when I say that 99% of the changes are just as effective at short time controls as at long time controls, it isn't a guess. It is an experimental observation. I have seen significantly more examples of where something works better at _shorter_ time controls than at longer ones. Personally, I'm happy to see this effect since it makes it possible to play these 30K game matches in an hour and get rapid feedback that is reliable. To date, I do not believe I have found more than 2-3 examples of where a long T/C is better. Unless you factor in changes to your time allocation, which needs to be tested at several time limits for obvious reasons.

I've found it perfectly workable to test at short time controls, and then occasionally run a longer one as a sanity check / verification. So far none of those that I have done changed anything.

Current results:

Code: Select all

Name                      Elo    +    - games score oppo. draws
Stockfish 1.8 64bit      2796   16   16  1255   75%  2600   38% 
Stockfish 1.8noSE 64bit  2781   15   15  1325   72%  2602   41%

Goes slow with only 50 nodes being used. 100 games in parallel, but each game lasts something on the order of 30 minutes or more, so maybe 100 games an hour at best... Will let it run thru today as I have tossed the ttSE idea for the time being and am moving on to the next SE approach (still saving Hsu's ideas until the end since they are a pain.) Did get search and q-search completely rewritten so that I now have just Search() and Quiesce() rather than having significant duplicated code. (Do still have a SearchParallel as merging that would make the code look messier, so that's a project for another day maybe). At least adding code is now much more straightforward as I make whatever SE changes seem interesting. On OpenChess one of the many IP* cloners claimed +100 Elo for the ttSE stuff. Something told me that was nonsense. Seems like it is in the +10 to +20 range at best, but I also suspect it is something that is used to offset a weakness elsewhere. For example, I don't particularly think highly of the idea of having extensions offsetting poor reductions. Seems more reasonable to work on the poor reductions and reduce those rather than depending on a kludge to counteract them. But I have not looked at the ip* code enough to see what it does with respect to reductions.

Don · Post by **Don** » Sun Aug 01, 2010 11:47 pm

bob wrote:
Daniel Shawul wrote:This is going to be waste of time, definitely not a +70 elo idea. The point is not to make something work at some time control because we would like it to.. I bet no one tested it at 10+10 because it just takes too much valuable time. 5 + 5 is already long enough IMO. Maximum TC i ever used is 2+2.

It is infact a good thing that there is a plus elo to the idea. (5) without tuning lost a couple of elos in my tests ..
I'm only running the test to try to eliminate "urban legend" again. This is a common argument, that the thing needs to search deeper to reap the full effect. As in the LMR discussion. And this one. I've explained my testing approach several times, and when I say that 99% of the changes are just as effective at short time controls as at long time controls, it isn't a guess. It is an experimental observation. I have seen significantly more examples of where something works better at _shorter_ time controls than at longer ones. Personally, I'm happy to see this effect since it makes it possible to play these 30K game matches in an hour and get rapid feedback that is reliable. To date, I do not believe I have found more than 2-3 examples of where a long T/C is better. Unless you factor in changes to your time allocation, which needs to be tested at several time limits for obvious reasons.

I've found it perfectly workable to test at short time controls, and then occasionally run a longer one as a sanity check / verification. So far none of those that I have done changed anything.

Current results:
Code: Select all
Name                      Elo    +    - games score oppo. draws
Stockfish 1.8 64bit      2796   16   16  1255   75%  2600   38% 
Stockfish 1.8noSE 64bit  2781   15   15  1325   72%  2602   41% 
Goes slow with only 50 nodes being used. 100 games in parallel, but each game lasts something on the order of 30 minutes or more, so maybe 100 games an hour at best... Will let it run thru today as I have tossed the ttSE idea for the time being and am moving on to the next SE approach (still saving Hsu's ideas until the end since they are a pain.) Did get search and q-search completely rewritten so that I now have just Search() and Quiesce() rather than having significant duplicated code. (Do still have a SearchParallel as merging that would make the code look messier, so that's a project for another day maybe). At least adding code is now much more straightforward as I make whatever SE changes seem interesting. On OpenChess one of the many IP* cloners claimed +100 Elo for the ttSE stuff. Something told me that was nonsense. Seems like it is in the +10 to +20 range at best, but I also suspect it is something that is used to offset a weakness elsewhere. For example, I don't particularly think highly of the idea of having extensions offsetting poor reductions. Seems more reasonable to work on the poor reductions and reduce those rather than depending on a kludge to counteract them. But I have not looked at the ip* code enough to see what it does with respect to reductions.

This is pretty bizarre. Singular extensions based on the hash table move only in PV nodes proved to be a pretty huge benefit to Komodo at time controls way faster than this. I think it just shows that that every program is different and what works in one program may not work in another. It could very well be that the same basic ideas can be covered in more than one way.

bob · Post by **bob** » Mon Aug 02, 2010 12:16 am

Don wrote:
bob wrote:
Daniel Shawul wrote:This is going to be waste of time, definitely not a +70 elo idea. The point is not to make something work at some time control because we would like it to.. I bet no one tested it at 10+10 because it just takes too much valuable time. 5 + 5 is already long enough IMO. Maximum TC i ever used is 2+2.

It is infact a good thing that there is a plus elo to the idea. (5) without tuning lost a couple of elos in my tests ..
I'm only running the test to try to eliminate "urban legend" again. This is a common argument, that the thing needs to search deeper to reap the full effect. As in the LMR discussion. And this one. I've explained my testing approach several times, and when I say that 99% of the changes are just as effective at short time controls as at long time controls, it isn't a guess. It is an experimental observation. I have seen significantly more examples of where something works better at _shorter_ time controls than at longer ones. Personally, I'm happy to see this effect since it makes it possible to play these 30K game matches in an hour and get rapid feedback that is reliable. To date, I do not believe I have found more than 2-3 examples of where a long T/C is better. Unless you factor in changes to your time allocation, which needs to be tested at several time limits for obvious reasons.

I've found it perfectly workable to test at short time controls, and then occasionally run a longer one as a sanity check / verification. So far none of those that I have done changed anything.

Current results:
Code: Select all
Name                      Elo    +    - games score oppo. draws
Stockfish 1.8 64bit      2796   16   16  1255   75%  2600   38% 
Stockfish 1.8noSE 64bit  2781   15   15  1325   72%  2602   41% 
Goes slow with only 50 nodes being used. 100 games in parallel, but each game lasts something on the order of 30 minutes or more, so maybe 100 games an hour at best... Will let it run thru today as I have tossed the ttSE idea for the time being and am moving on to the next SE approach (still saving Hsu's ideas until the end since they are a pain.) Did get search and q-search completely rewritten so that I now have just Search() and Quiesce() rather than having significant duplicated code. (Do still have a SearchParallel as merging that would make the code look messier, so that's a project for another day maybe). At least adding code is now much more straightforward as I make whatever SE changes seem interesting. On OpenChess one of the many IP* cloners claimed +100 Elo for the ttSE stuff. Something told me that was nonsense. Seems like it is in the +10 to +20 range at best, but I also suspect it is something that is used to offset a weakness elsewhere. For example, I don't particularly think highly of the idea of having extensions offsetting poor reductions. Seems more reasonable to work on the poor reductions and reduce those rather than depending on a kludge to counteract them. But I have not looked at the ip* code enough to see what it does with respect to reductions.
This is pretty bizarre. Singular extensions based on the hash table move only in PV nodes proved to be a pretty huge benefit to Komodo at time controls way faster than this. I think it just shows that that every program is different and what works in one program may not work in another. It could very well be that the same basic ideas can be covered in more than one way.

I have one more test to run, but once this one finishes I can get a sneak preview. That is sf vs sf-noSE head-to-head which I believe is a flawed way of testing. It may well look much better there, not sure. Once this test finishes, I can extract just the sf vs sf-noSE games and see how they match up, only I will only have a total of 3K games (I believe)...

I _really_ don't see why this would be a big gainer. Random extensions here and there seems flawed from the outset. As opposed to Hsu's SE approach which is applied everywhere equally. I managed to eventually get 3-4 Elo with Crafty, and about +10 or so with Stockfish 1.8 in my testing. 10+10 is about as slow as I am willing to test here as the games take forever no matter how many are played in parallel.

There is always the possibility that I have tuned my LMR stuff so carefully it doesn't rely on the crutch of occasionally extending a hash move. In this case, I care much less about the "why" as it could take months to figure out why it works in one but not in another. Someone on OpenChess claimed +100. I'd certainly suspect bugs in testing to get that kind of change. Regular SE will likely not get even a part of that increase, since the check extension is worth less than 20 Elo itself... When my numbers were nowhere near even the +40 I had seen in Stockfish claims, I thought that it better to test that program with and without before assuming that mine had some sort of unknown bug or efficiency issue...

Mangar · Post by **Mangar** » Mon Aug 02, 2010 9:25 am

Hi,

for the example of stockfish implementation of singular extension I suspected a better result in faster time controls. Stockfish is extending at 8 remaining depth for non pv and 6 remaining depth for pv nodes. In fast time controls I expected most relevant hash entries are not overwritten. In long time controls it should get more and more random if using the same hash size.

Greetings Volker

bob · Post by **bob** » Mon Aug 02, 2010 4:22 pm

Mangar wrote:Hi,

for the example of stockfish implementation of singular extension I suspected a better result in faster time controls. Stockfish is extending at 8 remaining depth for non pv and 6 remaining depth for pv nodes. In fast time controls I expected most relevant hash entries are not overwritten. In long time controls it should get more and more random if using the same hash size.

Greetings Volker

I finally stopped the test last night, error bar was down to +/- 8, difference was +18 Elo. Not insignificant, but also not in line what claims I had seen on freechess. One person there claimed +100 or so which would be remarkable for any change.

Singular Extensions

Re: Singular Extensions

Re: Singular Extensions

Re: Singular Extensions

Re: Singular Extensions - long games

Re: Singular Extensions - long games

Re: Singular Extensions - long games

Re: Singular Extensions - long games

Re: Singular Extensions - long games

Re: Singular Extensions - long games

Re: Singular Extensions - long games