Stockfish 4

Don · Post by **Don** » Fri Aug 23, 2013 4:38 pm

IWB wrote:
Don wrote:
It wouldn't be fair to run a 32 bit Stockfish and nobody would be happy with this solution and I would be on Stockfish's side if you had done this.

On the other hand it also doesn't seem fair to have a list of compiles that you choose from where you pick the best one for your hardware. Why would that not be fair? Because I am not allowed to do that. I tried to do that and was rebuked for it. Our plan was to provide an official version in the distribution for each processor family, including multiple ones for older Intel and newer Intel if we could make it work - and YOU told me that only 1 official version could be used on your test.

So I'm more than a little annoyed that so much effort is spend on finding the one perfect compile of stockfish that you can use - that the rules changed for Stockfish and that it's even a question which one you should use. Even if there is a link to multiple compiles on their website that seems like a workaround - for example if someone produces a special Haswell compile that only worked well on Haswell then are you saying that all they have to do is add that to their growing list of binaries on that site and that it's all ok?

To be completely clear, I think there SHOULD be multiple compiles for multiple platforms and that this SHOULD be allowed - so I'm not saying Stockfish users should not enjoy this advantage and I advocate this kind of free choice - but what I am saying is that if it's ok for them it should be ok for any of us and not JUST them.

As I said before this is an issue that is not going to go away because I expect future processors to have more and more capabilities that might be utilized in chess programs just as popcount has been. So I strongly suggest that all the major testers agree upon some rules here and this not become an ad-hoc per-program decision.

I propose a liberal policy that lets a program author include in his release as many binaries as the author wishes to produce - but that for official tests this is confined to what is available at release time. For casual users there is no problem with allowing compiles to be added to a growing list if that can benefit the end-users. But for testing it is equivalent to having a new release every time a new compile is produced and I don't think anyone wants that.

What is NOT acceptable is for a program author to work with each tester to make a specialized compile just for the test - which is not what I'm saying is happening here but I could see this happening in the future if there is not some rules.
I remember vaguely that we discussed that problem about your compiles and I don't get it completly anymore (would have to check my emails) but I think it can only be to avoid
a) confusion on the customer side and
b) one compile suitable for 95% of all users would be the best.

I have to appologie because you are anoid by my decision but what else could I have done. "Officialy" I could have only used the 32 bit version and you yourself wrote that this would be wrong and you would support the Stockfish team in that. So I had to make a reasonable decisiion and that was to use a compile from a site where even the Stockfish team was refering to (and taking their comples). Right now I can not get closer to "official" than this. So, sorry again, but what else to do?

You misunderstand. You did the right thing and I am not blaming you or your decision in this particular case, but what really didn't seem right and what annoyed me was the discussion on what was best for you to use for AMD. For example would you have used an AMD optimized version had one been available?

Neither am I against the Stockfish team for providing such choices - I think they SHOULD provided choices - that is never bad. However, I remember that in more than one case I was not allowed to do so (not just you) and that really annoyed me.

Besides that. If one would make special compiles for individual CPUs I don't think that is a good solution. I think a "general" compile, maybe detecting some hardware specialities for 64 and 32 bit is enough even if it might lose 2,3,4% of performance. If you start with individual compiles you have to do a lot of work with every new generation AND you will get request for this special Samsung CPU, Android devices, Intel ad AMD low power version ... it will never end. Do you really want to run behind every fraction of a percent to pelase every possible CPU owner?

Sorry for having you upset but I dont think it is something special for Stockfish but more finding a practical solution - which basicaly all rating lists have choosen as I have heard ...

I was not upset by you or any particular individual, it was the implicit assumption that you should use the best compile for your platform - an assumption that I happen to agree with on general principle but only if the rules are the same for everyone. Am I wrong to expect that?

It is easy for a distribution to provide a number of binaries and have a front-end script select that based on the hardware. The front end script could even be an executable. Basically the script makes the determination and it runs the appropriate binary.

I think this is a lot more problematic on Windows, which treats scripts and native executable's differently - but Unix tends to have much better abstractions - a script is no different than an executable. I tried running a chess program from a batch file years ago in windows and Fritz couldn't load the script as if it were the program but in Linux there is no distinction that would defeat any GUI. But if there is a way to work around this in windows a simple solution is that the program author provides a single entry point (such as a windows batch file) called "stockfish-4.bat" that figures out which DISTRIBUTION PROVIDED binary to run and just runs it transparently.

Anyway, that is just one possibility, but what is important to me is that there are consistent rules that testers must follow - or at least that each testing group publishes a clear policy on this matter.

Bye
Ingo

Don · Post by **Don** » Fri Aug 23, 2013 4:45 pm

P.S. to Ingo:

I apologize to you if I made it seem that you were being unfair - that was not my intention - my source of "annoyance" was that the entire discussion seemed to be about picking the ideal binary for your platform - something I have assumed was not permitted by any of the testing sites.

Don wrote:
IWB wrote:
Don wrote:
It wouldn't be fair to run a 32 bit Stockfish and nobody would be happy with this solution and I would be on Stockfish's side if you had done this.

On the other hand it also doesn't seem fair to have a list of compiles that you choose from where you pick the best one for your hardware. Why would that not be fair? Because I am not allowed to do that. I tried to do that and was rebuked for it. Our plan was to provide an official version in the distribution for each processor family, including multiple ones for older Intel and newer Intel if we could make it work - and YOU told me that only 1 official version could be used on your test.

So I'm more than a little annoyed that so much effort is spend on finding the one perfect compile of stockfish that you can use - that the rules changed for Stockfish and that it's even a question which one you should use. Even if there is a link to multiple compiles on their website that seems like a workaround - for example if someone produces a special Haswell compile that only worked well on Haswell then are you saying that all they have to do is add that to their growing list of binaries on that site and that it's all ok?

To be completely clear, I think there SHOULD be multiple compiles for multiple platforms and that this SHOULD be allowed - so I'm not saying Stockfish users should not enjoy this advantage and I advocate this kind of free choice - but what I am saying is that if it's ok for them it should be ok for any of us and not JUST them.

As I said before this is an issue that is not going to go away because I expect future processors to have more and more capabilities that might be utilized in chess programs just as popcount has been. So I strongly suggest that all the major testers agree upon some rules here and this not become an ad-hoc per-program decision.

I propose a liberal policy that lets a program author include in his release as many binaries as the author wishes to produce - but that for official tests this is confined to what is available at release time. For casual users there is no problem with allowing compiles to be added to a growing list if that can benefit the end-users. But for testing it is equivalent to having a new release every time a new compile is produced and I don't think anyone wants that.

What is NOT acceptable is for a program author to work with each tester to make a specialized compile just for the test - which is not what I'm saying is happening here but I could see this happening in the future if there is not some rules.
I remember vaguely that we discussed that problem about your compiles and I don't get it completly anymore (would have to check my emails) but I think it can only be to avoid
a) confusion on the customer side and
b) one compile suitable for 95% of all users would be the best.

I have to appologie because you are anoid by my decision but what else could I have done. "Officialy" I could have only used the 32 bit version and you yourself wrote that this would be wrong and you would support the Stockfish team in that. So I had to make a reasonable decisiion and that was to use a compile from a site where even the Stockfish team was refering to (and taking their comples). Right now I can not get closer to "official" than this. So, sorry again, but what else to do?

You misunderstand. You did the right thing and I am not blaming you or your decision in this particular case, but what really didn't seem right and what annoyed me was the discussion on what was best for you to use for AMD. For example would you have used an AMD optimized version had one been available?

Neither am I against the Stockfish team for providing such choices - I think they SHOULD provided choices - that is never bad. However, I remember that in more than one case I was not allowed to do so (not just you) and that really annoyed me.

Besides that. If one would make special compiles for individual CPUs I don't think that is a good solution. I think a "general" compile, maybe detecting some hardware specialities for 64 and 32 bit is enough even if it might lose 2,3,4% of performance. If you start with individual compiles you have to do a lot of work with every new generation AND you will get request for this special Samsung CPU, Android devices, Intel ad AMD low power version ... it will never end. Do you really want to run behind every fraction of a percent to pelase every possible CPU owner?

Sorry for having you upset but I dont think it is something special for Stockfish but more finding a practical solution - which basicaly all rating lists have choosen as I have heard ...

I was not upset by you or any particular individual, it was the implicit assumption that you should use the best compile for your platform - an assumption that I happen to agree with on general principle but only if the rules are the same for everyone. Am I wrong to expect that?

It is easy for a distribution to provide a number of binaries and have a front-end script select that based on the hardware. The front end script could even be an executable. Basically the script makes the determination and it runs the appropriate binary.

I think this is a lot more problematic on Windows, which treats scripts and native executable's differently - but Unix tends to have much better abstractions - a script is no different than an executable. I tried running a chess program from a batch file years ago in windows and Fritz couldn't load the script as if it were the program but in Linux there is no distinction that would defeat any GUI. But if there is a way to work around this in windows a simple solution is that the program author provides a single entry point (such as a windows batch file) called "stockfish-4.bat" that figures out which DISTRIBUTION PROVIDED binary to run and just runs it transparently.

Anyway, that is just one possibility, but what is important to me is that there are consistent rules that testers must follow - or at least that each testing group publishes a clear policy on this matter.

Bye
Ingo

IWB · Post by **IWB** » Fri Aug 23, 2013 5:08 pm

Hmm, here is a principle problem. I don't like the idea of mixing different hadware for a test (one reason why I started the official IPON).
Using a specialized compile for a specific hardware would make the whole concept of "my" proper testing obsolet because at the end I would make a list which is correct basicaly only for my setup. So maybe the idea of making a list with different hardware has its advantages too - but than someone has to take care that the percentage of games is distributed equaly over the available hardware - which is done nowhere today. One Engine might have 90% i7, the next just old Athlons, and then with adapted timeings based on one single old bench made with an obsolet compiler ... )
I like to have a compile witch is running on the vast majority of hardware.
If I have a compile which runs everywhere and a compile for my AMDs which only diffees 1,2,3% from the generic I migh have used the special version. If I have a compile which is 25% faster than the generic I would have declined it. If someone produces an "autodetection" compile which runs everywhere and would run 25% faster an AMD I would not see that (most likely I would have thought that someone removed the usuall compiler penalty for AMD)

It is some kind of catch-22. Whatever you do it migh be wrong ...

I have to 'ponder' about that situation.

Bye
Ingo

Don · Post by **Don** » Fri Aug 23, 2013 5:51 pm

IWB wrote:Hmm, here is a principle problem. I don't like the idea of mixing different hadware for a test (one reason why I started the official IPON).
Using a specialized compile for a specific hardware would make the whole concept of "my" proper testing obsolet because at the end I would make a list which is correct basicaly only for my setup. So maybe the idea of making a list with different hardware has its advantages too - but than someone has to take care that the percentage of games is distributed equaly over the available hardware - which is done nowhere today. One Engine might have 90% i7, the next just old Athlons, and then with adapted timeings based on one single old bench made with an obsolet compiler ... )
I like to have a compile witch is running on the vast majority of hardware.
If I have a compile which runs everywhere and a compile for my AMDs which only diffees 1,2,3% from the generic I migh have used the special version. If I have a compile which is 25% faster than the generic I would have declined it. If someone produces an "autodetection" compile which runs everywhere and would run 25% faster an AMD I would not see that (most likely I would have thought that someone removed the usuall compiler penalty for AMD)

It is some kind of catch-22. Whatever you do it migh be wrong ...

I have to 'ponder' about that situation.

Bye
Ingo

You will never solve the platform issue because it's a fact of life that each chess program AND compile too will run differently relative to other programs and compiles. We know for sure that SSE4.2 helps some programs very little and others a lot.

It only makes sense that someone releasing software such as the Stockfish team wants to see it run as well as possible on his customers machines, whether his customers are commercial or open source downloaders.

So personally I see no problem with TESTERS using the most appropriate binary for their platforms too. The only caveat here is that these binaries should come as part of the official distribution and not specially prepared beyond the release date to perform well. This is in the same spirit as how the options are set - you probably don't allow the authors of programs you test to have special settings just for your particular test either, right?

SSE is not an issue for us because we auto-detect that at run time. But it COULD have been an issue. Which binary would you run if we had two?

For Komodo and some programs SSE is a very big deal and for other programs it isn't, but the point is that it does matter and only because of auto-detection do we avoid getting penalized. If it were not for that I would have refused to provide a special non-SSE binary.

But I think you have to accept and admit that it's not reasonable to restrict us to a single binary - otherwise you would have to make a special exception for 32 bit binaries and for testing we would all have to NOT provide a 32 bit binary for fear the testers would use it - but they all recognize why the shouldn't do that. So we are ALREADY are making some concessions to one size fit's all.

A simple rule is that you don't accept releases more frequently than every 3 months (or whatever your cutoff is) and that a release can come with multiple binaries optimized for various platforms but that for testing purposes you will not use ones that are added AFTER the release date because that is the same as having frequent releases. You probably would not re-test every time someone tells you they produced a faster binary would you?

Of course whatever you decide we will honor, it's your test run your way and I am just making a suggestion here and also I wonder what others think about this.

zamar · Post by **zamar** » Fri Aug 23, 2013 6:17 pm

I think that the intention of Stockfish Team and what makes most sense (Take a look at Makefile!) is to provide three binaries:

x64-with-popcnt
x64-without-popcnt
x32
(+ x32-legacy, but this is almost irrelevant as this is only for VERY old hardware)

I think this is the most reasonable compromise between simplicity and optimization. I dislike the idea of specialized versions for different hardwares, but because popcnt is so important in computer chess, it makes sense to do an exception in this area...

Graham Banks · Post by **Graham Banks** » Fri Aug 23, 2013 6:58 pm

Don wrote:......We know for sure that SSE4.2 helps some programs very little and others a lot......

Do we?
I think that this is a fallacy. It makes minimal difference from what I've seen.

Reminds me of the age long argument over the value of tablebases in adding ELO to ratings.

Don · Post by **Don** » Fri Aug 23, 2013 7:25 pm

Graham Banks wrote:
Don wrote:......We know for sure that SSE4.2 helps some programs very little and others a lot......
Do we?
I think that this is a fallacy. It makes minimal difference from what I've seen.

It is not a fallacy, believe me as developers we are 100% aware of the issues and it's not a couple of ELO but something like 7 or 8 ELO for us and 2 or 3 for Stockfish I think.

For a test like you run there is not enough resolution to make it obvious. On the CCRL 40/40 list I show Komodo in clear second place with an ELO of 3207 and an error margin of +/- 25 ELO. It's far enough ahead not to make any difference in this particular test as far as rank is concerned but it does mean the ELO has a real chance of being off by quite a bit. Your test cannot resolve the different between programs that are within 10 or 20 ELO of each other.

However, imagine what is going to happen when Stockfish 4 is tested. On every list I have seen so far (at least the longer time controls tests) we are virtually tied. What that means is that who is in second place is virtually RANDOM and could hinge on 1 or 2 ELO (in fact in one list Stockfish is 1 or 2 ELO ahead of Komodo when I last looked.)

For people who understand what they are looking at it won't matter, but most people do not have a clue. If Komodo ends up 1 ELO ahead of Stockfish some people will assume it's the better program when in fact the error margins are ridiculously too high to make such a judgement.

With an error margin of 25 when it's a close call, SSE4.2 is major thing. There were times when Critter was either slightly ahead or slightly behind Komodo and so it mattered a lot there too. If it's not a close call it doesn't matter.

A very crude interpretation of what a 25 point error margin means is that there is about a 1/20 chance the program is at least 25 ELO OFF in one direction or the other. So Komodo has a non-trivial chance of being 25 ELO weaker than your test reports or conversely 25 ELO stronger. Since the error is based on a curve, the odds that it's off by at least 15 or 20 ELO is pretty high, certainly a lot more likely than the 25 ELO case.

melajara · Post by **melajara** » Fri Aug 23, 2013 7:52 pm

tjfroh wrote:Fern, I am buggered. I can only do 18 to 20 ply searches in my ape's brain. I will never draw this program unless I use the ARB system.

Low Grade Krell Moron

fern wrote:My goodness, around 30 ply in middle game!!!

What a degree of pruning!!!

Fern

Nice to meet you, Chelsea Manning!

Modern Times · Post by **Modern Times** » Fri Aug 23, 2013 8:12 pm

Don wrote:
Graham Banks wrote:
Don wrote:......We know for sure that SSE4.2 helps some programs very little and others a lot......
Do we?
I think that this is a fallacy. It makes minimal difference from what I've seen.
It is not a fallacy, believe me as developers we are 100% aware of the issues and it's not a couple of ELO but something like 7 or 8 ELO for us and 2 or 3 for Stockfish I think.

I'm with Graham on this. But good evidence to the contrary would convince me.

Putting aside popcount, I think you're making a mountain out of a molehill. Any difference with processor specific compiles will be miniscule. But again, if you have any good evidence, I have an open mind.

Graham Banks · Post by **Graham Banks** » Fri Aug 23, 2013 8:17 pm

Don wrote:
Graham Banks wrote:
Don wrote:......We know for sure that SSE4.2 helps some programs very little and others a lot......
Do we?
I think that this is a fallacy. It makes minimal difference from what I've seen.
It is not a fallacy, believe me as developers we are 100% aware of the issues and it's not a couple of ELO but something like 7 or 8 ELO for us and 2 or 3 for Stockfish I think.......

That falls into my category of minimal.

Stockfish 4

Re: Stockfish 4

Re: Stockfish 4

Re: Stockfish 4

Re: Stockfish 4

Re: Stockfish 4

Re: Stockfish 4

Re: Stockfish 4

Re: Stockfish 4

Re: Stockfish 4

Re: Stockfish 4