IWB wrote:Don wrote:
...
Although I have some complaints, I like your test most of all. Something really bother me about a bunch of people testing programs on all sorts of different hardware under not so strict conditions. The result is that any given program will perform best if it's played more on hardware it "likes." Of course the positive aspect of this is that they can test a huge variety of programs and get very large samples. So it's always a trade-off.
First of all, thanks!
"Different people" could be a problem but I do not see any problem with the existing lists regarding that matter. MY biggest concern and main issue to start my list was the use of different hardware with time controls which are adapted (IF they are adapted) by just one benchmark - which means they have to be wrong for nearly every engine except Crafty.
Different people using different books might shift a result - even if I have no doubt that this is not the intension of a single person there and that is true for my selection of opening positions as well - no intension, but possible!
After running my lsit for a while I have to admit that I was suprised how similar my results where with the CEGT 40/20. Basicaly the main difference is, that the IPON is a bit faster with the top engines. Thats it!
(And btw, I compare all lists and regardless of the time control there is no major difference for ANY engine. AS long as the time control is not too short, and all list starting from 40/3 upwards are long enough nowadays an extraordinary increase in playing strength for a particular engine is not visible - I think this "more time and my prefered engine will get better" is a myth (when propper tested)! Of course the quality of the game IS getting better, but that is true for all engines.
Bye
Ingo
People get "anal" about all sort of things and it's true that various factors can make a +/- 5-10 ELO difference, but in practice there is not much we can do about that and most of these things makes only a small difference in the big picture. It's a matter of semantics what you consider the "reference" point to be consider the normal case. Does certain hardware make you look bad or is it the "other" hardware that makes you look good? Semantics.
1. Time control - I think it seems to not be a major issue as long as it's "long enough." There is a very clear difference between programs once you go below 2 or 3 minutes on modern hardware but that tapers off. I think it makes a 5-10 or more difference going up to really long time controls too but that is very difficult to prove and curve is very gentle.
2. Book - I think that is a bigger issue. We test with a huge opening book with thousands of openings culled from master play only to ply 10, or 5 moves each. We want Komodo to play most of the game itself against other engines. However I have seen good opening books play seemingly most of the game FOR the engine. I don't really know how deep the book the testers use is - but since I am an engineer and the idea is to make a strong overall program I want the book to get out of my way.
3. ponder - hotly debated but I don't think it's as important as most of the other things on this list. If you have the resources it's better to have ponder than not. Pondering is going to help the stronger program more than the weaker programs so it might increase the difference in the range from low to high a little bit.
4. Hardware - of course each program responds differently to different hardware. We also must not forget that how a program is compiled is a very similar issue. I think most compilers have settings to make different trade-offs to optimize for specific hardware - and the classic example is the Intel compiler specifically designed to make AMD look bad. I don't know if that is still an issue or not.
When we speak of hardware and compiles, SSE4 (or more specifically ABM which stands for Advanced Bit Manipulation) makes a huge difference in some programs such as Komodo and less in others. So if you do not have SSE hardware then Komodo is crippled, or if you prefer Komodo has an advantage if you do.
It's amazing that the lists mostly agree within a few ELO given all these factors and probably many more.