Don wrote:The Intel JA compile is actually worth quite a bit of ELO improvement on Komodo and I would assume on just about any program.
This may explain the discrepancy.
There are 2 other considerations:
1. Crafty is not JA compiled either - so this is more or less a wash. Unless of course Bob is using the Intel compiler.
As I have said many times, that is _all_ I use. Unless I run on AMD. For reasons unknown, gcc seems to be better for AMD processors _every_ time I compare them... Particularly for SMP code. But on Intel boxes, and all of our stuff is currently Intel, icc is far better. And PGO actually works for multi-threaded code as well. gcc crashes and burns.
2. In experiments I have done in the past, head to head ratings come out about the same, within experimental error. If we can demonstrate that head to head is really that biased, then we have to adjust accordingly.
This depends on circumstances. For example, take A, B and C (let A be stockfish). Suppose B and C are -200 head-to-head with stockfish. But suppose B is +100 head-to-head with C (and this is not that uncommon). Elo can't give an accurate rating. Because when you choose any two programs and compare their Elos, the rating should predict the result between them, with reasonable accuracy. There is no rating you can assign to A, B and C that will be accurate. B has to be +100 over C, yet B and C both have to be -200 to A. So you only get a very rough approximation for B and C, and depending on how many games you play between who, you could get A=2600, B=2450 and C=2350. B and C have the right interval between them, but neither A/B or A/C do, either is off by 50.
Nothing can be done for that case except to use the head-to-head rating if you have it. USCF/FIDE doesn't use that kind of data that I am aware of.
On the other hand, why are we comparing to Stockfish 1.8 instead of Rybka 4? The lists are showing over 300 ELO difference between these two programs.
300 points between 1.8 and Rybka? Where? I am certain that is wrong.
What we have to do is see how far Crafty was from the top on our two reference dates. Why is this so complicated?
It isn't, but I can't run head-to-head matches with Rybka 4. In the lists I have seen, stockfish 1.8 is pretty close to R3/4.
It is disturbing to me that this simple thing is not good enough for Bob and that he continues to produce his own numbers - in this case by running more private tests and rejecting existing data. Doesn't that make anyone wonder what is going on here?
Only makes one wonder what your "hidden agenda" is. I am not rejecting anything. I am trying to produce results with as few outside influences as possible, given the facilities that I have. A large rating list doesn't say a thing about how far apart two specific programs are, because the ratings are an average sampling over games between various pairs of programs in the list. I've already explained this. To precisely measure A vs B, you play A vs B and run it thru BayesElo or your favorite Elo calculator. The more programs you use, the less accurate a rating becomes. It gives you a better idea of how any program stacks up against the entire population as a whole, but a single Elo number can't predict the outcome between any two opponents with high accuracy, unless only results from those two opponents playing each other is used...
I simply set out to roughly quantify hardware vs software. You, on the other hand have continually shifted the discussion, make outright false statements (Crafty data shows software more important...) and then bring in deep blue for no valid reason at all. So, what's your game here, instead of questioning what mine is???
mcostalba wrote:bob wrote:Since a "couple here" have some pretty uninformed views, here's a different rating metric.
6000 games, Crafty-23.4, stockfish 1.8.whatever is current
Did you compiled SF with profiling option ?
Public tested version is JA compiled, and this has a weight....
You can have something similar, but not as fast if you do:
Code: Select all
make profile-build ARCH=x86-64-modern
if your cluster is 32 bits then do: