That is not "two doublings". This is, once again, apples and oranges. SMP overhead comes in and this changes things.Don wrote:Bob,
I'm not following this too closely any longer. I don't know to what extent you have taken these 2 things into consideration - maybe you already have but if not, here goes:
Crafty gets 100 ELO going from 1 to 4 processors. That is 2 doublings and that means you get 50 ELO per doubling. If you go with MORE processors you get even less ELO per doubling. So the point is that you cannot mix and match any way you want to and call it science. I'm not saying you are doing that as I am only quickly skimming these discussions. So if you talk about nodes per second, number of cores, or speedup per core you have to separate them and make sure you are being scientifically rigid, at least as much as tests like this can permit.
So can we use some sensible numbers? Going from 1 cpu to 4, is about a 3x speedup. But not knowing what hardware makes even this inexact, as there are good SMP boxes and bad SMP boxes. And some take an SMP version and run it on NUMA, which is significantly lower than optimal if Crafty doesn't know. Etc. So 3x = +100 is "in the range". But it isn't 2 doublings.
If you recall, in our hardware discussion I gave both a "NPS speed" which is just raw hardware, but then I factored in SMP to go from 1 i7 core to 6, and ended up with 4.5x rather than 6x.
I don't believe that, because I have tried it. In fact, as we get more speculative in the search, going 2x faster is not the same as going 2x faster 10 years ago, because some fraction of that 2x is doing up in smoke because of error. I just ran some "doubling" experiments for the old version. I'll crank out a couple for the new code to compare, again.
The other issue is that how much you get per doubling is not a constant either. Modern programs have excellent branching factors compared to the older programs. This is software improvement. In fact I don't really thing there is a good way to resolve this. But consider this:
If you take a 1995 program and test it with a bunch of doubling's, you won't come up with a very impressive number in terms of ELO. If you could do the same with a modern program you will come up with much more impressive numbers.
Already done. Always been on my ftp machine. I can stick the modified version I tested there, but again, I am not certain at all that it will run with xboard, the protocol has changed a lot over the years. 23.3 is already available and everyone interested has a copy, and it is still on my ftp box. 23.4 is maybe 5-6 Elo stronger and isn't available, but I don't see how that makes enough difference to matter.
It would be a real mistake to attribute all of this to hardware by observing how much modern programs improve with a speedup. I know how you think and you are going to consider this completely fair. But it isn't because this kind of improvement was just not possible without the software improvements that lowered BF so much.
Also, you really NEED to give us all of your source code and binaries, to the old program and new.
I did just copy my 10.x version over, and I called it 10.x to keep it from being confused with 10.18. Test at your own risk. It does require 64 bit hardware only, and it works with my cluster referee program, no idea about xboard/winboard...
Funny guy. Who was making claims about Komodo as well during the discussion? And where, exactly, is _your_ source. I know, this is a one-way street. In any case, my source has been available the whole time on ftp.cis.uab.edu/pub/hyatt/source, you just choose the version numbers to look at. I just put my crafty-10.x.tar over there which may work for you, or not. the normal 10.18 certainly will not as I could not get it to compile and run without making quite a few changes.
You can of course do anything you want, but I have to say that I am extremely uncomfortable with YOU being in complete control of all the tests designed by YOU under YOUR conditions when these tests are designed so that YOU can make a point. Nobody was given any feedback on how these tests were run.
So unless you hand over the sources and binaries involved so that there can be some transparency, I think it would be foolish on our part to continue to entertain this.
No, it is _lazy_ for you to not run the tests yourself. I try to be open, and post everything relevant, and you want to imply that I am being secretive and not open. Again, where's your source? Got something to hide? Clearly I don't...
It's stupid for us to keep bringing up issues and then for us to wait for you to tell us if we are right or wrong based on some private testing that you decided on.
bob wrote:It had to, as that was a requirement. And IIRC some of those games were not rated, because USCF had rules about "matches" between computers and humans, because of the commercial bullshit from years before where they would pay someone to play an "arranged match" to give them a high rating for their advertising...Uri Blass wrote:In this case maybe Deep thought performed better against GM's relative to other players.bob wrote:The tournaments don't count "in toto". The Fredkin stage 2 prize required a 2550+ rating over 24 consecutive games against GM players only. However, if you look at old USCF rating reports, You can find a 2551 rating in 1988, although I don't have 'em for the entire year.
I do not remember that deep thought played against 24 GM's in 1988
so it is going to be nice to have a list of GM's who played against it in 1988.
I do not remember the names of all the GMs. I do remember it playing Byrne, Browne and Spraggett, but I suspect this must be recorded somewhere in the announcement of their winning the prize.