The latest Crafty versions

bob · Post by **bob** » Thu Nov 27, 2008 5:38 pm

SzG wrote:From v22.4 to v.22.7 were there changes affecting playing strength or were they only memory leakage fixes?

Every version probably had a slight eval tweak. So 22.7 is the strongest. There is probably a 20 Elo spread from 22.4 to 22.7. So not that significant...

bob · Post by **bob** » Thu Nov 27, 2008 6:33 pm

SzG wrote:
bob wrote:
SzG wrote:From v22.4 to v.22.7 were there changes affecting playing strength or were they only memory leakage fixes?
Every version probably had a slight eval tweak. So 22.7 is the strongest. There is probably a 20 Elo spread from 22.4 to 22.7. So not that significant...
Thank you very much. Your reply was important from the point of view of a tester who works for a rating list.

I suspect we will be making significant progress in each new release. Tracy just found another 7 Elo in a change we just finished testing, a different way of implementating the old "bad trade" penalty in Crafty... Hopefully there will not be any steps backward because of the cluster-testing we do for validation...

glorfindel · Post by **glorfindel** » Thu Nov 27, 2008 8:19 pm

bob wrote:...Tracy just found another 7 Elo in a change we just finished testing...

Are you really confident you can measure such small changes in playing strength?

bob · Post by **bob** » Thu Nov 27, 2008 8:35 pm

glorfindel wrote:
bob wrote:...Tracy just found another 7 Elo in a change we just finished testing...
Are you really confident you can measure such small changes in playing strength?

Yes. My small test has a +/-4 error. My big test is +/- 1...

I'm reporting conservative numbers. For example, if the last test was 2590+/-4 I count that as 2594. If the current test is 2608+/-4 I count that as 2604. Which is a +10 difference. It is likely that the difference is greater, but using the distance between the two "edges" gives a conservative value which is what I have been quoting...

Some changes require that I use a longer test. And for some tuning, when I am making very small (final) adjustments, we need more accuracy and run a bigger test...

Most of what we have done in the past few months has been clearly detectable with our one-hour test run (when the cluster is completely idle, that is, which is most of the time).

bob · Post by **bob** » Tue Dec 02, 2008 6:44 pm

SzG wrote:From v22.4 to v.22.7 were there changes affecting playing strength or were they only memory leakage fixes?

There were playing strength changes as well. Any eval tuning that was in progress was included since I make a change, test, and if it is better in our cluster testing, that change is instantly kept as a part of the current version, else it is removed and we continue testing other changes...

There is at least a +20 Elo improvement between 22.4 and current version...

Graham Banks · Post by **Graham Banks** » Tue Dec 02, 2008 7:16 pm

bob wrote:
SzG wrote:From v22.4 to v.22.7 were there changes affecting playing strength or were they only memory leakage fixes?
There were playing strength changes as well. Any eval tuning that was in progress was included since I make a change, test, and if it is better in our cluster testing, that change is instantly kept as a part of the current version, else it is removed and we continue testing other changes...

There is at least a +20 Elo improvement between 22.4 and current version...

We've kept 22.4 and 22.5 separate, but will combine 22.7 and 22.8 under 22.8 as suggested.
Thanks Bob.

bob · Post by **bob** » Tue Dec 02, 2008 8:26 pm

Graham Banks wrote:
bob wrote:
SzG wrote:From v22.4 to v.22.7 were there changes affecting playing strength or were they only memory leakage fixes?
There were playing strength changes as well. Any eval tuning that was in progress was included since I make a change, test, and if it is better in our cluster testing, that change is instantly kept as a part of the current version, else it is removed and we continue testing other changes...

There is at least a +20 Elo improvement between 22.4 and current version...
We've kept 22.4 and 22.5 separate, but will combine 22.7 and 22.8 under 22.8 as suggested.
Thanks Bob.

That's up to you. The problem, from my perspective, is that at times, new versions come slowly. Took a major effort to create 21.0 by renaming/numbering all the bits to match X86. Took an even larger effort to create 22.0 by removing all the white/black duplication. So versions come out slowly. Usually the changes are chess-related and do not have associated bugs to speak of because they are tested thoroughly. But in 22.x we switched back to threads to address a windows issue with smpnice=1, and I can't test those changes since I don't run windows. And as a result, I was up to my armpits in issues with the windows thread changes. And that produced a flurry of new versions. I hope that has now slowed down. In fact, we might not do anything until 23.0 comes out either late this month or perhaps in January. I am working on a q-search change that might be helpful, and the evaluation tweaks and a change to statically initialized Zobrist random numbers will require a new version since books will be incompatible between 22.x and 23.x

The latest Crafty versions

Re: The latest Crafty versions

Re: The latest Crafty versions

Re: The latest Crafty versions

Re: The latest Crafty versions

Re: The latest Crafty versions

Re: The latest Crafty versions

Re: The latest Crafty versions