Every version probably had a slight eval tweak. So 22.7 is the strongest. There is probably a 20 Elo spread from 22.4 to 22.7. So not that significant...SzG wrote:From v22.4 to v.22.7 were there changes affecting playing strength or were they only memory leakage fixes?
The latest Crafty versions
Moderator: Ras
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: The latest Crafty versions
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: The latest Crafty versions
I suspect we will be making significant progress in each new release. Tracy just found another 7 Elo in a change we just finished testing, a different way of implementating the old "bad trade" penalty in Crafty... Hopefully there will not be any steps backward because of the cluster-testing we do for validation...SzG wrote:Thank you very much. Your reply was important from the point of view of a tester who works for a rating list.bob wrote:Every version probably had a slight eval tweak. So 22.7 is the strongest. There is probably a 20 Elo spread from 22.4 to 22.7. So not that significant...SzG wrote:From v22.4 to v.22.7 were there changes affecting playing strength or were they only memory leakage fixes?
Re: The latest Crafty versions
Are you really confident you can measure such small changes in playing strength?bob wrote:...Tracy just found another 7 Elo in a change we just finished testing...
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: The latest Crafty versions
Yes. My small test has a +/-4 error. My big test is +/- 1...glorfindel wrote:Are you really confident you can measure such small changes in playing strength?bob wrote:...Tracy just found another 7 Elo in a change we just finished testing...
I'm reporting conservative numbers. For example, if the last test was 2590+/-4 I count that as 2594. If the current test is 2608+/-4 I count that as 2604. Which is a +10 difference. It is likely that the difference is greater, but using the distance between the two "edges" gives a conservative value which is what I have been quoting...
Some changes require that I use a longer test. And for some tuning, when I am making very small (final) adjustments, we need more accuracy and run a bigger test...
Most of what we have done in the past few months has been clearly detectable with our one-hour test run (when the cluster is completely idle, that is, which is most of the time).
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: The latest Crafty versions
There were playing strength changes as well. Any eval tuning that was in progress was included since I make a change, test, and if it is better in our cluster testing, that change is instantly kept as a part of the current version, else it is removed and we continue testing other changes...SzG wrote:From v22.4 to v.22.7 were there changes affecting playing strength or were they only memory leakage fixes?
There is at least a +20 Elo improvement between 22.4 and current version...
-
- Posts: 44653
- Joined: Sun Feb 26, 2006 10:52 am
- Location: Auckland, NZ
Re: The latest Crafty versions
We've kept 22.4 and 22.5 separate, but will combine 22.7 and 22.8 under 22.8 as suggested.bob wrote:There were playing strength changes as well. Any eval tuning that was in progress was included since I make a change, test, and if it is better in our cluster testing, that change is instantly kept as a part of the current version, else it is removed and we continue testing other changes...SzG wrote:From v22.4 to v.22.7 were there changes affecting playing strength or were they only memory leakage fixes?
There is at least a +20 Elo improvement between 22.4 and current version...
Thanks Bob.
gbanksnz at gmail.com
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: The latest Crafty versions
That's up to you. The problem, from my perspective, is that at times, new versions come slowly. Took a major effort to create 21.0 by renaming/numbering all the bits to match X86. Took an even larger effort to create 22.0 by removing all the white/black duplication. So versions come out slowly. Usually the changes are chess-related and do not have associated bugs to speak of because they are tested thoroughly. But in 22.x we switched back to threads to address a windows issue with smpnice=1, and I can't test those changes since I don't run windows. And as a result, I was up to my armpits in issues with the windows thread changes. And that produced a flurry of new versions. I hope that has now slowed down. In fact, we might not do anything until 23.0 comes out either late this month or perhaps in January. I am working on a q-search change that might be helpful, and the evaluation tweaks and a change to statically initialized Zobrist random numbers will require a new version since books will be incompatible between 22.x and 23.xGraham Banks wrote:We've kept 22.4 and 22.5 separate, but will combine 22.7 and 22.8 under 22.8 as suggested.bob wrote:There were playing strength changes as well. Any eval tuning that was in progress was included since I make a change, test, and if it is better in our cluster testing, that change is instantly kept as a part of the current version, else it is removed and we continue testing other changes...SzG wrote:From v22.4 to v.22.7 were there changes affecting playing strength or were they only memory leakage fixes?
There is at least a +20 Elo improvement between 22.4 and current version...
Thanks Bob.