Bob: Crafty improvements 23.2 / 23.3 / 23.4

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Frank Quisinsky
Posts: 6808
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Bob: Crafty improvements 23.2 / 23.3 / 23.4

Post by Frank Quisinsky »

Hello Bob,

if I remember me ...

1.
You wrote that Crafty 23.3 is around 55 ELO stronger as Crafty 23.4.
With 40 moves in 10 minutes = 44 ELO in SWCR

2.
You wrote that Crafty 23.4 is around 15 ELO stronger as Crafty 23.3.
With 40 moves in 10 minutes = +38 ELO in SWCR but not enough games yet (275 games played, 725 games are to play up to December 14th, 2010).

OK, theory and praxis but ...
It seems you test your engine with very short time controls. It could be possible, that you can't see that higher improvements are possible with longer time controls in your testing.

Could be a bigger problem with all the changes you made with Crafty in the last years because I think, Crafty have potential for a much bigger rating.

Now ...
If you have the results ... Crafty 23.4 is 15 ELO stronger as Crafty 23.3 after your thousends of very fast games you have a good chance to find out more.

Because it seems that Crafty 23.4 is much more stronger as 15 ELO to Crafty 23.3. Could be possible that one of your latest changes gave Crafty a bigger step as you await after your testing results.

Crafty SWCR resutls so far ...

Code: Select all

  74 Crafty 23.4 JA x64           2638   35   36   275   38%  2720   33% NEW +  38 ELO
  93 Crafty 23.3 JA x64           2600   18   18  1120   35%  2713   35% 
 100 Crafty 23.3 JA w32           2566   19   20   960   31%  2705   32% 
 103 Crafty 23.2 JA x64           2556   18   19  1120   28%  2720   30% 
 105 Crafty 23.3 JA x64 NP        2553   21   21   880   27%  2722   32% (test without ponder) 
Loud thinking ...
But let "us" wait of more games ...

Best
Frank

PS: For the x64 version after 275 games ...
No game lost on time. But more problems with lost on times has the w32 version of Crafty 23.3, I will test the w32 version of Crafty 23.4 after the still running SWCR-64 tourney. Time mangement looks very very fine in Crafty 23.4 x64!
Engin
Posts: 918
Joined: Mon Jan 05, 2009 7:40 pm
Location: Germany
Full name: Engin Üstün

Re: Bob: Crafty improvements 23.2 / 23.3 / 23.4

Post by Engin »

that is why i am not saying of differences between the version, because you cant really say it, that must be prove in different time controls and many many played games.

it is not enough to make test with some games with short time control to see if the new version makes progress or not.

i believe only by the tournaments results with longer time controls, but to testing this with lets say 1000 or 10000 games i dont have so much time for this ;)
Frank Quisinsky
Posts: 6808
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: Bob: Crafty improvements 23.2 / 23.3 / 23.4

Post by Frank Quisinsky »

Hi Engin,

I don't know which person bandied the story that we need thousands of games for a good rating. First questions, what is a good rating?

I have for each of the 98 tested engines an ELO develop-curve.

After around 520 games only Naum 4.1 w32 and GullChess 1.0a x64 (statistic randoms) comes with a results with more as 20 ELO up to game number 1.000 or higher game numbers.

+-10 is in my opinion good enough for a rating.
I think it make no sense to play 5.000 games for a +-5 rating if I can play 500 games with a +-10 rating.

Means that after my statistics we have only in 1/52 cases an engine with a different from 20 ELO from game number 500 - higher!

Important for such an statistic is that games vs. many others different engines are played. Much more important as the number of games.

In the still running SWCR-64 tourney comes now the interesting rating building phase for each of the new eleven engines I pick up. At the moment the new engines played 275 games :-)

Right is in my opinion, that different time controls will give us different ratings. I think that a hand full TOP-24 engines will have a higher rating with longer time controls (example 40 in 120) ... perhaps with 30 ELO ... and a hand full others engines will have a slower rating with longer time controls ... perhaps with 30 ELO too. But nobody can test it in detail or a group of persons must have around 40 Quad Core machines for testing such things.

But what we know ... better we can supposing ... if we compare ... for an example ... 40 in 2 with 40 in 10. Think so!

Best
Frank

PS: A big topic in German chess fora with others which are working on such a list. Many different opinions here. But I look on my statistics and I am happy with it ... more with the still playing games, means I like it more to visit the still running games if I have time for it.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Bob: Crafty improvements 23.2 / 23.3 / 23.4

Post by bob »

Frank Quisinsky wrote:Hello Bob,

if I remember me ...

1.
You wrote that Crafty 23.3 is around 55 ELO stronger as Crafty 23.4.
With 40 moves in 10 minutes = 44 ELO in SWCR

2.
You wrote that Crafty 23.4 is around 15 ELO stronger as Crafty 23.3.
With 40 moves in 10 minutes = +38 ELO in SWCR but not enough games yet (275 games played, 725 games are to play up to December 14th, 2010).

OK, theory and praxis but ...
It seems you test your engine with very short time controls. It could be possible, that you can't see that higher improvements are possible with longer time controls in your testing.

Could be a bigger problem with all the changes you made with Crafty in the last years because I think, Crafty have potential for a much bigger rating.

Now ...
If you have the results ... Crafty 23.4 is 15 ELO stronger as Crafty 23.3 after your thousends of very fast games you have a good chance to find out more.

Because it seems that Crafty 23.4 is much more stronger as 15 ELO to Crafty 23.3. Could be possible that one of your latest changes gave Crafty a bigger step as you await after your testing results.

Crafty SWCR resutls so far ...

Code: Select all

  74 Crafty 23.4 JA x64           2638   35   36   275   38%  2720   33% NEW +  38 ELO
  93 Crafty 23.3 JA x64           2600   18   18  1120   35%  2713   35% 
 100 Crafty 23.3 JA w32           2566   19   20   960   31%  2705   32% 
 103 Crafty 23.2 JA x64           2556   18   19  1120   28%  2720   30% 
 105 Crafty 23.3 JA x64 NP        2553   21   21   880   27%  2722   32% (test without ponder) 
Loud thinking ...
But let "us" wait of more games ...

Best
Frank

PS: For the x64 version after 275 games ...
No game lost on time. But more problems with lost on times has the w32 version of Crafty 23.3, I will test the w32 version of Crafty 23.4 after the still running SWCR-64 tourney. Time mangement looks very very fine in Crafty 23.4 x64!
Rmemeber that my testing is very much unlike what you are doing. I am forcing Crafty to play all sorts of openings, openings I would never want it to play in real games. My intent is to improve its overall play in all positions, and then for tournaments we tune the book to take it into positions is plays best.

It is quite possible that by avoiding some oddball openings, it would show up as stronger than what I report here. And I do, on occasion, run long time controls. I have a 30+30 run going now since there is no other testing "in the works" at the moment...

The opening book really has a significant impact on results, but everyone has known that for years. But for my testing, I am not trying to test our book tuning/learning at all.

I might try a "book on" test here next to use a normal opening book + learning, to see what happens, just for fun. But I like the consistent set of positions we use without the added noise introduced by tuned variations and learning...
Frank Quisinsky
Posts: 6808
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: Bob: Crafty improvements 23.2 / 23.3 / 23.4

Post by Frank Quisinsky »

Hi Bob,

very interesting what you wrote about your test mechanism.
Seems logical and full of good ideas.

OK, I wait now of my final results.
End of the day the Prof have the best results :-)

But if Crafty now around 40 ELO stronger after 1.000 SWCR games I think it could be possible that one of your latest changes will give Crafty a little push to a higher rating with longer time controls.

I read for months what you in open chess fourm wrote to LMR / NullMove settings. A big posting with very intersting test results by yourself. Never saw such a posting from a chess programmer. I believe its not easy to find here a setting which is good for shorter and longer time controls ... if I understand. Thinking if you changed here a little bit in the newer Crafty 23.4 again ... not sure that you do that ... that this one could be good for longer time controls.

So I wrote my TalkChess message to this topic.

Best
Frank
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Bob: Crafty improvements 23.2 / 23.3 / 23.4

Post by bob »

Frank Quisinsky wrote:Hi Engin,

I don't know which person bandied the story that we need thousands of games for a good rating. First questions, what is a good rating?

I have for each of the 98 tested engines an ELO develop-curve.

After around 520 games only Naum 4.1 w32 and GullChess 1.0a x64 (statistic randoms) comes with a results with more as 20 ELO up to game number 1.000 or higher game numbers.

+-10 is in my opinion good enough for a rating.
I think it make no sense to play 5.000 games for a +-5 rating if I can play 500 games with a +-10 rating.

Means that after my statistics we have only in 1/52 cases an engine with a different from 20 ELO from game number 500 - higher!

Important for such an statistic is that games vs. many others different engines are played. Much more important as the number of games.

In the still running SWCR-64 tourney comes now the interesting rating building phase for each of the new eleven engines I pick up. At the moment the new engines played 275 games :-)

Right is in my opinion, that different time controls will give us different ratings. I think that a hand full TOP-24 engines will have a higher rating with longer time controls (example 40 in 120) ... perhaps with 30 ELO ... and a hand full others engines will have a slower rating with longer time controls ... perhaps with 30 ELO too. But nobody can test it in detail or a group of persons must have around 40 Quad Core machines for testing such things.

But what we know ... better we can supposing ... if we compare ... for an example ... 40 in 2 with 40 in 10. Think so!

Best
Frank

PS: A big topic in German chess fora with others which are working on such a list. Many different opinions here. But I look on my statistics and I am happy with it ... more with the still playing games, means I like it more to visit the still running games if I have time for it.
It all depends on _what_ you are trying to measure.

In your case, you want to know whether program A is stronger than B or not, and usually the difference between A and B is more than "just a few Elo". That doesn't take as many games as what I am measuring,

Which is which version is better, A or A', where A' is a minor modification to A. That takes tens of thousands of games because often the Elo gain is 1, 2, 3... and a +/- 10 error bar is useless for measuring that.

As far as what type of game time to use, again, this is about what are you trying to measure.

I have found, with now hundreds of millions of games of testing, that if you compare A and B at a fast time control (by playing against a gauntlet of different opponents), and then you compare A and B with slow time controls against the same opponents, in general the value abs(Elo(a)-Elo(b)) is pretty constant. You might get 2600 and 2640 at slow games and 2700 and 2740 at fast games, but for me, I don't care about the absolute Elo, I only care "Which version plays better." And when we have proven (by actually running the test using both fast and slow games, not by guessing) that the rating difference between the two versions doesn't change as the time control is varied, even if the absolute Elo values do change, that is certainly justification for relying mainly (not exclusively however) on fast games because the testing can be completed so much quicker.

it does work. Even if your testing shows that the rating improvements are more (or less) than what I report. A positive slope is all I want.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Bob: Crafty improvements 23.2 / 23.3 / 23.4

Post by bob »

Frank Quisinsky wrote:Hi Bob,

very interesting what you wrote about your test mechanism.
Seems logical and full of good ideas.

OK, I wait now of my final results.
End of the day the Prof have the best results :-)

But if Crafty now around 40 ELO stronger after 1.000 SWCR games I think it could be possible that one of your latest changes will give Crafty a little push to a higher rating with longer time controls.

I read for months what you in open chess fourm wrote to LMR / NullMove settings. A big posting with very intersting test results by yourself. Never saw such a posting from a chess programmer. I believe its not easy to find here a setting which is good for shorter and longer time controls ... if I understand. Thinking if you changed here a little bit in the newer Crafty 23.4 again ... not sure that you do that ... that this one could be good for longer time controls.

So I wrote my TalkChess message to this topic.

Best
Frank
Just don't forget the book. For example, suppose Crafty simply can not handle positions where it is black and has played g6 and BG7. If I don't address that (It actually doesn't have a serious problem, this is just an example) then in my testing the overall result will not show as much improvement when I change something like pawn structure code or whatever, because my testing _forces_ Crafty to play some openings with g6/Bg7 and my new changes don't help those games at all. But in a real tournament, we would simply not play openings that lead to g6/Bg7 and our rating would show a stronger jump than in my testing. But for my testing, this is fine, because we slowly address every weakness we see, over a period of time, so that we play better when we are forced to play openings we would prefer to avoid when someone wants to do a thematic tournament or whatever...