Series Chess programs Test! Share your opinion

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

playwaycool

Series Chess programs Test! Share your opinion

Post by playwaycool »

Hey guys,


I do believe that I brought this topic in the past but I haven't seen anyone take this into account!



I have seen a lot of folks make testing to see the overall strength of a chess engine, some used general opening book and others used sorta tuned book or fritz/shredder or hiarcs opening books.



Here is a list of suggestions that i believe it will make testing more interesting to watch and it could lead to some improvement to chess strength of a program! Instead of testing programs over and over again and you will only see rybka is on the top and you may get similar results every time when you have engine tournaments. Which that alone can be boring to some of us, would be worth it if we can find slight improvement ;)



List for testing ;

1) The use of 3-4-5-6 Nalimov Tablebase (Okay, it is understandable piece 6 is hard to get it since it's large size but it will certainly improve the endgame/solve most endings plus it would be interesting to see if there very few engines would take advantage for it over others) It may improve engines by maybe 3-5 Elo if not slightly more!


2) Hand tuned opening book (This is very necessary, and honestly I believe the people that use general opening book for testing will not get the absolute conclusion of the program strength, The fact is think about it that way, human player that play over the board tournament have his own openings that difference rest of the players. Someone need to create or find openings that fit exactly to the programs style!!)




3) Programs settings! (Like rybka or hiarcs. It is also worth trying different settings, I am certain there are some mystery settings that are better than default that comes with the program. Someone can try that by having engine tournament and you can find out after numerous of testing an improvement will be seen!





4) Contempt (Now this is important and I haven't seen folks share games/tournaments results with messing with Contempt, although it's recommended to leave it at 0. but i believe this has a impact, you can use + value for weaker program to favor more draws if it is especially playing against stronger programs, and opposite for stronger programs to - value to see if they can have more wins and avoid draws against weaker/medium strength programs!)





I would like everyone opinions, I believe doing all four different testings will provide a more interesting results. I will be going for it myself and see . Now the use of hardware, that is not too much of issue, we can assume all programs are using quads and 64 bit windows. So this is even a strong push to see the real strength of all programs out there!
playwaycool

Re: Series Chess programs Test! Share your opinion

Post by playwaycool »

heh, I guess the title probably is confusing my bad or topic could be boring :roll:
Dann Corbit
Posts: 12803
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Series Chess programs Test! Share your opinion

Post by Dann Corbit »

playwaycool wrote:Hey guys,


I do believe that I brought this topic in the past but I haven't seen anyone take this into account!



I have seen a lot of folks make testing to see the overall strength of a chess engine, some used general opening book and others used sorta tuned book or fritz/shredder or hiarcs opening books.



Here is a list of suggestions that i believe it will make testing more interesting to watch and it could lead to some improvement to chess strength of a program! Instead of testing programs over and over again and you will only see rybka is on the top and you may get similar results every time when you have engine tournaments. Which that alone can be boring to some of us, would be worth it if we can find slight improvement ;)



List for testing ;

1) The use of 3-4-5-6 Nalimov Tablebase (Okay, it is understandable piece 6 is hard to get it since it's large size but it will certainly improve the endgame/solve most endings plus it would be interesting to see if there very few engines would take advantage for it over others) It may improve engines by maybe 3-5 Elo if not slightly more!
All experiments I have seen to date show no strength improvement for EGTB. There is perhaps 50 Elo for bitbase files. It appears that the cost of the lookup is about equal to the cost of the calculation. Also, you have to get to the endgame for them to come into play. Most games are decided before the board is clear.


2) Hand tuned opening book (This is very necessary, and honestly I believe the people that use general opening book for testing will not get the absolute conclusion of the program strength, The fact is think about it that way, human player that play over the board tournament have his own openings that difference rest of the players. Someone need to create or find openings that fit exactly to the programs style!!)
There are lots of book experiments. You can read about some on the testing forum just around the corner.




3) Programs settings! (Like rybka or hiarcs. It is also worth trying different settings, I am certain there are some mystery settings that are better than default that comes with the program. Someone can try that by having engine tournament and you can find out after numerous of testing an improvement will be seen!
People have also tried this. ChessMaster has been tested very extensively. Open source code for personalities can be found in the Beowulf code base.





4) Contempt (Now this is important and I haven't seen folks share games/tournaments results with messing with Contempt, although it's recommended to leave it at 0. but i believe this has a impact, you can use + value for weaker program to favor more draws if it is especially playing against stronger programs, and opposite for stronger programs to - value to see if they can have more wins and avoid draws against weaker/medium strength programs!)
Contempt is not very important and while it may win you a few Elo I don't think it is very interesting from a scientific standpoint.




I would like everyone opinions, I believe doing all four different testings will provide a more interesting results. I will be going for it myself and see . Now the use of hardware, that is not too much of issue, we can assume all programs are using quads and 64 bit windows. So this is even a strong push to see the real strength of all programs out there!
playwaycool

Re: Series Chess programs Test! Share your opinion

Post by playwaycool »

Yes, I agree with you mostly. However Regarding the tablebases it makes sense to have them, let's not think of them how much improvement elo they will make but it wouldn't harm to have 3-4-5-6, 6 is good to solve more complex endings. I believe in Rybka match against Zappa match, Rybka missed a mate in 40! If it had tablebases 6 it would have seen it. So it's clear it's worth having



Speaking of Opening books. We both seem to agree that there are experts on those! But I think you missed my point, I haven't seen someone produce engine tournament and allow custom opening for different engines in a tournament instead of using a custom or general opening book for all engines in the tournament, each engins need to use separate
custom opening that fit the style of the engine just like in International computer events or human OTB tournaments.




with my idea of Contempt if we can find good adjustment it may gain few elo and along with finding an interesting settings for each engines that may increase by 5-10 elo (someone found good settings with Toga if I recall) this adds on for other things we modify for program and opening book alone is huge factor. Part of reason shredder and fritz are strong programs it's also due to their strong opening books! but there are still room for improvement to get closer and closer to improve a chess program by using highly advanced opening book! Assuming it's right on top of ranks and use multi processor! (openings have big influence how middle games will be played)


Thanks for your views Dann!
User avatar
Graham Banks
Posts: 44891
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

Re: Series Chess programs Test! Share your opinion

Post by Graham Banks »

Fiddling with the contempt settings is generally a recipe for disaster.
Whilst it may enable a better overall result against much weaker engines, it will backfire against those of higher strength, similar strength or even slightly lower.
gbanksnz at gmail.com
Dann Corbit
Posts: 12803
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Series Chess programs Test! Share your opinion

Post by Dann Corbit »

playwaycool wrote:Yes, I agree with you mostly. However Regarding the tablebases it makes sense to have them, let's not think of them how much improvement elo they will make but it wouldn't harm to have 3-4-5-6, 6 is good to solve more complex endings. I believe in Rybka match against Zappa match, Rybka missed a mate in 40! If it had tablebases 6 it would have seen it. So it's clear it's worth having
That assumes it would not have lost the game because of spending a ton of time searching through EGTB files and missing some easy tactic because of it. I will believe that they are helpful when I see an experiment that demonstrates it. I do believe that if you have 100 GB of RAM and make the whole pile memory resident, it will simply have to help. But it will be a while before that much RAM becomes cheap.


Speaking of Opening books. We both seem to agree that there are experts on those! But I think you missed my point, I haven't seen someone produce engine tournament and allow custom opening for different engines in a tournament instead of using a custom or general opening book for all engines in the tournament, each engins need to use separate
custom opening that fit the style of the engine just like in International computer events or human OTB tournaments.
I suppose that you are talking about matching up against certain opponents (e.g. your database says that Shredder plays poorly against the French, so play the French when possible against Shredder). That could easily be handled via statistical collection. I suppose that it is interesting from a competition point of view, but not that interesting from a scientific one. You are just exploiting bugs in the other program's opening book or eval or search. I would rather concentrate on making my program better than looking for weaknesses in other programs.




with my idea of Contempt if we can find good adjustment it may gain few elo and along with finding an interesting settings for each engines that may increase by 5-10 elo (someone found good settings with Toga if I recall) this adds on for other things we modify for program and opening book alone is huge factor. Part of reason shredder and fritz are strong programs it's also due to their strong opening books! but there are still room for improvement to get closer and closer to improve a chess program by using highly advanced opening book! Assuming it's right on top of ranks and use multi processor! (openings have big influence how middle games will be played)
The whole idea of contempt is simply not interesting to me, but I imagine that it is very interesting to others. Most tournaments tend to be largely against equal or nearly equal players. The few players who would get a large contempt factor are almost certainly going to lose anyway so I do not see how a big benefit can come from it anyway.

Thanks for your views Dann!
playwaycool

Re: Series Chess programs Test! Share your opinion

Post by playwaycool »

I agree with Graham contempt could be disaster but it's need an accurate care to make a small adjustment, someone can certainly take a risk with it for weaker programs against stronger opponents.



Okay, Dann. Let me give you an example of what I mean which will be easier to understand more clearly. http://64.68.157.89/forum/viewtopic.php ... 18&t=18391

I will use easier example. Let's say we are running engine tournament with 6 programs. 1) Rybka 2.3.2a 32 MP 2) Rybka 2.3.2.a 64 MP 3)Zappa Mexico X64 4) Hiarcs 11.2 MP 5) Deep Shredder 11 X64 6) Deep Gandalf 7 (they were deep gandalf beta testing and it played an international event).

Say we are using the same time control 4m+2 probably wiser to try way longer time control but the idea remains the same for the testing i am suggesting.


Also lets say we are using egtb 3-4-5-6 and the tournament is Round robin 10 rounds (150 games).



I know some programs do not support all 8 CPU yet, lets use more precise hardware, perhaps lets assume all programs will be using Intel Quads on high speed!




Now to the critical part. Daniel used Sedat Perfect X Beta for all 6 programs as general opening book and I am assuming he wants to see how the 6 programs would perform under 8 CPU and how they will place.
I am certain it's good to use such a great general opening book and use exact same tablebases and hardware to see which program will lead the field, the issue with that the results looks fairly reasonable



But it's not showing true strength of each program. Now if I were to make the testing. The two rybka's would use a opening that custom for it's style to play it's own openings with it's style. The reason I added two Rybka's because we can edit the settings for Rybka 32 bit and use slightly - Contempt value (rybka is strongest and to set the value to avoid draws is reasonable!) or we can try with 64 bit one and leave the other one at default!




Now with hiarcs, gandalf, shredder and Zappa mexico, we will use different opening book that fit their style so for all 6 programs, we will use 5 different custom opening (assuming they are very advanced book and plays best under each engines!). We can even create duplicate and allow more than 6 to play to try various different settings we leave one default and modify the duplicate one and compare which one will perform better!


We also assuming all 6 programs support all 3-4-5-6 tablebases. With Deep gandalf we can use a high contempt value (to favor draw). And tweak it's settings because we know the rest of 6 should score higher than deep gandalf!




The testing should be highly interesting, What we did is simple, we are using 6 different openings that we know that it improves the program slightly better and we may also found a nice settings that program plays stronger than the default. Rybka will not have an easy ride it will still score first but all programs will carry up a high fights including Gandalf!



Do you guys see my point of view? Why use general opening for all engine in a tournament? When everyone use different opening books that fit programs style that plays better.
playwaycool

Re: Series Chess programs Test! Share your opinion

Post by playwaycool »

One additional note carrying on from my previous post. Chess playing program is not only based on how strong is it on Middle games and/or Endings. It's opening is a big factor too, this factor is a big one for top Grand Masters game, the difference between super GM playing GM, it could be opening preparation! It's also the same with chess playing programs!


That being said, If a chess playing program likes to play tactics or favor Tactics positions, then choice openings that will lead to tactics shots. If it is to favor pure positional games, then choice positional openings and list goes on for different style of the programs, you don't want to play tactical opening when your program is known to play better positional!



Therefore general opening doesn't fit nicely at all, you would feel the engine is lacking something and it may lead to lose more games than it should!
Dann Corbit
Posts: 12803
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Series Chess programs Test! Share your opinion

Post by Dann Corbit »

Many, perhaps most chess tournaments use the program's own books which (presumably) have been tuned for that particular engine.

The value of opening books is well known and that is why the big chess contests consider both the book writers and the chess programmers as co-authors.

All chess engines like tactics, though some are smart enough to know you should develop first before looking for tactical insights.

I am not sure that what you want to do is not already being done. There are some people who put a good deal of energy into contempt calculation. There are lots of people who spend endless hours of book-building (I have built a few books myself, though I am not as good at it as other people are).

The SSDF (for instance) uses native books when possible instead of generic books.

I think that there is interest and value in just about any sort of chess contest. If you have something very specific in mind (somehow, I only get generalities from your descriptions) then why not set it up yourself and see how it comes out.
playwaycool

Re: Series Chess programs Test! Share your opinion

Post by playwaycool »

Thank you for the replies Dann! I was hoping for more folks to share their opinions as well.


I think the point being here is I was hoping to see folks share results on CCC forum with different testing instead the ordinary one generic opening book is used for all the program in one engine tournament!



This is a hobby for most people here with exception of some that takes it to a serious level. To share a little fact, if we take Fritz for instance, Fritz does perform better when it plays it's Own book. If you put Fritz 11 in a tournament among others and used general book it will not perform that much. So I think that is saying something.




Bottom line, If anyone able to take requests and able to make such testing and share it with us will be nice to see!



I am not speaking for authors or programmers here, I am speaking terms of testing and finding a useful improvement with what we have right now for hobby/fun. Programmers does have their own hardware that they use for top of the line and use tournament book for international events.


Yes, you got a point there 8-) I will test it myself but I was waiting to get new cpu and hard drive for TBs. i was hoping anyone with good cpu able to share results.