Question for the all the TD's out there

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

User avatar
opraus
Posts: 166
Joined: Wed Mar 08, 2006 9:49 pm
Location: S. New Jersey, USA

Question for the all the TD's out there

Post by opraus »

Could/would/have you categorize[d] the engines into strengths and weaknesses eg, in 'Activity', 'King Attacks', 'Passers', 'Endings' ?

And, does NNSSL.epd [Dann's SherwinSilver suite] have any 'pattern' to it? And if not, could it be sorted by opening 'class'?

If we [authors] could tell eg that we do worst against strong king attackers, especially in q-side games ... it would help to focus on certain areas of eval().

Eg, you take the outcome of a tournament division and compare each engine results against eg, the 'King Slayers' of the bunch. And again against the 'Pawn Pushers' and again against some other category of 'known strength' players.

Maybe do the same with regard to opening 'kinds'.

This way, from the results of a single tournament, we could see much more than, 'Engine Z got better!' We could say: 'Engine G got better in the closed positions or with regard to king safety. - simply by parsing out the results of a single tournament.

But we would need a reasonable sense of who was better at what. [Which I imagine, many of you already have].

Thoughts?

Thanks.

David
User avatar
Graham Banks
Posts: 44606
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

Re: Question for the all the TD's out there

Post by Graham Banks »

This would make it too much like work and less like fun for many I suspect.

I can tell you that with later versions of Xpdnt, there is some kind of bug that kicks in at longer time controls that causes Xpdnt to give up its queen for nothing, often on h2 or h7.
This doesn't seem to happen early in the search and therefore Xpdnt gets quite good blitz results.

Regards, Graham.
gbanksnz at gmail.com
User avatar
opraus
Posts: 166
Joined: Wed Mar 08, 2006 9:49 pm
Location: S. New Jersey, USA

Re: Question for the all the TD's out there

Post by opraus »

Hi Graham,

Yes, true. Maybe this was more a 'provocation' for ideas ...

With regard to my first Q., at least - could you say which engines are better at negotiating King Attacks, or End games, or piece activity?

I might then use Scid to compare results of Xpdnt to various 'sorts' of engines within a tournament of otherwise similar strengths.

Thanks too for the heads up about the Queen thingy :)

Is that the latest version too, or too early to say yet?

-David
User avatar
Graham Banks
Posts: 44606
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

Re: Question for the all the TD's out there

Post by Graham Banks »

opraus wrote:Hi Graham,

Yes, true. Maybe this was more a 'provocation' for ideas ...

With regard to my first Q., at least - could you say which engines are better at negotiating King Attacks, or End games, or piece activity?

I might then use Scid to compare results of Xpdnt to various 'sorts' of engines within a tournament of otherwise similar strengths.

Thanks too for the heads up about the Queen thingy :)

Is that the latest version too, or too early to say yet?

-David
I've not used the latest version of Xpdnt yet, so can't say if the queen giveaway is still a problem, but it affected all releases between 061030 and the latest.
Something you introduced or changed that obviously caused it.

With 5 tourneys and 3 gauntlets consecutively on the go, I must say I don't get the same chance as I used to of studying the games, so categorising strengths and weaknesses is more difficult these days.

Xpdnt has quite a nice style of play, but opens itself up too much and loses from better positions as a consequence. It gets caught at the back so to speak.
It must value mobility and control quite highly because it seems happy to cede a slight material imbalance for control.
If you can tighten up on that and solve the queen giveaway (if not already done), you should get better results.

With regards to styles of other engines, I could only make such general comments based on my observations and memory. :P

Regards, Graham.
gbanksnz at gmail.com
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Question for the all the TD's out there

Post by Michael Sherwin »

And, does NNSSL.epd [Dann's SherwinSilver suite] have any 'pattern' to it? And if not, could it be sorted by opening 'class'?
As a programmer I was not happy with the way my program tested with the Silver set. So, I decided to create my own set of positions for testing. The philosophy behind my set was simple.

1) at most only one set of pawns exchanged
2) all pieces remaining on the board
3) many pawn formations
4) many different opening lines (that do not violate 1, 2 or 3
5) interesting early opening to late middle game positions
6) any position winable or looseable against other engines that are close in strength
7) positions that do not require a lot of knowledge to play well as well as positions that do. And everything inbetween

For the most part I am happy with it! :)

But, I admit that I could be a little biased!! :D

Also, I reiterate that my test set is for programmers and not for testers.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
User avatar
opraus
Posts: 166
Joined: Wed Mar 08, 2006 9:49 pm
Location: S. New Jersey, USA

Re: Question for the all the TD's out there

Post by opraus »

Hi Mike,

Thanks for your answer.

If an engine, say, did a little better in the first and last 3rd of the positions in your suite, could any reasonable assumptions be made? Like, 'it seems to like q-side play' or, 'it hates closed positions'

Dont openings 'tend' to certain 'themes' which could hint at strengths and weaknesses of engines? Are the positions in your suite so ordered? ie, by opening type and/or the 'themes' they tend to.

My next question would be, which 'themes' are associated with which openings. I would be satisfied with a general, even simplistic distinction between the openings.

-David
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Question for the all the TD's out there

Post by Michael Sherwin »

opraus wrote:Hi Mike,

Thanks for your answer.

If an engine, say, did a little better in the first and last 3rd of the positions in your suite, could any reasonable assumptions be made? Like, 'it seems to like q-side play' or, 'it hates closed positions'

Dont openings 'tend' to certain 'themes' which could hint at strengths and weaknesses of engines? Are the positions in your suite so ordered? ie, by opening type and/or the 'themes' they tend to.

My next question would be, which 'themes' are associated with which openings. I would be satisfied with a general, even simplistic distinction between the openings.

-David
Hi David,

Maybe something of the nature of what you describe can be done. However, the slightest change in the code to my program will cause a completely different 'look' to the outcome. RomiChess can easily win or loose any of the fifty positions and do better or worse in any section of the set. In the end it is the total number of points gained that matter to the programmer.

What the tester has to realize is that for every version of a program that is released, many-many-many versions are tested privately by their authors and that it is extreamly time consuming. Many of us only have time for general testing and not enough of it. That is why weaker versions of programs 'sneak through the cracks' to the public. Authors that release frequently suffer from this most. Although, authors that release only once a year or so, can also put out a flop.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through