ChessUSA.com TalkChess.com
Hosted by Your Move Chess & Games
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

What your opinion about this testing methodology?
Goto page 1, 2  Next
 
Post new topic       TalkChess.com Forum Index -> Computer Chess Club: Programming and Technical Discussions Threaded
View previous topic :: View next topic  
Author Message
Fermin Serrano



Joined: 08 Feb 2008
Posts: 509
Location: Madrid - Spain

PostPosted: Tue Apr 17, 2012 5:28 pm    Post subject: What your opinion about this testing methodology? Reply to topic Reply with quote

Should it work?

I have been thinking about ways to improve testing time results. People usually use tournaments from startup position, or tournaments with a set of very limited set of position (i.e. 32), or tournaments with a lot of random positions. I asumme all people is doing this with a minimum of 1000 to 4000 games.

.... but ....

what about repeating the same tournament, with the same opponents, with the same positions per opponent?. Assuming a set of positions would be very large....

example:
Game 1, agains Crafty, black, posicion from FEN file 'myfenpositions.epd', number of position 540
Game 2, agains Critter, white, position from FEN file 'myfenpositions.epd', number of position 3251
....
etc

the idea is that the number of position would be always the same and not choosed ramdomly, without repeating any FEN, but enought varied.
The tournament file from the tournament manager would always be the same, without the need to recreate the tournament. The test would always repeat the same.

Would be results between tests more accurate than randomly choose the startup position.?
_________________
Fermin Serrano
Author of 'Rodin' engine
http://sites.google.com/site/clonfsp/
http://clonfsp.wordpress.com
Back to top
View user's profile Send private message Visit poster's website MSN Messenger
Ferdinand Mosca



Joined: 10 Aug 2008
Posts: 452
Location: Philippines

PostPosted: Wed Apr 18, 2012 6:45 am    Post subject: Re: What your opinion about this testing methodology? Reply to topic Reply with quote

Your methodology is favorable considering your goal. There is no point testing 1. b3 e5 positions if it is not in your engines repertoire. Only use test positions where you want your engine to be. Of course there are drawbacks, but that can be overcomed as you say using large number of selected positions.

Perhaps start from smaller number of positions, and as your engine able to improve from it, then add other positions, to be considered in its repertoire.

But I have a bad feeling about it, to me the engine should be able to handle all positions, it can be blocked, open, full of pinned pieces, etc.
Back to top
View user's profile Send private message
Mincho Georgiev



Joined: 04 Apr 2009
Posts: 406
Location: Bulgaria

PostPosted: Wed Apr 18, 2012 7:52 am    Post subject: Re: What your opinion about this testing methodology? Reply to topic Reply with quote

I'm somewhere in between. Testing with same tournament with same opponents /4-5/ with position set of 320 positions currently. 5/opp/ x 2/color/ x 320/positions/ = ~4000 games.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Fermin Serrano



Joined: 08 Feb 2008
Posts: 509
Location: Madrid - Spain

PostPosted: Wed Apr 18, 2012 7:56 am    Post subject: Re: What your opinion about this testing methodology? Reply to topic Reply with quote

Ferdy wrote:
Your methodology is favorable considering your goal. There is no point testing 1. b3 e5 positions if it is not in your engines repertoire. Only use test positions where you want your engine to be. Of course there are drawbacks, but that can be overcomed as you say using large number of selected positions.

Perhaps start from smaller number of positions, and as your engine able to improve from it, then add other positions, to be considered in its repertoire.

But I have a bad feeling about it, to me the engine should be able to handle all positions, it can be blocked, open, full of pinned pieces, etc.


I think you misunderstand my idea. The goal is not test only a limited set of opening positions, but a large and varied set of starting middle-game positions. The point is repeating always the same games with the same positions, but enought positions to say the engine is played a varied.
_________________
Fermin Serrano
Author of 'Rodin' engine
http://sites.google.com/site/clonfsp/
http://clonfsp.wordpress.com
Back to top
View user's profile Send private message Visit poster's website MSN Messenger
Sven Schüle



Joined: 15 May 2008
Posts: 2246
Location: Berlin, Germany

PostPosted: Wed Apr 18, 2012 9:40 am    Post subject: Re: What your opinion about this testing methodology? Reply to topic Reply with quote

Kempelen wrote:
Ferdy wrote:
Your methodology is favorable considering your goal. There is no point testing 1. b3 e5 positions if it is not in your engines repertoire. Only use test positions where you want your engine to be. Of course there are drawbacks, but that can be overcomed as you say using large number of selected positions.

Perhaps start from smaller number of positions, and as your engine able to improve from it, then add other positions, to be considered in its repertoire.

But I have a bad feeling about it, to me the engine should be able to handle all positions, it can be blocked, open, full of pinned pieces, etc.


I think you misunderstand my idea. The goal is not test only a limited set of opening positions, but a large and varied set of starting middle-game positions. The point is repeating always the same games with the same positions, but enought positions to say the engine is played a varied.

The point is, the positions are selected once by random but then always the same positions are used for testing. That's exactly what Bob is doing for a long while now, and also lots of other people, so it is not a new method but kind of "de facto standard". I recall there were long discussions about the details few years ago. Doing it that way instead of newly choosing different positions by random each time has been found to result in lower error bars as far as I remember. I guess Bob and the other experts in statistics can explain the exact reasons.

Sven
Back to top
View user's profile Send private message Visit poster's website
Ferdinand Mosca



Joined: 10 Aug 2008
Posts: 452
Location: Philippines

PostPosted: Wed Apr 18, 2012 10:12 am    Post subject: Re: What your opinion about this testing methodology? Reply to topic Reply with quote

Kempelen wrote:
Ferdy wrote:
Your methodology is favorable considering your goal. There is no point testing 1. b3 e5 positions if it is not in your engines repertoire. Only use test positions where you want your engine to be. Of course there are drawbacks, but that can be overcomed as you say using large number of selected positions.

Perhaps start from smaller number of positions, and as your engine able to improve from it, then add other positions, to be considered in its repertoire.

But I have a bad feeling about it, to me the engine should be able to handle all positions, it can be blocked, open, full of pinned pieces, etc.


I think you misunderstand my idea. The goal is not test only a limited set of opening positions, but a large and varied set of starting middle-game positions. The point is repeating always the same games with the same positions, but enought positions to say the engine is played a varied.


Did I say limited set of opening positions? Do not underestimate when I say start from smaller number of positions, as I don't underestimate when you say large. Even when you say very large there is still a limit to this, how many exactly is very large? Even from your first sentence " improve testing time results", I understand from here that you also have a limitation of resources.
When I said there is drawback is because your scheme is like this.
vs crafty use pos 1 to 100 or something
vs critter use pos 101 to 200
...
Now if you always use this test, you will probably improve score vs crafty for positions set 1 to 100, same with critter for positions 101 to 200. But the question is will the engine tuned to play vs crafty on positions 1 to 100 is equally good when it plays another engine on same opening test set?
Back to top
View user's profile Send private message
Lucas Braesch



Joined: 31 May 2010
Posts: 1757

PostPosted: Wed Apr 18, 2012 11:41 am    Post subject: Re: What your opinion about this testing methodology? Reply to topic Reply with quote

Sven Schüle wrote:
Kempelen wrote:
Ferdy wrote:
Your methodology is favorable considering your goal. There is no point testing 1. b3 e5 positions if it is not in your engines repertoire. Only use test positions where you want your engine to be. Of course there are drawbacks, but that can be overcomed as you say using large number of selected positions.

Perhaps start from smaller number of positions, and as your engine able to improve from it, then add other positions, to be considered in its repertoire.

But I have a bad feeling about it, to me the engine should be able to handle all positions, it can be blocked, open, full of pinned pieces, etc.


I think you misunderstand my idea. The goal is not test only a limited set of opening positions, but a large and varied set of starting middle-game positions. The point is repeating always the same games with the same positions, but enought positions to say the engine is played a varied.

The point is, the positions are selected once by random but then always the same positions are used for testing. That's exactly what Bob is doing for a long while now, and also lots of other people, so it is not a new method but kind of "de facto standard". I recall there were long discussions about the details few years ago. Doing it that way instead of newly choosing different positions by random each time has been found to result in lower error bars as far as I remember. I guess Bob and the other experts in statistics can explain the exact reasons.

Sven

It seems pretty obvious that it lowers the error bar. In fact the whole estimation model implicitly assumes that you do this.

Let's say that the score of engine A vs B is distributed under a probablity law P(mu,sigma) with mean mu and stdev sigma. That means that given equal chances from the starting position the distribution of the result should be P(mu,sigma). However if the position is chosen that favors A or B, then the distribution will be sth like Q(position)P(mu,sigma) where Q(position) is centered around 1 and is more or less depending on whether A or B is favored. the fact that E(Q)=1 may still ensure an unbiaised estimator, but with a higher variance...

No need to be an expert in statistics to understand it, at least intuitively. You can write it cleanly too, and it isn't hard!

PS: please no ball busting on the details, I purposly made the math notations oversimplistic.
Back to top
View user's profile Send private message
Mincho Georgiev



Joined: 04 Apr 2009
Posts: 406
Location: Bulgaria

PostPosted: Wed Apr 18, 2012 11:46 am    Post subject: Re: What your opinion about this testing methodology? Reply to topic Reply with quote

I forgot to mentions, that mine are not selected randomly. I have Axx Bxx Cxx Dxx Exx openings mixed exactly that way and then again Axx Bxx... but none of them are duplicated for the entire set.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Vincent Diepeveen



Joined: 09 Mar 2006
Posts: 1738
Location: The Netherlands

PostPosted: Wed Apr 18, 2012 11:51 am    Post subject: Re: What your opinion about this testing methodology? Reply to topic Reply with quote

Kempelen wrote:
Ferdy wrote:
Your methodology is favorable considering your goal. There is no point testing 1. b3 e5 positions if it is not in your engines repertoire. Only use test positions where you want your engine to be. Of course there are drawbacks, but that can be overcomed as you say using large number of selected positions.

Perhaps start from smaller number of positions, and as your engine able to improve from it, then add other positions, to be considered in its repertoire.

But I have a bad feeling about it, to me the engine should be able to handle all positions, it can be blocked, open, full of pinned pieces, etc.


I think you misunderstand my idea. The goal is not test only a limited set of opening positions, but a large and varied set of starting middle-game positions. The point is repeating always the same games with the same positions, but enought positions to say the engine is played a varied.


Your engine gets optimized always for the positions you test with. If you want to just kick butt at say a few Noomen positions then this is the way to test.

It isn't the holy grail - but if some guy grabs your engine and plays with those positions against other engines, then you'll beat them bigtime.

What you need is a mix of everything and innovate every few years.
Back to top
View user's profile Send private message Send e-mail Visit poster's website MSN Messenger
Sven Schüle



Joined: 15 May 2008
Posts: 2246
Location: Berlin, Germany

PostPosted: Wed Apr 18, 2012 11:52 am    Post subject: Re: What your opinion about this testing methodology? Reply to topic Reply with quote

lucasart wrote:
Sven Schüle wrote:
Kempelen wrote:
Ferdy wrote:
Your methodology is favorable considering your goal. There is no point testing 1. b3 e5 positions if it is not in your engines repertoire. Only use test positions where you want your engine to be. Of course there are drawbacks, but that can be overcomed as you say using large number of selected positions.

Perhaps start from smaller number of positions, and as your engine able to improve from it, then add other positions, to be considered in its repertoire.

But I have a bad feeling about it, to me the engine should be able to handle all positions, it can be blocked, open, full of pinned pieces, etc.


I think you misunderstand my idea. The goal is not test only a limited set of opening positions, but a large and varied set of starting middle-game positions. The point is repeating always the same games with the same positions, but enought positions to say the engine is played a varied.

The point is, the positions are selected once by random but then always the same positions are used for testing. That's exactly what Bob is doing for a long while now, and also lots of other people, so it is not a new method but kind of "de facto standard". I recall there were long discussions about the details few years ago. Doing it that way instead of newly choosing different positions by random each time has been found to result in lower error bars as far as I remember. I guess Bob and the other experts in statistics can explain the exact reasons.

Sven

It seems pretty obvious that it lowers the error bar. In fact the whole estimation model implicitly assumes that you do this.

Let's say that the score of engine A vs B is distributed under a probablity law P(mu,sigma) with mean mu and stdev sigma. That means that given equal chances from the starting position the distribution of the result should be P(mu,sigma). However if the position is chosen that favors A or B, then the distribution will be sth like Q(position)P(mu,sigma) where Q(position) is centered around 1 and is more or less depending on whether A or B is favored. the fact that E(Q)=1 may still ensure an unbiaised estimator, but with a higher variance...

No need to be an expert in statistics to understand it, at least intuitively. You can write it cleanly too, and it isn't hard!

PS: please no ball busting on the details, I purposly made the math notations oversimplistic.

I think this is not about positions favoring either side A or B, the selected positions have to be "balanced". Instead, it is all about
a) always using the same set of starting positions (for each single "test tournament"), or
b) repeating the step of choosing a set of starting positions for each "test tournament".

The statement was then that method a) would result in lower error bars, which is not my own statement but which is what I recall was mentioned by someone else in the past.

Sven
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Post new topic       TalkChess.com Forum Index -> Computer Chess Club: Programming and Technical Discussions All times are GMT
Goto page 1, 2  Next
Threaded
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum




Powered by phpBB © 2001, 2005 phpBB Group
Enhanced with Moby Threads