IPON will change ...

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: IPON will change ...

Post by IWB »

Don wrote:
1. If an author releases a new version, remove the old and substitute it with the new version. It's the authors decisions which is his latest and greatest.


That is a view one could defend, but of what value is a list where some entries are for sure not the version that belongs there ... I fear the acceptance would be less good.
Don wrote:
2. If a NEW engine is released, test it against the top 19, not the top 20.

Idea 2 is very logical - because we basically ASSUME that it will be top 20 and the only time you have to resolve anything is if it tests in last place - then you have to determine if the previous position 20 engine is stronger or this new one is. But that is trivial to resolve since BOTH program played the same 19 players. Just keep the one with the higher ELO
Yep locial, but making a tourney with 20 is exactly the same work as doing it with 19 (except the 20. is a CB native engine!). I can do a test with 20 and eliminate the last engine if needed ...

Thx again
Ingo
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: IPON will change ...

Post by Don »

IWB wrote:
Don wrote:
1. If an author releases a new version, remove the old and substitute it with the new version. It's the authors decisions which is his latest and greatest.


That is a view one could defend, but of what value is a list where some entries are for sure not the version that belongs there ... I fear the acceptance would be less good.
Yes, it's a matter of taste and like everything else has it's pro's and con's. The pro's are simplicity and letting the author determine which program represents him. The con's of this idea is what you say, a program might be slightly misplaced in the rating list.
Don wrote:
2. If a NEW engine is released, test it against the top 19, not the top 20.

Idea 2 is very logical - because we basically ASSUME that it will be top 20 and the only time you have to resolve anything is if it tests in last place - then you have to determine if the previous position 20 engine is stronger or this new one is. But that is trivial to resolve since BOTH program played the same 19 players. Just keep the one with the higher ELO
Yep locial, but making a tourney with 20 is exactly the same work as doing it with 19 (except the 20. is a CB native engine!).
There are a couple of fine points you may not have understood or appreciated.

When you add a new player, think of the test as a competition between the CURRENT 20th place player and the NEW player. They should have the same EXACT playing conditions which means they BOTH play the SAME 19 opponents. You will get an unambiguous answer as to whether the new player deserves to be in the top 20. Even if you ignore the ratings you can simply keep the player with the highest total score, which might not necessarily be the highest rated but probably will be.

If you play all 20, then drop the lowest players results to get back to 20 players, you can actually have a mess on your hands because it's possible that the new player will have a higher ELO at first, but after dropping the 21st player it can change enough that his rating will now be lower than the the player you dropped.

Having said that, I don't think any reasonable thing you do is likely to change much but it does illustrate that you need to set very precise rules and conditions and simply follow them to the letter. Whenever you remove a player and add a new one and re-rate, you are going to get a little jostling around and there is not that much you can do about it.

There is one other very simple no-hassle ideas you can use:

Never throw out games, just play new candidates against the top 20, rate the results and keep the top 20 unique programs. The programs that have been in the top 20 the longest will have the most games and the most accurate ratings - and there is nothing wrong with that.


I can do a test with 20 and eliminate the last engine if needed ...

Thx again
Ingo