Unified eval tournament?

Tord Romstad · Post by **Tord Romstad** » Thu Jan 22, 2009 12:49 pm

Stan Arts wrote:Such a tournament could be played online (I guess most authors have an ICC account)

I think ICC is not a good place to play it, for several reasons. The most obvious reason is that uniform evaluation without uniform hardware doesn't make much sense to me. Furthermore, even if most authors have an ICC account (which I doubt, by the way), everybody doesn't have one, and getting an account costs money (the free trial week was only for Windows users the last time I checked). It is also difficult to find a date and time which is acceptable to everyone.

or simply in one's own basement.

That would be much better, as long as it is in someone neutral's basement.

Tord

hgm · Post by **hgm** » Thu Jan 22, 2009 1:19 pm

Note that I mastered the art of running an ICS (and that of hacking it.

), and that it would be very easy for me to provide all prospective participants with an account on it, and then put it on line for the one or two days that a tourney is running. For tourneys like CCT, running them on a reputed ICS like ICC or FICS does not really draw on the strength of those places, which is that they have a large number of humans that regularly visit them. They can be played just as easily from a private ICS that has zero uninvited visitors.

I agree with Tord, btw, that for this tourney running it on unequal hardware makes no sense. But it still seemed a good opportunity to promote the idea of other such tournaments. Who needs a place like ICC for organized, pre-announced comp-comp tourneys? Their accounts policy only makes it more difficult. We could have a CCT-like tournament every month, if we wanted.

How about a blitz tourney, 3+2 games in a giant round-robin, conducted automatically under the 'mamer' TM? My upload link is only 25KB/sec, but ICS protocol is not very band-width consuming at all, and to send a board+move only takes 200 bytes. So if I calculate well, I could handle 125 moves per second (players + observers), so at 5 sec per move I would only into trouble with more than 600 people logged in...

Stan Arts · Post by **Stan Arts** » Thu Jan 22, 2009 1:43 pm

Ah you're right, neutral basement's (fish nor cheese) fine then.
I must say unequal hardware doesn't bother me much, fairly certain my engine will be last, and Glaurung will be first.

I've got a few square tables set up (not entirely trivial in my engine, or atleast I won't make the 15 minute mark, 30 should well be in reach though) , now going to put the values in.

One thing I wonder though, are we allowed to recognise insufficient material in evaluation? Recognisers in search would be allowed I guess.
Ok, infact I wonder two things, I suppose knowledge to box in the opponent king in the endgame when up material is also not allowed? KQ - K mates will be tricky!

Stan

hgm · Post by **hgm** » Thu Jan 22, 2009 2:04 pm

I don't expect the given PST for end-game King would provide any problem in KQK, even at low depth. But even in blitz games, the mate in KQK should always be within the horizon, even without positional guidance. It would be better if the edge squares were differentiated a bit more, though, dropping gradually when you approach the corners, rather than dropping stepwise only at the corner itself.

Before investing any effort in this, I really would advise to at least optimize the piece-square tables a bit more.

If recognizers are allowed, I think every engine should use the same recognizers, or it would defeat the purpose of the test. I would propose to use the "last Pawn counts double" trick in the material eval.

Andres Valverde · Post by **Andres Valverde** » Thu Jan 22, 2009 3:40 pm

I like the idea very much. Would proppose two parallel ways :

- Upload the engines to some website where everyone can download and run the tests in equal hardware

- Run the ICS tours using HGM server . Yes, this will be unfair but funny

. (And yes, he has managed to make MAMER run

)

Anyways count us in

Stan Arts · Post by **Stan Arts** » Thu Jan 22, 2009 3:48 pm

Think I've mine running.

Yes KQ-K is easy but I thought for example KR-K would be problematic but I just tried and that's also no problem at all.

Sure the values could be better but for it's purpose they are probably fine.

On Pawel's suggestion, lowering the scores probably has no effect on play, except in those rare cases where the high scores lead to material compensation, but imho it needs quite a poor/good position to lead upto a pawn compensation. (which by that time is probably the least it needs to give.

)

With material draw recognisers allowed in search I ment that I suppose many already have these in search, and as they are part of search likely allowed. But those in evaluation likely not.

Stan

PK · Post by PK » Thu Jan 22, 2009 4:53 pm

If I remember correctly, original specification allowed some additional endgame knowledge, because the organizer forgot to say "no bitbases, no Nalimov", and ended up saying "stuff replacing nalimov and bitbases would be fine". but I think it's pretty academic anyway, since most games were decided in the middlegame.

regards,

pawel

cyberfish · Post by **cyberfish** » Thu Jan 22, 2009 6:38 pm

I think that it may be also interesting to do also simplified search tournament.
The idea is that everybody use the same simple search function but people are free to change the evaluation.

That certainly sounds interesting, but it would be a lot more work to change to a simplified search than simplified eval, since search is a lot more dependent on the structure of the program. An easier way might be to all start with the same engine?

cyberfish · Post by **cyberfish** » Thu Jan 22, 2009 6:47 pm

No significant difference in my engine -

Code: Select all

Rank Name                       Elo    +    - games score oppo. draws 
   1 Brainless 09-SIM1 64-bit     2    3    2 52737   50%    -1   20% 
   2 Brainless 09-SIM2 64-bit     2    3    3 52699   50%    -1   20% 
   3 Brainless 09-SIM0 64-bit    -4    4    4 52722   49%     2   20%

SIM0 = original
SIM1 = 4/5
SIM2 = 3/5

cyberfish · Post by **cyberfish** » Thu Jan 22, 2009 6:51 pm

That's far less than I would have thought. What does your evaluation contain, apart from material and piece square tables?

Not much right now. I just re-wrote the eval. The only big difference between my current eval and the simplified eval is distinguishing between end/mid games for pcsq, and removed discontinuity (smooth scaling).

Unified eval tournament?

Re: Unified eval tournament?

Re: Unified eval tournament?

Re: Unified eval tournament?

Re: Unified eval tournament?

Re: Unified eval tournament?

Re: Unified eval tournament?

Re: Unified eval tournament?

Re: Unified eval tournament?

Re: Unified eval tournament?

Re: Unified eval tournament?