Unified eval tournament?

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

Tord Romstad
Posts: 1808
Joined: Wed Mar 08, 2006 9:19 pm
Location: Oslo, Norway

Re: Unified eval tournament?

Post by Tord Romstad »

Andres Valverde wrote:I like the idea very much. Would proppose two parallel ways :

- Upload the engines to some website where everyone can download and run the tests in equal hardware

- Run the ICS tours using HGM server . Yes, this will be unfair but funny :-) . (And yes, he has managed to make MAMER run :-) )
I agree -- Why not do both. :)

Here is the source code for Glaurung UFO 090123, which uses Tomasz's material+piece square tables evaluation function. "UFO", in case someone wonders, means "Ujednolicona Funkcja Oceny", which is Polish for "Uniform Evaluation Function". Perhaps "PST" or "UEF" would make more sense in English, but "UFO" looks a lot cooler, and I'll keep using it. I hope the rest of you will consider to do the same. :wink:

Tord
User avatar
hgm
Posts: 28387
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Unified eval tournament?

Post by hgm »

That is a relief! I already feared you were into flying saucers now! :lol:
Allard Siemelink
Posts: 297
Joined: Fri Jun 30, 2006 9:30 pm
Location: Netherlands

Re: Unified eval tournament?

Post by Allard Siemelink »

Here are the results of the match at 40 moves/40 seconds repeating:
18.14% elo=-263. +38 -261 =51

Indeed the results of the simple eval version have gone down, as I expected.
Yet, it still scores better than Glaurungs 13.5%.
Tord, may I ask what time control you used?
Allard Siemelink wrote:Brights numbers are a little less pronounced than Glaurungs but the simple eval is still ~200 elo worse than its own.

Here are the results of a 3000 game match (4096 nodes/move) that just finished:
22.67% elo=-213, +562 -2202 =236

I think I'll run a match with longer time controls to see if that yields different results
cyberfish

Re: Unified eval tournament?

Post by cyberfish »

I am pretty sure the mamer on FICS has such a commnad at its disposal, too. That's how players can just "tell" it something to start a game. It's just not automatic on FICS (or maybe there's an option I'm not aware of).
cyberfish

Re: Unified eval tournament?

Post by cyberfish »

Should we allow EGTB/EGBB and opening books?

As pointed out earlier, EGTB/EGBB don't really matter (we can disable them to make it "fairer"), but what about opening books?

Disallowing opening books will also make it "fairer" (since we are comparing the search), but then how are we going to get our randomness?

For my engine, the opening book is the only source of randomness. It's a hash indexed book of about 1000000 positions, it will pick weighted-random moves among the move list (weighted by how much they are played) for every move in the book. That gives me enough randomness to run 40000 games with just ~1% duplicate games (counting both colours, including when the 2 sides are reversed, so it would be ~0.5% of exactly identical games).
User avatar
hgm
Posts: 28387
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Unified eval tournament?

Post by hgm »

Perhaps they are using another version of Mamer, or they disabled this feature because it was too unreliable or troublesome in practice.

Mamer is a program separate from the ICS, which can run on a completely different machine. It logs in to the ICS as a bot, on an account that has special permissions and can use commands that are nott available to any other user (not even admins). And once it is logged in, it is totally controlled through the ICS interface, through tell messages that contain commands. This is both true for ordinary users, that can send it commands to join a tourney and ask for games and results, as for the TDs creating and specifying the tourney. Mamer has its own list (on the computer where it runs) of which commands to accept from who; you have to be on its managers list in order to create a tourney.
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Unified eval tournament?

Post by Michael Sherwin »

Tord Romstad wrote:
cyberfish wrote:Ah thanks!

We just need to get a few more people now...

I just implemented the simplified eval in my engine, and in ~2 seconds (limited depth) games, it's 52-72 elo points weaker.
That's far less than I would have thought. What does your evaluation contain, apart from material and piece square tables?

I just finished a quick Silver match between the normal version of my program and an otherwise identical version with the evaluation function replaced by Toasz Michniewski's piece square table evaluation:

Code: Select all

Glaurung 090122: 86.5 (+81,=11,-8)
Glaurung UFO 090122: 13.5 (+8,=11,-81)
Tord
I have not read this thread, so sorry if this has already been asked.

About how many ply would you have to slow down the search to get an even result?
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Tord Romstad
Posts: 1808
Joined: Wed Mar 08, 2006 9:19 pm
Location: Oslo, Norway

Re: Unified eval tournament?

Post by Tord Romstad »

Allard Siemelink wrote:Here are the results of the match at 40 moves/40 seconds repeating:
18.14% elo=-263. +38 -261 =51

Indeed the results of the simple eval version have gone down, as I expected.
Yet, it still scores better than Glaurungs 13.5%.
Tord, may I ask what time control you used?
I used 1 minute/game, with a 0.5 second increment. Perhaps I played too few games, or perhaps my evaluation function is better than I think.

Tord
Tord Romstad
Posts: 1808
Joined: Wed Mar 08, 2006 9:19 pm
Location: Oslo, Norway

Re: Unified eval tournament?

Post by Tord Romstad »

cyberfish wrote:As pointed out earlier, EGTB/EGBB don't really matter (we can disable them to make it "fairer"), but what about opening books?

Disallowing opening books will also make it "fairer" (since we are comparing the search), but then how are we going to get our randomness?
The most obvious thing to do would be to start from randomly selected Silver positions, or positions from any similar suite of opening positions.

Tord
Tord Romstad
Posts: 1808
Joined: Wed Mar 08, 2006 9:19 pm
Location: Oslo, Norway

Re: Unified eval tournament?

Post by Tord Romstad »

Michael Sherwin wrote:
Tord Romstad wrote:
cyberfish wrote:Ah thanks!

We just need to get a few more people now...

I just implemented the simplified eval in my engine, and in ~2 seconds (limited depth) games, it's 52-72 elo points weaker.
That's far less than I would have thought. What does your evaluation contain, apart from material and piece square tables?

I just finished a quick Silver match between the normal version of my program and an otherwise identical version with the evaluation function replaced by Tomasz Michniewski's piece square table evaluation:

Code: Select all

Glaurung 090122: 86.5 (+81,=11,-8)
Glaurung UFO 090122: 13.5 (+8,=11,-81)
Tord
I have not read this thread, so sorry if this has already been asked.

About how many ply would you have to slow down the search to get an even result?
I have no idea. 300 Elo points should normally correspond to about five plies in the middle game, but it's difficult to believe that the evaluation function can really be worth that much.

Tord