Unified eval tournament?

Tord Romstad · Post by **Tord Romstad** » Fri Jan 23, 2009 12:01 pm

Andres Valverde wrote:I like the idea very much. Would proppose two parallel ways :

- Upload the engines to some website where everyone can download and run the tests in equal hardware

- Run the ICS tours using HGM server . Yes, this will be unfair but funny . (And yes, he has managed to make MAMER run )

I agree -- Why not do both.

Here is the source code for Glaurung UFO 090123, which uses Tomasz's material+piece square tables evaluation function. "UFO", in case someone wonders, means "Ujednolicona Funkcja Oceny", which is Polish for "Uniform Evaluation Function". Perhaps "PST" or "UEF" would make more sense in English, but "UFO" looks a lot cooler, and I'll keep using it. I hope the rest of you will consider to do the same.

Tord

hgm · Post by **hgm** » Fri Jan 23, 2009 12:50 pm

That is a relief! I already feared you were into flying saucers now!

Allard Siemelink · Post by **Allard Siemelink** » Fri Jan 23, 2009 7:47 pm

Here are the results of the match at 40 moves/40 seconds repeating:
18.14% elo=-263. +38 -261 =51

Indeed the results of the simple eval version have gone down, as I expected.
Yet, it still scores better than Glaurungs 13.5%.
Tord, may I ask what time control you used?

Allard Siemelink wrote:Brights numbers are a little less pronounced than Glaurungs but the simple eval is still ~200 elo worse than its own.

Here are the results of a 3000 game match (4096 nodes/move) that just finished:
22.67% elo=-213, +562 -2202 =236

I think I'll run a match with longer time controls to see if that yields different results

cyberfish · Post by **cyberfish** » Sat Jan 24, 2009 12:38 am

I am pretty sure the mamer on FICS has such a commnad at its disposal, too. That's how players can just "tell" it something to start a game. It's just not automatic on FICS (or maybe there's an option I'm not aware of).

cyberfish · Post by **cyberfish** » Sat Jan 24, 2009 12:47 am

Should we allow EGTB/EGBB and opening books?

As pointed out earlier, EGTB/EGBB don't really matter (we can disable them to make it "fairer"), but what about opening books?

Disallowing opening books will also make it "fairer" (since we are comparing the search), but then how are we going to get our randomness?

For my engine, the opening book is the only source of randomness. It's a hash indexed book of about 1000000 positions, it will pick weighted-random moves among the move list (weighted by how much they are played) for every move in the book. That gives me enough randomness to run 40000 games with just ~1% duplicate games (counting both colours, including when the 2 sides are reversed, so it would be ~0.5% of exactly identical games).

hgm · Post by **hgm** » Sat Jan 24, 2009 11:42 am

Perhaps they are using another version of Mamer, or they disabled this feature because it was too unreliable or troublesome in practice.

Mamer is a program separate from the ICS, which can run on a completely different machine. It logs in to the ICS as a bot, on an account that has special permissions and can use commands that are nott available to any other user (not even admins). And once it is logged in, it is totally controlled through the ICS interface, through tell messages that contain commands. This is both true for ordinary users, that can send it commands to join a tourney and ask for games and results, as for the TDs creating and specifying the tourney. Mamer has its own list (on the computer where it runs) of which commands to accept from who; you have to be on its managers list in order to create a tourney.

Michael Sherwin · Post by **Michael Sherwin** » Wed Jan 28, 2009 5:58 am

Tord Romstad wrote:
cyberfish wrote:Ah thanks!

We just need to get a few more people now...

I just implemented the simplified eval in my engine, and in ~2 seconds (limited depth) games, it's 52-72 elo points weaker.
That's far less than I would have thought. What does your evaluation contain, apart from material and piece square tables?

I just finished a quick Silver match between the normal version of my program and an otherwise identical version with the evaluation function replaced by Toasz Michniewski's piece square table evaluation:
Code: Select all
Glaurung 090122: 86.5 (+81,=11,-8)
Glaurung UFO 090122: 13.5 (+8,=11,-81)
Tord

I have not read this thread, so sorry if this has already been asked.

About how many ply would you have to slow down the search to get an even result?

Tord Romstad · Post by **Tord Romstad** » Wed Jan 28, 2009 9:39 am

Allard Siemelink wrote:Here are the results of the match at 40 moves/40 seconds repeating:
18.14% elo=-263. +38 -261 =51

Indeed the results of the simple eval version have gone down, as I expected.
Yet, it still scores better than Glaurungs 13.5%.
Tord, may I ask what time control you used?

I used 1 minute/game, with a 0.5 second increment. Perhaps I played too few games, or perhaps my evaluation function is better than I think.

Tord

Tord Romstad · Post by **Tord Romstad** » Wed Jan 28, 2009 9:41 am

cyberfish wrote:As pointed out earlier, EGTB/EGBB don't really matter (we can disable them to make it "fairer"), but what about opening books?

Disallowing opening books will also make it "fairer" (since we are comparing the search), but then how are we going to get our randomness?

The most obvious thing to do would be to start from randomly selected Silver positions, or positions from any similar suite of opening positions.

Tord

Tord Romstad · Post by **Tord Romstad** » Wed Jan 28, 2009 9:45 am

Michael Sherwin wrote:
Tord Romstad wrote:
cyberfish wrote:Ah thanks!

We just need to get a few more people now...

I just implemented the simplified eval in my engine, and in ~2 seconds (limited depth) games, it's 52-72 elo points weaker.
That's far less than I would have thought. What does your evaluation contain, apart from material and piece square tables?

I just finished a quick Silver match between the normal version of my program and an otherwise identical version with the evaluation function replaced by Tomasz Michniewski's piece square table evaluation:
Code: Select all
Glaurung 090122: 86.5 (+81,=11,-8)
Glaurung UFO 090122: 13.5 (+8,=11,-81)
Tord
I have not read this thread, so sorry if this has already been asked.

About how many ply would you have to slow down the search to get an even result?

I have no idea. 300 Elo points should normally correspond to about five plies in the middle game, but it's difficult to believe that the evaluation function can really be worth that much.

Tord

Unified eval tournament?

Re: Unified eval tournament?

Re: Unified eval tournament?

Re: Unified eval tournament?

Re: Unified eval tournament?

Re: Unified eval tournament?

Re: Unified eval tournament?

Re: Unified eval tournament?

Re: Unified eval tournament?

Re: Unified eval tournament?

Re: Unified eval tournament?