Unified eval tournament?

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

Edsel Apostol
Posts: 803
Joined: Mon Jul 17, 2006 5:53 am
Full name: Edsel Apostol

Re: Unified eval tournament?

Post by Edsel Apostol »

Thanks for the info. I'm already using 4096 nodes in my tests. Results of depth 1 in my test is different compared to fixed nodes of 4096 and fixed nodes seems to correlate with blitz results, so I think I will have to trust the fixed nodes.

You're right that it searches around depth 4 in the middle game and deeper in the endgame.

There seems to be only two free engine that supports fixed nodes search, Spike and Twisted Logic. Glaurung has support for it but it isn't exact number of nodes. Latest Bright public version doesn't seem to support it.
Allard Siemelink wrote:Hi Edsel,

Arena is indeed too slow, so I rolled my own.
It is a command line thingy built into Bright itself.
Basically it starts some uci engine and then talks uci
to play games against the hosting bright exe.

I have not tried fixed depth=1 matches, I would think that it is prone to simple tactical traps and unreliable for endgames.
The 4096 nodes still reach depth=4 on average.
If playing fixed node matches, it will actually search deeper during the endgame, like in real games.

Edsel Apostol wrote: Hi Allard,

What GUI did you use to test with fixed nodes? I'm using Arena and it seems not fast enough for me. A game sometimes could last 30 seconds even if I'm using fewer nodes than what you are using. My engine's NPS is slightly lower than yours and my hardware is old.

By the way, have you tried your experiment above with a fixed depth of for example 1? Which is more reliable in your opinion to test the changes in eval, fixed depth or fixed nodes?
Tord Romstad
Posts: 1808
Joined: Wed Mar 08, 2006 9:19 pm
Location: Oslo, Norway

Re: Unified eval tournament?

Post by Tord Romstad »

Edsel Apostol wrote:There seems to be only two free engine that supports fixed nodes search, Spike and Twisted Logic. Glaurung has support for it but it isn't exact number of nodes.
It is an exact number of nodes in Glaurung, as long as it runs with a single search thread.

Tord
Edsel Apostol
Posts: 803
Joined: Mon Jul 17, 2006 5:53 am
Full name: Edsel Apostol

Re: Unified eval tournament?

Post by Edsel Apostol »

:oops: I've just realized that the PIV machine I'm using for fixed nodes testing has hyperthreading and Glaurung might recognize it as two threads. Thanks for the correction Tord. I could now use Glaurung as sparring partner also. I will just have to set the number of threads in the UCI parameter to 1.
Tord Romstad wrote:
Edsel Apostol wrote:There seems to be only two free engine that supports fixed nodes search, Spike and Twisted Logic. Glaurung has support for it but it isn't exact number of nodes.
It is an exact number of nodes in Glaurung, as long as it runs with a single search thread.

Tord
Tord Romstad
Posts: 1808
Joined: Wed Mar 08, 2006 9:19 pm
Location: Oslo, Norway

Re: Unified eval tournament?

Post by Tord Romstad »

Edsel Apostol wrote::oops: I've just realized that the PIV machine I'm using for fixed nodes testing has hyperthreading and Glaurung might recognize it as two threads.
Yes, it's annoying that there is no easy way to detect hyperthreading programatically.
Thanks for the correction Tord. I could now use Glaurung as sparring partner also. I will just have to set the number of threads in the UCI parameter to 1.
You can, but please note that for very low node counts (like 4096), you will probably see occasional crashes or illegal moves: This can happen when Glaurung does not manage to finish at least two iterations. Perhaps I should release a 2.2.1 version with a fix for this bug.

Tord
User avatar
hgm
Posts: 28395
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Unified eval tournament?

Post by hgm »

Something completely different in connection with the unified eval:

Wouldn't it be good idea to have all engines participating this read the PSTs they are using from the same file (e.g. pst.dat) at startup, so that we can be sure there were no bugs in typing the PSTs? (Plus that we could easily repeat the test with a different PST.)

I also think we have to more explicitly state which criterion to apply for end-game. The Wiki pe with the original description leave two possibilities open for this, and we should make sure every engine is using the same.