The reason that I think that it is better to start with evaluation competition is that good evaluation can be relatively more undependent on the search than the opposite.
people do pruning based on evaluation and even if we talk only about null move pruning it is pruning based on evaluation (the question if there is a threat or there is not a threat is evaluation based question)
It is logical to think that the question if it is good to use R=2 or R=3 or R=4 in null move pruning may be dependent on the evaluation function
so it may be better to start with weak engine because of bad search and try only to improve the evaluation function and start to work about the search only when you are happy with the evaluation.
You may have search ideas that work with bad evaluation and do not work with good evaluation so testing search ideas when you have a bad evaluation may be a waste of time because you may find that some ideas are productive with simple evaluation only to remove them later.
- Upload the engines to some website where everyone can download and run the tests in equal hardware
That I think is a good idea. We can later combine all the PGNs and have a large number of games very quickly. The problem is all engines would have to be open (or at least public), though. No problem for me at all. My program has always been and will always be open source. But is everyone okay with that? (at least the binary has to be public)
- Run the ICS tours using HGM server . Yes, this will be unfair but funny. (And yes, he has managed to make MAMER run)
I have not participated in any ICS computer tournament (but have tried human ones). At least one FICS, the operator needs to type a command to start each game? That will be quite inconvenient if we want large number of games.
For a tournament, the ICS has a so called Tournament Manager bot ("Mamer" on FICS, "Tomato" on ICC) which can start all games centrally, after the previous round finishes and it has made new pairings. The only thing the user has to do is subscribe for the tourney before it begins (by typing "mam join <nr>", where "mam" is a shortcut alias for "tell mamer"). Afterthat, everything is automatic. And even joining the tournament can be done for him by the tournament director, so the only requirement is really that the user is logged in.
Brights numbers are a little less pronounced than Glaurungs but the simple eval is still ~200 elo worse than its own.
Here are the results of a 3000 game match (4096 nodes/move) that just finished:
22.67% elo=-213, +562 -2202 =236
I think I'll run a match with longer time controls to see if that yields different results
Tord Romstad wrote:
cyberfish wrote:Ah thanks!
I just implemented the simplified eval in my engine, and in ~2 seconds (limited depth) games, it's 52-72 elo points weaker.
That's far less than I would have thought. What does your evaluation contain, apart from material and piece square tables?
I just finished a quick Silver match between the normal version of my program and an otherwise identical version with the evaluation function replaced by Toasz Michniewski's piece square table evaluation:
For a tournament, the ICS has a so called Tournament Manager bot ("Mamer" on FICS, "Tomato" on ICC) which can start all games centrally, after the previous round finishes and it has made new pairings. The only thing the user has to do is subscribe for the tourney before it begins (by typing "mam join <nr>", where "mam" is a shortcut alias for "tell mamer"). Afterthat, everything is automatic. And even joining the tournament can be done for him by the tournament director, so the only requirement is really that the user is logged in.
Hmm. In the tournaments I have played on FICS, I had to type "td PlayGame #", where # is the tourney number, to start each game.
If all we have to do is have the engine logged in, I don't mind playing on your ICS either . It has an added bonus that all (or most) of our computing power can be used all the time (simultaneous games).
I have had administrators on ICC remotely control LearningLemming with some spoofing command. If the tournament directors were allowed to do that, tournaments really could be hands off.
Well, we tried it out a few days ago, and the version of "mamer" I have really starts the games automatically. The Lasker-2.2.3 ICS has a special command to match two other users, and allows people that are on the "Tournament-Director list" to use this command.
I don't know about the reliability of all this; probably some operator supervision is advisable, as engine bots might get stuck or connections might be interrupted.
hgm wrote:I agree with Tord, btw, that for this tourney running it on unequal hardware makes no sense. But it still seemed a good opportunity to promote the idea of other such tournaments. Who needs a place like ICC for organized, pre-announced comp-comp tourneys? Their accounts policy only makes it more difficult. We could have a CCT-like tournament every month, if we wanted.
That's a really great idea. Such monthly informal tournaments sound a lot more attractive than the CCT to me. I would probably join as often as I could, both in normal and UFO tournaments.
It's probably better to start a separate thread about this, though. Most people don't read this thread.