what happened to scorpio NN in TCEC?

Discussion of anything and everything relating to chess playing software and machines.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
syzygy
Posts: 4451
Joined: Tue Feb 28, 2012 10:56 pm

Re: what happened to scorpio NN in TCEC?

Post by syzygy » Sat Nov 17, 2018 7:32 pm

Branko Radovanovic wrote:
Sat Nov 17, 2018 7:19 pm
syzygy wrote:
Sat Nov 17, 2018 6:58 pm
The inclusion of games of a randomly crashing engine clearly hurts fairness.
How can one hurt fairness by including non-crashing games of a crashing engine?
I now agree that strictly speaking it does not hurt fairness. But it does hurt the tournament because it distorts the results. See my previous post (which I have edited).
On the contrary, it is their removal that hurts fairness. If it's unfair for an engine to receive a point just because its opponent crashed, it is then equally unfair to deduct a fully earned point against such engine from a game in which it didn't crash. If the choices are discard all or discard none, discarding all only seems fairer.
It is not unfair to deduct the point if you consider that there is no difference with a tournament in which the disqualified engine had not started to begin with.

syzygy
Posts: 4451
Joined: Tue Feb 28, 2012 10:56 pm

Re: what happened to scorpio NN in TCEC?

Post by syzygy » Sat Nov 17, 2018 7:39 pm

syzygy wrote:
Sat Nov 17, 2018 7:32 pm
Branko Radovanovic wrote:
Sat Nov 17, 2018 7:19 pm
syzygy wrote:
Sat Nov 17, 2018 6:58 pm
The inclusion of games of a randomly crashing engine clearly hurts fairness.
How can one hurt fairness by including non-crashing games of a crashing engine?
I now agree that strictly speaking it does not hurt fairness. But it does hurt the tournament because it distorts the results. See my previous post (which I have edited).
To add to my explanation why it distorts even though it does not hurt fairness:

Suppose TCEC had a rule that after the tournament is finished, each engine is awarded random bonus points. For all engines the bonus points are drawn from the same random distribution, so this is absolutely and completely fair. But clearly this would just add noise to the tournament results, distorting the tournament's outcome.

A randomly crashing engine is hardly different from such a rule. Compared to a hypothetical tournament in which the engine had not randomly crashed, the crashing engine randomly awards half and full points to some of the other engines.

syzygy
Posts: 4451
Joined: Tue Feb 28, 2012 10:56 pm

Re: what happened to scorpio NN in TCEC?

Post by syzygy » Sat Nov 17, 2018 7:49 pm

I think another problem with a randomly crashing engine is that its performance relative to other engines contradicts the otherwise reasonable assumption that Elo differences are additive. So it's not just draw ratio that is the problem.

(So even if draws were impossible, as in tennis, it would be highly undesirable to have a randomly crashing player if we care about the reliability of the relative ranking of the non-crashing players at the end of the tournament.)

Branko Radovanovic
Posts: 59
Joined: Sat Sep 13, 2014 2:12 pm

Re: what happened to scorpio NN in TCEC?

Post by Branko Radovanovic » Sat Nov 17, 2018 8:01 pm

syzygy wrote:
Sat Nov 17, 2018 7:39 pm
Suppose TCEC had a rule that after the tournament is finished, each engine is awarded random bonus points. For all engines the bonus points are drawn from the same random distribution, so this is absolutely and completely fair. But clearly this would just add noise to the tournament results, distorting the tournament's outcome.

A randomly crashing engine is hardly different from such a rule. Compared to a hypothetical tournament in which the engine had not randomly crashed, the crashing engine randomly awards half and full points to some of the other engines.
Absolutely true: points from crashes are randomly won points (as you've noted, and I've discussed it in a recent thread about Leela, this is similar to random major blunders), and that by itself hurts fairness. However, on the other hand, points won fair-and-square contribute to fairness. This is particularly important if the probability of crashing for a given engine is fairly low (i.e. not close to say 0.5), which is usually the case.

There is actually a difference between a tournament with all engines and the same tournament without the crashing engine: the latter has fewer valid games, which essentially means more random chance and therefore less fairness - that is, if one adopts the definition I've given earlier (note I'm not saying it is the best or the only definition of "fairness" - the term is a bit hard to define).

syzygy
Posts: 4451
Joined: Tue Feb 28, 2012 10:56 pm

Re: what happened to scorpio NN in TCEC?

Post by syzygy » Sat Nov 17, 2018 8:08 pm

Branko Radovanovic wrote:
Sat Nov 17, 2018 8:01 pm
However, on the other hand, points won fair-and-square contribute to fairness.
How do they contribute to fairness if some of the other engines get the point for free? You cannot separate the two. I simply don't see how a point inherently contributes to fairness just by having been won fair and square.
There is actually a difference between a tournament with all engines and the same tournament without the crashing engine: the latter has fewer valid games, which essentially means more random chance and therefore less fairness - that is, if one adopts the definition I've given earlier (note I'm not saying it is the best or the only definition of "fairness" - the term is a bit hard to define).
As I said, the more the better does not apply without qualification.

chrisw
Posts: 2008
Joined: Tue Apr 03, 2012 2:28 pm

Re: what happened to scorpio NN in TCEC?

Post by chrisw » Sat Nov 17, 2018 8:21 pm

Daniel Shawul wrote:
Sat Nov 17, 2018 5:41 pm
Daniel Shawul wrote:
Sat Nov 17, 2018 5:19 pm
From what I gather it is not the fault of Scorpio but cutechess-cli.

Apparently cutechess-cli only waits 10 seconds for an engine to load but loading Scorpio neural networks may take upto 30 seconds.
It is not a problem for winboard if an engine takes an hour to initialize because the winboard protocol says this
done (integer, no default)
If you set done=1 during the initial two-second timeout after xboard sends you the "xboard" command, the timeout will end and xboard will not look for any more feature commands before starting normal operation. If you set done=0, the initial timeout is increased to one hour; in this case, you must set done=1 before xboard will enter normal operation.
The xboard protocol provides a way to counter this by doing:
"feature done 0" ... then time-taking operation ... "feature done 1"
So the engine can take upto 1 hour initializing its stuff and there shouldn't be a problem.
I implemented that and expect it to work in every GUI but i guess cutechess-cli just resumes normal operation after waiting only 10 seconds....

I am fine with scorpio getting out of the tournament due to its hangs but it should not be alluded that the cause of the tournament being restarted
is scoprio especially when i did things the rightway and their GUI (cutechess-cli) happens not to implement winboard correctly.

Daniel
I am not even sure this is the case at all. It played 80 blitz games without a problem so if NN loading taking too long was a problem, it would have
caused way too many hangs there...

Anyway one would assume cutechess-cli probably implemented the xboard protocol correctly.

Edit:
Indeed cutechess implements things correctly like I suspected. It waits for a "feature done 1" before initializing.

Code: Select all

	else if (name == "done")
	{
		write("accepted done", Unbuffered);
		m_initTimer->stop();
		
		if (val == "1")
			initialize();
		return;
	}
The only explanation for me is that it is not strong enough for Div4 so lets blame it on its hangs and then say it was causing cutechess-cli to hang or whatever...
It’s tough doing something different, well, because different is difficult compared to path copying, but also because the world is set up for the paths the path copiers tread, so you have double the difficulties. Hang on in there, it always pays off.

syzygy
Posts: 4451
Joined: Tue Feb 28, 2012 10:56 pm

Re: what happened to scorpio NN in TCEC?

Post by syzygy » Sat Nov 17, 2018 8:34 pm

Branko Radovanovic wrote:
Sat Nov 17, 2018 8:01 pm
(note I'm not saying it is the best or the only definition of "fairness" - the term is a bit hard to define)
I'd say fairness here means equal treatment. If an engine crashes on purpose when playing specific opponents, then that would clearly be unfair.

Not discarding the games of a randomly crashing engine may not be unfair (though it could certainly be argued to be unfair to engines with an Elo rating higher than the hypothetical engine with the crashes removed (which could be achieved by replaying the games with crashes -- fair as long as the crashes are indeed completely unrelated to how the game evolves)). But discarding all its games definitely is not unfair either: an engine that retroactively does not take part in a tourname cannot create unfairness (of course the disqualification should not be in any way dependent on game results).

So fairness isn't really the best criterion here. More important is the reliability of the final tournament results. I am quite sure that a randomly crashing engine increases the error margins.

User avatar
lucasart
Posts: 3037
Joined: Mon May 31, 2010 11:29 am
Full name: lucasart
Contact:

Re: what happened to scorpio NN in TCEC?

Post by lucasart » Sat Nov 17, 2018 11:23 pm

Daniel Shawul wrote:
Sat Nov 17, 2018 5:19 pm
From what I gather it is not the fault of Scorpio but cutechess-cli.

Apparently cutechess-cli only waits 10 seconds for an engine to load but loading Scorpio neural networks may take upto 30 seconds.
It is not a problem for winboard if an engine takes an hour to initialize because the winboard protocol says this
done (integer, no default)
If you set done=1 during the initial two-second timeout after xboard sends you the "xboard" command, the timeout will end and xboard will not look for any more feature commands before starting normal operation. If you set done=0, the initial timeout is increased to one hour; in this case, you must set done=1 before xboard will enter normal operation.
The xboard protocol provides a way to counter this by doing:
"feature done 0" ... then time-taking operation ... "feature done 1"
So the engine can take upto 1 hour initializing its stuff and there shouldn't be a problem.
I implemented that and expect it to work in every GUI but i guess cutechess-cli just resumes normal operation after waiting only 10 seconds....

I am fine with scorpio getting out of the tournament due to its hangs but it should not be alluded that the cause of the tournament being restarted
is scoprio especially when i did things the rightway and their GUI (cutechess-cli) happens not to implement winboard correctly.

Daniel
There may have been some Scorpio crashes at the beginning of the game, explained by the time out you describe. But all of what I have seen were instances where Scorpio was getting mated, and crashed a few moves before TCEC adjudication could kick in.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.

User avatar
lucasart
Posts: 3037
Joined: Mon May 31, 2010 11:29 am
Full name: lucasart
Contact:

Re: what happened to scorpio NN in TCEC?

Post by lucasart » Sat Nov 17, 2018 11:40 pm

Branko Radovanovic wrote:
Sat Nov 17, 2018 4:09 pm
Meanwhile, the entire Div4 has been restarted, without Scorpio. The chat message is:

hi everyone i am sorry to inform you Entrance Division 4 had to be restarted due to too many inconsistencies in book orders, probably as a consequence of the many crashes among other things. this seemed to be the only option that guarantees fairness for all. so we restarted, with the one restriction: scorpio nn has proven to be too unstable and causing cutechess crashes, so it stays out. --kanchess (see !eventpgn for games)

I like TCEC but it can be very confusing and frustrating to follow. Scorpio was disqualified after 4 crashes, although the limit is supposed to be 3, and even that rule seems to be unofficial, as the Rules page says nothing about "3 strikes = DQ". Pirarucu on the other hand kept playing with 3 crashes because there was apparently some mixup with the logs so the first crash didn't count, adding further to the confusion. And now this - restarting from scratch, without any explanation, except for chat messages. Still, am I supposed to sit and read everything people say in the chat until explanation randomly comes along?

Confusion aside, the very idea of removing the engine with three crashes from competition and discarding all of its games - supposedly in the interest of fairness - is absolutely misguided, because crashes are essentially random events, just as normal wins, losses and draws are, and discarding valid games actually hurts fairness instead of improving it.
This tournament is becoming a bit of a mess:

They were supposed to restart without Scorpio, but they just restarted and forgot the "without Scorpio" part…

What about Pirarucu? It had 3 crashes, and they said 3 crashes were disqualifying…

Again this will stall the live every time Pirarucu or Scorpio crashes (annoying), and will pollute results (unfair). Some engines will get lucky and "win" against Pirarucu due tocrashes, where they otherwise would have lost or drawn, some will not be gifted with opponent crashes.

As for Scorpio, it cannot pollute the results, because it would lose every single game regardless.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.

User avatar
CMCanavessi
Posts: 835
Joined: Thu Dec 28, 2017 3:06 pm
Location: Argentina

Re: what happened to scorpio NN in TCEC?

Post by CMCanavessi » Sat Nov 17, 2018 11:47 pm

lucasart wrote:
Sat Nov 17, 2018 11:40 pm
This tournament is becoming a bit of a mess:

They were supposed to restart without Scorpio, but they just restarted and forgot the "without Scorpio" part…
They did, but they aborted it after 3 games and re-started it with Scorpio in again, apparently with the same version they used for teh blitz test games (scorpio did 80 games without crashing even once)

What about Pirarucu? It had 3 crashes, and they said 3 crashes were disqualifying…
From what I've read, it crashed 2 times, the 3rd was a problem with something else

Again this will stall the live every time Pirarucu or Scorpio crashes (annoying), and will pollute results (unfair). Some engines will get lucky and "win" against Pirarucu due tocrashes, where they otherwise would have lost or drawn, some will not be gifted with opponent crashes.

As for Scorpio, it cannot pollute the results, because it would lose every single game regardless.
Let's hope now that things have stabilized no crashes will appear.
Follow my tournament and some Leela gauntlets live at http://twitch.tv/ccls

Post Reply