TCEC stage 3 , New Houdini starts with a bang

Raptor · Post by **Raptor** » Mon Jul 25, 2016 6:15 pm

Frank Brenner wrote:>
Imagine, what would happen if Houdini would have won the second game also and the Komodo team would claim "Oh, here is one more abolute-best setting 5 / 107" .... we could repeat this situation until komodo wins the n-th repitition ...

This is actually not even possible. As the settings are requested and accepted from programmers ONCE per stage, at the beginning of the stage. So the correct parameters that Mark keeps referring to are the one's he requested when he sent Komodo 10.1 for Stage 3, and requested that Komodo play with those settings for the ENTIRE stage.

He can not possibly 'keep changing' the settings till he won. He only had 1 chance to ask for the settings, and he did so.

These settings were not applied. Hence the game was re-played. Same opening, same engines, CORRECT settings.

Knowing Mark, IF Houdini claimed to have started with incorrect settings and Komodo won, Mark would have NO ISSUE re-playing the game.

Just my thoughts.

Michel · Post by **Michel** » Mon Jul 25, 2016 7:22 pm

Raptor wrote:Knowing Mark, IF Houdini claimed to have started with incorrect settings and Komodo won, Mark would have NO ISSUE re-playing the game.

The issue is the part of the above statement in bold. This would make the decision biased (in favour of Houdini). So Mark would indeed have to agree to such arrangement.

So did Houdini's author agree with the replay? It seems likely he wasn't even asked.

Raptor · Post by **Raptor** » Mon Jul 25, 2016 7:53 pm

Michel wrote: The issue is the part of the above statement in bold. This would make the decision biased (in favour of Houdini). So Mark would indeed have to agree to such arrangement.

So did Houdini's author agree with the replay? It seems likely he wasn't even asked.

The issue is not about who's favour it would be in (I am saying even if TCEC re-played the game without asking Mark he would be fine).

It's about playing with settings and configurations that the programmers intended. This is not something Mark requested ad-hoc, it wasn't like 'Oh Komodo is losing, maybe Contempt 7 will do better let me ask for it'.

The settings were requested before the stage began.

In this case, If I were the affected programmer, I would request for replaying the game because I demand the settings I asked for. The organizers, within rules allowed me that, and failed to implement it for the game.

The point I am trying to make is, irrespective of the result the game is null and void the moment it started with wrong settings for either of the participants.

I would be equally vocal in support of a replay, no matter what engines were involved, in fact I would push for a replay if Raptor was impacted.

So Team Komodo requesting the setting change and pointing out the organizer mistake is well within their rights. And as a response I feel that the decision taken by the TCEC team is fair and just, they would do it for anyone.

Just out of curiosity what if by mistake TCEC set Stockfish with Skill Level=0, would you want a replay or be like 'that's unfair to the opponent', think about who it is unfair the most to!

Uri Blass · Post by **Uri Blass** » Mon Jul 25, 2016 8:14 pm

Michel wrote:
I must disagree with the coin analogy. A proper be an unbalanced coin was flipped and replaced with a fair coin. The first flip was not correct since the parameters of the coin were not correct.
The coin analogy is correct (since the bias was very small).

If the game would also have been replayed in the case of a draw/loss for Houdini, then the decision was fair.

However a more likely scenario in case of a win for K would have been that the game would not have been replayed, the reason being that it was K that was handicapped, not Houdini.... (and it would have been hard to argue with this).

I do not think that it would have been hard to argue with this.

Even the claim that K was handicapped is not clear and it is possible that the wrong setting is better than the correct setting.

We need to use the correct setting not because it is better than a different setting but because the programmer asked for it.

Even if the programmer discover during the tournament a better setting we are not allowed to replace the correct setting by a better setting during the stage.

Uri Blass · Post by **Uri Blass** » Mon Jul 25, 2016 8:27 pm

Laskos wrote:
Uri Blass wrote:
Laskos wrote:
Peter Berger wrote:You have to realize that these situations happen ALL THE TIME in engine tournaments. They just go unnoticed often.

I attended 3 WCCCs as an operator. Each year I observed unexpected situations that demanded a judgement by an official. Some I agreed to, some I didn't.

E.g. Crafty-Shredder Ramat-Gan 1st round when Crafty crashed and wasn't set back up in time, It should have been a loss for Crafty in a perfect world IMHO. The game mainly ended as a draw because SMK was very friendly and resisted the visible temptation to insist on a win by default. The scorssheet eventually just showed a pretty boring draw.

It is the same with ALL basement tourneys. I have run quite a few myself ,and OFTEN sth unexpected happened at some point, despite utmost care with the setup.

How do you judge the case when at some point in the game some scheduled process running monthly you have completely forgotten about fires off and steals processor time from an engine?

How do you judge crashes when it can't be worked out who is to blame ( engine, setup, machine)? As every judgement changes the probablities of the final result.

You can try to do your best though.

In case the one to blame is the operator as he set up the engine incorrectly, the only reasonable decision is to ALWAYS replay the game/s though IMHO. No matter the result of the game.

So I agree with the TCEC decision.

Peter
In absence of TCEC rules on this precise issue, I would favor too a practical solution for this accident which keeps the test as unbiased as possible. If the wrong UCI setting was to use half of the cores, I would have had no problem with the decision. But if UCI settings are very close to optimal and the probability is only 2% that they affected the game, I would be more careful, as the outcome of the repeated game would result in 85% probability that Houdini will score worse than in the first. Or if the ELO loss of Komodo is no larger than 5 ELO points in this game, just go to the next game correcting the settings, not repeating anything, rather than affect Houdini by a drawelo of 250 ELO points. Seems clearly a scientifically more sound approach to measure strength which I would have picked in tests. I am not talking in legalistic terms, rather in practical terms of common sense.
I do not think that we have data about the probability the UCI setting affected the game.

We do not have data about the elo change for Komodo at long time control and even if we can get data by testing we cannot learn from it the probability that the change affected the game.

An extreme case suppose that a program always win with white and lose with black.
You can change the setting in a way that it always win with black and lose with white.

The probability of changing the result of the game is 100%.
The elo change is exactly 0 elo because it get again 50%.

Of course we do not have practically probability of 100% to change the result of the game but claiming the probability it effected the result of the game is only 2% has no basis and I do not say it only because we do not have data to claim that the elo change is not larger than 5 elo(maybe you can have evidence that the elo change is not than 5 elo at bullet time control with 1 core but I am sure that you have no evidence for 20 cores
at TCEC time control and it is possible that bigger contempt is something that reduce elo more at long time control).
I was talking about common sense, not Uri's proverbial overworldly gedanken experiments. I think in your example one can change the settings so that the engine can win always both white and black, and no such engine exists yet.

Basically your argument can be reduced to that because we don't know how 5 ELO points loss of Komodo due to UCI settings in one game transform to TCEC conditions, we must strip Houdini of 250 ELO points in one game. Well, again, I was talking of common sense.

My example was extreme example and of course not realistic but the point is that we do not know the probability to change the result in a single game based on elo difference.

I also do not think that we know the elo change but even if we know it and we know that it is not more than 5 elo then the fair decision is to replay the game.

My logic is simple.
If you discover that the setting are not the setting that the programmer asked before the tournament then it is better to replay the game(at least if the tournament is not already over and you already started the next stage)

If stage X is already finished and you start the next stage then replaying the games of stage X maybe to decide about different programs that qualify is a problem and if we take extreme case suppose that we find that program X in some old TCEC from 2014 was not in the correct setting in stage 1 then replaying stage 1 and replaying stage 2 in case that a program Y qualify because of replaying stage 1 instead of program Z is a problem because in this case the question is which version of program Y should participate.

The fair solution is probably to replay the relevant games in stage 1 and replay stage 2 with the best version of program Y from that time instead of program Z and maybe also replaying stage 3 if different programs qualify.

The problem with the fair solution is that the programmer of program Y did not send his best program because his program did not qualify to stage 2 in that time so we do not have it.

Frank Brenner · Post by **Frank Brenner** » Mon Jul 25, 2016 8:38 pm

Ralf Müller wrote:
Imagine, what would happen if Houdini would have won the second game also and the Komodo team would claim "Oh, here is one more abolute-best setting 5 / 107" .... we could repeat this situation until komodo wins the n-th repitition ...
The Komodo team gave their settings before the game to the TCEC operator and he took the wrong settings. So your argument has no point.

Okay, i dit not know that the Komodo team gave their setting before the game.

Now everything is clear fo me, and the decision of replaying the first game is correct in my opinion .

Frank Brenner · Post by **Frank Brenner** » Mon Jul 25, 2016 8:44 pm

> The first game was played with incorrect settings, not something caused by the program, but something caused by a mistake the operator

Oh yes, i did not know that the right settings were told to TCEC operatores before game 1.

Now its clear, i was wrong.

With the new situation the fault was caused by the tcec operator, and the decision to replay the game is correct in my opinion.

syzygy · Post by **syzygy** » Mon Jul 25, 2016 8:50 pm

bob wrote:There is one overriding question here. What is the purpose of playing the games? (a) to see if the human operators can set things up correctly? or (b) to see which program wins the game, using optimal settings? I've always gone for (b). And, in fact, ICGA tournament rules require this. If a game is re-started, with wrong settings, it gets backed up to the last correct point and resumed. If the wrong move is entered, the game backs up to the last correct move and continues. The goal has always been to see which program plays better at that particular instant in time, rather than whether or not an operator makes a simple configuration error.

Are you sure that also applies to wrong settings at ICGA tournaments?

For example, an operator might have inadvertently configured the engine to use too little time. Halfway the game he discovers his error. Is the game now restarted from the point where the mistake was made (i.e. from move 1)?

It seems to me that would allow for too much abuse. I guess there must be some restrictions on the type of mistake that can be corrected.

(Of course in TCEC these is no such operator involvement, so things are normally much simpler.)

Laskos · Post by **Laskos** » Mon Jul 25, 2016 9:43 pm

Uri Blass wrote:
Laskos wrote:
Uri Blass wrote:
Laskos wrote:
Peter Berger wrote:You have to realize that these situations happen ALL THE TIME in engine tournaments. They just go unnoticed often.

I attended 3 WCCCs as an operator. Each year I observed unexpected situations that demanded a judgement by an official. Some I agreed to, some I didn't.

E.g. Crafty-Shredder Ramat-Gan 1st round when Crafty crashed and wasn't set back up in time, It should have been a loss for Crafty in a perfect world IMHO. The game mainly ended as a draw because SMK was very friendly and resisted the visible temptation to insist on a win by default. The scorssheet eventually just showed a pretty boring draw.

It is the same with ALL basement tourneys. I have run quite a few myself ,and OFTEN sth unexpected happened at some point, despite utmost care with the setup.

How do you judge the case when at some point in the game some scheduled process running monthly you have completely forgotten about fires off and steals processor time from an engine?

How do you judge crashes when it can't be worked out who is to blame ( engine, setup, machine)? As every judgement changes the probablities of the final result.

You can try to do your best though.

In case the one to blame is the operator as he set up the engine incorrectly, the only reasonable decision is to ALWAYS replay the game/s though IMHO. No matter the result of the game.

So I agree with the TCEC decision.

Peter
In absence of TCEC rules on this precise issue, I would favor too a practical solution for this accident which keeps the test as unbiased as possible. If the wrong UCI setting was to use half of the cores, I would have had no problem with the decision. But if UCI settings are very close to optimal and the probability is only 2% that they affected the game, I would be more careful, as the outcome of the repeated game would result in 85% probability that Houdini will score worse than in the first. Or if the ELO loss of Komodo is no larger than 5 ELO points in this game, just go to the next game correcting the settings, not repeating anything, rather than affect Houdini by a drawelo of 250 ELO points. Seems clearly a scientifically more sound approach to measure strength which I would have picked in tests. I am not talking in legalistic terms, rather in practical terms of common sense.
I do not think that we have data about the probability the UCI setting affected the game.

We do not have data about the elo change for Komodo at long time control and even if we can get data by testing we cannot learn from it the probability that the change affected the game.

An extreme case suppose that a program always win with white and lose with black.
You can change the setting in a way that it always win with black and lose with white.

The probability of changing the result of the game is 100%.
The elo change is exactly 0 elo because it get again 50%.

Of course we do not have practically probability of 100% to change the result of the game but claiming the probability it effected the result of the game is only 2% has no basis and I do not say it only because we do not have data to claim that the elo change is not larger than 5 elo(maybe you can have evidence that the elo change is not than 5 elo at bullet time control with 1 core but I am sure that you have no evidence for 20 cores
at TCEC time control and it is possible that bigger contempt is something that reduce elo more at long time control).
I was talking about common sense, not Uri's proverbial overworldly gedanken experiments. I think in your example one can change the settings so that the engine can win always both white and black, and no such engine exists yet.

Basically your argument can be reduced to that because we don't know how 5 ELO points loss of Komodo due to UCI settings in one game transform to TCEC conditions, we must strip Houdini of 250 ELO points in one game. Well, again, I was talking of common sense.
My example was extreme example and of course not realistic but the point is that we do not know the probability to change the result in a single game based on elo difference.

I also do not think that we know the elo change but even if we know it and we know that it is not more than 5 elo then the fair decision is to replay the game.

My logic is simple.
If you discover that the setting are not the setting that the programmer asked before the tournament then it is better to replay the game(at least if the tournament is not already over and you already started the next stage)

If stage X is already finished and you start the next stage then replaying the games of stage X maybe to decide about different programs that qualify is a problem and if we take extreme case suppose that we find that program X in some old TCEC from 2014 was not in the correct setting in stage 1 then replaying stage 1 and replaying stage 2 in case that a program Y qualify because of replaying stage 1 instead of program Z is a problem because in this case the question is which version of program Y should participate.

The fair solution is probably to replay the relevant games in stage 1 and replay stage 2 with the best version of program Y from that time instead of program Z and maybe also replaying stage 3 if different programs qualify.

The problem with the fair solution is that the programmer of program Y did not send his best program because his program did not qualify to stage 2 in that time so we do not have it.

These are somehow extreme cases, I am concerned more about practical issue of this particular game. The settings were very close to optimal and replaying the game had very high likelihood of provoking a huge ELO swing in favor of Komodo. We do not know how the decision would have been in case Komodo had won or drawn the problem game. So, the decision might have depended on the outcome of the first game. The second game might be dependent on the outcome of the first, and even if weakly, the huge ELO swing likely introduced by the second game compared with the tiny flaw in settings does not warrant re-playing in case games are not completely independent.

In formal terms, probably the author of Houdini should have been asked if he agrees for a re-play. There would be no arguments then.

syzygy · Post by **syzygy** » Mon Jul 25, 2016 9:47 pm

mjlef wrote:
syzygy wrote:
Ralf Müller wrote:It's not about replay a game, it's about restart the whole tournament. All engines start at 0 points and have the same possibilities.

Also it would be fair to restart a whole coin flip tournament (if there are enough rounds to go).
You are making a basic error.

If someone thinks he might have a patch that improves SF, he can run it on fishtest. If the patch passes all tests, it will normally be added to the official SF.

Now suppose you try this and the patch fails some of the tests.
You try again... you "restart the whole tournament". Now it passes.

What do you expect to happen? Will your patch be added to the official Stockfish?
No, your patch will not be added and for very good reasons.

Restarting the tournament for whatever reason (provided it is linked, however remotely, to the outcome of the earlier games) is inherently unfair. (And in any event, it is absolutely irrelevant that the game that was restarted was the first game of the series.)

Completely separate question: what if Komodo had won the game despite its disadvantage and the stage would have continued without replay. Would you have objected?
I disagree. The test run is not the same. A better analogy would be you had a bug in the first run (you set some numbers wrong) so you fixed them and restarted the test. This has happened several time in testing Komodo. We always disregard the faulty data, clear it and run again with proper settings.

Well, I was responding to the precise argument made: that because this was the first game of the stage and restarting the game effectively restarted the whole tournament, everything was fair. I explain why that argument does not fly. (Of course then the other side (not you here) changes its arguments again, as is usual, but ok.)

TCEC stage 3 , New Houdini starts with a bang

Re: TCEC stage 3 , New Houdini starts with a bang

Re: TCEC stage 3 , New Houdini starts with a bang

Re: TCEC stage 3 , New Houdini starts with a bang

Re: TCEC stage 3 , New Houdini starts with a bang

Re: TCEC stage 3 , New Houdini starts with a bang

Re: TCEC stage 3 , New Houdini starts with a bang

Re: TCEC stage 3 , New Houdini starts with a bang

Re: TCEC stage 3 , New Houdini starts with a bang

Re: TCEC stage 3 , New Houdini starts with a bang

Re: TCEC stage 3 , New Houdini starts with a bang