FGRL 60 min + 15 sec Rating list - Komodo 10.4

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

mjlef
Posts: 1494
Joined: Thu Mar 30, 2006 2:08 pm

Re: FGRL 60 min + 15 sec Rating list - Komodo 10.4

Post by mjlef »

Milos wrote:
JJJ wrote:
Milos wrote:
lkaufman wrote:Scaling with more time and scaling with more cores/threads are two different issues, though the first will help the second. I think it is clearer that Komodo scales better than Stockfish with more cores than with more time.
Regarding more time.
SF8:K is constant 55% at all TCs, therefore there is no better scaling at all. Actually since the gap doesn't reduce due to diminishing return it is more probable that SF has better scaling with TC.
The change in Elo is just an artefact of K's contempt that SF doesn't have.
At very short TC contempt effect is non-existent since draw rate is really low and SF has a larger lead. At longer TC draw rate increases and K's contempt helps reducing the Elo gap. However, that is totally artificial.
So please, stop producing bogus claims.
Regarding more cores your claims are even less believable.
Good point.
This can also be clearly seen from results. At 60s+0.6s draw rates for SF, K and H are respectively 15.65%, 15.08% and 15.13%. At 60m+15s draw rates for SF, K and H are respectively 53.85%, 49.63% and 57.1%.
At 60m+15s contempt really kicks in.
The whole purpose of a positive Contempt is to reduce draw rates. This is definitely does, but against a similar strength opponent, a positive Contempt hurts elo. I know of no reason for Contempt to "lick in" at higher time controls. You never seem to quote number of games or where you are getting results or often even which version of each program you are referring to. Low game count could be a reason. But since so much data is missing, I do not know what to say.
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: FGRL 60 min + 15 sec Rating list - Komodo 10.4

Post by Milos »

mjlef wrote:
This can also be clearly seen from results. At 60s+0.6s draw rates for SF, K and H are respectively 15.65%, 15.08% and 15.13%. At 60m+15s draw rates for SF, K and H are respectively 53.85%, 49.63% and 57.1%.
At 60m+15s contempt really kicks in.
The whole purpose of a positive Contempt is to reduce draw rates. This is definitely does, but against a similar strength opponent, a positive Contempt hurts elo. I know of no reason for Contempt to "lick in" at higher time controls. You never seem to quote number of games or where you are getting results or often even which version of each program you are referring to. Low game count could be a reason. But since so much data is missing, I do not know what to say.
How about you open a link of the original poster in the thread, the same link you've already quoted data and look for yourself?
mjlef
Posts: 1494
Joined: Thu Mar 30, 2006 2:08 pm

Re: FGRL 60 min + 15 sec Rating list - Komodo 10.4

Post by mjlef »

Milos wrote:
mjlef wrote:
This can also be clearly seen from results. At 60s+0.6s draw rates for SF, K and H are respectively 15.65%, 15.08% and 15.13%. At 60m+15s draw rates for SF, K and H are respectively 53.85%, 49.63% and 57.1%.
At 60m+15s contempt really kicks in.
Larry said " I think it is clearer that Komodo scales better than Stockfish with more cores than with more time.". This seems clear in the reporting groups data against a range of opponents. Larry was not restricting his comment to merely Komodo-Stockfish games. You need thousands of games to determine that (and we indeed do play thousands of games every day, and occasionally against Stockfish).

The whole purpose of a positive Contempt is to reduce draw rates. This is definitely does, but against a similar strength opponent, a positive Contempt hurts elo. I know of no reason for Contempt to "lick in" at higher time controls. You never seem to quote number of games or where you are getting results or often even which version of each program you are referring to. Low game count could be a reason. But since so much data is missing, I do not know what to say.
How about you open a link of the original poster in the thread, the same link you've already quoted data and look for yourself?
Of course I have, but you are ignoring most of the data and concentrating on just SF vs K runs (some would say cherry picking). Only 150 to 300 games were played in those runs, whihc is simply not enough to determine a reliable elo difference. You fail to include data from other sources, and theorize about unproven "contempt scaling". Show some significant data. I have.

Stockfish playing against Stockfish with more cores has less of an elo gain than Komodo against Komodo with more cores . And those were 3000 game sets, not cherry picked sets of 150 to 300 games which have a huge error margin. Our own testing of Komodo versus Stockfish shows the same thing. 19926 games at a very fast TC, then an intermediate time control of 22830 games then more games at a longer time control show 16 elo improvement in direct K vs SF play. MP scaling shows similar changes.

Contempt is a red herring. Contempt hurts against roughly equal opponents, which means Komodo will do worse than a Contempt of 0. Nearly all testing groups use default everything other than Threads and Hash. But even with this setting which is going to hurt against SF, you still can see the clear trend in CCRL data. SF-K10.4 is 60% (59.9% for K10.3) at 40/4. At 40/40 SF scored only 49.5% against K 10.3, and SF8 vs K 10.4 at 40/40 got 47.1% (just 52 games).

Larry believes we scale better than Stockfish. But the data with more cores is simply better since there are a lot more games. As the testing groups run more games, we can back this up, but the trend seems clear already. I suggest we wait a bit to get more than a few hundred games at these time controls. The error margins are huge with just a few hundred games.
David Xu
Posts: 47
Joined: Mon Oct 31, 2016 9:45 pm

Re: FGRL 60 min + 15 sec Rating list - Komodo 10.4

Post by David Xu »

Looking at some of Stanisavljevic's other posts on Komodo threads makes it clear that he has an agenda to advance. He has presented very few facts or citations of his own, resorting instead to definitional nitpicking (LOS vs. CFS) which distracts from the real point. What little information he has presented is generally either (a) cherrypicked in order to diminish Komodo's apparent performance, (b) largely irrelevant to the discussion at hand, or (c) both. In at least one case, he has made a claim that was outright false. (http://talkchess.com/forum/viewtopic.ph ... ht=#709770)

It is one thing for a poster to prefer one engine over another. It is quite another for that poster to make a pretense at neutrality and present biased facts and statistics in order to make a particular engine/team of developers look bad. This isn't even getting into the personal attacks he has made on the integrity of various developers and testers (including, but not necessarily limited to, Mark Lefler, Larry Kaufman, and Ingo Bauer), which, apart from being rude and unnecessary, also closely resemble accusations made by conspiracy theorists.

To be clear: any arguments and/or information presented by Stanisavljevic--at least regarding new versions of Komodo, against which his bias could not be more obvious--should be regarded with the utmost suspicion. Arguing with such a character is unproductive at best, as it is impossible to change the view of someone who has already decided on what the "truth" is, and counterproductive at worst, since it gives apparent legitimacy to claims which, ideally, should receive no consideration whatsoever. (This is why public figures do not engage with conspiracy theorists.)
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: FGRL 60 min + 15 sec Rating list - Komodo 10.4

Post by Milos »

David Xu wrote:Looking at some of Stanisavljevic's other posts on Komodo threads makes it clear that he has an agenda to advance. He has presented very few facts or citations of his own, resorting instead to definitional nitpicking (LOS vs. CFS) which distracts from the real point. What little information he has presented is generally either (a) cherrypicked in order to diminish Komodo's apparent performance, (b) largely irrelevant to the discussion at hand, or (c) both. In at least one case, he has made a claim that was outright false. (http://talkchess.com/forum/viewtopic.ph ... ht=#709770)

It is one thing for a poster to prefer one engine over another. It is quite another for that poster to make a pretense at neutrality and present biased facts and statistics in order to make a particular engine/team of developers look bad. This isn't even getting into the personal attacks he has made on the integrity of various developers and testers (including, but not necessarily limited to, Mark Lefler, Larry Kaufman, and Ingo Bauer), which, apart from being rude and unnecessary, also closely resemble accusations made by conspiracy theorists.

To be clear: any arguments and/or information presented by Stanisavljevic--at least regarding new versions of Komodo, against which his bias could not be more obvious--should be regarded with the utmost suspicion. Arguing with such a character is unproductive at best, as it is impossible to change the view of someone who has already decided on what the "truth" is, and counterproductive at worst, since it gives apparent legitimacy to claims which, ideally, should receive no consideration whatsoever. (This is why public figures do not engage with conspiracy theorists.)
Piece off troll. Since you registered 6 months ago you have only 11 posts on this forum. 4 are rude and pointless oneliners, 2 are bashing Stockfish, 1 is outright ass licking of Komodo and 4 are personal attacks on me.
You are either troll created by someone from Komodo team, or someone with a serious personality disorder.
kranium
Posts: 2129
Joined: Thu May 29, 2008 10:43 am

Re: FGRL 60 min + 15 sec Rating list - Komodo 10.4

Post by kranium »

David Xu wrote:Looking at some of Stanisavljevic's other posts on Komodo threads makes it clear that he has an agenda to advance. He has presented very few facts or citations of his own, resorting instead to definitional nitpicking (LOS vs. CFS) which distracts from the real point. What little information he has presented is generally either (a) cherrypicked in order to diminish Komodo's apparent performance, (b) largely irrelevant to the discussion at hand, or (c) both. In at least one case, he has made a claim that was outright false. (http://talkchess.com/forum/viewtopic.ph ... ht=#709770)

It is one thing for a poster to prefer one engine over another. It is quite another for that poster to make a pretense at neutrality and present biased facts and statistics in order to make a particular engine/team of developers look bad. This isn't even getting into the personal attacks he has made on the integrity of various developers and testers (including, but not necessarily limited to, Mark Lefler, Larry Kaufman, and Ingo Bauer), which, apart from being rude and unnecessary, also closely resemble accusations made by conspiracy theorists.

To be clear: any arguments and/or information presented by Stanisavljevic--at least regarding new versions of Komodo, against which his bias could not be more obvious--should be regarded with the utmost suspicion. Arguing with such a character is unproductive at best, as it is impossible to change the view of someone who has already decided on what the "truth" is, and counterproductive at worst, since it gives apparent legitimacy to claims which, ideally, should receive no consideration whatsoever. (This is why public figures do not engage with conspiracy theorists.)

https://chessprogramming.wikispaces.com ... isavljevic

LOL
I think Milos know what he's talking about...
who are you David Xu?
mjlef
Posts: 1494
Joined: Thu Mar 30, 2006 2:08 pm

Re: FGRL 60 min + 15 sec Rating list - Komodo 10.4

Post by mjlef »

A little data from an outside tester. Komodo 1812.00 (a development version, well before Komodo 10.4), with thousands of games run at various time controls, one thread, Contempt is Komodo is set to 0 for these runs:

TC = 3' + 2" -40 Elo (5000 games)
TC = 10' + 6" -30 Elo (5000 games)
TC = 30' + 15" -21 Elo (2000 games)
TC = 90' + 30" -10 Elo ( 500 games)

Some contraction in elo difference is expected due to longer time controls increasing the draw rate. The longest time control has only 500 games, which is not enough to get the error margin down to prove the scaling improves, but the rest of the data is pretty convincing.

Now it is conceivable that something we changed in Komodo since this development version could have hurt scaling in K 10.4, but more recent tests still confirm the scaling Larry mentioned.

Mark
mjlef
Posts: 1494
Joined: Thu Mar 30, 2006 2:08 pm

Re: FGRL 60 min + 15 sec Rating list - Komodo 10.4

Post by mjlef »

Milos wrote:
David Xu wrote:Looking at some of Stanisavljevic's other posts on Komodo threads makes it clear that he has an agenda to advance. He has presented very few facts or citations of his own, resorting instead to definitional nitpicking (LOS vs. CFS) which distracts from the real point. What little information he has presented is generally either (a) cherrypicked in order to diminish Komodo's apparent performance, (b) largely irrelevant to the discussion at hand, or (c) both. In at least one case, he has made a claim that was outright false. (http://talkchess.com/forum/viewtopic.ph ... ht=#709770)

It is one thing for a poster to prefer one engine over another. It is quite another for that poster to make a pretense at neutrality and present biased facts and statistics in order to make a particular engine/team of developers look bad. This isn't even getting into the personal attacks he has made on the integrity of various developers and testers (including, but not necessarily limited to, Mark Lefler, Larry Kaufman, and Ingo Bauer), which, apart from being rude and unnecessary, also closely resemble accusations made by conspiracy theorists.

To be clear: any arguments and/or information presented by Stanisavljevic--at least regarding new versions of Komodo, against which his bias could not be more obvious--should be regarded with the utmost suspicion. Arguing with such a character is unproductive at best, as it is impossible to change the view of someone who has already decided on what the "truth" is, and counterproductive at worst, since it gives apparent legitimacy to claims which, ideally, should receive no consideration whatsoever. (This is why public figures do not engage with conspiracy theorists.)
Piece off troll. Since you registered 6 months ago you have only 11 posts on this forum. 4 are rude and pointless oneliners, 2 are bashing Stockfish, 1 is outright ass licking of Komodo and 4 are personal attacks on me.
You are either troll created by someone from Komodo team, or someone with a serious personality disorder.
Pleas stop making things up. The Komodo team is me and Larry. We have not made any other accounts on this site and we did not post that message. When you run out of facts, you apparently start making them up. But the truth always comes out.
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: FGRL 60 min + 15 sec Rating list - Komodo 10.4

Post by Milos »

mjlef wrote:A little data from an outside tester. Komodo 1812.00 (a development version, well before Komodo 10.4), with thousands of games run at various time controls, one thread, Contempt is Komodo is set to 0 for these runs:

TC = 3' + 2" -40 Elo (5000 games)
TC = 10' + 6" -30 Elo (5000 games)
TC = 30' + 15" -21 Elo (2000 games)
TC = 90' + 30" -10 Elo ( 500 games)

Some contraction in elo difference is expected due to longer time controls increasing the draw rate. The longest time control has only 500 games, which is not enough to get the error margin down to prove the scaling improves, but the rest of the data is pretty convincing.
This is quite normal reduction of Elo gap between 2 engines due to diminishing returns (more draws) since TCs are getting quite long. There might be some scaling advantage but it is certainly very small, few Elo at best. Contempt has noting to do with H2H matches so I really don't know why you are mentioning it in this case.

On the other hand, contempt helps a lot to Komodo on rating lists, I know it, you know it, and pretending it doesn't, just makes you a hypocrite.
Fact that Komodo consistently makes less draws than SF or H with increase in TC is a clear proof of it. Consistently worse results against SF and H than what final Elo shows is another proof. Yes contempt does hurt Komodo against SF and H, but these are 2 engines. And in the same time it help it against few dozen of other engines so net results is a lot into K's favour.
mjlef
Posts: 1494
Joined: Thu Mar 30, 2006 2:08 pm

Re: FGRL 60 min + 15 sec Rating list - Komodo 10.4

Post by mjlef »

Milos wrote:
mjlef wrote:A little data from an outside tester. Komodo 1812.00 (a development version, well before Komodo 10.4), with thousands of games run at various time controls, one thread, Contempt is Komodo is set to 0 for these runs:

TC = 3' + 2" -40 Elo (5000 games)
TC = 10' + 6" -30 Elo (5000 games)
TC = 30' + 15" -21 Elo (2000 games)
TC = 90' + 30" -10 Elo ( 500 games)

Some contraction in elo difference is expected due to longer time controls increasing the draw rate. The longest time control has only 500 games, which is not enough to get the error margin down to prove the scaling improves, but the rest of the data is pretty convincing.
This is quite normal reduction of Elo gap between 2 engines due to diminishing returns (more draws) since TCs are getting quite long. There might be some scaling advantage but it is certainly very small, few Elo at best. Contempt has noting to do with H2H matches so I really don't know why you are mentioning it in this case.

On the other hand, contempt helps a lot to Komodo on rating lists, I know it, you know it, and pretending it doesn't, just makes you a hypocrite.
Fact that Komodo consistently makes less draws than SF or H with increase in TC is a clear proof of it. Consistently worse results against SF and H than what final Elo shows is another proof. Yes contempt does hurt Komodo against SF and H, but these are 2 engines. And in the same time it help it against few dozen of other engines so net results is a lot into K's favour.
Of course Contempt helps against weaker opponent. That is the whole point, and I have never denied it. I wrote it for goodness sake! But a positive Contempt does hurt against stronger/equal opponents. Since Stockfish default contempt is zero, then they benefit a bit in direct play against us. I merely mention the contempt setting in the match I quoted so people have the important settings.

In any case you make a claim that elo contraction due to longer time controls explains away the time scaling. But you provide no data showing the magnitude of this. We study these things so know what the effect is. You cannot just make a claim and not supply data. So, what do you think the elo contraction between programs is at these time controls. Data please, not speculation. To save you time, take a look at CCRL data. Say comparing Stockfish 8 versus say Houdini 5.01. The difference in direct play is in the range of 13-19 elo, for both 40/4 and 40/40 games. Where is this contraction you speak of? AT most a few elo. Perhaps you can find a set of direct play big enough to be meaningful. At bullet speeds to blitz the effect is pretty pronounced. But at higher speeds it is quite small for most programs.

But first, do you admit your claim that Stockfish wins 55% no matter the time control has been proven false? That it was an artifact of using too few games to jump to a conclusion. Admitting you made errors will help your credibility.