Scaling of engines from FGRL rating list

Discussion of anything and everything relating to chess playing software and machines.

Moderators: Harvey Williamson, bob, hgm

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
jhellis3
Posts: 485
Joined: Fri Aug 16, 2013 10:36 pm

Re: Scaling of engines from FGRL rating list.

Post by jhellis3 » Tue Apr 11, 2017 8:49 pm

Hi Isaac, rather than just explaining things completely from my perspective, it would probably be more beneficial to you (and anyone so interested) to do most of the work/thought and draw your own conclusions from there.

It will also spare me more pointless sniping of which I am weary.

I am, however, happy to point you in the right direction in thinking about these issues.

One of the best tricks in mathematics (at least for me) has always been to draw a picture. Get out a pencil an paper and draw a few XY plots, let the X axis be time and the Y axis be score differential, Elo differential, or whatever tickles your fancy. Then you can start thinking about different ways different engines might scale with time. Are they linear, curved, do they more resemble e^x or ln(x)? Maybe some are even parabolic....

Then you might think about what factors affect the scaling rate and shape, for example time management, search (pruning), and eval.

For example, Stockfish scales very well to low time control. How do we know? Well, it is at the top of most lists. But if we look at how it scales with increasing time/depth vs some poorly scaling engine, it might look like SF scales poorly with lower time. This could be the case if an engine either was so slow or had such inferior eval, that the gains SF achieves with more time outweigh any possible Elo compression.

Another place to look is the at the data set Mark posted earlier in this thread: {-40, -30, -21, -10} with each data point roughly 3x the time than the previous.

Now consider various ways that set might continue...
{-40, -30, -21, -10, 0, 10, 20, 30, 40, 50, ...}
or maybe { -40, -30, -21, -10, -7, -7, -5, -5, -5, -5, -4, -4, -4, -4, -4, -4....}

or many others....

If Komodo never surpasses SF as T approaches infinity would you say it scales better than SF with time? I mean technically one could do that, but I wouldn't say such a statement would be the most accurate representation of the truth.

Another question to ask is as what point in time is engine A supposed to outscale engine B, and by how much. If for example, a claim that engine A will be 2 Elo better than engine B at time control of 1440' + 480" on 64 cores is rather meaningless unless one has the 20,000+ games to back it up. Indeed, such a claim would be considered unfalsifiable for all practical purposes. What you choose to do with an unfalsifiable claim is none of my business though.

Anyway, hope this helps... If you still have some questions, send me a pm, and I will try to answer them more directly there.

Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 10:41 am

Re: Scaling of engines from FGRL rating list.

Post by Lyudmil Tsvetkov » Thu Apr 13, 2017 6:01 am

Isaac wrote:
jhellis3 wrote:Instead of saying Andscacs scales well with increasing time, we might say it actually just scales horribly with decreasing time. The problem is we only looked at 2 data points and have no way of knowing for sure, without broadening our scope. And it doesn't even have to be one or the other, but could potentially be a combination of both, where Andscacs does scale better with more time but not nearly as much as it first appears because it also scales relatively poorly with less time.
Hello Joseph, I would like to understand you here but I fail to see the difference between scaling better with increasing time and scaling badly with decreasing time. To me, it is exactly the same, just another way of describing the same effect.

For example I can't imagine a way to scale both well at increasing and decreasing time control. Will you (or any other) please help me to figure this particular case out? Thank you.
scaling well at LTC is of course, much more important than doing well at shorter TC, as with more time the quality of the game is improved and that is whete you find the better moves. So, we are all interested at longer TC.

I am pretty certain Komodo would scale better than SF with increasing time. In general. Quite probably due to more reasonable eval and search parameter values, instead of SF-like over-formulistic approach. (followed, btw., by almost all other existing engines) However, I also guess sclaing will be very much speicific TC-dependent, so you will need 5 to 10 different TC data points to draw any definitive conclusions. I guess that, if one collects and analyses such a data set, Komodo will still scale better than SF, but the margins will decrease.

I also guess, as confirmed by some data Kai posted, contempt works better at LTC, for the simple fact that chance events tend to decrease with LTC/better moves, and in order to win positions close to 0.0, and randomness would not help, as with STC, you simply need to find good moves to move away from the drawish line.

Btw., contempt is more of an eval change, is not it?

mjlef
Posts: 1454
Joined: Thu Mar 30, 2006 12:08 pm
Contact:

Re: Scaling of engines from FGRL rating list.

Post by mjlef » Thu Apr 13, 2017 12:10 pm

Joseph,

You are right in that we do not know how the scaling of any program will go at time controls that we do not have data on. Scaling could be continuous, approach something asymptotically, or even have a a hump where it starts getting worse with more time/cores. We only know what we know.

Mark

Ralph Stoesser
Posts: 408
Joined: Sat Mar 06, 2010 8:28 am

Re: Scaling of engines from FGRL rating list.

Post by Ralph Stoesser » Sat Apr 15, 2017 11:57 am

Isaac wrote: For example I can't imagine a way to scale both well at increasing and decreasing time control. Will you (or any other) please help me to figure this particular case out? Thank you.
This is what I understand. You have two time controls t1 (short tc) and t2 (long tc) and you have two corresponding elo performances elo1 and elo2. Now you look at the difference d of the two elo performances and call that value scaling. If you interpret d as a positive number (d=elo2-elo1) you have "scaling from low to high tc". If you interpret d as a negative number (d=elo1-elo2) you have "scaling from high tc to low tc". So, of course its the same number in both cases. I have absolutely no idea what others here are talking about, but that's how I understand it.

Post Reply