Car industry and the Elo race

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Viz
Posts: 77
Joined: Tue Apr 09, 2024 6:24 am
Full name: Michael Chaly

Re: Car industry and the Elo race

Post by Viz »

chrisw wrote: Mon Apr 29, 2024 6:42 pm
Viz wrote: Mon Apr 29, 2024 4:30 pm Also we have really good progress in knowledge in area which parameters are TC sensitive and thus can be optimized for longer searches.
Namely almost every parameter in singular extensions is extremely sensitive to average game length (depth of search and w/e measurement you like).
SE is a good example (looks like loadsa Elo possible) of asking a developer a) why this works and b) what would be the basis in science of particular tweak adjustments to experiment on. Negative extensions was a neat idea and more extension for larger singularity likewise. But, we’ll reach the point where logical base for other ideas begins to shrink and you’re down to throwing guesses at it, possibly for diminishing returns. I guess, one day, with a gazillion resources, the process can be automated with a universal adjustable black box function, ie no longer do you gave to guess at an algorithm and then use learning on the parameters, but also meta-guess at a range of algorithmic possibilities and apply learning on those.
Well it also more or less in line of famous book "physicists are joking". With enough experience and understanding you can make sense of almost anything that is thrown in as a search tweak.
chrisw
Posts: 4346
Joined: Tue Apr 03, 2012 4:28 pm

Re: Car industry and the Elo race

Post by chrisw »

Viz wrote: Mon Apr 29, 2024 6:45 pm
chrisw wrote: Mon Apr 29, 2024 6:42 pm
Viz wrote: Mon Apr 29, 2024 4:30 pm Also we have really good progress in knowledge in area which parameters are TC sensitive and thus can be optimized for longer searches.
Namely almost every parameter in singular extensions is extremely sensitive to average game length (depth of search and w/e measurement you like).
SE is a good example (looks like loadsa Elo possible) of asking a developer a) why this works and b) what would be the basis in science of particular tweak adjustments to experiment on. Negative extensions was a neat idea and more extension for larger singularity likewise. But, we’ll reach the point where logical base for other ideas begins to shrink and you’re down to throwing guesses at it, possibly for diminishing returns. I guess, one day, with a gazillion resources, the process can be automated with a universal adjustable black box function, ie no longer do you gave to guess at an algorithm and then use learning on the parameters, but also meta-guess at a range of algorithmic possibilities and apply learning on those.
Well it also more or less in line of famous book "physicists are joking". With enough experience and understanding you can make sense of almost anything that is thrown in as a search tweak.
But the test is: can you propose a good reason for a tweak working before you try it. Ie science guided tweaking vs 50 monkeys.
User avatar
towforce
Posts: 11660
Joined: Thu Mar 09, 2006 12:57 am
Location: Birmingham UK

Re: Car industry and the Elo race

Post by towforce »

When you tweak one parameter, the optimal value for the other parameters will change.

If you have 5 parameters, P1 to P5, then you could try doing a global optimisation over all 5 parameters. With 5 parameters, you would need 5 versions of the following expression:

Elo rating = fn(P1, P2, P3, P4, P5)

Of course, if the relationship between the parameters with respect to the Elo rating is not close to being linear, then there's no guarantee that doing a linear global optimisation like this will get you the global optimum.
Writing is the antidote to confusion.
It's not "how smart you are", it's "how are you smart".
Your brain doesn't work the way you want, so train it!
User avatar
towforce
Posts: 11660
Joined: Thu Mar 09, 2006 12:57 am
Location: Birmingham UK

Re: Car industry and the Elo race

Post by towforce »

chrisw wrote: Mon Apr 29, 2024 7:02 pmBut the test is: can you propose a good reason for a tweak working before you try it. Ie science guided tweaking vs 50 monkeys.

Nature didn't do that when evolving the species! :)

We know two things:

1. Auto-tuning is better than human tuning

2. NNs have better chess knowledge than HCEs (and we don't know what this extra knowledge is: the evidence suggests that it's actually a large number of simple/small patterns, and not a small number of large patterns with deep insight)
Writing is the antidote to confusion.
It's not "how smart you are", it's "how are you smart".
Your brain doesn't work the way you want, so train it!
User avatar
Antihelion
Posts: 29
Joined: Tue Mar 26, 2024 8:21 pm
Full name: Lyndon S. Sears

Re: Car industry and the Elo race

Post by Antihelion »

hgm wrote: Sun Apr 28, 2024 11:50 am I noticed an interesting parallel between chess-engine development and the car industry. Let me start with a story that will teach us a moral lesson:

As there originated demand for cleaner cars, government agencies devised tests for measuring car emissions. Unfortunately these tests did not mimic the conditions of every-day use of a car very well. So it turned out to be possible to do very well in the tests, while being quite polluting in real-life usage. When it was discovered that car manifacturers like Volkswagen exploited this to the max, society did not look very kindly on these great technical achievements. It was considered cheating, the software doing this optimization was called a 'defeat device', legal claims were filed against the companies, and the involved board members were sued. If there had been a discord channel on which car engineers would have exchanged ideas for how to make their cars perform better in the tests, I am pretty sure the participants there would have been charged with 'conspiracy to defraud'.

This is what can happen if testing is not realistic, and people start to see passing the test as the goal, rather than delivering a good-quality product.

In computer chess there seems to develop a similar situation. Elo is tested in engine-engine games, but this is merely a testing method, and cannot be called an application. It is true that it can be fun to play engine tourneys, but the amount of fun is hardly dependent on the strength of the participants, for which you nowadays don't have to look very far to push the level beyond human understanding. The real application for which Elo matters is analysis. Human Chess players that want the 'absolute truth' about positions they encounter in games, or of opening lines they are preparing.

It would therefore be a bad thing if the testing conditions would be dissimilar to those used in analysis. But unfortunately it seems they are. Rating testers run their test with generous hash allocation. Which is easy, because even at LTC the time per move is sub-minute. But in analysis the time can be measured in hours. To keep the same hash-size/search-time ratio as in the tests you quickly get to insane amounts of memory, which would be never useful for any other application, and would have to be purchased for no other reason than doing the analysis. Understandably most users won't be willing to do that; for those it is only important how the engine performs on the hardware they can afford.

Of course you could argue that this is just the user's fault. Just like it was the driver's fault that they were not limiting their car trips to roads where they could drive at constant speed, without braking and accelerating, on days without wind, just as the emission tests on the roller bench did. But that argument did not fly very well in court. An engine that doesn't perform as its advertized Elo under conditions where people would want to use it, stinks as much as a car that passed the emission test by means of a 'defeat device'.

So what I want to advocate is that testing should really be done under conditions where the node count in the search tree of a typical move is a factor 10 to 100 larger than what would fit in the hash table. (Assuming, say, each entry would take 16 bytes, as we don't want to punish designs that make more efficient use of memory.) Otherwise we will run the risk that engines will only be designed to pass meaningless tests, wrecking them for use in analysis if that is needed to crank up the Elo under artificial conditions.
The exact same applies to test positions.
connor_mcmonigle
Posts: 543
Joined: Sun Sep 06, 2020 4:40 am
Full name: Connor McMonigle

Re: Car industry and the Elo race

Post by connor_mcmonigle »

Antihelion wrote: Mon Apr 29, 2024 8:00 pm ...

The exact same applies to test positions.
Definitely. Furthermore, performance on test positions is empirically more orthogonal to "analysis quality"/"Elo under LTC tournament conditions" relative to Elo under typical STC testing conditions as a metric.