future of top engines:how much more elo?

Ovyron · Post by **Ovyron** » Wed Jul 24, 2019 9:57 am

But that doesn't matter, the algorithm just needs to use an strategy that allows it to have the worst performance against the pool of opponents, and we make it be calibrated to 0 elo. Ideally, the rating lists would include these algorithms in their pools (so far rating lists have been incredibly biased for Alphabeta algorithms) and calibrate the bottom one to 0 elo.

Right now Stockfish has around 3450 elo on the CCRL, but "3450" is meaningless because it's calibrated arbitrarily. The question of what would be Stockfish's rating in a pool where the worst possible chess entity has 0 elo remains unanswered.

Guenther · Post by **Guenther** » Wed Jul 24, 2019 10:02 am

Dann Corbit wrote: ↑Tue Jul 23, 2019 2:51 am If a random moving engine played an infinite number of games against SF, some of the games would be wins for the random engine.
Obviously, a very tiny percentage.

But, just by chance, a win could happen early in the chain.

A random engine will not be zero Elo.
An engine that chooses the worst possible move after a careful search might be able to hit zero.

I want to clarify, why I always was against setting the worst player to zero instead of the random player and think it is better
to use a negative scale for them below the random mover as a kind of antagonist.

Trying to lose deliberately is IMHO another game than chess. The goal of chess is at least not trying to lose.
(May be there is even a chess rule for this? Sandbagging, but for no reason?)

Ovyron · Post by **Ovyron** » Wed Jul 24, 2019 10:08 am

Playing to win isn't a rule of chess. You win if you checkmate the opponent's king, but that's just if you want to win. Actually, a kid that is taught losing games is favorable would be very happy on her chess career tanking games. A player that wants to lose against one trying to win would lead to a game where both players end happy.

Guenther · Post by **Guenther** » Wed Jul 24, 2019 10:11 am

Ovyron wrote: ↑Wed Jul 24, 2019 10:08 am Playing to win isn't a rule of chess.

I did not write this... try to quote and you'll see - this is the last time I reply to you BTW (normally you are on ignore, since this board has this feature).

Hint:

The goal of chess is at least not trying to lose.

Uri Blass · Post by **Uri Blass** » Wed Jul 24, 2019 11:28 am

Ovyron wrote: ↑Wed Jul 24, 2019 9:57 am But that doesn't matter, the algorithm just needs to use an strategy that allows it to have the worst performance against the pool of opponents, and we make it be calibrated to 0 elo. Ideally, the rating lists would include these algorithms in their pools (so far rating lists have been incredibly biased for Alphabeta algorithms) and calibrate the bottom one to 0 elo.

Right now Stockfish has around 3450 elo on the CCRL, but "3450" is meaningless because it's calibrated arbitrarily. The question of what would be Stockfish's rating in a pool where the worst possible chess entity has 0 elo remains unanswered.

The point is that the right algorithm is different with a different pool of players.

Rating has no meaning unless we agree about the participants.

The CCRL rating of the random player is above 0 when many opponents capture the pieces of the random player only to allow repetition or stalemate later but if you add replace the weakest players it is easy to get the random player rating below 0.

chrisw · Post by **chrisw** » Wed Jul 24, 2019 1:07 pm

Guenther wrote: ↑Wed Jul 24, 2019 10:02 am
Dann Corbit wrote: ↑Tue Jul 23, 2019 2:51 am If a random moving engine played an infinite number of games against SF, some of the games would be wins for the random engine.
Obviously, a very tiny percentage.

But, just by chance, a win could happen early in the chain.

A random engine will not be zero Elo.
An engine that chooses the worst possible move after a careful search might be able to hit zero.
I want to clarify, why I always was against setting the worst player to zero instead of the random player and think it is better
to use a negative scale for them below the random mover as a kind of antagonist.

Trying to lose deliberately is IMHO another game than chess. The goal of chess is at least not trying to lose.
(May be there is even a chess rule for this? Sandbagging, but for no reason?)

Yes, some good points there.
Normal chess = trying to win
Random chess = not trying to do anything
Losing chess = deliberate losing, which may include playing much better in early stages.

There’s an easy to build “worse than random” chess that isn’t deliberately trying to lose, which would be: choose a random move from the non-capture list, if there aren’t any non captures, randomly choose a capture. This type of random mover will tend to end up with fewer pieces and will be more likely to lose to fully random.

Calibrating fully random = 0 seems sensible, with worse than random giving negative Elos. If best-at-losing were to be calibrated at zero, somebody could always produce a better best-at-losing, which then requires a recalibration.

Laskos · Post by **Laskos** » Wed Jul 24, 2019 2:14 pm

todd wrote: ↑Tue Jul 23, 2019 8:56 pm A fun engine idea: Start out playing like full-strength Stockfish (in order to win some material) and then use near stalemate positions and self-smothering to force the opponent to checkmate it.

Yes, I think it's fairly easy to build a player which virtually always loses to the random player using such as scheme. The rate of losses will be such, that it would be 2,000+ Elo points below the random player at some macroscopical time per move, similarly to the full strength SF being 3,000+ Elo points above the random player.

I remember I played Andscacs Random, which has a setting for degree of randomness, in a pool of 90% random, 80% random, ... , 10% random, 0% random (full Andscacs at short TC), and it seems this way of ranking, random player would land in negative territory on CCRL 40/4' rating list, some -300 CCRL Elo IIRC. But the rating will depend on the pool of players, for example players which crash from time to time don't obey the Elo scheme of ratings, and will skew it for the random player too.

Robert Pope · Post by **Robert Pope** » Wed Jul 24, 2019 4:27 pm

Ovyron wrote: ↑Wed Jul 24, 2019 9:57 am But that doesn't matter, the algorithm just needs to use an strategy that allows it to have the worst performance against the pool of opponents, and we make it be calibrated to 0 elo. Ideally, the rating lists would include these algorithms in their pools (so far rating lists have been incredibly biased for Alphabeta algorithms) and calibrate the bottom one to 0 elo.

Right now Stockfish has around 3450 elo on the CCRL, but "3450" is meaningless because it's calibrated arbitrarily. The question of what would be Stockfish's rating in a pool where the worst possible chess entity has 0 elo remains unanswered.

If we were to recalibrate ratings, I would calibrate based on a random mover at 0, not the worst mover.
1. A random mover is something that can be objectively specified and validated, and its strength is stable. Worst mover is not.
2. A worst mover is a moving target, same as best mover. So someone improves their worst mover engine, and Stockfish and everyone else gain 100 elo immediately. That seems like a terrible basis for a rating system.

The problem with using random as 0 elo is that the random engine doesn't improve with more time, which isn't a characteristic of normal chess engines. That means that the rating list scales for different time controls won't be comparable. The longer the time control, the more it will be stretched. Right now 2000 elo at 40/4 CCRL is in the same ballpark as 2000 elo at 40/40. The value 2000 has meaning even when talking different time controls. That wouldn't hold anymore with random=0, and I think the problem is even made worse if worstmover=0.

I think a proper calibration system actually requires three components:
1. A random mover engine to set the baseline for 0 elo.
2. A baseline time control, e.g. 40/4.
3. A stable reference engine (and version)

You use the first two to develop a rating list for a given time control, with 0 elo pegged to the random mover. Then for all other time controls, you peg the reference engine's elo to the same value.

Ovyron · Post by **Ovyron** » Wed Jul 24, 2019 6:32 pm

@Guenther: Please don't remove me from your ignore list, it was quite hard to get in there

Robert Pope wrote: ↑Wed Jul 24, 2019 4:27 pm That means that the rating list scales for different time controls won't be comparable.

That has an easy solution: allow engines of one time control to play against engines of another time control (say, 40/4 Vs. 40/40), then the lists are directly comparable, currently we have:

3431 elo Stockfish 9 64-bit 4CPU 40/40
3546 elo Stockfish 9 64-bit 4CPU 4/40

According to this cutting thinking time to one tenth makes engine play 100 elo stronger??

Forget everything I've said, just, do anything to make these ratings sensible, I don't really care where Random ends at...

carldaman · Post by **carldaman** » Wed Jul 24, 2019 10:41 pm

They are separate rating lists, meaning you can't compare those two ratings.

Unless you were joking, of course.

future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?