which engine is stronger?

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

which engine is stronger?

Engine X1
16
64%
Engine X2
3
12%
No difference
6
24%
 
Total votes: 25

User avatar
Rebel
Posts: 7435
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

which engine is stronger?

Post by Rebel »

Imagine 2 matches between very strong engines, Engine X1 and X2 versus Engine Y.

Match conditions:
16 cores
Hash 4Gb
Time control 40 moves in 2 hours

Match-one, X1 vs Y, 1000 games, result 550 - 450, W=100 | D=900 | L=0
Match-two, X2 vs Y, 1000 games, result 550 - 450, W=200 | D=700 | L=100

Same result.

Nevertheless, which engine is stronger, X1 or X2 ?

Allow re-voting enabled.
90% of coding is debugging, the other 10% is writing bugs.
User avatar
towforce
Posts: 12709
Joined: Thu Mar 09, 2006 12:57 am
Location: Birmingham UK
Full name: Graham Laight

Re: which engine is stronger?

Post by towforce »

I voted X1. My reason: losing fewer games indicates fewer mistakes made, and, with the draw rate increasing at the top level of computer chess, it's looking as though mistakes are needed to lose games - so X1 would be closer to the highest possible standard of play in chess (which would be never making mistakes, and winning games in which an opponent made a mistake).

I would be interested to know what the actual elo calculation would be for the 2 different scenarios, if anyone knows.
Human chess is partly about tactics and strategy, but mostly about memory
jdart
Posts: 4418
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: which engine is stronger?

Post by jdart »

SPRT provides a way to quantify this, but it's better to use pentanomial SPRT: that assumes matches are in groups of two with each player getting White and Black from the same position (https://www.chessprogramming.org/Match_Statistics#SPRT).
Lazy_Frank
Posts: 74
Joined: Mon Jul 23, 2018 10:56 pm
Location: Latvia
Full name: Raivis Baumanis

Re: which engine is stronger?

Post by Lazy_Frank »

towforce wrote: Sat Jul 30, 2022 9:35 am I would be interested to know what the actual elo calculation would be for the 2 different scenarios, if anyone knows.
+35, +-7
+35, +-12

Just different error bars.
User avatar
Scally
Posts: 232
Joined: Thu Sep 28, 2017 9:34 pm
Location: Bermondsey, London
Full name: Alan Cooper

Re: which engine is stronger?

Post by Scally »

Hi all,

I too am leaning towards X1 being the stronger as it didn’t lose any games. However in todays age where 3 points are given for a win and 1 for a draw, then engine X2 would top a league table.

Al
User avatar
towforce
Posts: 12709
Joined: Thu Mar 09, 2006 12:57 am
Location: Birmingham UK
Full name: Graham Laight

Re: which engine is stronger?

Post by towforce »

Lazy_Frank wrote: Sat Jul 30, 2022 10:00 am
towforce wrote: Sat Jul 30, 2022 9:35 am I would be interested to know what the actual elo calculation would be for the 2 different scenarios, if anyone knows.
+35, +-7
+35, +-12

Just different error bars.

Thank you - that was helpful.
Human chess is partly about tactics and strategy, but mostly about memory
User avatar
hgm
Posts: 28428
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: which engine is stronger?

Post by hgm »

Well, whether an engine is stronger or not is not just decided on how it performs against a single opponent. But suppose Y is actually a representative pool of opponents.

The problem here is that the Elo model obviously fails. The rating difference deduced from the total score would predict a much larger loss rate for the stronger engine than is compatible with 0 out of 1000. You can ignore that, and derive av Elo from the 55% result like nothing is amiss, but that number will have only little predictive value. As any predictions made from Elo ratings assume the underlying model is valid, rather than invalid.

In a tournament X2 would have the better chances to end first place.
User avatar
Rebel
Posts: 7435
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: which engine is stronger?

Post by Rebel »

hgm wrote: Sat Jul 30, 2022 4:53 pm Well, whether an engine is stronger or not is not just decided on how it performs against a single opponent. But suppose Y is actually a representative pool of opponents.

The problem here is that the Elo model obviously fails. The rating difference deduced from the total score would predict a much larger loss rate for the stronger engine than is compatible with 0 out of 1000. You can ignore that, and derive av Elo from the 55% result like nothing is amiss, but that number will have only little predictive value. As any predictions made from Elo ratings assume the underlying model is valid, rather than invalid.

In a tournament X2 would have the better chances to end first place.
Thank you.

The reason why I created this poll is the growing number of draws among the top engines and the longer the time control the more draws. It's already reflected in the 2 most popular rating lists, CCRL and CEGT. An example would be the CCRL 40/15 list, indicating no progress since Stockfish 13, which hard to believe. CEGT by their own admission have similar problems.

A picture:

Image

We see several examples Stockfish 15 not losing a single game and yet it gets negative rating points.

And I start to wonder if the elo models used (CEGT=Ordo and CCRL=Bayeselo) basically based on 1% = 7 elo are still working correctly, in any case not in my X1 example. IMO the X1 version is a lot stronger than the X2 version. My gut feeling (not very scientific I know, but based my understanding of (computer) chess) says: the stronger an engine plays, the harder it becomes to beat it and the percentage of lost games should become part of the elo formula.
90% of coding is debugging, the other 10% is writing bugs.
User avatar
towforce
Posts: 12709
Joined: Thu Mar 09, 2006 12:57 am
Location: Birmingham UK
Full name: Graham Laight

Re: which engine is stronger?

Post by towforce »

I'll just reiterate a couple of points I recently made on the ProDeo forum (Ed's computer chess and general discussion forum):

1. In the 1700s, Thomas Bayes (the statistician behind Bayes Theorem) pointed out that you get a better answer if you take ALL of the available information into account. Arpad Elo either wasn't aware, or took no account, of the fact that the higher the player rating, the higher the draw rate between players of similar rating (assuming the players are at least strong enough to be able to get a win).

Armed with this information, we can say that, given the following results...

Player A v Player B: 100 draws

Player C v Player D: 50 wins each

a. Players A and B are about the same strength, and players C and D are also about the same strength

b. Players A and B are likely to be stronger than players C and D


2. The Elo ratings might be approaching its upper limit

If chess is a draw (I think it's very likely that it is: computers have done massive amounts of searching (and book building) from the opening position, but no way has been found to either force the win of material or to get a decisive positional advantage), then avoiding a mistake will guarantee that you won't lose. For me, the increasing draw rate indicates that mistakes are becoming rare. When mistakes become "extremely rare", we will probably be close to the upper limit of the Elo rating system.

For me, this is likely to be a part of the problem that Ed is raising here: as we approach this upper limit, no amount of improvement in either software or hardware will confer large gains in Elo ratings.


3. Another possibility is that SF already does many things so well that giving it extra hardware will not bring about a significant improvement in how it does those things (hat tip to Chris W for that one, who wrote a detailed post in ProDeo on this idea)
Human chess is partly about tactics and strategy, but mostly about memory
User avatar
hgm
Posts: 28428
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: which engine is stronger?

Post by hgm »

Well, the 100/900/0 result is compatible with the hypothesis that X1 plays perfect chess. The 200/700/100 on the other hand shows that in at least 10% of the games X2 plays a losing move. My first thought was also that this made X1 stronger. But then I started to doubt that. Because X2 seems to be an awful lot better in either exploiting unforced opponent errors, or in putting pressure on the opponents that makes them do losing moves.

I think it all boils down to a deeper question of a more philosophical nature: what do we actually mean by 'stronger chess'? Perhaps X1 plays very 'bland' chess, taking no initiative, keeping the position approximately even, just waiting for the opponent to blunder. While X2 plays very wild, often making sacrifices, some unsound but very difficult to refute. Is it really a weakness to take a 50-50 gamble on a win or loss, rather than preferring a certain draw? In a tournament it would be better to take the gamble, because if you are lucky (or twice lucky), it can mean the championship, while going for certainty will make you end in the middle of the ranking. So even if the player would know better, it would be smart to choose the X2 strategy over the X1 strategy.

To get good predictive modelling of these high-quality games you would probably need to describe the player by two numbers. E.g. representing 'strength' and 'drawishness'. Such a system might be very good at predicting results of matches, but wouldn't tell you anymore which is 'better'. That concept can exist between points on a line, but not for points in a plane.