which engine is stronger?

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

which engine is stronger?

Engine X1
16
64%
Engine X2
3
12%
No difference
6
24%
 
Total votes: 25

User avatar
towforce
Posts: 12699
Joined: Thu Mar 09, 2006 12:57 am
Location: Birmingham UK
Full name: Graham Laight

Re: which engine is stronger?

Post by towforce »

hgm wrote: Sun Jul 31, 2022 10:46 amIs it really a weakness to take a 50-50 gamble on a win or loss, rather than preferring a certain draw?

Code: Select all

if (probability(success) = 0.5)
	"not weakness";

else if (in_Unsound_Moves_Get_Killed_World)
	"weakness"

Increasingly, we are living in "Unsound Moves Get Killed World" - and not just in chess!
Human chess is partly about tactics and strategy, but mostly about memory
Lazy_Frank
Posts: 74
Joined: Mon Jul 23, 2018 10:56 pm
Location: Latvia
Full name: Raivis Baumanis

Re: which engine is stronger?

Post by Lazy_Frank »

hgm wrote: Sun Jul 31, 2022 10:46 am ... The 200/700/100 on the other hand shows that in at least 10% of the games X2 plays a losing move...
Given information do not says which openings (if any) engines play's. Can be "objective busted" and both engines just converts all of them.
User avatar
hgm
Posts: 28426
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: which engine is stronger?

Post by hgm »

True, I assumed balanced openings.
jkominek
Posts: 98
Joined: Tue Sep 04, 2018 5:33 am
Full name: John Kominek

Re: which engine is stronger?

Post by jkominek »

We see several examples Stockfish 15 not losing a single game and yet it gets negative rating points.
I would characterize it differently. BayesElo and Ordo optimize whole-tournament rating assignments. Looking at the image you cut & pasted from CCRL showing Stockfish 15 matches, in all but two encounters (Shash 22, Dragon 3), every engine is an Elo donor. Some donate more than expected, some less. As I interpret it, the performance column is taken relative to the rating difference expectation. There will always be some on the negative side, and some on the positive side of the balance.

I confess I cannot duplicate the CCRL Perf column by hand. Example: the rating difference to Berserk 9 is 86 Elo. From the standard equation this predicts a performance of 62.13%. Yet, feeling in form, Stockfish mopped the floor with Berserk and scored 73.21%. This corresponds to a rating delta of 174.7 ... 175-86 = 89. But, they show the performance as only +53. So, I dunno.