ShashChess

pohl4711 · Post by **pohl4711** » Thu Apr 25, 2024 7:05 am

Graham Banks wrote: ↑Thu Apr 25, 2024 1:39 am
Uri Blass wrote: ↑Thu Apr 25, 2024 1:29 amUHO is a different type of game and in theory it is possible that the better engine with UHO is not the better engine without UHO.
Quite correct.
If either CCRL or CEGT were to use UHO openings, it would skew and invalidate their previous testing.
That is why Stefan kept two lists - his main list and his UHO list.

First of all: not correct.
Second: I did not keep two lists. My old SPCC-ratinglist using balanced HERT opening, is abandoned since I started my UHO-Top15 Ratinglist (october 2023). My old results of the SPCC-ratinglist are still available on my website, but just for keeping the EAS-scorings of the weaker engines visible (like Velvet 4.1.0 oder Revenge 1.0 for example).

For me, it makes absolutely no sense, to use balanced openings in top-level computerchess anymore. So, I dont do it. Like TCEC and chesscom. And the developers of Stockfish, Torch, and all other top-engines. No one here uses balanced openings anymore. Not for tournaments, not for engine development...
Balanced openings are dead. UHO is the future (and I am very proud, that I invented and created UHO)
Curious to see, when (finally) all computerchess fans will understand this (IMHO) simple fact.

The only truth in the above posting is this:
"If either CCRL or CEGT were to use UHO openings, it would skew and invalidate their previous testing."
That is indeed a huge problem. I definitly agree, that it would be total statistical nonsense, to connect results of balanced openings and UHO openings. So, CEGT/CCRL are in serious trouble here.
The only solution, I see, is (perhaps) to split the ratinglists and testruns up: Keep using balanced openings for the testruns of the weaker engines, but doing a separate UHO-Ratinglist for the top-level (like my UHO-Top15 Ratinglist). Means something like a 2-league double-list. Just an idea.

Graham Banks · Post by **Graham Banks** » Thu Apr 25, 2024 7:31 am

pohl4711 wrote: ↑Thu Apr 25, 2024 7:05 am
Graham Banks wrote: ↑Thu Apr 25, 2024 1:39 am
Uri Blass wrote: ↑Thu Apr 25, 2024 1:29 amUHO is a different type of game and in theory it is possible that the better engine with UHO is not the better engine without UHO.
Quite correct.
If either CCRL or CEGT were to use UHO openings, it would skew and invalidate their previous testing.
That is why Stefan kept two lists - his main list and his UHO list.
First of all: not correct.
Second: I did not keep two lists. My old SPCC-ratinglist using balanced HERT opening, is abandoned since I started my UHO-Top15 Ratinglist (october 2023). My old results of the SPCC-ratinglist are still available on my website, but just for keeping the EAS-scorings of the weaker engines visible (like Velvet 4.1.0 oder Revenge 1.0 for example).

For me, it makes absolutely no sense, to use balanced openings in top-level computerchess anymore. So, I dont do it. Like TCEC and chesscom. And the developers of Stockfish, Torch, and all other top-engines. No one here uses balanced openings anymore. Not for tournaments, not for engine development...
Balanced openings are dead. UHO is the future (and I am very proud, that I invented and created UHO)
Curious to see, when (finally) all computerchess fans will understand this (IMHO) simple fact. LOL - you're full of yourself, little man.

Matter of opinion. As a chess purist, I'd rather watch engines play fair openings. I'm sure that they could contribute much to existing opening theory.
Without wishing to offend you, UHO openings usually give one side an unfair advantage from the start. Good for entertainment purposes only, but useless for a meaningful rating list.
Regardless of our approaches, we all do what we enjoy.

The only truth in the above posting is this:
"If either CCRL or CEGT were to use UHO openings, it would skew and invalidate their previous testing."
That is indeed a problem. I agree definitly, that it would be total statistical nonsense, to connect results of balanced openings and UHO openings. So, in the future, CEGT/CCRL will get in serious trouble here.

There is not a lot of rating difference between top chess players, so it makes perfect sense that this would be even more pronounced with engines.
I'll tell you what though. Stockfish still has the ability to win more often than other top engines when using balanced openings.

RubiChess · Post by **RubiChess** » Thu Apr 25, 2024 7:45 am

Graham Banks wrote: ↑Thu Apr 25, 2024 7:31 am Without wishing to offend you, UHO openings usually give one side an unfair advantage from the start. Good for entertainment purposes only, but useless for a meaningful rating list.

Just my 2ct:
UHO openings are perfectly fair when game pairs are played. Call it "the art to hold the bad side while converting the good side". Playing game pairs is something that your ranking list misses sometimes. This "ChessGUI replays unfair openings" feature when the reverse game already was played and no replay happened... I'm creating a database of CCRL games that will allow some investigation regarding openings and their evals but it will take some time.

Regards, Andreas

pohl4711 · Post by **pohl4711** » Thu Apr 25, 2024 8:06 am

RubiChess wrote: ↑Thu Apr 25, 2024 7:45 am
Graham Banks wrote: ↑Thu Apr 25, 2024 7:31 am Without wishing to offend you, UHO openings usually give one side an unfair advantage from the start. Good for entertainment purposes only, but useless for a meaningful rating list.
Just my 2ct:
UHO openings are perfectly fair when game pairs are played. Call it "the art to hold the bad side while converting the good side". Playing game pairs is something that your ranking list misses sometimes.

Thats why my UHO-Top15 Ratinglist has 2 versions, one is recalculated with my Gamepairs Rescoring Tool. That is the real future of ratinglists IMHO:

Code: Select all

   # PLAYER                   :  RATING  ERROR  PLAYED     W     D     L   (%)  CFS(%)
   1 Stockfish 240413 avx2    :    3852     14    7500  6196  1161   143  90.4     100
   2 Stockfish 16.1 240224    :    3833   ----    7500  6111  1183   206  89.4     100
   3 Torch 2 popavx2          :    3708     13    7500  5296  1661   543  81.7     100
   4 Berserk 13 avx2          :    3546     13    7500  4062  2159  1279  68.6     100
   5 KomodoDragon 3.3 avx2    :    3522     13    7500  3815  2310  1375  66.3     100
   6 Obsidian 12.0 avx2       :    3408     13    7500  2708  2742  2050  54.4     100
   7 Caissa 1.18 avx2         :    3373     12    7500  2427  2711  2362  50.4     100
   8 RubiChess 240112 avx2    :    3325     13    7500  1999  2744  2757  44.9      98
   9 Ethereal 14.25 nnue      :    3316     13    7500  1941  2702  2857  43.9     100
  10 PlentyChess 1.0 avx2     :    3264     12    7500  1452  2796  3252  38.0     100
  11 Alexandria 6.1.0 avx2    :    3221     13    7500  1162  2645  3693  33.1      98
  12 Seer 2.8.0 avx2          :    3212     13    7500  1094  2627  3779  32.1     100
  13 CSTal 2.0 avx2           :    3180     13    7500   893  2499  4108  28.6      63
  14 Rebel 16.3 avx2          :    3178     13    7500   876  2508  4116  28.4     100
  15 Clover 6.1 avx2          :    3152     13    7500   767  2322  4411  25.7     100
  16 Koivisto 9.2 avx2        :    3138     13    7500   689  2254  4557  24.2     ---

------------------------------------------------------------------- 
--- Number of all Gamepairs          : 60000 
--- Number of drawn Gamepairs overall: 18512 (= 30.85%) 
--- Number of 1:1 drawn Gamepairs    :  8542 (= 14.24%) 
--- Number of 2-draws drawn Gamepairs:  9970 (= 16.62%) 
-------------------------------------------------------------------

These statistics are close to perfect... The number of 2draws-drawn gamepairs and the number of 1:1 drawn gamepairs are nearly the same. And nearly all CFS-values are 100%. What more anybody can expect from a ratinglist (better: rankinglist (superhuman Computer-Elos are pure fiction, its all about the ranking, not the rating. And for this: The wider the Elo-spreadings, the better! And here we have 700 C-Elo difference from Stockfish 16.1 to Koivisto 9.2. How cool is that?))

Graham Banks · Post by **Graham Banks** » Thu Apr 25, 2024 8:22 am

pohl4711 wrote: ↑Thu Apr 25, 2024 8:06 am
RubiChess wrote: ↑Thu Apr 25, 2024 7:45 am
Graham Banks wrote: ↑Thu Apr 25, 2024 7:31 am Without wishing to offend you, UHO openings usually give one side an unfair advantage from the start. Good for entertainment purposes only, but useless for a meaningful rating list.
Just my 2ct:
UHO openings are perfectly fair when game pairs are played. Call it "the art to hold the bad side while converting the good side". Playing game pairs is something that your ranking list misses sometimes.
Thats why my UHO-Top15 Ratinglist has 2 versions, one is recalculated with my Gamepairs Rescoring Tool. That is the real future of ratinglists IMHO:
Code: Select all
   # PLAYER                   :  RATING  ERROR  PLAYED     W     D     L   (%)  CFS(%)
   1 Stockfish 240413 avx2    :    3852     14    7500  6196  1161   143  90.4     100
   2 Stockfish 16.1 240224    :    3833   ----    7500  6111  1183   206  89.4     100
   3 Torch 2 popavx2          :    3708     13    7500  5296  1661   543  81.7     100
   4 Berserk 13 avx2          :    3546     13    7500  4062  2159  1279  68.6     100
   5 KomodoDragon 3.3 avx2    :    3522     13    7500  3815  2310  1375  66.3     100
   6 Obsidian 12.0 avx2       :    3408     13    7500  2708  2742  2050  54.4     100
   7 Caissa 1.18 avx2         :    3373     12    7500  2427  2711  2362  50.4     100
   8 RubiChess 240112 avx2    :    3325     13    7500  1999  2744  2757  44.9      98
   9 Ethereal 14.25 nnue      :    3316     13    7500  1941  2702  2857  43.9     100
  10 PlentyChess 1.0 avx2     :    3264     12    7500  1452  2796  3252  38.0     100
  11 Alexandria 6.1.0 avx2    :    3221     13    7500  1162  2645  3693  33.1      98
  12 Seer 2.8.0 avx2          :    3212     13    7500  1094  2627  3779  32.1     100
  13 CSTal 2.0 avx2           :    3180     13    7500   893  2499  4108  28.6      63
  14 Rebel 16.3 avx2          :    3178     13    7500   876  2508  4116  28.4     100
  15 Clover 6.1 avx2          :    3152     13    7500   767  2322  4411  25.7     100
  16 Koivisto 9.2 avx2        :    3138     13    7500   689  2254  4557  24.2     ---

------------------------------------------------------------------- 
--- Number of all Gamepairs          : 60000 
--- Number of drawn Gamepairs overall: 18512 (= 30.85%) 
--- Number of 1:1 drawn Gamepairs    :  8542 (= 14.24%) 
--- Number of 2-draws drawn Gamepairs:  9970 (= 16.62%) 
------------------------------------------------------------------- 
These statistics are close to perfect... The number of 2draws-drawn gamepairs and the number of 1:1 drawn gamepairs are nearly the same. And nearly all CFS-values are 100%. What more anybody can expect from a ratinglist (better: rankinglist (superhuman Computer-Elos are pure fiction, its all about the ranking, not the rating. And for this: The wider the Elo-spreadings, the better! And here we have 700 C-Elo difference from Stockfish 16.1 to Koivisto 9.2. How cool is that?))

Many engine enthusiasts appreciate your approach and your efforts.
I guess I'm just trying point out that many others don't like UHO.
There is room for different approaches, but there is no perfect way.

Rebel · Post by **Rebel** » Thu Apr 25, 2024 9:03 am

UHO openings favor the stronger engines with the better search and thus can handle UHO better. This is why CCRL and CEGT don't want to move and that is perfectly understandable. OTOH, in the meantime both lists IMO show an unrealistic top 15 regarding elo differences, especially the top 3-5. I have seen Dragon 3.3 on top of Stockfish some time ago, this can't be.

CCRL and CEGT could distinguish themselves on top of what they do already to start a rating list in SSDF style allowing opening books, learning. In this way lower rared engines can defend themselves better against the stronger ones. And it is exactly the way how humans compete since the dawn of chess. Why should we do things differently?

pohl4711 · Post by **pohl4711** » Thu Apr 25, 2024 9:58 am

Graham Banks wrote: ↑Thu Apr 25, 2024 8:22 am
Many engine enthusiasts appreciate your approach and your efforts.
I guess I'm just trying point out that many others don't like UHO.

The point is: It all is not about, what we like or dislike. But all about what works (and will work in the future) and what does not work or will not work in the future anymore.
If balanced openings would work now and in the future - I would never ever have made UHO. Why should I? I used balanced openings myself. But I understood some years ago, that balanced openings are a road with a dead end.

Because I saw this:
http://www.fastgm.de/time-control4.html

http://www.fastgm.de/K93-Doubling-TC1.png

I cant say this clearer than this experiment is... With balanced openings the Elo-spreading will shrink and shrink the stronger engines or the better the hardware gets. And the draw-ratio skyrockets towards 100%. So, balanced openings are a statistical dead end. It is that simple. What else could anybody say?

Graham Banks · Post by **Graham Banks** » Thu Apr 25, 2024 10:02 am

pohl4711 wrote: ↑Thu Apr 25, 2024 9:58 am
Graham Banks wrote: ↑Thu Apr 25, 2024 8:22 am
Many engine enthusiasts appreciate your approach and your efforts.
I guess I'm just trying point out that many others don't like UHO.
The point is: It all is not about, what we like or dislike. But all about what works (and will work in the future) and what does not work or will not work in the future anymore.
If balanced openings would work now and in the future - I would never ever have made UHO. Why should I? I used balanced openings myself. But I understood some years ago, that balanced openings are a road with a dead end.

Because I saw this:
http://www.fastgm.de/time-control4.html

http://www.fastgm.de/K93-Doubling-TC1.png

Many agree with regards to engine testing, but many don't.
It's good to have different approaches.

As a chess purist, you're unlikely to see Carlsen and other top GM's playing UHO lines.
For myself, as a player with a close to 2400 ICCF rating (with no computer help), I prefer balanced lines in which one needs to fight to gain an advantage.

Modern Times · Post by **Modern Times** » Thu Apr 25, 2024 10:12 am

Rebel wrote: ↑Thu Apr 25, 2024 9:03 am UHO openings favor the stronger engines with the better search and thus can handle UHO better. This is why CCRL and CEGT don't want to move and that is perfectly understandable. OTOH, in the meantime both lists IMO show an unrealistic top 15 regarding elo differences, especially the top 3-5. I have seen Dragon 3.3 on top of Stockfish some time ago, this can't be.

CCRL and CEGT could distinguish themselves on top of what they do already to start a rating list in SSDF style allowing opening books, learning. In this way lower rared engines can defend themselves better against the stronger ones. And it is exactly the way how humans compete since the dawn of chess. Why should we do things differently?

The existing CCRL lists are a lost cause in my view and don't present meaningful information anymore. With different testers using different hardware, different GUIs, different GUI adjudication settings, different books, different tablebases, sometimes different hash sizes, and different opponent selection methodologies, there is a big error margin before the first game is even played. Add that to the statistical error margins and the high draw rate with stronger engines due to balanced books, and increasing core count and time control, and the lists end up being pretty meaningless and not worth the effort. Just my 2 cents worth. And on top of that, your choice of ratings tool and parameters will give you different answers as well.

Graham Banks · Post by **Graham Banks** » Thu Apr 25, 2024 10:15 am

Modern Times wrote: ↑Thu Apr 25, 2024 10:12 am
Rebel wrote: ↑Thu Apr 25, 2024 9:03 am UHO openings favor the stronger engines with the better search and thus can handle UHO better. This is why CCRL and CEGT don't want to move and that is perfectly understandable. OTOH, in the meantime both lists IMO show an unrealistic top 15 regarding elo differences, especially the top 3-5. I have seen Dragon 3.3 on top of Stockfish some time ago, this can't be.

CCRL and CEGT could distinguish themselves on top of what they do already to start a rating list in SSDF style allowing opening books, learning. In this way lower rared engines can defend themselves better against the stronger ones. And it is exactly the way how humans compete since the dawn of chess. Why should we do things differently?
The existing CCRL lists are a lost cause in my view and don't present meaningful information anymore. With different testers using different hardware, different GUIs, different GUI adjudication settings, different books, different tablebases, sometimes different hash sizes, and different opponent selection methodologies, there is a big error margin before the first game is even played. Add that to the statistical error margins and the high draw rate with stronger engines due to balanced books, and increasing core count and time control, and the lists end up being pretty meaningless and not worth the effort. Just my 2 cents worth.

Everybody is entitled to an opinion.

CCRL is locked into set guidelines for testing, and the current testers are happy with that, which is the key.
For a different approach, you'd need to chuck everything out and start afresh.

It's only the elite engines which are very close in rating to each other and that is to be expected.
However, CCRL is more than just being about the elite engines.

ShashChess

Re: ShashChess

Re: ShashChess

Re: ShashChess

Re: ShashChess

Re: ShashChess

Re: ShashChess

Re: ShashChess

Re: ShashChess

Re: ShashChess

Re: ShashChess