Fritz GUI ignores uci_limitstrength and uci_elo on purpose?

mvanthoor · Post by **mvanthoor** » Mon Jun 08, 2020 10:26 pm

Bill Forster wrote: ↑Sun Jun 07, 2020 1:45 am I remember doing some experiments a while ago (I am away from my office and can't check) and I also found Rybka didn't really implement the Elo limiting convincingly. I think there's something in the Tarrasch FAQ about it. I was surprised and happy to beat it so easily at 1200 . To be fair, simulating amateur chess convincingly is not a trivial problem I suppose. I seem to recall Shredder is praised for doing a good job in this area?

I remain pleased with my suggested solution. It's not guaranteed to work with every engine, but there's a very good chance. And it's a lot easier than setting up for recompilation etc.

Yes, Shredder is praised for having a good implementation of UCI_LimitStrength and UCI_Elo, but I don't have the program. Even if it is very good, I can't use the options in the Fritz GUI, and the Shredder GUI is less reliable (but workable) with the DGT board than the Fritz GUI.

Your solution is a good one for commercial programs, but I don't see the need to hack an exe if I have and can compile the source code.

Today I have tested with Arasan 22.0 by Jon Dart. I've changed UCI_LimitStrength and UCI_Elo to eng_LimitStrength and eng_Elo in the source, recompiled, and tested in Fritz 17 and 11. It works. This proves that the Fritz GUI filters and hides the UCI_* options on purpose, because I can't believe that this is an oversight or bug that has been in there for 13+ years.

At Elo 1000, Arasan is *really* easy to defeat, but it doesn't 'give' you pieces outright like some engines do (such as making a losing capture on purpose). It does things such as 'missing' tactical stuff such as not seeing that it can lose a rook through a skewer.

My test has been very short; just a few games, but preliminary results feel as if Arasan is n engine with a good LimitStrength implementation.

mvanthoor · Post by **mvanthoor** » Tue Jun 09, 2020 12:23 am

Even though the implementation of LimitStrength is very well done in Arasan, it would need a slight adjustment. Arasan should "waste" more time, so the player has the feeling that the engine is actually thinking. Now it doesn't use any time; it just moves instantly. An idle loop with no other purpose to use up time would fix this. This could even be a uci option called "slow play" or something like that. In Fritz 11, it's just called "Use Clock" in the Handicap and Fun menu, and then the engine takes the clock time into account; both for itself, as well as for the player. In one of the other modes (I don't know on top of my head which one), there is an option "Slow play" which can be used when the engine is playing at a weaker level.

If this could be added to Arasan, it would make a great engine to play against. (Stockfish does this already, if I recall correctly; it plays slowly as if 'thinking", even on lower skill levels.)

MikeB · Post by **MikeB** » Mon Jun 15, 2020 5:30 am

mvanthoor wrote: ↑Tue Jun 09, 2020 12:23 am Even though the implementation of LimitStrength is very well done in Arasan, it would need a slight adjustment. Arasan should "waste" more time, so the player has the feeling that the engine is actually thinking. Now it doesn't use any time; it just moves instantly. An idle loop with no other purpose to use up time would fix this. This could even be a uci option called "slow play" or something like that. In Fritz 11, it's just called "Use Clock" in the Handicap and Fun menu, and then the engine takes the clock time into account; both for itself, as well as for the player. In one of the other modes (I don't know on top of my head which one), there is an option "Slow play" which can be used when the engine is playing at a weaker level.

If this could be added to Arasan, it would make a great engine to play against. (Stockfish does this already, if I recall correctly; it plays slowly as if 'thinking", even on lower skill levels.)

That's true because Stockfish does not reduce thinking time to reduce Elo - reduction of Elo is all done through randomization and was done quite well. My only criticism is that they anchored their testing to the CCRL ratings - which is fine at higher levels, but CCRL does not paired to human ratings below 2000, except at the very weakest levels . For example, you will need to be about a 2000 rated player to hold steady with the 1700 Elo rating on Stockfish.

Read more about the Stockfish Elo testing methodology here:
https://github.com/official-stockfish/S ... /pull/2225

mvanthoor · Post by **mvanthoor** » Mon Jun 15, 2020 9:25 pm

MikeB wrote: ↑Mon Jun 15, 2020 5:30 am That's true because Stockfish does not reduce thinking time to reduce Elo - reduction of Elo is all done through randomization and was done quite well. My only criticism is that they anchored their testing to the CCRL ratings - which is fine at higher levels, but CCRL does not paired to human ratings below 2000, except at the very weakest levels . For example, you will need to be about a 2000 rated player to hold steady with the 1700 Elo rating on Stockfish.

Read more about the Stockfish Elo testing methodology here:
https://github.com/official-stockfish/S ... /pull/2225

That explains it. If my rating from over 20 years ago when I was a teenager who never studied chess beyond general principles still means anything, then it would be around 1850 ELO. I ran a Skill level tournament for Stockfish a year ago. I tried to run it according to CCRL specs.

http://www.talkchess.com/forum3/viewtopic.php?t=71288

The result:

Code: Select all

Rank Name                    Elo    +    - games score oppo. draws 
   1 Stockfish 10 x64 1T20  3495  102  102   400   99%  2804    2% 
   2 Stockfish 10 x64 1T19  2855   32   32   400   45%  2965   26% 
   3 Stockfish 10 x64 1T18  2825   32   32   400   41%  2972   27% 
   4 Stockfish 10 x64 1T17  2793   32   32   400   36%  2980   27% 
   5 Stockfish 10 x64 1T16  2745   23   23   800   50%  2788   24% 
   6 Stockfish 10 x64 1T15  2677   30   30   400   60%  2602   26% 
   7 Stockfish 10 x64 1T14  2652   30   30   400   56%  2608   26% 
   8 Stockfish 10 x64 1T13  2552   31   31   400   39%  2633   21% 
   9 Stockfish 10 x64 1T12  2458   24   24   800   50%  2444   14% 
  10 Stockfish 10 x64 1T11  2370   33   33   400   65%  2253   14% 
  11 Stockfish 10 x64 1T10  2269   32   32   400   49%  2279   12% 
  12 Stockfish 10 x64 1T09  2181   33   33   400   36%  2301   13% 
  13 Stockfish 10 x64 1T08  2105   25   25   800   49%  2108    8% 
  14 Stockfish 10 x64 1T07  2050   35   35   400   66%  1910    7% 
  15 Stockfish 10 x64 1T06  1950   34   34   400   51%  1935    5% 
  16 Stockfish 10 x64 1T05  1862   34   34   400   39%  1957    4% 
  17 Stockfish 10 x64 1T04  1722   29   29   800   52%  1683    3% 
  18 Stockfish 10 x64 1T03  1589   37   37   400   68%  1408    2% 
  19 Stockfish 10 x64 1T02  1427   36   36   400   48%  1449    2% 
  20 Stockfish 10 x64 1T01  1302   38   38   400   32%  1480    1% 
  21 Stockfish 10 x64 1T00  1181   42   42   400   19%  1510    0%

My score against level 00 up to and including level 04 is OK, but lower than expected. My score against level 05 is around 30%, which is a lot less than expected. This means that either my rating has dropped to far below 1850, or Stockfish is stronger than 1850. Stockfish is scoring 70% against me, so according to:

https://images.chesscomfiles.com/upload ... uqs48.jpeg

... the rating should be about 150 higher than mine.

So...
Either I kept my 1850 rating and Stockfish is playing at around FIDE 2000 at level 05.
Or, Stockfish is playing at Fide 1862 as per the tournament, and my rating dropped to around 1700.

Both seem possible.

Either way, I don't really mind. I'm looking into several engines that play 'slowly' when put at a lower level, such as Stockfish and Texel. I don't really care if they call their setting "elo" or "skill" or "strength"; the only thing I'm caring about is that, over time, I can play against higher skill levels, as I'm now actually studying chess.

I'm doing both the Dutch "Steps Method", and the Yusupov books, which should go up to 2000-2100 and +/- 2200-2300 respectively. I completely expect to be able to finish the Steps method and reach about 2100 ELO in due time. Somewhere after that, it's probably time to crash and burn... i.e. get stuck in one of the higher Yusupov books, as I don't have the time (nor motivation, probably) that further improvement would likely require.

Let's see where this ends

MikeB · Post by **MikeB** » Sun Jun 28, 2020 6:16 am

mvanthoor wrote: ↑Mon Jun 15, 2020 9:25 pm
MikeB wrote: ↑Mon Jun 15, 2020 5:30 am That's true because Stockfish does not reduce thinking time to reduce Elo - reduction of Elo is all done through randomization and was done quite well. My only criticism is that they anchored their testing to the CCRL ratings - which is fine at higher levels, but CCRL does not paired to human ratings below 2000, except at the very weakest levels . For example, you will need to be about a 2000 rated player to hold steady with the 1700 Elo rating on Stockfish.

Read more about the Stockfish Elo testing methodology here:
https://github.com/official-stockfish/S ... /pull/2225
That explains it. If my rating from over 20 years ago when I was a teenager who never studied chess beyond general principles still means anything, then it would be around 1850 ELO. I ran a Skill level tournament for Stockfish a year ago. I tried to run it according to CCRL specs.

http://www.talkchess.com/forum3/viewtopic.php?t=71288

The result:
Code: Select all
Rank Name                    Elo    +    - games score oppo. draws 
   1 Stockfish 10 x64 1T20  3495  102  102   400   99%  2804    2% 
   2 Stockfish 10 x64 1T19  2855   32   32   400   45%  2965   26% 
   3 Stockfish 10 x64 1T18  2825   32   32   400   41%  2972   27% 
   4 Stockfish 10 x64 1T17  2793   32   32   400   36%  2980   27% 
   5 Stockfish 10 x64 1T16  2745   23   23   800   50%  2788   24% 
   6 Stockfish 10 x64 1T15  2677   30   30   400   60%  2602   26% 
   7 Stockfish 10 x64 1T14  2652   30   30   400   56%  2608   26% 
   8 Stockfish 10 x64 1T13  2552   31   31   400   39%  2633   21% 
   9 Stockfish 10 x64 1T12  2458   24   24   800   50%  2444   14% 
  10 Stockfish 10 x64 1T11  2370   33   33   400   65%  2253   14% 
  11 Stockfish 10 x64 1T10  2269   32   32   400   49%  2279   12% 
  12 Stockfish 10 x64 1T09  2181   33   33   400   36%  2301   13% 
  13 Stockfish 10 x64 1T08  2105   25   25   800   49%  2108    8% 
  14 Stockfish 10 x64 1T07  2050   35   35   400   66%  1910    7% 
  15 Stockfish 10 x64 1T06  1950   34   34   400   51%  1935    5% 
  16 Stockfish 10 x64 1T05  1862   34   34   400   39%  1957    4% 
  17 Stockfish 10 x64 1T04  1722   29   29   800   52%  1683    3% 
  18 Stockfish 10 x64 1T03  1589   37   37   400   68%  1408    2% 
  19 Stockfish 10 x64 1T02  1427   36   36   400   48%  1449    2% 
  20 Stockfish 10 x64 1T01  1302   38   38   400   32%  1480    1% 
  21 Stockfish 10 x64 1T00  1181   42   42   400   19%  1510    0%
  
My score against level 00 up to and including level 04 is OK, but lower than expected. My score against level 05 is around 30%, which is a lot less than expected. This means that either my rating has dropped to far below 1850, or Stockfish is stronger than 1850. Stockfish is scoring 70% against me, so according to:

https://images.chesscomfiles.com/upload ... uqs48.jpeg

... the rating should be about 150 higher than mine.

So...
Either I kept my 1850 rating and Stockfish is playing at around FIDE 2000 at level 05.
Or, Stockfish is playing at Fide 1862 as per the tournament, and my rating dropped to around 1700.

Both seem possible.

Either way, I don't really mind. I'm looking into several engines that play 'slowly' when put at a lower level, such as Stockfish and Texel. I don't really care if they call their setting "elo" or "skill" or "strength"; the only thing I'm caring about is that, over time, I can play against higher skill levels, as I'm now actually studying chess.

I'm doing both the Dutch "Steps Method", and the Yusupov books, which should go up to 2000-2100 and +/- 2200-2300 respectively. I completely expect to be able to finish the Steps method and reach about 2100 ELO in due time. Somewhere after that, it's probably time to crash and burn... i.e. get stuck in one of the higher Yusupov books, as I don't have the time (nor motivation, probably) that further improvement would likely require.

Let's see where this ends

Stockfish ratings at all levels were anchored to CCRL. CCRL ratings to do not correspond FIDE ratings at lower levels (at the higher levels, they have better correlation - say around 2800 or so),
As an example, a rough formula that I have come up with to convert - say if you're 1850 FIDE:

FIDE ELO*10/7 -1200
(1850*10/7)-1200 == 1442 Play SF at this level

Using FIDE 2800:
(2800*10/7)- 1200 ==2800

I actually use this formula in Honey. It wa derived from comments Kai that has made here. Kai is the authoritative Elo guru here, imho.

As an FYI, once you get below 1850, the ratings get really dicey to control since SF is very strong at 50 nps as compared to weak humans. SF ELo is pretty good for CCRL comparisons and all of their ELo play is done through randomization of moves. It was all done on their fishcooking platform playing hundreds of thousands of games.

Fritz GUI ignores uci_limitstrength and uci_elo on purpose?

Re: Fritz GUI ignores uci_limitstrength and uci_elo on purpose?

Re: Fritz GUI ignores uci_limitstrength and uci_elo on purpose?

Re: Fritz GUI ignores uci_limitstrength and uci_elo on purpose?

Re: Fritz GUI ignores uci_limitstrength and uci_elo on purpose?

Re: Fritz GUI ignores uci_limitstrength and uci_elo on purpose?