Stockfish skill levels

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

chesskobra
Posts: 343
Joined: Thu Jul 21, 2022 12:30 am
Full name: Chesskobra

Stockfish skill levels

Post by chesskobra »

Has anybody seriously tested Stockfish skill levels? I have been running some tests against Crafty 25.6 Elo levels. For now my plan is to run games between 4 Elo settings 2000, 2100, 2200, 2300 in Crafty 25.6 (option elo=...) and 4 skills levels 6,7,8,9 in Stockfish. I am using Stockfish development version because recently there has been a re-calibration of skill levels https://github.com/official-stockfish/S ... it/a08b8d4. From my preliminary tests I think that the Crafty Elo range 2000-2300 and SF skill levels 6-9 are comparable. But what concerns me is the endgame play of SF levels is like a box of chocolates. For example, the engine has a pawn on the 7th rank, in KP vs K, and it just doesn't queen for many moves, but just moves the king randomly, often going away from the pawn. It finally queens only when the opposing king threatens to capture. Other examples are saying M9 on one move, and M15 after 1-2 moves, or not taking opposition, or not going to the corner in king+rook pawn against king, and so on. I notice many such examples where a 1600 rated player would play in a straightforward manner. I know skill levels is a secondary goal of the developers (of SF and many other engines). But the skill levels 7-9 are roughly mapped to 2500-2700 range at the above link, which makes no sense. It means to me that it is a good engine at these skill levels only in relation to other engines in terms of WDL score. These levels would crush most of us much before we manage to reach the endgame. But I would like to find an engine that plays decent chess at 2000-2200 range in all phases on the game. I suspect that Crafty or older HCE Stockfish may be doing better in endgame. But I would appreciate suggestions.
AndrewGrant
Posts: 1945
Joined: Tue Apr 19, 2016 6:08 am
Location: U.S.A
Full name: Andrew Grant

Re: Stockfish skill levels

Post by AndrewGrant »

This is not a very satisfying answer, but my impression is that skill levels in alpha beta engines are not that profound. They might indeed map to the requested elo ranges, but they certainly don't "play" like the elo range. This is because while there are a few mechanisms in play for reducing the strength of the engine, the primary one is to just randomly blunder.

I've always thought a better job could be done, although there is not a ton of interest. Maia is the gold standard for this, but that is the GPU land.
chesskobra
Posts: 343
Joined: Thu Jul 21, 2022 12:30 am
Full name: Chesskobra

Re: Stockfish skill levels

Post by chesskobra »

I will add maia to my test (although I don't have a GPU). Recently I have acquired maia2200 from somewhere (I don't remember where).
royb
Posts: 555
Joined: Thu Mar 09, 2006 12:53 am

Re: Stockfish skill levels

Post by royb »

I cannot recall where I found this, but here is the info I have stored on my laptop that may prove helpful for you:

# PLAYER : RATING ERROR POINTS PLAYED (%)
1 skill-19 : 3191.1 40.4 940.0 1707 55
2 skill-18 : 3170.3 39.3 1343.0 2519 53
3 skill-17 : 3141.3 37.8 2282.0 4422 52
4 skill-16 : 3111.2 37.1 2773.0 5423 51
5 skill-15 : 3069.5 37.2 2728.5 5386 51
6 skill-14 : 3024.8 36.1 2702.0 5339 51
7 skill-13 : 2972.9 35.4 2645.5 5263 50
8 skill-12 : 2923.1 35.0 2653.5 5165 51
9 skill-11 : 2855.5 33.6 2524.0 5081 50
10 skill-10 : 2788.3 32.0 2724.5 5511 49
12 skill-9 : 2702.8 30.5 2670.0 5018 53
13 skill-8 : 2596.2 28.5 2669.5 4975 54
15 skill-7 : 2499.5 28.5 1934.0 4178 46
18 skill-6 : 2363.2 26.4 2510.5 4379 57
20 skill-5 : 2203.7 25.3 2859.5 5422 53
25 skill-4 : 1922.9 25.9 2690.0 5399 50
28 skill-3 : 1742.3 27.8 1909.5 4439 43
29 skill-2 : 1608.4 29.4 2064.5 4389 47
31 skill-1 : 1467.6 31.3 2015.5 4244 47
33 skill-0 : 1320.1 32.9 651.5 2083 31

How accurate these numbers are is of course very open to debate.
chesskobra
Posts: 343
Joined: Thu Jul 21, 2022 12:30 am
Full name: Chesskobra

Re: Stockfish skill levels

Post by chesskobra »

This is the table posted at the link I gave.

It is possible that the blundering algorithm is quite bad. For example, I would define multi-pv as follows: if you want to find the second best pv at the root, with say white to move, you should find the regular min-max pv, then prune the last white move in the pv, and then find the min-max pv again. So it is possible that you still have to make the best move at the root to lead to the second best pv. This could be one issue (but I am sure most engine developers have thought about this). It is great if you can do this efficiently. But even if it is inefficient, it would be the best thing to do. Then we can decide if we want to make a move on the basis of the second best pv.

Another possible algorithm (which I am not sure if some engines use) is perturbing the evaluation of the end positions slightly before doing min-max. The flaw with this is that the small perturbation on end positions does not mean small perturbation at the root. There would be discontinuities.
Fritz 0
Posts: 149
Joined: Fri Mar 11, 2022 12:10 pm
Full name: Branislav Đošić

Re: Stockfish skill levels

Post by Fritz 0 »

chesskobra wrote: Mon Jun 17, 2024 12:34 am Has anybody seriously tested Stockfish skill levels? I have been running some tests against Crafty 25.6 Elo levels. For now my plan is to run games between 4 Elo settings 2000, 2100, 2200, 2300 in Crafty 25.6 (option elo=...) and 4 skills levels 6,7,8,9 in Stockfish. I am using Stockfish development version because recently there has been a re-calibration of skill levels https://github.com/official-stockfish/S ... it/a08b8d4. From my preliminary tests I think that the Crafty Elo range 2000-2300 and SF skill levels 6-9 are comparable. But what concerns me is the endgame play of SF levels is like a box of chocolates. For example, the engine has a pawn on the 7th rank, in KP vs K, and it just doesn't queen for many moves, but just moves the king randomly, often going away from the pawn. It finally queens only when the opposing king threatens to capture. Other examples are saying M9 on one move, and M15 after 1-2 moves, or not taking opposition, or not going to the corner in king+rook pawn against king, and so on. I notice many such examples where a 1600 rated player would play in a straightforward manner. I know skill levels is a secondary goal of the developers (of SF and many other engines). But the skill levels 7-9 are roughly mapped to 2500-2700 range at the above link, which makes no sense. It means to me that it is a good engine at these skill levels only in relation to other engines in terms of WDL score. These levels would crush most of us much before we manage to reach the endgame. But I would like to find an engine that plays decent chess at 2000-2200 range in all phases on the game. I suspect that Crafty or older HCE Stockfish may be doing better in endgame. But I would appreciate suggestions.
The eternal question - how to make an engine make "intelligent" mistakes. And the answer is, in my opinion, as little randomness as possible, and reducing the engine's strength almost exclusively by limiting its search depth or node count. It seems to me, for example, that Komodo 14 or Dragon 2 skill levels 20-22 play reasonable chess in the 2000-2200 Elo range you are looking for.
chesskobra
Posts: 343
Joined: Thu Jul 21, 2022 12:30 am
Full name: Chesskobra

Re: Stockfish skill levels

Post by chesskobra »

Thanks. I do have Komodo 14 and I am planning to test it also.