Stockfish and Tactics

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

Peter Berger
Posts: 751
Joined: Thu Mar 09, 2006 2:56 pm

Re: Stockfish and Tactics

Post by Peter Berger »

Ciekce wrote: Sun Dec 03, 2023 1:27 pm computer chess does not work by *getting* an advantage, but by *not losing it* - advantages come from bad moves, not good ones
..

I can with 100% certainty guarantee that neither of these engines' authors care about their move average

..
Very interesting statements. The latter isn’t true, take e.g. Chris Whittington and Ed Schroeder. They don’t care about move average, but about something called „aggressiveness“ measured by some EAS tool. This is just as arbitrary as move average and feels arbitrary to me, some other author may just as well try to lower his/her move average.
The first statement is more interesting to me. It is clear, that you can’t test a thing if every other game ends as a draw, you need decisive results to measure progress in a reasonable time.
You can do this in two ways:
1. You make the opening positions more unbalanced to get more decisive results.
2. You start with even opening positions but play against weaker opponents.
Number 1 is the popular choice right now. As you say you do this yourself, you will be well aware of the pros and cons.
Number 2 would be a serious alternative IMHO. I think that an engine tested this way should be clearly better at „getting an advantage“.
The most seriously tested engine (and the strongest one) is Stockfish. It sucks at beating weaker opponents. It would lose a lot of rating being tested against weaker opponents. But don’t get me wrong – it is clearly number 1 in test scenario 1.
The question is: what do humans want from a chess engine other than an objective analysis result?
I think it is „creating an advantage“. Correspondence players would love an engine that does well in this aspect, even if it were like 50 ELO points weaker in other aspects of the game.
So, Frank is Frank, it is easily to get annoyed with him, but I feel that his dissatification with some recent developments with top level engines has a simple reason, that I tried to make a little clearer.
Peter
Frank Quisinsky
Posts: 7053
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: Stockfish and Tactics

Post by Frank Quisinsky »

Hi Peter,

First of all, thank you!

The move average is important for A00-E99 statistics from FEOBOS lines (FEOBOS is a long time opening project in different stages). I need a lot of different engines in playing style for games running on a different system here. It seems that out of the 41,614 positions (10 engines will find out end of 2016) 2,650 positions are great for a balanced test set with a really small draw-quote. The problem is that most of the 2,650 positions are only from 270/500 ECO codes. 230 ECO codes will produce balanced positions with many draws, the Eng-Eng constellation is not important.

I need the game results for different statistics. If I used engines that produced a move average 30 moves higher for draws than "normal", different of my stats are not working. After a break of one year and all the strong engines available (many are new to me) I am currently optimating a new test field for engines I can use.

As a final result I will use 20 engines from the next tournament I will start (called FCP-Tourney-2024-MP ... MP for move-average). All the tactically stronger engines (for the starting phase of a game) are in it.

It's a pity that most of the "newer" engines don't have the playing style I need/want. For a strong test set with balanced positions, I need any engine with skills in aggressiveness for the first phase of the game. Most of the newer engines produce very rarely a fast won game, produce a very high move average for draws. I lose a lot of time using such engines.

This is the main topic I am working on since about 2015.
With neural-network all my work before 2020 and all the stats I made are more or less outdated.
This is very annoying.

Example: I can produce a test set with 500 balanced positions, current versions of Stockfish and Dragon produced a draw quote of only 78%. But if I use exactly the same 500 positions for an Arasan - Minic match, the draw rate is 88%. So I need more good eng test setups and have to find out which engines can help me a bit. So I will not publish the results until I am sure that others have not wasted time testing my test set.

To train an engine it is not only the neural network file!
Since the beginning chess programmers train engines with games.
They look for errors and try to fix them.
To test a network file with UHU position again and again shows only one important effect.

John Stanback has been writing to me for days ... that he is thinking about changes in his training method.

The style produced by Uralochka is really great when I check the 2650 positions I have collected from FEOBOS.
Against. Wasp 4.50 is the average of eval for move 30 (from the 2,650 positions) + 1.28 ... this is a new record!

Corrected: Not 1,48 ... 1,18!
Second place is 1.01 for an example.
1,48 is after move 55 ... speaking from Wasp 4.50 without neural-network.
But the 1,48 comes not from Uralochka.

No need to be angry with me. I am the first where give younger persons the room for new ideas (in my professional jobs). The future belongs to the younger generation. And it seems over all the years computer chess I am today very outdated.

Yes, I have the opinion you wrote before!
And I know what programmer do for reduce the move-average. I have contacts to one of them and we discuess often about this and that. He made programmings over 40 years and to listen him is an event ... all the time.
And yes, in the last years I have often contacts to correspondce chess players. For many of the players are the middlegame and a more aggressive playing style very interesting. That is the problem with Uralochka ... the first phase in the midgame is highest niveau ... engine is in my opinion on place 1 in the World. The second phase of the midgame ... is around place 15-20 in the World. But exactly the first phase of the midgame is most important.

To 100% clear ... for all fans of openings ...
Uralochka 3.40a is the engine version of the year for me!
Very hard to top Revenge, Slow, the great Velvet or Wasp here ... Uralochka tops all of the attackers!

Best
Frank
Frank Quisinsky
Posts: 7053
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: Stockfish and Tactics

Post by Frank Quisinsky »

Ah, I got an email with questions about my message:
For all the other readers...

The idea is simple and old!

You have for example: 2.000 good starting positions!
Now you are interested in creating a test set with the best 500 of the 2.000 positions.

- Balanced lines for an example ... what I do
- Lines are good for quick wins for an example ... what I do
- Lines that do not produce a high draw quote for an example ... what I do

You can do the following:
Let an engine play the lines you have against several others.
Now look at eval for move 20 / 25 / 30 / 35

Create an Excel and set the eval of the 2.000 moves to move 20 / 25 / 30 / 35 ...

It is important to use the same test-engine and eval from the test-engine for Excel.
I use Wasp 4.50 (very strong in king safety, can also produce fast wins in HCE times).
For many years I have used Spark 1.0 for such things.

Now
Wasp 4.50 vs. Uralochka 4.40a
Wasp 4.50 vs. Dragon 3.3
Wasp 4.50 vs. Revenge 3.0
and so on ...

All results in Excel and look at eval for move 25 / 30 / 35 ...
So you can quickly find out the best lines for a test set if you use aggressive engines for the beginning of the games.

No great magic!
Often the best lines aren't give the wish results for an other group of engines, like: Minic, Seer, Caissa, Clover ... very strong engines but not the attacker. The secret is good mix of positions, for each of the available engines. But normaly, after all the work I do ... the best results you can get with aggresssive engines. End of the day the draw-quote is good with a test-set created by aggressive engines.

I mixed a bit the final results:
Minic, Seer, Caissa and Clover are very strong in king saftey but not played often the most aggressive move. For good reason, to open the positions with aggressive pawn moves can be bad for an endgame. So the test-set must have a good mix later ... the main problem I think. So I am looking a bit deeper in the test-games, in special what Seer, Caissa, Clover do here. And here different stats are not working if in clear draw positions different engines like to play 50 moves for nothing. So I must sorted out a bit in the group of engines.

Second question:
Why the MP tournament.
There are several reasons, I am very sure that I can produce much more than the 2.560 positions I have with the idea of reducing the opening book to 6 moves. There are very strong engines in there and maybe they can give me new and interesting lines. It's quite clear to me that for a test set option the strongest available engines don't help in every case. Choosing engines has nothing to do with what I like or dislike. It is very important that all the different engines produce different styles, not all must be strong in attacking chess after the openings. But for a test set with balanced positions you need all possible engine power (aggressive engines) for exactly the first phase of the game.

Best
Frank

I hope this explains a bit what I do ...
At the end of the day ... the dream I have ... I can build the best test set for each engine constellation from a basic database with two or three clicks, if I know the strengths of the engines exactly. But this is pure horror in NN times. As for the final results ... it is more fun to watch the games if I know before that the test positions are very strong.
Frank Quisinsky
Posts: 7053
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: Stockfish and Tactics

Post by Frank Quisinsky »

Peter Berger wrote about Rebel / CSTal ...
I don't have much contact with the programmers (better no private contacts).
But they do exactly what I like, also different others.
To be honest, I am not a big fan of the first Rebel NN versions, not of Chris Whittington and his old CSTal.
But with the current versions of the engines ... both are really maestros!
So I changed my mind.
They do a great job, the engines produce a great style, a fantastic move average and are 100% good for the things I like to do.

All the engines I need are on startup for FCP-Tourney-2024 MP.

It's the same with Klaus Wlotzka with FEOBOS.
Engines created FEOBOS ... we give only the basic to do that.

I did not create the test set ... the engines do that in teamwork.
I am just the slave of the engines and have to find the right mix for all of them.

An example:
With the FCP Tourney-2024, which is still running, I got maybe 5% new material for my work.
With the next tournament, I am sure I will get 30 or 40% new material.

Enough, to write about it in English is not easy for me and too complicated.
carldaman
Posts: 2287
Joined: Sat Jun 02, 2012 2:13 am

Re: Stockfish and Tactics

Post by carldaman »

Frank Quisinsky wrote: Mon Dec 04, 2023 4:51 pm Hi Peter,

First of all, thank you!

The move average is important for A00-E99 statistics from FEOBOS lines (FEOBOS is a long time opening project in different stages). I need a lot of different engines in playing style for games running on a different system here. It seems that out of the 41,614 positions (10 engines will find out end of 2016) 2,650 positions are great for a balanced test set with a really small draw-quote. The problem is that most of the 2,650 positions are only from 270/500 ECO codes. 230 ECO codes will produce balanced positions with many draws, the Eng-Eng constellation is not important.

I need the game results for different statistics. If I used engines that produced a move average 30 moves higher for draws than "normal", different of my stats are not working. After a break of one year and all the strong engines available (many are new to me) I am currently optimating a new test field for engines I can use.

As a final result I will use 20 engines from the next tournament I will start (called FCP-Tourney-2024-MP ... MP for move-average). All the tactically stronger engines (for the starting phase of a game) are in it.

It's a pity that most of the "newer" engines don't have the playing style I need/want. For a strong test set with balanced positions, I need any engine with skills in aggressiveness for the first phase of the game. Most of the newer engines produce very rarely a fast won game, produce a very high move average for draws. I lose a lot of time using such engines.

This is the main topic I am working on since about 2015.
With neural-network all my work before 2020 and all the stats I made are more or less outdated.
This is very annoying.

Example: I can produce a test set with 500 balanced positions, current versions of Stockfish and Dragon produced a draw quote of only 78%. But if I use exactly the same 500 positions for an Arasan - Minic match, the draw rate is 88%. So I need more good eng test setups and have to find out which engines can help me a bit. So I will not publish the results until I am sure that others have not wasted time testing my test set.

To train an engine it is not only the neural network file!
Since the beginning chess programmers train engines with games.
They look for errors and try to fix them.
To test a network file with UHU position again and again shows only one important effect.

John Stanback has been writing to me for days ... that he is thinking about changes in his training method.

The style produced by Uralochka is really great when I check the 2650 positions I have collected from FEOBOS.
Against. Wasp 4.50 is the average of eval for move 30 (from the 2,650 positions) + 1.28 ... this is a new record!

Corrected: Not 1,48 ... 1,18!
Second place is 1.01 for an example.
1,48 is after move 55 ... speaking from Wasp 4.50 without neural-network.
But the 1,48 comes not from Uralochka.

No need to be angry with me. I am the first where give younger persons the room for new ideas (in my professional jobs). The future belongs to the younger generation. And it seems over all the years computer chess I am today very outdated.

Yes, I have the opinion you wrote before!
And I know what programmer do for reduce the move-average. I have contacts to one of them and we discuess often about this and that. He made programmings over 40 years and to listen him is an event ... all the time.
And yes, in the last years I have often contacts to correspondce chess players. For many of the players are the middlegame and a more aggressive playing style very interesting. That is the problem with Uralochka ... the first phase in the midgame is highest niveau ... engine is in my opinion on place 1 in the World. The second phase of the midgame ... is around place 15-20 in the World. But exactly the first phase of the midgame is most important.

To 100% clear ... for all fans of openings ...
Uralochka 3.40a is the engine version of the year for me!
Very hard to top Revenge, Slow, the great Velvet or Wasp here ... Uralochka tops all of the attackers!

Best
Frank
I happen to agree - the newest Uralochka plays with a ruthless attacking style, creating winning chances with both White and Black! :shock:
It can terrorize weaker opponents. :)
Frank Quisinsky
Posts: 7053
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: Stockfish and Tactics

Post by Frank Quisinsky »

Hi Carldaman,

yes, and now ...

Search this on in *-sto Shredder file ...
All match constellation from TOP-41 are inside ...
That'*s the most important point the still-running tournament helps me a bit for my test-set!!

Caissa 1.14.1 NN - Dragon 3.3 NN (Komodo)=============
RubiChess 20230918 NN - CSTal 2.00 NN=============
Fire 9.2 NN - SlowChess Blitz 2.9 NN=============

All games are draw ...
And many other constellations with such constellations are inside.
That is most interesting for me!!!

Two ways ...
Use the best of test-set results / positions from Uralochka ... means after highest score after 20, 25, 30, 35 ... in games vs. Wasp.
And play the matches again with the positions you find out.

Uralochka seems to be the best helper for all the others.
Never all the match-results are draws ...

If I play Caissa vs. Dragon ... all 2.560 positions I have I can produce a match with 60-70% draws (if optimated for this match constellation) but with the final result that other match constellation are again ... all games are draw. But it seems if best attackers create the test-set the results for all the match constellation are great. Not 60-70% as in a special tuning ... but 75-85%.

And a test-set is born with the helps from the most aggressive engines we have.
And all this without Stockfish. Stockfish produced good results is in TOP-10 but is not the absolutely number 1 here. Not for the first gaming phase in mid-games.

Uralochka is an allrounder to all A, B, C, D, E ECO-Codes, OK, with B-Codes two other engines are stronger.

Best
Frank
Frank Quisinsky
Posts: 7053
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: Stockfish and Tactics

Post by Frank Quisinsky »

The only problem with such a work I do is ...

If you saw that diffrerent engines produced the same stats again and again you lost a bit your interest on it.
But it's easy to select out this group of engines for that what I like to do.
Hard but fact!!

Against vs. all the draws we can do a bit and it seems that the high draw quote is to 30% homemade!
What I like to say it ... uhu is a way ... the brute-force way ... but the wish constellation that "normal balance books or test-set constellation" produced lower draws is possible, also for engines are over 3400 Elo (ratings from my still-running tourney).

The bad site of this way is ...
We lose around 200 ECO-codes for eng-eng matches with highest niveau!

:-(

Best
Frank
Ciekce
Posts: 197
Joined: Sun Oct 30, 2022 5:26 pm
Full name: Conor Anstey

Re: Stockfish and Tactics

Post by Ciekce »

Peter Berger wrote: Mon Dec 04, 2023 1:53 pm The latter isn’t true, take e.g. Chris Whittington and Ed Schroeder.
I know both engines' authors - that statement is true, and was not made as a generalisation of engine devs as a whole.
Peter Berger wrote: Mon Dec 04, 2023 1:53 pm The first statement is more interesting to me. It is clear, that you can’t test a thing if every other game ends as a draw, you need decisive results to measure progress in a reasonable time.
You can do this in two ways:
1. You make the opening positions more unbalanced to get more decisive results.
2. You start with even opening positions but play against weaker opponents.
Number 1 is the popular choice right now. As you say you do this yourself, you will be well aware of the pros and cons.
Number 2 would be a serious alternative IMHO. I think that an engine tested this way should be clearly better at „getting an advantage“.
The most seriously tested engine (and the strongest one) is Stockfish. It sucks at beating weaker opponents. It would lose a lot of rating being tested against weaker opponents. But don’t get me wrong – it is clearly number 1 in test scenario 1.
The question is: what do humans want from a chess engine other than an objective analysis result?
I think it is „creating an advantage“. Correspondence players would love an engine that does well in this aspect, even if it were like 50 ELO points weaker in other aspects of the game.
So, Frank is Frank, it is easily to get annoyed with him, but I feel that his dissatification with some recent developments with top level engines has a simple reason, that I tried to make a little clearer.
Peter
as I said, it is not possible in chess to *create* an advantage where none exists

in order to win a game you have to be put in a position where an advantage is possible, which requires either properly biased openings (played with reverses) or like you said weaker opponents - and testing against weaker opponents has not been shown to work. I can't even think off the top of my head of a neat way to make it work at all.

Also, I'd like a source for the claim that SF sucks at beating weaker opponents, or that it would be lower rated if it were tested as such. Tested where? On rating lists? Stockfish has been in my most recent two releases' CCRL 40/15 gauntlets, and SP is 200+ elo weaker.
Peter Berger
Posts: 751
Joined: Thu Mar 09, 2006 2:56 pm

Re: Stockfish and Tactics

Post by Peter Berger »

Ciekce wrote: Tue Dec 05, 2023 1:08 pm Also, I'd like a source for the claim that SF sucks at beating weaker opponents, or that it would be lower rated if it were tested as such. Tested where? On rating lists? Stockfish has been in my most recent two releases' CCRL 40/15 gauntlets, and SP is 200+ elo weaker.
By coincidence I have the perfect answer for your question ready. As you can see from the version number, this is a very recent game - Stockfish managed to beat Crafty by 1.5-0.5 on my computers. ( And I have several of these kind of draws by Stockfish on my computers).

Btw, it took me months to get even a single draw with Crafty against Rebel 16.2.

You say it yourself: you have to be put in a position where an advantage is possible. I''d say the classical opening position is one of the more interesting ones to choose.

But you also pointed out a weak point in my post. It is not quite clear if you could use tests against weaker opponents to improve your engine without hurting it in other ways - this one I don't know.



[Event "Lang 120min+10sek"]
[Site "Berlin"]
[Date "2023.12.05"]
[Round "?"]
[White "Crafty 25.6"]
[Black "Stockfish dev-20231204-7a8bcfc"]
[Result "1/2-1/2"]
[ECO "C67"]
[PlyCount "102"]
[TimeControl "7200+10"]

{4096MB, LAPTOP-NCDN8BTK} 1. e4 {[%emt 0:00:00]} e5 {[%eval 21,41] [%emt 0:04:
35]} 2. Nf3 {[%emt 0:00:08]} Nc6 {[%eval 18,40] [%emt 0:01:35]} 3. Bb5 {
[%emt 0:00:06]} Nf6 {[%eval 15,42] [%emt 0:02:27]} 4. O-O {[%emt 0:00:07]} Nxe4
{[%eval 15,44] [%emt 0:02:26]} 5. d4 {[%emt 0:00:06] (Te1)} Nd6 {[%eval 19,43]
[%emt 0:01:53]} 6. Bxc6 {[%emt 0:00:06]} dxc6 {[%eval 14,40] [%emt 0:01:53]} 7.
dxe5 {[%emt 0:00:06]} Nf5 {[%eval 9,40] [%emt 0:02:29]} 8. Qxd8+ {[%emt 0:00:
06]} Kxd8 {[%eval 12,30] [%emt 0:00:00]} 9. h3 {[%emt 0:00:06] (Sc3)} Be7 {
[%eval 8,41] [%emt 0:02:44]} 10. Nc3 {[%emt 0:03:22]} Nh4 {[%eval 8,43] [%emt
0:08:11]} 11. Rd1+ {[%emt 0:08:35] (Sxh4)} Ke8 {[%eval 7,46] [%emt 0:02:05]}
12. Nxh4 {[%emt 0:00:06]} Bxh4 {[%eval 4,47] [%emt 0:05:04]} 13. g4 {[%emt 0:
00:07]} h5 {[%eval 3,46] [%emt 0:01:43]} 14. f3 {[%emt 0:02:46]} f6 {[%eval 2,
45] [%emt 0:00:01]} 15. exf6 {[%emt 0:03:11]} gxf6 {[%eval 2,42] [%emt 0:00:04]
} 16. Kg2 {[%emt 0:07:23] (Td3)} Kf7 {[%eval 0,54] [%emt 0:01:26]} 17. Bf4 {
[%emt 0:01:58] (Ld2)} f5 {[%eval 0,60] [%emt 0:01:58]} 18. Bxc7 {[%emt 0:00:41]
} hxg4 {[%eval 0,56] [%emt 0:01:25]} 19. hxg4 {[%emt 0:00:46]} fxg4 {[%eval 0,
55] [%emt 0:01:15]} 20. Rd4 {[%emt 0:00:55]} gxf3+ {[%eval 0,61] [%emt 0:01:24]
} 21. Kxf3 {[%emt 0:00:45]} Bf5 {[%eval 0,57] [%emt 0:01:17]} 22. Rf4 {[%emt 0:
02:49]} Ke6 {[%eval 0,59] [%emt 0:00:01]} 23. Ne4 {[%emt 0:03:10] (Th1)} Raf8 {
[%eval 0,54] [%emt 0:02:50]} 24. Nc5+ {[%emt 0:02:16]} Kd5 {[%eval 0,57] [%emt
0:00:01]} 25. Rd1+ {[%emt 0:02:27] (Sxb7)} Kxc5 {[%eval 0,68] [%emt 0:01:53]}
26. a4 {[%emt 0:00:35]} Be4+ {[%eval 0,64] [%emt 0:01:34]} 27. Kxe4 {[%emt 0:
01:56]} Rxf4+ {[%eval 0,59] [%emt 0:00:17]} 28. Bxf4 {[%emt 0:01:29]} Re8+ {
[%eval 0,60] [%emt 0:00:25]} 29. Kf5 {[%emt 0:13:53]} Bf2 {[%eval 0,70] [%emt
0:00:01]} 30. Rd7 {[%emt 0:04:15]} b5 {[%eval 0,70] [%emt 0:00:01]} 31. axb5 {
[%emt 0:01:23]} Kxb5 {[%eval 0,72] [%emt 0:00:35]} 32. b3 {[%emt 0:00:34]} Bb6
{[%eval 0,67] [%emt 0:01:26]} 33. Bd2 {[%emt 0:01:24]} a5 {[%eval 0,68] [%emt
0:00:34]} 34. c4+ {[%emt 0:01:12]} Ka6 {[%eval 0,72] [%emt 0:00:46]} 35. Rd6 {
[%emt 0:02:46]} Rf8+ {[%eval 0,68] [%emt 0:00:01]} 36. Ke4 {[%emt 0:01:05]
(Kg4)} Kb7 {[%eval 0,74] [%emt 0:02:03]} 37. Rd7+ {[%emt 0:03:14] (Th6)} Kc8 {
[%eval 0,79] [%emt 0:02:25]} 38. Rg7 {[%emt 0:01:50] (Td3)} Rf2 {[%eval 0,77]
[%emt 0:01:56]} 39. Bc3 {[%emt 0:00:07]} Re2+ {[%eval 0,76] [%emt 0:02:05]} 40.
Kd3 {[%emt 0:00:06]} Re3+ {[%eval 0,73] [%emt 0:02:53]} 41. Kd2 {[%emt 0:00:19]
} Rf3 {[%eval 0,75] [%emt 0:01:37]} 42. Rg2 {[%emt 0:00:12] (Kc2)} Rf1 {
[%eval 0,74] [%emt 0:01:53]} 43. Kc2 {[%emt 0:00:54]} Rf4 {[%eval 0,74] [%emt
0:00:49]} 44. Bd2 {[%emt 0:00:53] (Tg8+)} Rf3 {[%eval 0,69] [%emt 0:01:45]} 45.
Rh2 {[%emt 0:00:06] (Tg8+)} Kb7 {[%eval 0,71] [%emt 0:01:55]} 46. Be1 {[%emt 0:
01:29] (Lc3)} Ka6 {[%eval 0,81] [%emt 0:01:39]} 47. Rg2 {[%emt 0:00:57] (Lc3)}
a4 {[%eval 0,66] [%emt 0:01:38]} 48. bxa4 {[%emt 0:01:00]} Ba5 {[%eval 0,77]
[%emt 0:02:13]} 49. Rg1 {[%emt 0:00:08] (Ld2)} Bxe1 {[%eval 0,73] [%emt 0:01:
33]} 50. Rxe1 {[%emt 0:00:07]} c5 {[%eval 0,76] [%emt 0:01:52]} 51. Rb1 {
[%emt 0:00:46] (Td1)} Rh3 {[%eval 0,79] [%emt 0:03:42]} 1/2-1/2
Guenther
Posts: 4718
Joined: Wed Oct 01, 2008 6:33 am
Location: Regensburg, Germany
Full name: Guenther Simon

Re: Stockfish and Tactics

Post by Guenther »

Peter Berger wrote: Tue Dec 05, 2023 1:57 pm
Ciekce wrote: Tue Dec 05, 2023 1:08 pm Also, I'd like a source for the claim that SF sucks at beating weaker opponents, or that it would be lower rated if it were tested as such. Tested where? On rating lists? Stockfish has been in my most recent two releases' CCRL 40/15 gauntlets, and SP is 200+ elo weaker.
By coincidence I have the perfect answer for your question ready. As you can see from the version number, this is a very recent game - Stockfish managed to beat Crafty by 1.5-0.5 on my computers. ( And I have several of these kind of draws by Stockfish on my computers).

Btw, it took me months to get even a single draw with Crafty against Rebel 16.2.

You say it yourself: you have to be put in a position where an advantage is possible. I''d say the classical opening position is one of the more interesting ones to choose.

But you also pointed out a weak point in my post. It is not quite clear if you could use tests against weaker opponents to improve your engine without hurting it in other ways - this one I don't know.



[Event "Lang 120min+10sek"]
[Site "Berlin"]
[Date "2023.12.05"]
[Round "?"]
[White "Crafty 25.6"]
[Black "Stockfish dev-20231204-7a8bcfc"]
[Result "1/2-1/2"]
[ECO "C67"]
[PlyCount "102"]
[TimeControl "7200+10"]

{4096MB, LAPTOP-NCDN8BTK} 1. e4 {[%emt 0:00:00]} e5 {[%eval 21,41] [%emt 0:04:
35]} 2. Nf3 {[%emt 0:00:08]} Nc6 {[%eval 18,40] [%emt 0:01:35]} 3. Bb5 {
[%emt 0:00:06]} Nf6 {[%eval 15,42] [%emt 0:02:27]} 4. O-O {[%emt 0:00:07]} Nxe4
{[%eval 15,44] [%emt 0:02:26]} 5. d4 {[%emt 0:00:06] (Te1)} Nd6 {[%eval 19,43]
[%emt 0:01:53]} 6. Bxc6 {[%emt 0:00:06]} dxc6 {[%eval 14,40] [%emt 0:01:53]} 7.
dxe5 {[%emt 0:00:06]} Nf5 {[%eval 9,40] [%emt 0:02:29]} 8. Qxd8+ {[%emt 0:00:
06]} Kxd8 {[%eval 12,30] [%emt 0:00:00]} 9. h3 {[%emt 0:00:06] (Sc3)} Be7 {
[%eval 8,41] [%emt 0:02:44]} 10. Nc3 {[%emt 0:03:22]} Nh4 {[%eval 8,43] [%emt
0:08:11]} 11. Rd1+ {[%emt 0:08:35] (Sxh4)} Ke8 {[%eval 7,46] [%emt 0:02:05]}
12. Nxh4 {[%emt 0:00:06]} Bxh4 {[%eval 4,47] [%emt 0:05:04]} 13. g4 {[%emt 0:
00:07]} h5 {[%eval 3,46] [%emt 0:01:43]} 14. f3 {[%emt 0:02:46]} f6 {[%eval 2,
45] [%emt 0:00:01]} 15. exf6 {[%emt 0:03:11]} gxf6 {[%eval 2,42] [%emt 0:00:04]
} 16. Kg2 {[%emt 0:07:23] (Td3)} Kf7 {[%eval 0,54] [%emt 0:01:26]} 17. Bf4 {
[%emt 0:01:58] (Ld2)} f5 {[%eval 0,60] [%emt 0:01:58]} 18. Bxc7 {[%emt 0:00:41]
} hxg4 {[%eval 0,56] [%emt 0:01:25]} 19. hxg4 {[%emt 0:00:46]} fxg4 {[%eval 0,
55] [%emt 0:01:15]} 20. Rd4 {[%emt 0:00:55]} gxf3+ {[%eval 0,61] [%emt 0:01:24]
} 21. Kxf3 {[%emt 0:00:45]} Bf5 {[%eval 0,57] [%emt 0:01:17]} 22. Rf4 {[%emt 0:
02:49]} Ke6 {[%eval 0,59] [%emt 0:00:01]} 23. Ne4 {[%emt 0:03:10] (Th1)} Raf8 {
[%eval 0,54] [%emt 0:02:50]} 24. Nc5+ {[%emt 0:02:16]} Kd5 {[%eval 0,57] [%emt
0:00:01]} 25. Rd1+ {[%emt 0:02:27] (Sxb7)} Kxc5 {[%eval 0,68] [%emt 0:01:53]}
26. a4 {[%emt 0:00:35]} Be4+ {[%eval 0,64] [%emt 0:01:34]} 27. Kxe4 {[%emt 0:
01:56]} Rxf4+ {[%eval 0,59] [%emt 0:00:17]} 28. Bxf4 {[%emt 0:01:29]} Re8+ {
[%eval 0,60] [%emt 0:00:25]} 29. Kf5 {[%emt 0:13:53]} Bf2 {[%eval 0,70] [%emt
0:00:01]} 30. Rd7 {[%emt 0:04:15]} b5 {[%eval 0,70] [%emt 0:00:01]} 31. axb5 {
[%emt 0:01:23]} Kxb5 {[%eval 0,72] [%emt 0:00:35]} 32. b3 {[%emt 0:00:34]} Bb6
{[%eval 0,67] [%emt 0:01:26]} 33. Bd2 {[%emt 0:01:24]} a5 {[%eval 0,68] [%emt
0:00:34]} 34. c4+ {[%emt 0:01:12]} Ka6 {[%eval 0,72] [%emt 0:00:46]} 35. Rd6 {
[%emt 0:02:46]} Rf8+ {[%eval 0,68] [%emt 0:00:01]} 36. Ke4 {[%emt 0:01:05]
(Kg4)} Kb7 {[%eval 0,74] [%emt 0:02:03]} 37. Rd7+ {[%emt 0:03:14] (Th6)} Kc8 {
[%eval 0,79] [%emt 0:02:25]} 38. Rg7 {[%emt 0:01:50] (Td3)} Rf2 {[%eval 0,77]
[%emt 0:01:56]} 39. Bc3 {[%emt 0:00:07]} Re2+ {[%eval 0,76] [%emt 0:02:05]} 40.
Kd3 {[%emt 0:00:06]} Re3+ {[%eval 0,73] [%emt 0:02:53]} 41. Kd2 {[%emt 0:00:19]
} Rf3 {[%eval 0,75] [%emt 0:01:37]} 42. Rg2 {[%emt 0:00:12] (Kc2)} Rf1 {
[%eval 0,74] [%emt 0:01:53]} 43. Kc2 {[%emt 0:00:54]} Rf4 {[%eval 0,74] [%emt
0:00:49]} 44. Bd2 {[%emt 0:00:53] (Tg8+)} Rf3 {[%eval 0,69] [%emt 0:01:45]} 45.
Rh2 {[%emt 0:00:06] (Tg8+)} Kb7 {[%eval 0,71] [%emt 0:01:55]} 46. Be1 {[%emt 0:
01:29] (Lc3)} Ka6 {[%eval 0,81] [%emt 0:01:39]} 47. Rg2 {[%emt 0:00:57] (Lc3)}
a4 {[%eval 0,66] [%emt 0:01:38]} 48. bxa4 {[%emt 0:01:00]} Ba5 {[%eval 0,77]
[%emt 0:02:13]} 49. Rg1 {[%emt 0:00:08] (Ld2)} Bxe1 {[%eval 0,73] [%emt 0:01:
33]} 50. Rxe1 {[%emt 0:00:07]} c5 {[%eval 0,76] [%emt 0:01:52]} 51. Rb1 {
[%emt 0:00:46] (Td1)} Rh3 {[%eval 0,79] [%emt 0:03:42]} 1/2-1/2
Two games on your computer? Are you serious?
And in those games Crafty used a given learn file too (probably also fed with months of learning), as it really started calculating in move 10 in one of the most boring drawish openings available...
(while SF already had burned over 20 minutes) How much games BTW did you get in 'months' with that tc vs. 'Rebel' ;)

I think there is not much too say anymore about your understanding of statistics and how to conduct proper comparisons/engine matches.
https://rwbc-chess.de

[Trolls n'existent pas...]