stockfish's and dragon2.6.1's evaluation bugs

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

Uri Blass
Posts: 11147
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: stockfish's and dragon2.6.1's evaluation bugs

Post by Uri Blass »

Cornfed wrote: Sat Feb 12, 2022 5:13 pm Are not you guys trying to use a tool (chess engine) designed to best play a game of chess, based these days largely on NNUE for what is really a non-chess playing situation with these 'odds games'?

I mean, strictly speaking evaluation may not be spot on...but 'who really cares' as the NNUE engine is not designed for those kinds of positions but to play a best possible game of normal chess. I'm just not sure it is fair shout "evaluation bug" all the time. Then again, I suppose one could use anything to sit on and call it a 'chair'...but that kind of does disservice to the good ol' traditional concept of a chair...

Just wondering, that's all...just woke up and my brain tends to the philosophical before it gets cluttered with normal day things.

*** lOL! As I was typing this, a similar comment just got posted. :)
It is not only about odd games.
It is basically about the fact that humans use engines for analysis and not only for engine-engine games.

You can claim that it is not a bug because it is the way the programmer wanted to do it but I think that this is the way that only programmers who are not strong chess players think.
syzygy
Posts: 5829
Joined: Tue Feb 28, 2012 11:56 pm

Re: stockfish's and dragon2.6.1's evaluation bugs

Post by syzygy »

Cornfed wrote: Sat Feb 12, 2022 5:13 pm I'm just not sure it is fair shout "evaluation bug" all the time. Then again, I suppose one could use anything to sit on and call it a 'chair'...but that kind of does disservice to the good ol' traditional concept of a chair...

Just wondering, that's all...just woke up and my brain tends to the philosophical before it gets cluttered with normal day things.
It trains our networks to recognize quickly which threads to ignore.
Cornfed
Posts: 511
Joined: Sun Apr 26, 2020 11:40 pm
Full name: Brian D. Smith

Re: stockfish's and dragon2.6.1's evaluation bugs

Post by Cornfed »

Uri Blass wrote: Sat Feb 12, 2022 10:58 pm
Cornfed wrote: Sat Feb 12, 2022 5:13 pm Are not you guys trying to use a tool (chess engine) designed to best play a game of chess, based these days largely on NNUE for what is really a non-chess playing situation with these 'odds games'?

I mean, strictly speaking evaluation may not be spot on...but 'who really cares' as the NNUE engine is not designed for those kinds of positions but to play a best possible game of normal chess. I'm just not sure it is fair shout "evaluation bug" all the time. Then again, I suppose one could use anything to sit on and call it a 'chair'...but that kind of does disservice to the good ol' traditional concept of a chair...

Just wondering, that's all...just woke up and my brain tends to the philosophical before it gets cluttered with normal day things.

*** lOL! As I was typing this, a similar comment just got posted. :)
It is not only about odd games.
It is basically about the fact that humans use engines for analysis and not only for engine-engine games.

You can claim that it is not a bug because it is the way the programmer wanted to do it but I think that this is the way that only programmers who are not strong chess players think.
I believe the situations presented above are all 'odds games'....not at all true to life in normal chess, which is why I reference them.
Examples I've seen elsewhere which could stem from normal chess is truly rare indeed.

Team SF: just don't throw the baby out with the proverbial bathwater and all will be good.
Uri Blass
Posts: 11147
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: stockfish's and dragon2.6.1's evaluation bugs

Post by Uri Blass »

Cornfed wrote: Sun Feb 13, 2022 12:20 am
Uri Blass wrote: Sat Feb 12, 2022 10:58 pm
Cornfed wrote: Sat Feb 12, 2022 5:13 pm Are not you guys trying to use a tool (chess engine) designed to best play a game of chess, based these days largely on NNUE for what is really a non-chess playing situation with these 'odds games'?

I mean, strictly speaking evaluation may not be spot on...but 'who really cares' as the NNUE engine is not designed for those kinds of positions but to play a best possible game of normal chess. I'm just not sure it is fair shout "evaluation bug" all the time. Then again, I suppose one could use anything to sit on and call it a 'chair'...but that kind of does disservice to the good ol' traditional concept of a chair...

Just wondering, that's all...just woke up and my brain tends to the philosophical before it gets cluttered with normal day things.

*** lOL! As I was typing this, a similar comment just got posted. :)
It is not only about odd games.
It is basically about the fact that humans use engines for analysis and not only for engine-engine games.

You can claim that it is not a bug because it is the way the programmer wanted to do it but I think that this is the way that only programmers who are not strong chess players think.
I believe the situations presented above are all 'odds games'....not at all true to life in normal chess, which is why I reference them.
Examples I've seen elsewhere which could stem from normal chess is truly rare indeed.

Team SF: just don't throw the baby out with the proverbial bathwater and all will be good.
The point is that it is possible not to get weaker in normal chess and improve in odd games.
Elo is not the only thing.

If a human player has a big material advantage and finally lose the game then the human want the engine to show him what he did wrong and what he did wrong is not only the losing mistake but the fact that he had an opportunity to force a fast win and did not do it.

Every move is winning so it is not important what you play may be ok for the engine but it is clearly a bad idea for the human.

You can say that a faster mate is also not always better for the human because it is better to simplify with no risks
but the average risk is smaller when you go for faster mate because there are less moves that you can blunder.
dkappe
Posts: 1632
Joined: Tue Aug 21, 2018 7:52 pm
Full name: Dietrich Kappe

Re: stockfish's and dragon2.6.1's evaluation bugs

Post by dkappe »

You can have a network that correctly evaluates arbitrary odds starting positions at very low ply, or you can have a network that is as strong as possible in standard chess at bullet and upwards. But you can’t have both.

Dragon has a very young net in reenforcement learning terms, so it’s possible that in a few hundred generations it might learn how to evaluate certain things more accurately. But I doubt that any of the NN engines (leela, sf, dragon, etc.) will be able to learn this type of stuff without losing strength.
Fat Titz by Stockfish, the engine with the bodaciously big net. Remember: size matters. If you want to learn more about this engine just google for "Fat Titz".
Uri Blass
Posts: 11147
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: stockfish's and dragon2.6.1's evaluation bugs

Post by Uri Blass »

dkappe wrote: Sun Feb 13, 2022 7:55 am You can have a network that correctly evaluates arbitrary odds starting positions at very low ply, or you can have a network that is as strong as possible in standard chess at bullet and upwards. But you can’t have both.

Dragon has a very young net in reenforcement learning terms, so it’s possible that in a few hundred generations it might learn how to evaluate certain things more accurately. But I doubt that any of the NN engines (leela, sf, dragon, etc.) will be able to learn this type of stuff without losing strength.
The problem with stockfish is clearly not the network but the classical evaluation because it use the classical evaluation when one side has a huge advantage and not the network.

I also do not believe that any type of NN will not be able to learn this type of stuff without losing strength.
I believe that if you train the network for a scoring system that does not give 1,0.5,0 scores but
1 minus number of moves divided by 5000 in case of winning
so mate in 100 moves is 0.98 points for the winner and mate in 200 moves is 0.96 points for the winner then the result is going to be a stronger chess engine and I wonder if somebody tried to do it or everybody use scoring of 1 for winning 0.5 for draw and 0 for loss in their testing.
dkappe
Posts: 1632
Joined: Tue Aug 21, 2018 7:52 pm
Full name: Dietrich Kappe

Re: stockfish's and dragon2.6.1's evaluation bugs

Post by dkappe »

Uri Blass wrote: Sun Feb 13, 2022 8:11 am I also do not believe that any type of NN will not be able to learn this type of stuff without losing strength.
Believe it.
I believe that if you train the network for a scoring system that does not give 1,0.5,0 scores but
1 minus number of moves divided by 5000 in case of winning
so mate in 100 moves is 0.98 points for the winner and mate in 200 moves is 0.96 points for the winner then the result is going to be a stronger chess engine and I wonder if somebody tried to do it or everybody use scoring of 1 for winning 0.5 for draw and 0 for loss in their testing.
Neural networks are often trained using a sigmoid function to go from some arbitrarily large range like centipawns to a range from [0,1]. https://en.wikipedia.org/wiki/Sigmoid_function

So the eval produced by an HCE or NNUE search run through a sigmoid already gives you a continuous function.
Fat Titz by Stockfish, the engine with the bodaciously big net. Remember: size matters. If you want to learn more about this engine just google for "Fat Titz".
Sopel
Posts: 392
Joined: Tue Oct 08, 2019 11:39 pm
Full name: Tomasz Sobczyk

Re: stockfish's and dragon2.6.1's evaluation bugs

Post by Sopel »

Still waiting for a proof that e4 is better
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.

Maybe you copied your stockfish commits from someone else too?
I will look into that.
Vernon Crawford
Posts: 73
Joined: Wed Sep 01, 2021 2:05 am
Location: London, England
Full name: Vernon Crawford

Re: stockfish's and dragon2.6.1's evaluation bugs

Post by Vernon Crawford »

Sopel wrote: Sun Feb 13, 2022 8:56 pm Still waiting for a proof that e4 is better
I have database of engine games includes almost 15 million games since 2007
(Playchess, Infinity, CCRL, CEGT, SPCC, FastGM, etc.)

It has been very carefully maintained
no duplicates
no quick draws
no games less than 30 moves

has
1. e3 (38721 games) @ 49.0 win%
and
1. e4 (749226 games) @ 56.0 win%

that's good enough for me
Guenther
Posts: 4718
Joined: Wed Oct 01, 2008 6:33 am
Location: Regensburg, Germany
Full name: Guenther Simon

Re: stockfish's and dragon2.6.1's evaluation bugs

Post by Guenther »

Vernon Crawford wrote: Sun Feb 13, 2022 9:53 pm
Sopel wrote: Sun Feb 13, 2022 8:56 pm Still waiting for a proof that e4 is better
I have database of engine games includes almost 15 million games since 2007
(Playchess, Infinity, CCRL, CEGT, SPCC, FastGM, etc.)

It has been very carefully maintained
no duplicates
no quick draws
no games less than 30 moves

has
1. e3 (38721 games) @ 49.0 win%
and
1. e4 (749226 games) @ 56.0 win%

that's good enough for me
[fen]1nb1kbn1/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQ - 0 1 [/fen]

So how much games with the start position above does your 15 M games database contain?

What about first reading a thread before replying into the blue? sigh...
https://rwbc-chess.de

[Trolls n'existent pas...]