stockfish's and dragon2.6.1's evaluation bugs

Uri Blass · Post by **Uri Blass** » Sat Feb 12, 2022 10:58 pm

Cornfed wrote: ↑Sat Feb 12, 2022 5:13 pm Are not you guys trying to use a tool (chess engine) designed to best play a game of chess, based these days largely on NNUE for what is really a non-chess playing situation with these 'odds games'?

I mean, strictly speaking evaluation may not be spot on...but 'who really cares' as the NNUE engine is not designed for those kinds of positions but to play a best possible game of normal chess. I'm just not sure it is fair shout "evaluation bug" all the time. Then again, I suppose one could use anything to sit on and call it a 'chair'...but that kind of does disservice to the good ol' traditional concept of a chair...

Just wondering, that's all...just woke up and my brain tends to the philosophical before it gets cluttered with normal day things.

*** lOL! As I was typing this, a similar comment just got posted.

It is not only about odd games.
It is basically about the fact that humans use engines for analysis and not only for engine-engine games.

You can claim that it is not a bug because it is the way the programmer wanted to do it but I think that this is the way that only programmers who are not strong chess players think.

syzygy · Post by **syzygy** » Sat Feb 12, 2022 11:10 pm

Cornfed wrote: ↑Sat Feb 12, 2022 5:13 pm I'm just not sure it is fair shout "evaluation bug" all the time. Then again, I suppose one could use anything to sit on and call it a 'chair'...but that kind of does disservice to the good ol' traditional concept of a chair...

Just wondering, that's all...just woke up and my brain tends to the philosophical before it gets cluttered with normal day things.

It trains our networks to recognize quickly which threads to ignore.

Cornfed · Post by **Cornfed** » Sun Feb 13, 2022 12:20 am

Uri Blass wrote: ↑Sat Feb 12, 2022 10:58 pm
Cornfed wrote: ↑Sat Feb 12, 2022 5:13 pm Are not you guys trying to use a tool (chess engine) designed to best play a game of chess, based these days largely on NNUE for what is really a non-chess playing situation with these 'odds games'?

I mean, strictly speaking evaluation may not be spot on...but 'who really cares' as the NNUE engine is not designed for those kinds of positions but to play a best possible game of normal chess. I'm just not sure it is fair shout "evaluation bug" all the time. Then again, I suppose one could use anything to sit on and call it a 'chair'...but that kind of does disservice to the good ol' traditional concept of a chair...

Just wondering, that's all...just woke up and my brain tends to the philosophical before it gets cluttered with normal day things.

*** lOL! As I was typing this, a similar comment just got posted.
It is not only about odd games.
It is basically about the fact that humans use engines for analysis and not only for engine-engine games.

You can claim that it is not a bug because it is the way the programmer wanted to do it but I think that this is the way that only programmers who are not strong chess players think.

I believe the situations presented above are all 'odds games'....not at all true to life in normal chess, which is why I reference them.
Examples I've seen elsewhere which could stem from normal chess is truly rare indeed.

Team SF: just don't throw the baby out with the proverbial bathwater and all will be good.

Uri Blass · Post by **Uri Blass** » Sun Feb 13, 2022 7:23 am

Cornfed wrote: ↑Sun Feb 13, 2022 12:20 am
Uri Blass wrote: ↑Sat Feb 12, 2022 10:58 pm
Cornfed wrote: ↑Sat Feb 12, 2022 5:13 pm Are not you guys trying to use a tool (chess engine) designed to best play a game of chess, based these days largely on NNUE for what is really a non-chess playing situation with these 'odds games'?

I mean, strictly speaking evaluation may not be spot on...but 'who really cares' as the NNUE engine is not designed for those kinds of positions but to play a best possible game of normal chess. I'm just not sure it is fair shout "evaluation bug" all the time. Then again, I suppose one could use anything to sit on and call it a 'chair'...but that kind of does disservice to the good ol' traditional concept of a chair...

Just wondering, that's all...just woke up and my brain tends to the philosophical before it gets cluttered with normal day things.

*** lOL! As I was typing this, a similar comment just got posted.
It is not only about odd games.
It is basically about the fact that humans use engines for analysis and not only for engine-engine games.

You can claim that it is not a bug because it is the way the programmer wanted to do it but I think that this is the way that only programmers who are not strong chess players think.
I believe the situations presented above are all 'odds games'....not at all true to life in normal chess, which is why I reference them.
Examples I've seen elsewhere which could stem from normal chess is truly rare indeed.

Team SF: just don't throw the baby out with the proverbial bathwater and all will be good.

The point is that it is possible not to get weaker in normal chess and improve in odd games.
Elo is not the only thing.

If a human player has a big material advantage and finally lose the game then the human want the engine to show him what he did wrong and what he did wrong is not only the losing mistake but the fact that he had an opportunity to force a fast win and did not do it.

Every move is winning so it is not important what you play may be ok for the engine but it is clearly a bad idea for the human.

You can say that a faster mate is also not always better for the human because it is better to simplify with no risks
but the average risk is smaller when you go for faster mate because there are less moves that you can blunder.

dkappe · Post by **dkappe** » Sun Feb 13, 2022 7:55 am

You can have a network that correctly evaluates arbitrary odds starting positions at very low ply, or you can have a network that is as strong as possible in standard chess at bullet and upwards. But you can’t have both.

Dragon has a very young net in reenforcement learning terms, so it’s possible that in a few hundred generations it might learn how to evaluate certain things more accurately. But I doubt that any of the NN engines (leela, sf, dragon, etc.) will be able to learn this type of stuff without losing strength.

Uri Blass · Post by **Uri Blass** » Sun Feb 13, 2022 8:11 am

dkappe wrote: ↑Sun Feb 13, 2022 7:55 am You can have a network that correctly evaluates arbitrary odds starting positions at very low ply, or you can have a network that is as strong as possible in standard chess at bullet and upwards. But you can’t have both.

Dragon has a very young net in reenforcement learning terms, so it’s possible that in a few hundred generations it might learn how to evaluate certain things more accurately. But I doubt that any of the NN engines (leela, sf, dragon, etc.) will be able to learn this type of stuff without losing strength.

The problem with stockfish is clearly not the network but the classical evaluation because it use the classical evaluation when one side has a huge advantage and not the network.

I also do not believe that any type of NN will not be able to learn this type of stuff without losing strength.
I believe that if you train the network for a scoring system that does not give 1,0.5,0 scores but
1 minus number of moves divided by 5000 in case of winning
so mate in 100 moves is 0.98 points for the winner and mate in 200 moves is 0.96 points for the winner then the result is going to be a stronger chess engine and I wonder if somebody tried to do it or everybody use scoring of 1 for winning 0.5 for draw and 0 for loss in their testing.

dkappe · Post by **dkappe** » Sun Feb 13, 2022 8:46 am

Uri Blass wrote: ↑Sun Feb 13, 2022 8:11 am I also do not believe that any type of NN will not be able to learn this type of stuff without losing strength.

Believe it.

I believe that if you train the network for a scoring system that does not give 1,0.5,0 scores but
1 minus number of moves divided by 5000 in case of winning
so mate in 100 moves is 0.98 points for the winner and mate in 200 moves is 0.96 points for the winner then the result is going to be a stronger chess engine and I wonder if somebody tried to do it or everybody use scoring of 1 for winning 0.5 for draw and 0 for loss in their testing.

Neural networks are often trained using a sigmoid function to go from some arbitrarily large range like centipawns to a range from [0,1]. https://en.wikipedia.org/wiki/Sigmoid_function

So the eval produced by an HCE or NNUE search run through a sigmoid already gives you a continuous function.

Sopel · Post by **Sopel** » Sun Feb 13, 2022 8:56 pm

Still waiting for a proof that e4 is better

Vernon Crawford · Post by **Vernon Crawford** » Sun Feb 13, 2022 9:53 pm

Sopel wrote: ↑Sun Feb 13, 2022 8:56 pm Still waiting for a proof that e4 is better

I have database of engine games includes almost 15 million games since 2007
(Playchess, Infinity, CCRL, CEGT, SPCC, FastGM, etc.)

It has been very carefully maintained
no duplicates
no quick draws
no games less than 30 moves

has
1. e3 (38721 games) @ 49.0 win%
and
1. e4 (749226 games) @ 56.0 win%

that's good enough for me

Guenther · Post by **Guenther** » Sun Feb 13, 2022 10:14 pm

Vernon Crawford wrote: ↑Sun Feb 13, 2022 9:53 pm
Sopel wrote: ↑Sun Feb 13, 2022 8:56 pm Still waiting for a proof that e4 is better
I have database of engine games includes almost 15 million games since 2007
(Playchess, Infinity, CCRL, CEGT, SPCC, FastGM, etc.)

It has been very carefully maintained
no duplicates
no quick draws
no games less than 30 moves

has
1. e3 (38721 games) @ 49.0 win%
and
1. e4 (749226 games) @ 56.0 win%

that's good enough for me

[fen]1nb1kbn1/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQ - 0 1 [/fen]

So how much games with the start position above does your 15 M games database contain?

What about first reading a thread before replying into the blue? sigh...

stockfish's and dragon2.6.1's evaluation bugs

Re: stockfish's and dragon2.6.1's evaluation bugs

Re: stockfish's and dragon2.6.1's evaluation bugs

Re: stockfish's and dragon2.6.1's evaluation bugs

Re: stockfish's and dragon2.6.1's evaluation bugs

Re: stockfish's and dragon2.6.1's evaluation bugs

Re: stockfish's and dragon2.6.1's evaluation bugs

Re: stockfish's and dragon2.6.1's evaluation bugs

Re: stockfish's and dragon2.6.1's evaluation bugs

Re: stockfish's and dragon2.6.1's evaluation bugs

Re: stockfish's and dragon2.6.1's evaluation bugs