Developer tests of Stockfish need Stockfish 8 instead of Stockfish 7

sovaz1997 · Post by **sovaz1997** » Tue Jun 12, 2018 2:04 pm

corres wrote: ↑Tue Jun 12, 2018 1:40 pm
AndrewGrant wrote: ↑Tue Jun 12, 2018 1:35 am
No comment...
Edgy closers don't replace statistics.
If you are interested in statistics please, make those statistics.
The data are public.
To me it is obvious that to loose 110 Elo during 44 games this is not a statistical issue.

Include mathematical statistics and disable feelings. I think the topic can be closed. Why should we read this nonsense?

And yes: SF 9 use in regression tests, non Sf8/Sf7 lol

corres · Post by **corres** » Tue Jun 12, 2018 2:55 pm

sovaz1997 wrote: ↑Tue Jun 12, 2018 2:04 pm
And yes: SF 9 use in regression tests, non Sf8/Sf7 lol

If the developers of Stockfish would use only regression tests it was a very wrong thing.
If you are so clever please, explain us from what arises the 110 Elo loss.

sovaz1997 · Post by **sovaz1997** » Tue Jun 12, 2018 3:30 pm

corres wrote: ↑Tue Jun 12, 2018 2:55 pm
sovaz1997 wrote: ↑Tue Jun 12, 2018 2:04 pm
And yes: SF 9 use in regression tests, non Sf8/Sf7 lol

If the developers of Stockfish would use only regression tests it was a very wrong thing.
If you are so clever please, explain us from what arises the 110 Elo loss.

A small number of games. Large error in measurements.
I advise you to stop thinking and start calculating.

I do not think you are smarter than Stockfish developers. Write them on Fishcooking.

corres · Post by **corres** » Tue Jun 12, 2018 4:19 pm

sovaz1997 wrote: ↑Tue Jun 12, 2018 3:30 pm
A small number of games. Large error in measurements.
I advise you to stop thinking and start calculating.

I do not think you are smarter than Stockfish developers. Write them on Fishcooking.

If you know any other than to provoke somebody document your knowledge with positive actions.
I made some statements before you so if you do not agree me instead of accusations deny my statement with calculated data.
I am very curious your works.

sovaz1997 · Post by **sovaz1997** » Tue Jun 12, 2018 4:48 pm

corres wrote: ↑Tue Jun 12, 2018 4:19 pm
sovaz1997 wrote: ↑Tue Jun 12, 2018 3:30 pm
A small number of games. Large error in measurements.
I advise you to stop thinking and start calculating.

I do not think you are smarter than Stockfish developers. Write them on Fishcooking.

If you know any other than to provoke somebody document your knowledge with positive actions.
I made some statements before you so if you do not agree me instead of accusations deny my statement with calculated data.
I am very curious your works.

Ok. See on error column. It's > 100 in plus and minus:

Code: Select all

ordo-win64.exe -a 3400 -A "Komodo 12" -W -s1000 games.pgn

Code: Select all

   # PLAYER              : RATING  ERROR   POINTS  PLAYED    (%)
   1 Komodo 12           : 3400.0   ----     29.5      46   64.1%
   2 Stockfish 160518    : 3394.4  103.7     29.5      46   64.1%
   3 Houdini 6.03        : 3376.1  104.3     27.5      45   61.1%
   4 Fire 7              : 3272.1  103.1     21.0      45   46.7%
   5 Andscacs 0.93070    : 3250.9  103.4     19.5      46   42.4%
   6 Ginkgo 2.014        : 3219.4  101.8     17.0      45   37.8%
   7 Jonny 8.1           : 3191.8  105.9     15.0      45   33.3%

corres · Post by **corres** » Tue Jun 12, 2018 5:55 pm

sovaz1997 wrote: ↑Tue Jun 12, 2018 4:48 pm
Ok. See on error column. It's > 100 in plus and minus:

Code: Select all

ordo-win64.exe -a 3400 -A "Komodo 12" -W -s1000 games.pgn

Code: Select all

   # PLAYER              : RATING  ERROR   POINTS  PLAYED    (%)
   1 Komodo 12           : 3400.0   ----     29.5      46   64.1%
   2 Stockfish 160518    : 3394.4  103.7     29.5      46   64.1%
   3 Houdini 6.03        : 3376.1  104.3     27.5      45   61.1%
   4 Fire 7              : 3272.1  103.1     21.0      45   46.7%
   5 Andscacs 0.93070    : 3250.9  103.4     19.5      46   42.4%
   6 Ginkgo 2.014        : 3219.4  101.8     17.0      45   37.8%
   7 Jonny 8.1           : 3191.8  105.9     15.0      45   33.3%

[/quote]

After 46 games Stockfish stands on -111 Elo.
This is over your calculations.
If you had run tests you ought to know from practice at the end of P division games the Elo loss of Stockfish will be the similar. 
Only the error will be decrease.
So?

sovaz1997 · Post by **sovaz1997** » Tue Jun 12, 2018 6:04 pm

If we spend a lot of games, then we will see real ratings. I'll remind you: Stockfish received such a rating because he played brilliantly in the finals. And I'm asking you: do not judge the strength of the game engine by TCEC.

The table shows that the ratings have an error of 100 points ELO. This means that you can not draw conclusions about the fact that the engine has become weaker or stronger. I could say for sure: he became stronger on a large number of tests with an error of up to 5 points of ELO.

corres · Post by **corres** » Tue Jun 12, 2018 6:44 pm

sovaz1997 wrote: ↑Tue Jun 12, 2018 6:04 pm If we spend a lot of games, then we will see real ratings. I'll remind you: Stockfish received such a rating because he played brilliantly in the finals. And I'm asking you: do not judge the strength of the game engine by TCEC.
The table shows that the ratings have an error of 100 points ELO. This means that you can not draw conclusions about the fact that the engine has become weaker or stronger. I could say for sure: he became stronger on a large number of tests with an error of up to 5 points of ELO.

Please, read my text to Mr. Blass.
I wrote about relative weakening.
To calculate absolute Elo number differences we need a lot of games, really.
But you do not want to determine my opinions.
Rather learn some patience and politeness.

sovaz1997 · Post by **sovaz1997** » Tue Jun 12, 2018 6:54 pm

corres wrote: ↑Tue Jun 12, 2018 6:44 pm
sovaz1997 wrote: ↑Tue Jun 12, 2018 6:04 pm If we spend a lot of games, then we will see real ratings. I'll remind you: Stockfish received such a rating because he played brilliantly in the finals. And I'm asking you: do not judge the strength of the game engine by TCEC.
The table shows that the ratings have an error of 100 points ELO. This means that you can not draw conclusions about the fact that the engine has become weaker or stronger. I could say for sure: he became stronger on a large number of tests with an error of up to 5 points of ELO.
Please, read my text to Mr. Blass.
I wrote about relative weakening.
To calculate absolute Elo number differences we need a lot of games, really.
But you do not want to determine my opinions.
Rather learn some patience and politeness.

I do not know English very well, to understand how polite I am writing.

Just on the TCEC ratings do not have to look: they are not accurate, they are always considered in different conditions. There are very few games. Just last time, Sf was a little more fortunate than now, so that's the difference.

corres · Post by **corres** » Tue Jun 12, 2018 7:19 pm

sovaz1997 wrote: ↑Tue Jun 12, 2018 6:54 pm
I do not know English very well, to understand how polite I am writing.

Patience and politeness do not depend on knowledge of English but one's attitude.

Developer tests of Stockfish need Stockfish 8 instead of Stockfish 7

Re: Developer tests of Stockfish need Stockfish 8 instead of Stockfish 7

Re: Developer tests of Stockfish need Stockfish 8 instead of Stockfish 7

Re: Developer tests of Stockfish need Stockfish 8 instead of Stockfish 7

Re: Developer tests of Stockfish need Stockfish 8 instead of Stockfish 7

Re: Developer tests of Stockfish need Stockfish 8 instead of Stockfish 7

Re: Developer tests of Stockfish need Stockfish 8 instead of Stockfish 7

Re: Developer tests of Stockfish need Stockfish 8 instead of Stockfish 7

Re: Developer tests of Stockfish need Stockfish 8 instead of Stockfish 7

Re: Developer tests of Stockfish need Stockfish 8 instead of Stockfish 7

Re: Developer tests of Stockfish need Stockfish 8 instead of Stockfish 7