Developer tests of Stockfish need Stockfish 8 instead of Stockfish 7

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

sovaz1997
Posts: 261
Joined: Sun Nov 13, 2016 10:37 am

Re: Developer tests of Stockfish need Stockfish 8 instead of Stockfish 7

Post by sovaz1997 »

corres wrote: Tue Jun 12, 2018 1:40 pm
AndrewGrant wrote: Tue Jun 12, 2018 1:35 am
No comment...
Edgy closers don't replace statistics.
If you are interested in statistics please, make those statistics.
The data are public.
To me it is obvious that to loose 110 Elo during 44 games this is not a statistical issue.
Include mathematical statistics and disable feelings. I think the topic can be closed. Why should we read this nonsense?

And yes: SF 9 use in regression tests, non Sf8/Sf7 lol
Zevra 2 is my chess engine. Binary, source and description here: https://github.com/sovaz1997/Zevra2
Zevra v2.5 is last version of Zevra: https://github.com/sovaz1997/Zevra2/releases
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: Developer tests of Stockfish need Stockfish 8 instead of Stockfish 7

Post by corres »

sovaz1997 wrote: Tue Jun 12, 2018 2:04 pm
And yes: SF 9 use in regression tests, non Sf8/Sf7 lol
If the developers of Stockfish would use only regression tests it was a very wrong thing.
If you are so clever please, explain us from what arises the 110 Elo loss.
sovaz1997
Posts: 261
Joined: Sun Nov 13, 2016 10:37 am

Re: Developer tests of Stockfish need Stockfish 8 instead of Stockfish 7

Post by sovaz1997 »

corres wrote: Tue Jun 12, 2018 2:55 pm
sovaz1997 wrote: Tue Jun 12, 2018 2:04 pm
And yes: SF 9 use in regression tests, non Sf8/Sf7 lol
If the developers of Stockfish would use only regression tests it was a very wrong thing.
If you are so clever please, explain us from what arises the 110 Elo loss.
A small number of games. Large error in measurements.
I advise you to stop thinking and start calculating.

I do not think you are smarter than Stockfish developers. Write them on Fishcooking.
Zevra 2 is my chess engine. Binary, source and description here: https://github.com/sovaz1997/Zevra2
Zevra v2.5 is last version of Zevra: https://github.com/sovaz1997/Zevra2/releases
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: Developer tests of Stockfish need Stockfish 8 instead of Stockfish 7

Post by corres »

sovaz1997 wrote: Tue Jun 12, 2018 3:30 pm
A small number of games. Large error in measurements.
I advise you to stop thinking and start calculating.

I do not think you are smarter than Stockfish developers. Write them on Fishcooking.
If you know any other than to provoke somebody document your knowledge with positive actions.
I made some statements before you so if you do not agree me instead of accusations deny my statement with calculated data.
I am very curious your works.
sovaz1997
Posts: 261
Joined: Sun Nov 13, 2016 10:37 am

Re: Developer tests of Stockfish need Stockfish 8 instead of Stockfish 7

Post by sovaz1997 »

corres wrote: Tue Jun 12, 2018 4:19 pm
sovaz1997 wrote: Tue Jun 12, 2018 3:30 pm
A small number of games. Large error in measurements.
I advise you to stop thinking and start calculating.

I do not think you are smarter than Stockfish developers. Write them on Fishcooking.
If you know any other than to provoke somebody document your knowledge with positive actions.
I made some statements before you so if you do not agree me instead of accusations deny my statement with calculated data.
I am very curious your works.
Ok. See on error column. It's > 100 in plus and minus:

Code: Select all

ordo-win64.exe -a 3400 -A "Komodo 12" -W -s1000 games.pgn

Code: Select all

   # PLAYER              : RATING  ERROR   POINTS  PLAYED    (%)
   1 Komodo 12           : 3400.0   ----     29.5      46   64.1%
   2 Stockfish 160518    : 3394.4  103.7     29.5      46   64.1%
   3 Houdini 6.03        : 3376.1  104.3     27.5      45   61.1%
   4 Fire 7              : 3272.1  103.1     21.0      45   46.7%
   5 Andscacs 0.93070    : 3250.9  103.4     19.5      46   42.4%
   6 Ginkgo 2.014        : 3219.4  101.8     17.0      45   37.8%
   7 Jonny 8.1           : 3191.8  105.9     15.0      45   33.3%
Zevra 2 is my chess engine. Binary, source and description here: https://github.com/sovaz1997/Zevra2
Zevra v2.5 is last version of Zevra: https://github.com/sovaz1997/Zevra2/releases
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: Developer tests of Stockfish need Stockfish 8 instead of Stockfish 7

Post by corres »

sovaz1997 wrote: Tue Jun 12, 2018 4:48 pm
Ok. See on error column. It's > 100 in plus and minus:

Code: Select all

ordo-win64.exe -a 3400 -A "Komodo 12" -W -s1000 games.pgn

Code: Select all

   # PLAYER              : RATING  ERROR   POINTS  PLAYED    (%)
   1 Komodo 12           : 3400.0   ----     29.5      46   64.1%
   2 Stockfish 160518    : 3394.4  103.7     29.5      46   64.1%
   3 Houdini 6.03        : 3376.1  104.3     27.5      45   61.1%
   4 Fire 7              : 3272.1  103.1     21.0      45   46.7%
   5 Andscacs 0.93070    : 3250.9  103.4     19.5      46   42.4%
   6 Ginkgo 2.014        : 3219.4  101.8     17.0      45   37.8%
   7 Jonny 8.1           : 3191.8  105.9     15.0      45   33.3%

[/quote]

After 46 games Stockfish stands on -111 Elo.
This is over your calculations.
If you had run tests you ought to know from practice at the end of P division games the Elo loss of Stockfish will be the similar. 
Only the error will be decrease.
So?
sovaz1997
Posts: 261
Joined: Sun Nov 13, 2016 10:37 am

Re: Developer tests of Stockfish need Stockfish 8 instead of Stockfish 7

Post by sovaz1997 »

If we spend a lot of games, then we will see real ratings. I'll remind you: Stockfish received such a rating because he played brilliantly in the finals. And I'm asking you: do not judge the strength of the game engine by TCEC.

The table shows that the ratings have an error of 100 points ELO. This means that you can not draw conclusions about the fact that the engine has become weaker or stronger. I could say for sure: he became stronger on a large number of tests with an error of up to 5 points of ELO.
Zevra 2 is my chess engine. Binary, source and description here: https://github.com/sovaz1997/Zevra2
Zevra v2.5 is last version of Zevra: https://github.com/sovaz1997/Zevra2/releases
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: Developer tests of Stockfish need Stockfish 8 instead of Stockfish 7

Post by corres »

sovaz1997 wrote: Tue Jun 12, 2018 6:04 pm If we spend a lot of games, then we will see real ratings. I'll remind you: Stockfish received such a rating because he played brilliantly in the finals. And I'm asking you: do not judge the strength of the game engine by TCEC.
The table shows that the ratings have an error of 100 points ELO. This means that you can not draw conclusions about the fact that the engine has become weaker or stronger. I could say for sure: he became stronger on a large number of tests with an error of up to 5 points of ELO.
Please, read my text to Mr. Blass.
I wrote about relative weakening.
To calculate absolute Elo number differences we need a lot of games, really.
But you do not want to determine my opinions.
Rather learn some patience and politeness.
sovaz1997
Posts: 261
Joined: Sun Nov 13, 2016 10:37 am

Re: Developer tests of Stockfish need Stockfish 8 instead of Stockfish 7

Post by sovaz1997 »

corres wrote: Tue Jun 12, 2018 6:44 pm
sovaz1997 wrote: Tue Jun 12, 2018 6:04 pm If we spend a lot of games, then we will see real ratings. I'll remind you: Stockfish received such a rating because he played brilliantly in the finals. And I'm asking you: do not judge the strength of the game engine by TCEC.
The table shows that the ratings have an error of 100 points ELO. This means that you can not draw conclusions about the fact that the engine has become weaker or stronger. I could say for sure: he became stronger on a large number of tests with an error of up to 5 points of ELO.
Please, read my text to Mr. Blass.
I wrote about relative weakening.
To calculate absolute Elo number differences we need a lot of games, really.
But you do not want to determine my opinions.
Rather learn some patience and politeness.
I do not know English very well, to understand how polite I am writing.

Just on the TCEC ratings do not have to look: they are not accurate, they are always considered in different conditions. There are very few games. Just last time, Sf was a little more fortunate than now, so that's the difference.
Zevra 2 is my chess engine. Binary, source and description here: https://github.com/sovaz1997/Zevra2
Zevra v2.5 is last version of Zevra: https://github.com/sovaz1997/Zevra2/releases
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: Developer tests of Stockfish need Stockfish 8 instead of Stockfish 7

Post by corres »

sovaz1997 wrote: Tue Jun 12, 2018 6:54 pm
I do not know English very well, to understand how polite I am writing.
Patience and politeness do not depend on knowledge of English but one's attitude.