pohl4711 wrote:The test of Stockfish 130724 is running. I will finish that test, if there is a progress or not, because the last finished test (10000 games) was Stockfish 130623 (one month older version).
Result - if all works correct - on sunday. Then we will see, if there is progress again - or regression.
Stefan
Thanks Stefan! Looking forward to seeing the results. There should be a few more changes coming in today that will increase strength as well. The battle never ends .
I hope so!!
But I think a complete LS-testrun per month is really enough...But if there is some unused PC-power on my desk, I will perhaps do another Stockfish-test in between...We will see.
Stay tuned.
pohl4711 wrote:The test of Stockfish 130724 is running. I will finish that test, if there is a progress or not, because the last finished test (10000 games) was Stockfish 130623 (one month older version).
Result - if all works correct - on sunday. Then we will see, if there is progress again - or regression.
Stefan
Thanks Stefan! Looking forward to seeing the results. There should be a few more changes coming in today that will increase strength as well. The battle never ends .
I hope so!!
But I think a complete LS-testrun per month is really enough...But if there is some unused PC-power on my desk, I will perhaps do another Stockfish-test in between...We will see.
Stay tuned.
Stefan
A run per month is definitely more than enough . It's great to see the self-testing results validated against other opponents. And it really helps to see if there could be a regression in there as well!
pohl4711 wrote:The test of Stockfish 130724 is running. I will finish that test, if there is a progress or not, because the last finished test (10000 games) was Stockfish 130623 (one month older version).
Result - if all works correct - on sunday. Then we will see, if there is progress again - or regression.
Stefan
Thanks Stefan! Looking forward to seeing the results. There should be a few more changes coming in today that will increase strength as well. The battle never ends .
I hope so!!
But I think a complete LS-testrun per month is really enough...But if there is some unused PC-power on my desk, I will perhaps do another Stockfish-test in between...We will see.
Stay tuned.
Stefan
Note that the last stockfish version(26.07) is the best so far based on stockfish-stockfish tests
The test of Stockfish 130724 was already running, when this new version came out.
But the next test of a Stocfish-development version for the LS-ratinglist will come soon - but it is impossible for me to test all versions...
The test of Stockfish 130724 was already running, when this new version came out.
But the next test of a Stocfish-development version for the LS-ratinglist will come soon - but it is impossible for me to test all versions...
Stefan
If you can please test the Stockfish 130724 Stefan! The Stockfish team very much would like to know if there was really a 13 elo regression somewhere. We did not find it in selftesting but against Houdini we may have gotten worse somewhere although nobody found something as large as 13 elo. Well, at least I am very curious about this result!
Thank you,
Eelco
Debugging is twice as hard as writing the code in the first
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan
The test of Stockfish 130724 was already running, when this new version came out.
But the next test of a Stocfish-development version for the LS-ratinglist will come soon - but it is impossible for me to test all versions...
Stefan
If you can please test the Stockfish 130724 Stefan! The Stockfish team very much would like to know if there was really a 13 elo regression somewhere. We did not find it in selftesting but against Houdini we may have gotten worse somewhere although nobody found something as large as 13 elo. Well, at least I am very curious about this result!
Thank you,
Eelco
I wrote, that the test of Stockfish 130724 is running...and I will finish it. Thats why I dont wanted to restart with Stockfish 130726.
Final result of Stockfish 130724 on Sunday on my LS-Website, if all works correct (its very, very hot in Berlin at the moment, I hope my PCs dont crash...). What I can say now is, that this version is not a regression, but a (little) progress to Stockfish 130623, which was the latest full testrun of a Stockfish in the LS-ratinglist...but there are still 2000 games to play...
And the regression of 13 Elo that I found in Stockfish 130721 is not sure. I aborted that testrun after 2600 games, so the errorbar of that result (Stockfish 130721) is +/-10 Elo. So perhaps the regession is perhaps only -3 Elo?!?
The result of Stockfish 130724 is now online. Next test Stockfish 130727 - lets see, if the super-patch of Tom Vijlbrief is really worth a +10 Elo increase (against the top 10 engines of computerchess - not only against Stockfish 3)...Stay tuned!
pohl4711 wrote:The result of Stockfish 130724 is now online. Next test Stockfish 130727 - lets see, if the super-patch of Tom Vijlbrief is really worth a +10 Elo increase (against the top 10 engines of computerchess - not only against Stockfish 3)...Stay tuned!
Intermediate result of Stockfish 130727: 2100 games played, +12 Elo to Stockfish 130724(!) But after 2100 games Stockfish 130724 was +6 Elo to Stockfish 130623 and at the end (after 10000 games) it was less than +1 Elo...So I believe that Stockfish 130727 finally will be around 5-7 Elo stronger than Stockfish 130724 - that would still be a great result for one patch. But we will see (on wednesday, if all works correct).
Great! Thanks Stefan! Some of the credit for this version should go to Ryan Takker who rewrote and reduced the impact of some of the piece square tables while improving elo. But it also shows that king safety can and could still be improved and that not everybody is immune to king attacks yet I hope your testcomputer room is not like a sauna!
Eelco
Debugging is twice as hard as writing the code in the first
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan