Stockfish testing at STC and LTC: one question

Discussion of chess software programming and technical issues.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Jouni
Posts: 1791
Joined: Wed Mar 08, 2006 7:15 pm

Stockfish testing at STC and LTC: one question

Post by Jouni » Tue Sep 19, 2017 6:53 am

It's usual, that excellent patch gives +4 ELO at STC, but +2 at LTC. So is this indicating, that at 360+3,6 level we get probably NOTHING? And may be regression at tournament level!
Jouni

User avatar
cdani
Posts: 2047
Joined: Sat Jan 18, 2014 9:24 am
Location: Andorra
Contact:

Re: Stockfish testing at STC and LTC: one question

Post by cdani » Tue Sep 19, 2017 3:39 pm

Who knows, without more games is impossible to tell. When I want to be more sure, I just extend the test. Flexibility in testing methods is..., well, necessary.
Last edited by cdani on Tue Sep 19, 2017 3:40 pm, edited 1 time in total.

jhellis3
Posts: 341
Joined: Fri Aug 16, 2013 10:36 pm

Re: Stockfish testing at STC and LTC: one question

Post by jhellis3 » Tue Sep 19, 2017 3:40 pm

Yes, this is why Stockfish has lost hundreds of Elo over the last couple of years and has had miserable results at TCEC.

User avatar
cdani
Posts: 2047
Joined: Sat Jan 18, 2014 9:24 am
Location: Andorra
Contact:

Re: Stockfish testing at STC and LTC: one question

Post by cdani » Tue Sep 19, 2017 3:41 pm

jhellis3 wrote:Yes, this is why Stockfish has lost hundreds of Elo over the last couple of years and has had miserable results at TCEC.
Not at all, of course, but always something more can be done.

User avatar
Houdini
Posts: 1471
Joined: Mon Mar 15, 2010 11:00 pm
Contact:

Re: Stockfish testing at STC and LTC: one question

Post by Houdini » Tue Sep 19, 2017 4:17 pm

Jouni wrote:It's usual, that excellent patch gives +4 ELO at STC, but +2 at LTC. So is this indicating, that at 360+3,6 level we get probably NOTHING? And may be regression at tournament level!
The amount of noise in engine testing is such that it's nearly impossible to extrapolate the results to longer TC.
The error margins are very big compared to the difference in results.

User avatar
Evert
Posts: 2898
Joined: Fri Jan 21, 2011 11:42 pm
Location: NL
Contact:

Re: Stockfish testing at STC and LTC: one question

Post by Evert » Tue Sep 19, 2017 4:20 pm

How do you get those Elo estimates? Elo estimates based on the SPRT test runs are not reliable. All I'm seeing from the numbers you quote is an increased draw rate with longer time control, which I think is expected.

mjlef
Posts: 1329
Joined: Thu Mar 30, 2006 12:08 pm
Contact:

Re: Stockfish testing at STC and LTC: one question

Post by mjlef » Thu Sep 21, 2017 1:19 am

Jouni wrote:It's usual, that excellent patch gives +4 ELO at STC, but +2 at LTC. So is this indicating, that at 360+3,6 level we get probably NOTHING? And may be regression at tournament level!
Take a look at draw percentages. At longer time controls they increase a lot. For example, in CCRL a typical draw percentage for the stronger programs is 40% at 40/4, but at 40/40 is it around 60%. You just get more draws as programs search deeper and play better. So a contraction of 4 elo to 2 elo at a longer time control is quite normal.

It is very hard to get enough data at very long time controls to prove a change is good or not. Anything over more than just a few seconds per move just takes too long. I really appreciate the in between lists like IPON's 5'+3". It is a reasonable attempt to get enough games to say something with some reasonable error margins. Perhaps as computers get cheaper on the cloud we can test at a much longer time control.

Uri Blass
Posts: 8037
Joined: Wed Mar 08, 2006 11:37 pm
Location: Tel-Aviv Israel

Re: Stockfish testing at STC and LTC: one question

Post by Uri Blass » Fri Sep 22, 2017 2:33 am

Jouni wrote:It's usual, that excellent patch gives +4 ELO at STC, but +2 at LTC. So is this indicating, that at 360+3,6 level we get probably NOTHING? And may be regression at tournament level!
It is usual that the stockfish team have no interest how much elo they get otherwise they could do better by using fixed number of games.

We know almost nothing about the elo improvement of a patch from
results of SPRT.

Performance of +4 elo when they pass SPRT at STC means nothing becuase if you test the patch many times the patch may fail SPRT in part of the cases and give also 0 elo or 1 elo so the average result is clearly less than 4 elo.

jdart
Posts: 3509
Joined: Fri Mar 10, 2006 4:23 am
Location: http://www.arasanchess.org

Re: Stockfish testing at STC and LTC: one question

Post by jdart » Sun Sep 24, 2017 6:42 pm

I have observed myself that STC results appear to be a lot noisier than LTC results. So a positive STC result is a bad predictor of what LTC or real tournament results will be. This is a bit surprising because for years, starting with Rybka, engines were using hyper-bullet games for testing. There is some validity to that method because many got a good ELO gain from it. But it is not the best or most reliable method. It is a way to short cut testing at real time controls, which would require a huge number of processor cores to perform in a reasonable time period.

--Jon

Dann Corbit
Posts: 8670
Joined: Wed Mar 08, 2006 7:57 pm
Location: Redmond, WA USA
Contact:

Re: Stockfish testing at STC and LTC: one question

Post by Dann Corbit » Mon Sep 25, 2017 6:41 pm

jdart wrote:I have observed myself that STC results appear to be a lot noisier than LTC results. So a positive STC result is a bad predictor of what LTC or real tournament results will be. This is a bit surprising because for years, starting with Rybka, engines were using hyper-bullet games for testing. There is some validity to that method because many got a good ELO gain from it. But it is not the best or most reliable method. It is a way to short cut testing at real time controls, which would require a huge number of processor cores to perform in a reasonable time period.

--Jon
When you test with a certain set of conditions, the results are totally valid for exactly those conditions. The results may or may not translate to another set of conditions.

Generally speaking, things that work well at ultra high speed will work well at other speeds to. That is why the model tends to work and Stockfish is an extremely strong engine.

On the other hand, they are tuning SF for high speed blitz games so they will achieve that.
But I think every other engine is doing the same thing, so it really won't make any difference any way.
Besides which, nobody has the resources to test at tournament time control.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.

Post Reply