Stockfish testing at STC and LTC: one question
Moderators: hgm, Rebel, chrisw
-
- Posts: 3283
- Joined: Wed Mar 08, 2006 8:15 pm
Stockfish testing at STC and LTC: one question
It's usual, that excellent patch gives +4 ELO at STC, but +2 at LTC. So is this indicating, that at 360+3,6 level we get probably NOTHING? And may be regression at tournament level!
Jouni
-
- Posts: 2204
- Joined: Sat Jan 18, 2014 10:24 am
- Location: Andorra
Re: Stockfish testing at STC and LTC: one question
Who knows, without more games is impossible to tell. When I want to be more sure, I just extend the test. Flexibility in testing methods is..., well, necessary.
Last edited by cdani on Tue Sep 19, 2017 5:40 pm, edited 1 time in total.
Daniel José - http://www.andscacs.com
-
- Posts: 546
- Joined: Sat Aug 17, 2013 12:36 am
Re: Stockfish testing at STC and LTC: one question
Yes, this is why Stockfish has lost hundreds of Elo over the last couple of years and has had miserable results at TCEC.
-
- Posts: 2204
- Joined: Sat Jan 18, 2014 10:24 am
- Location: Andorra
Re: Stockfish testing at STC and LTC: one question
Not at all, of course, but always something more can be done.jhellis3 wrote:Yes, this is why Stockfish has lost hundreds of Elo over the last couple of years and has had miserable results at TCEC.
Daniel José - http://www.andscacs.com
-
- Posts: 1471
- Joined: Tue Mar 16, 2010 12:00 am
Re: Stockfish testing at STC and LTC: one question
The amount of noise in engine testing is such that it's nearly impossible to extrapolate the results to longer TC.Jouni wrote:It's usual, that excellent patch gives +4 ELO at STC, but +2 at LTC. So is this indicating, that at 360+3,6 level we get probably NOTHING? And may be regression at tournament level!
The error margins are very big compared to the difference in results.
-
- Posts: 2929
- Joined: Sat Jan 22, 2011 12:42 am
- Location: NL
Re: Stockfish testing at STC and LTC: one question
How do you get those Elo estimates? Elo estimates based on the SPRT test runs are not reliable. All I'm seeing from the numbers you quote is an increased draw rate with longer time control, which I think is expected.
-
- Posts: 1494
- Joined: Thu Mar 30, 2006 2:08 pm
Re: Stockfish testing at STC and LTC: one question
Take a look at draw percentages. At longer time controls they increase a lot. For example, in CCRL a typical draw percentage for the stronger programs is 40% at 40/4, but at 40/40 is it around 60%. You just get more draws as programs search deeper and play better. So a contraction of 4 elo to 2 elo at a longer time control is quite normal.Jouni wrote:It's usual, that excellent patch gives +4 ELO at STC, but +2 at LTC. So is this indicating, that at 360+3,6 level we get probably NOTHING? And may be regression at tournament level!
It is very hard to get enough data at very long time controls to prove a change is good or not. Anything over more than just a few seconds per move just takes too long. I really appreciate the in between lists like IPON's 5'+3". It is a reasonable attempt to get enough games to say something with some reasonable error margins. Perhaps as computers get cheaper on the cloud we can test at a much longer time control.
-
- Posts: 10280
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: Stockfish testing at STC and LTC: one question
It is usual that the stockfish team have no interest how much elo they get otherwise they could do better by using fixed number of games.Jouni wrote:It's usual, that excellent patch gives +4 ELO at STC, but +2 at LTC. So is this indicating, that at 360+3,6 level we get probably NOTHING? And may be regression at tournament level!
We know almost nothing about the elo improvement of a patch from
results of SPRT.
Performance of +4 elo when they pass SPRT at STC means nothing becuase if you test the patch many times the patch may fail SPRT in part of the cases and give also 0 elo or 1 elo so the average result is clearly less than 4 elo.
-
- Posts: 4366
- Joined: Fri Mar 10, 2006 5:23 am
- Location: http://www.arasanchess.org
Re: Stockfish testing at STC and LTC: one question
I have observed myself that STC results appear to be a lot noisier than LTC results. So a positive STC result is a bad predictor of what LTC or real tournament results will be. This is a bit surprising because for years, starting with Rybka, engines were using hyper-bullet games for testing. There is some validity to that method because many got a good ELO gain from it. But it is not the best or most reliable method. It is a way to short cut testing at real time controls, which would require a huge number of processor cores to perform in a reasonable time period.
--Jon
--Jon
-
- Posts: 12538
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Stockfish testing at STC and LTC: one question
When you test with a certain set of conditions, the results are totally valid for exactly those conditions. The results may or may not translate to another set of conditions.jdart wrote:I have observed myself that STC results appear to be a lot noisier than LTC results. So a positive STC result is a bad predictor of what LTC or real tournament results will be. This is a bit surprising because for years, starting with Rybka, engines were using hyper-bullet games for testing. There is some validity to that method because many got a good ELO gain from it. But it is not the best or most reliable method. It is a way to short cut testing at real time controls, which would require a huge number of processor cores to perform in a reasonable time period.
--Jon
Generally speaking, things that work well at ultra high speed will work well at other speeds to. That is why the model tends to work and Stockfish is an extremely strong engine.
On the other hand, they are tuning SF for high speed blitz games so they will achieve that.
But I think every other engine is doing the same thing, so it really won't make any difference any way.
Besides which, nobody has the resources to test at tournament time control.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.