Hi Ralph,
could you write a little bit about the bugs you found?
Best
Frank
Stockfish plans ?
Moderator: Ras
-
Frank Quisinsky
- Posts: 7232
- Joined: Wed Nov 18, 2009 7:16 pm
- Location: Gutweiler, Germany
- Full name: Frank Quisinsky
-
Sven
- Posts: 4052
- Joined: Thu May 15, 2008 9:57 pm
- Location: Berlin, Germany
- Full name: Sven Schüle
Re: Stockfish plans ?
It is not "academic" IMO.mcostalba wrote:You know it is an academic question, isn't it ?Sven Schüle wrote:Are you sure that the error bars of your tests are small enough to state that your current version is really about 10-15 ELO weaker than your reference version?
Sven
We think that's the gap we are missing, of course we are sure of nothing, in the sense that you state....but we have never been in the last 2 years and it seems we got some progress anyway...
Let's say SF version B plays 1000 games against version A and gets +10 ELO with error bars of +/- 20. Now version C plays 1000 games against version B and gets -10 ELO compared to B, also with error bars of +/- 20.
Then what do you conclude from it? B may be strongest, but how likely is that, and how likely is the opposite?
Sven
-
Dr.Wael Deeb
- Posts: 9773
- Joined: Wed Mar 08, 2006 8:44 pm
- Location: Amman,Jordan
Re: Stockfish plans ?
Totally agreed here Sven....unless there is a remarkable Elo increase,it's hard to determine which beta testing version is stronger than the other....Sven Schüle wrote:It is not "academic" IMO.mcostalba wrote:You know it is an academic question, isn't it ?Sven Schüle wrote:Are you sure that the error bars of your tests are small enough to state that your current version is really about 10-15 ELO weaker than your reference version?
Sven
We think that's the gap we are missing, of course we are sure of nothing, in the sense that you state....but we have never been in the last 2 years and it seems we got some progress anyway...
Let's say SF version B plays 1000 games against version A and gets +10 ELO with error bars of +/- 20. Now version C plays 1000 games against version B and gets -10 ELO compared to B, also with error bars of +/- 20.
Then what do you conclude from it? B may be strongest, but how likely is that, and how likely is the opposite?
Sven
Dr.D
_No one can hit as hard as life.But it ain’t about how hard you can hit.It’s about how hard you can get hit and keep moving forward.How much you can take and keep moving forward….
-
alpha123
- Posts: 660
- Joined: Sat Dec 05, 2009 5:13 am
- Location: Colorado, USA
Re: Stockfish plans ?
Okay, okay. I just figured you don't need to be ready to have news.zamar wrote:It's not my intention. However people who do not work with chess engines often do not realize how uncertain everything is. One brilliant idea can give +30 elo boost, but the downside is that you can spend two months trying different ideas without getting any practical improvement.alpha123 wrote: @Joona: Please don't turn into Vas....![]()
So in my answer I just wanted to express three facts.
1) We are not ready, so there is no news.
2) We don't know when we will be ready. A week? A month? Hopefully not a year
3) We will tell you when we are ready.
I do not mean to be rude, it's just typical Finnish way of communication - avoiding too much talking.
That's probably a good thing - it seems the American way of communication is to talk way too much.....
Peter
-
Dr.Wael Deeb
- Posts: 9773
- Joined: Wed Mar 08, 2006 8:44 pm
- Location: Amman,Jordan
Re: Stockfish plans ?
Than can I consider myself an american thenalpha123 wrote:Okay, okay. I just figured you don't need to be ready to have news.zamar wrote:It's not my intention. However people who do not work with chess engines often do not realize how uncertain everything is. One brilliant idea can give +30 elo boost, but the downside is that you can spend two months trying different ideas without getting any practical improvement.alpha123 wrote: @Joona: Please don't turn into Vas....![]()
So in my answer I just wanted to express three facts.
1) We are not ready, so there is no news.
2) We don't know when we will be ready. A week? A month? Hopefully not a year
3) We will tell you when we are ready.
I do not mean to be rude, it's just typical Finnish way of communication - avoiding too much talking.
That's probably a good thing - it seems the American way of communication is to talk way too much.....![]()
Peter
_No one can hit as hard as life.But it ain’t about how hard you can hit.It’s about how hard you can get hit and keep moving forward.How much you can take and keep moving forward….
-
Ralph Stoesser
- Posts: 408
- Joined: Sat Mar 06, 2010 9:28 am
Re: Stockfish plans ?
Two bugs in eval, one related to static threat evaluation, another one related to open file bonus and pawn lever bonus in positions with opposite side castling. Fixing these bugs + manually adjusting a few eval params related to these bugs results in a solid ELO increase in fast self play games for me.Frank Quisinsky wrote:Hi Ralph,
could you write a little bit about the bugs you found?
Best
Frank
I'm not sure what the SF team does, but I would first fix all known bugs, secondly auto-retune eval params, thirdly try othing things. In this order.
-
mcostalba
- Posts: 2684
- Joined: Sat Jun 14, 2008 9:17 pm
Re: Stockfish plans ?
Both have been fixed: threat evaluation fix and retuning gave a measurable increase while we were not able to measure an increase from the second one, but has been commited the same for documentation purposes (so that who reads doesn't get surprised).Ralph Stoesser wrote: Two bugs in eval, one related to static threat evaluation, another one related to open file bonus and pawn lever bonus in positions with opposite side castling. Fixing these bugs + manually adjusting a few eval params related to these bugs results in a solid ELO increase in fast self play games for me.
I'm not sure what the SF team does, but I would first fix all known bugs, secondly auto-retune eval params, thirdly try othing things.
But I have to clarify one point. Is not that the current development version is weaker the 1.7.1, actually it is stronger but by not an enough margin, I mean considering all the things that we have added, included the 2 that you highlighted, we expected more.
So now we are checking when and with what pacthes we introduced weakeness, we are doing this not starting from the current version, but testing intermediate versions trying to "bisect" (see git documentation for exact meaning of this technique) to the faulty patch/patches.
-
alpha123
- Posts: 660
- Joined: Sat Dec 05, 2009 5:13 am
- Location: Colorado, USA
Re: Stockfish plans ?
Naturally I assumed you were already an American.Dr.Wael Deeb wrote:Than can I consider myself an american thenalpha123 wrote:Okay, okay. I just figured you don't need to be ready to have news.zamar wrote:It's not my intention. However people who do not work with chess engines often do not realize how uncertain everything is. One brilliant idea can give +30 elo boost, but the downside is that you can spend two months trying different ideas without getting any practical improvement.alpha123 wrote: @Joona: Please don't turn into Vas....![]()
So in my answer I just wanted to express three facts.
1) We are not ready, so there is no news.
2) We don't know when we will be ready. A week? A month? Hopefully not a year
3) We will tell you when we are ready.
I do not mean to be rude, it's just typical Finnish way of communication - avoiding too much talking.
That's probably a good thing - it seems the American way of communication is to talk way too much.....![]()
Peter![]()
![]()
Peter
-
Lion
- Posts: 539
- Joined: Fri Mar 31, 2006 1:26 pm
- Location: Switzerland
Re: Stockfish plans ?
Lets say there are 2 changes named "A" and "B"mcostalba wrote:According to our calculation should be a 10-15 ELO regression, but it is not clear if depends from a single bad patch (I doubt because such a big difference would have been detected in first instance) or due to a series of small little patches, each one decreasing of -2/-3 ELOUri Blass wrote: I guess that it is only a small regression of less than 30 elo (otherwise it is probably easier to discover the problem because you do not need many games)
The latter is most probable but is also the worst case becasue it is more difficult to find and also it means to revert back a lot of code (that of many patches instead of just one).
Individually put, none of them has a bad effect but when put together with change "C"; "B";"A" it has a negative effect.
Is that type of cases possible ?
If yes, would "Taguchi method" of experiment make sense to apply ?
regards
-
Lion
- Posts: 539
- Joined: Fri Mar 31, 2006 1:26 pm
- Location: Switzerland
Re: Stockfish plans ?
Hi,mcostalba wrote:Both have been fixed: threat evaluation fix and retuning gave a measurable increase while we were not able to measure an increase from the second one, but has been commited the same for documentation purposes (so that who reads doesn't get surprised).Ralph Stoesser wrote: Two bugs in eval, one related to static threat evaluation, another one related to open file bonus and pawn lever bonus in positions with opposite side castling. Fixing these bugs + manually adjusting a few eval params related to these bugs results in a solid ELO increase in fast self play games for me.
I'm not sure what the SF team does, but I would first fix all known bugs, secondly auto-retune eval params, thirdly try othing things.
But I have to clarify one point. Is not that the current development version is weaker the 1.7.1, actually it is stronger but by not an enough margin, I mean considering all the things that we have added, included the 2 that you highlighted, we expected more.
So now we are checking when and with what pacthes we introduced weakeness, we are doing this not starting from the current version, but testing intermediate versions trying to "bisect" (see git documentation for exact meaning of this technique) to the faulty patch/patches.
Why not release it and call it 1.7.2 ?
When the problem, is found, fix it and call it 1.8 for release ?
regards