Stockfish plans ?

Frank Quisinsky · Post by **Frank Quisinsky** » Wed Jun 09, 2010 4:27 pm

Hi Ralph,

could you write a little bit about the bugs you found?

Best
Frank

Sven · Post by **Sven** » Wed Jun 09, 2010 4:40 pm

mcostalba wrote:
Sven Schüle wrote:Are you sure that the error bars of your tests are small enough to state that your current version is really about 10-15 ELO weaker than your reference version?

Sven
You know it is an academic question, isn't it ?

We think that's the gap we are missing, of course we are sure of nothing, in the sense that you state....but we have never been in the last 2 years and it seems we got some progress anyway...

It is not "academic" IMO.

Let's say SF version B plays 1000 games against version A and gets +10 ELO with error bars of +/- 20. Now version C plays 1000 games against version B and gets -10 ELO compared to B, also with error bars of +/- 20.

Then what do you conclude from it? B may be strongest, but how likely is that, and how likely is the opposite?

Sven

Dr.Wael Deeb · Post by **Dr.Wael Deeb** » Wed Jun 09, 2010 7:46 pm

Sven Schüle wrote:
mcostalba wrote:
Sven Schüle wrote:Are you sure that the error bars of your tests are small enough to state that your current version is really about 10-15 ELO weaker than your reference version?

Sven
You know it is an academic question, isn't it ?

We think that's the gap we are missing, of course we are sure of nothing, in the sense that you state....but we have never been in the last 2 years and it seems we got some progress anyway...
It is not "academic" IMO.

Let's say SF version B plays 1000 games against version A and gets +10 ELO with error bars of +/- 20. Now version C plays 1000 games against version B and gets -10 ELO compared to B, also with error bars of +/- 20.

Then what do you conclude from it? B may be strongest, but how likely is that, and how likely is the opposite?

Sven

Totally agreed here Sven....unless there is a remarkable Elo increase,it's hard to determine which beta testing version is stronger than the other....
Dr.D

alpha123 · Post by **alpha123** » Wed Jun 09, 2010 7:50 pm

zamar wrote:
alpha123 wrote: @Joona: Please don't turn into Vas....
It's not my intention. However people who do not work with chess engines often do not realize how uncertain everything is. One brilliant idea can give +30 elo boost, but the downside is that you can spend two months trying different ideas without getting any practical improvement.

So in my answer I just wanted to express three facts.

1) We are not ready, so there is no news.
2) We don't know when we will be ready. A week? A month? Hopefully not a year
3) We will tell you when we are ready.

I do not mean to be rude, it's just typical Finnish way of communication - avoiding too much talking.

Okay, okay. I just figured you don't need to be ready to have news.

That's probably a good thing - it seems the American way of communication is to talk way too much.....

Peter

Dr.Wael Deeb · Post by **Dr.Wael Deeb** » Wed Jun 09, 2010 7:54 pm

alpha123 wrote:
zamar wrote:
alpha123 wrote: @Joona: Please don't turn into Vas....
It's not my intention. However people who do not work with chess engines often do not realize how uncertain everything is. One brilliant idea can give +30 elo boost, but the downside is that you can spend two months trying different ideas without getting any practical improvement.

So in my answer I just wanted to express three facts.

1) We are not ready, so there is no news.
2) We don't know when we will be ready. A week? A month? Hopefully not a year
3) We will tell you when we are ready.

I do not mean to be rude, it's just typical Finnish way of communication - avoiding too much talking.
Okay, okay. I just figured you don't need to be ready to have news.

That's probably a good thing - it seems the American way of communication is to talk way too much.....

Peter

Than can I consider myself an american then

Ralph Stoesser · Post by **Ralph Stoesser** » Wed Jun 09, 2010 8:03 pm

Frank Quisinsky wrote:Hi Ralph,

could you write a little bit about the bugs you found?

Best
Frank

Two bugs in eval, one related to static threat evaluation, another one related to open file bonus and pawn lever bonus in positions with opposite side castling. Fixing these bugs + manually adjusting a few eval params related to these bugs results in a solid ELO increase in fast self play games for me.

I'm not sure what the SF team does, but I would first fix all known bugs, secondly auto-retune eval params, thirdly try othing things. In this order.

mcostalba · Post by **mcostalba** » Wed Jun 09, 2010 8:12 pm

Ralph Stoesser wrote: Two bugs in eval, one related to static threat evaluation, another one related to open file bonus and pawn lever bonus in positions with opposite side castling. Fixing these bugs + manually adjusting a few eval params related to these bugs results in a solid ELO increase in fast self play games for me.

I'm not sure what the SF team does, but I would first fix all known bugs, secondly auto-retune eval params, thirdly try othing things.

Both have been fixed: threat evaluation fix and retuning gave a measurable increase while we were not able to measure an increase from the second one, but has been commited the same for documentation purposes (so that who reads doesn't get surprised).

But I have to clarify one point. Is not that the current development version is weaker the 1.7.1, actually it is stronger but by not an enough margin, I mean considering all the things that we have added, included the 2 that you highlighted, we expected more.

So now we are checking when and with what pacthes we introduced weakeness, we are doing this not starting from the current version, but testing intermediate versions trying to "bisect" (see git documentation for exact meaning of this technique) to the faulty patch/patches.

alpha123 · Post by **alpha123** » Wed Jun 09, 2010 8:12 pm

Dr.Wael Deeb wrote:
alpha123 wrote:
zamar wrote:
alpha123 wrote: @Joona: Please don't turn into Vas....
It's not my intention. However people who do not work with chess engines often do not realize how uncertain everything is. One brilliant idea can give +30 elo boost, but the downside is that you can spend two months trying different ideas without getting any practical improvement.

So in my answer I just wanted to express three facts.

1) We are not ready, so there is no news.
2) We don't know when we will be ready. A week? A month? Hopefully not a year
3) We will tell you when we are ready.

I do not mean to be rude, it's just typical Finnish way of communication - avoiding too much talking.
Okay, okay. I just figured you don't need to be ready to have news.

That's probably a good thing - it seems the American way of communication is to talk way too much.....

Peter
Than can I consider myself an american then

Naturally I assumed you were already an American.

Peter

Lion · Post by **Lion** » Wed Jun 09, 2010 10:38 pm

mcostalba wrote:
Uri Blass wrote: I guess that it is only a small regression of less than 30 elo (otherwise it is probably easier to discover the problem because you do not need many games)
According to our calculation should be a 10-15 ELO regression, but it is not clear if depends from a single bad patch (I doubt because such a big difference would have been detected in first instance) or due to a series of small little patches, each one decreasing of -2/-3 ELO

The latter is most probable but is also the worst case becasue it is more difficult to find and also it means to revert back a lot of code (that of many patches instead of just one).

Lets say there are 2 changes named "A" and "B"

Individually put, none of them has a bad effect but when put together with change "C"; "B";"A" it has a negative effect.

Is that type of cases possible ?

If yes, would "Taguchi method" of experiment make sense to apply ?

regards

Lion · Post by **Lion** » Wed Jun 09, 2010 10:49 pm

mcostalba wrote:
Ralph Stoesser wrote: Two bugs in eval, one related to static threat evaluation, another one related to open file bonus and pawn lever bonus in positions with opposite side castling. Fixing these bugs + manually adjusting a few eval params related to these bugs results in a solid ELO increase in fast self play games for me.

I'm not sure what the SF team does, but I would first fix all known bugs, secondly auto-retune eval params, thirdly try othing things.
Both have been fixed: threat evaluation fix and retuning gave a measurable increase while we were not able to measure an increase from the second one, but has been commited the same for documentation purposes (so that who reads doesn't get surprised).

But I have to clarify one point. Is not that the current development version is weaker the 1.7.1, actually it is stronger but by not an enough margin, I mean considering all the things that we have added, included the 2 that you highlighted, we expected more.

So now we are checking when and with what pacthes we introduced weakeness, we are doing this not starting from the current version, but testing intermediate versions trying to "bisect" (see git documentation for exact meaning of this technique) to the faulty patch/patches.

Hi,

Why not release it and call it 1.7.2 ?

When the problem, is found, fix it and call it 1.8 for release ?

regards

Stockfish plans ?

Re: Stockfish plans ?

Re: Stockfish plans ?

Re: Stockfish plans ?

Re: Stockfish plans ?

Re: Stockfish plans ?

Re: Stockfish plans ?

Re: Stockfish plans ?

Re: Stockfish plans ?

Re: Stockfish plans ?

Re: Stockfish plans ?