Stockfish plans ?

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

Frank Quisinsky
Posts: 7229
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: Stockfish plans ?

Post by Frank Quisinsky »

Hi Ralph,

could you write a little bit about the bugs you found?

Best
Frank
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Stockfish plans ?

Post by Sven »

mcostalba wrote:
Sven Schüle wrote:Are you sure that the error bars of your tests are small enough to state that your current version is really about 10-15 ELO weaker than your reference version?

Sven
You know it is an academic question, isn't it ? :-)

We think that's the gap we are missing, of course we are sure of nothing, in the sense that you state....but we have never been in the last 2 years and it seems we got some progress anyway...
It is not "academic" IMO.

Let's say SF version B plays 1000 games against version A and gets +10 ELO with error bars of +/- 20. Now version C plays 1000 games against version B and gets -10 ELO compared to B, also with error bars of +/- 20.

Then what do you conclude from it? B may be strongest, but how likely is that, and how likely is the opposite?

Sven
User avatar
Dr.Wael Deeb
Posts: 9773
Joined: Wed Mar 08, 2006 8:44 pm
Location: Amman,Jordan

Re: Stockfish plans ?

Post by Dr.Wael Deeb »

Sven Schüle wrote:
mcostalba wrote:
Sven Schüle wrote:Are you sure that the error bars of your tests are small enough to state that your current version is really about 10-15 ELO weaker than your reference version?

Sven
You know it is an academic question, isn't it ? :-)

We think that's the gap we are missing, of course we are sure of nothing, in the sense that you state....but we have never been in the last 2 years and it seems we got some progress anyway...
It is not "academic" IMO.

Let's say SF version B plays 1000 games against version A and gets +10 ELO with error bars of +/- 20. Now version C plays 1000 games against version B and gets -10 ELO compared to B, also with error bars of +/- 20.

Then what do you conclude from it? B may be strongest, but how likely is that, and how likely is the opposite?

Sven
Totally agreed here Sven....unless there is a remarkable Elo increase,it's hard to determine which beta testing version is stronger than the other....
Dr.D
_No one can hit as hard as life.But it ain’t about how hard you can hit.It’s about how hard you can get hit and keep moving forward.How much you can take and keep moving forward….
alpha123
Posts: 660
Joined: Sat Dec 05, 2009 5:13 am
Location: Colorado, USA

Re: Stockfish plans ?

Post by alpha123 »

zamar wrote:
alpha123 wrote: @Joona: Please don't turn into Vas.... :lol:
It's not my intention. However people who do not work with chess engines often do not realize how uncertain everything is. One brilliant idea can give +30 elo boost, but the downside is that you can spend two months trying different ideas without getting any practical improvement.

So in my answer I just wanted to express three facts.

1) We are not ready, so there is no news.
2) We don't know when we will be ready. A week? A month? Hopefully not a year ;)
3) We will tell you when we are ready.

I do not mean to be rude, it's just typical Finnish way of communication - avoiding too much talking.
Okay, okay. I just figured you don't need to be ready to have news.

That's probably a good thing - it seems the American way of communication is to talk way too much..... :lol: :roll:

Peter
User avatar
Dr.Wael Deeb
Posts: 9773
Joined: Wed Mar 08, 2006 8:44 pm
Location: Amman,Jordan

Re: Stockfish plans ?

Post by Dr.Wael Deeb »

alpha123 wrote:
zamar wrote:
alpha123 wrote: @Joona: Please don't turn into Vas.... :lol:
It's not my intention. However people who do not work with chess engines often do not realize how uncertain everything is. One brilliant idea can give +30 elo boost, but the downside is that you can spend two months trying different ideas without getting any practical improvement.

So in my answer I just wanted to express three facts.

1) We are not ready, so there is no news.
2) We don't know when we will be ready. A week? A month? Hopefully not a year ;)
3) We will tell you when we are ready.

I do not mean to be rude, it's just typical Finnish way of communication - avoiding too much talking.
Okay, okay. I just figured you don't need to be ready to have news.

That's probably a good thing - it seems the American way of communication is to talk way too much..... :lol: :roll:

Peter
Than can I consider myself an american then :lol: :?:

:wink:
_No one can hit as hard as life.But it ain’t about how hard you can hit.It’s about how hard you can get hit and keep moving forward.How much you can take and keep moving forward….
Ralph Stoesser
Posts: 408
Joined: Sat Mar 06, 2010 9:28 am

Re: Stockfish plans ?

Post by Ralph Stoesser »

Frank Quisinsky wrote:Hi Ralph,

could you write a little bit about the bugs you found?

Best
Frank
Two bugs in eval, one related to static threat evaluation, another one related to open file bonus and pawn lever bonus in positions with opposite side castling. Fixing these bugs + manually adjusting a few eval params related to these bugs results in a solid ELO increase in fast self play games for me.

I'm not sure what the SF team does, but I would first fix all known bugs, secondly auto-retune eval params, thirdly try othing things. In this order.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Stockfish plans ?

Post by mcostalba »

Ralph Stoesser wrote: Two bugs in eval, one related to static threat evaluation, another one related to open file bonus and pawn lever bonus in positions with opposite side castling. Fixing these bugs + manually adjusting a few eval params related to these bugs results in a solid ELO increase in fast self play games for me.

I'm not sure what the SF team does, but I would first fix all known bugs, secondly auto-retune eval params, thirdly try othing things.
Both have been fixed: threat evaluation fix and retuning gave a measurable increase while we were not able to measure an increase from the second one, but has been commited the same for documentation purposes (so that who reads doesn't get surprised).

But I have to clarify one point. Is not that the current development version is weaker the 1.7.1, actually it is stronger but by not an enough margin, I mean considering all the things that we have added, included the 2 that you highlighted, we expected more.

So now we are checking when and with what pacthes we introduced weakeness, we are doing this not starting from the current version, but testing intermediate versions trying to "bisect" (see git documentation for exact meaning of this technique) to the faulty patch/patches.
alpha123
Posts: 660
Joined: Sat Dec 05, 2009 5:13 am
Location: Colorado, USA

Re: Stockfish plans ?

Post by alpha123 »

Dr.Wael Deeb wrote:
alpha123 wrote:
zamar wrote:
alpha123 wrote: @Joona: Please don't turn into Vas.... :lol:
It's not my intention. However people who do not work with chess engines often do not realize how uncertain everything is. One brilliant idea can give +30 elo boost, but the downside is that you can spend two months trying different ideas without getting any practical improvement.

So in my answer I just wanted to express three facts.

1) We are not ready, so there is no news.
2) We don't know when we will be ready. A week? A month? Hopefully not a year ;)
3) We will tell you when we are ready.

I do not mean to be rude, it's just typical Finnish way of communication - avoiding too much talking.
Okay, okay. I just figured you don't need to be ready to have news.

That's probably a good thing - it seems the American way of communication is to talk way too much..... :lol: :roll:

Peter
Than can I consider myself an american then :lol: :?:

:wink:
Naturally I assumed you were already an American. ;)

:lol:

Peter
Lion
Posts: 539
Joined: Fri Mar 31, 2006 1:26 pm
Location: Switzerland

Re: Stockfish plans ?

Post by Lion »

mcostalba wrote:
Uri Blass wrote: I guess that it is only a small regression of less than 30 elo (otherwise it is probably easier to discover the problem because you do not need many games)
According to our calculation should be a 10-15 ELO regression, but it is not clear if depends from a single bad patch (I doubt because such a big difference would have been detected in first instance) or due to a series of small little patches, each one decreasing of -2/-3 ELO

The latter is most probable but is also the worst case becasue it is more difficult to find and also it means to revert back a lot of code (that of many patches instead of just one).
Lets say there are 2 changes named "A" and "B"

Individually put, none of them has a bad effect but when put together with change "C"; "B";"A" it has a negative effect.

Is that type of cases possible ?

If yes, would "Taguchi method" of experiment make sense to apply ?

regards
Lion
Posts: 539
Joined: Fri Mar 31, 2006 1:26 pm
Location: Switzerland

Re: Stockfish plans ?

Post by Lion »

mcostalba wrote:
Ralph Stoesser wrote: Two bugs in eval, one related to static threat evaluation, another one related to open file bonus and pawn lever bonus in positions with opposite side castling. Fixing these bugs + manually adjusting a few eval params related to these bugs results in a solid ELO increase in fast self play games for me.

I'm not sure what the SF team does, but I would first fix all known bugs, secondly auto-retune eval params, thirdly try othing things.
Both have been fixed: threat evaluation fix and retuning gave a measurable increase while we were not able to measure an increase from the second one, but has been commited the same for documentation purposes (so that who reads doesn't get surprised).

But I have to clarify one point. Is not that the current development version is weaker the 1.7.1, actually it is stronger but by not an enough margin, I mean considering all the things that we have added, included the 2 that you highlighted, we expected more.

So now we are checking when and with what pacthes we introduced weakeness, we are doing this not starting from the current version, but testing intermediate versions trying to "bisect" (see git documentation for exact meaning of this technique) to the faulty patch/patches.
Hi,

Why not release it and call it 1.7.2 ?

When the problem, is found, fix it and call it 1.8 for release ?

regards