Further weaknesses

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: What to do about this?

Post by zullil »

Lyudmil Tsvetkov wrote:
Uri Blass wrote:
Lyudmil Tsvetkov wrote:
zullil wrote:
zullil wrote:
zullil wrote:
Lyudmil Tsvetkov wrote: [d]8/1Q6/3ppk1p/6r1/3p4/P3b1PK/1P5P/8 w - - 0 49
Both engines see 1 to 2 full pawns white advantage here for the next 20 moves, when suddenly, first SF, and then Houdini, understand that black is winning.

In any case, it seems black has a decisive advantage in terms of eval on the above diagram.
After a long search, the latest Stockfish sure isn't seeing that Black has an advantage:

Code: Select all

info depth 48 seldepth 90 score cp 275 nodes 120428243868 nps 23691600 time 5083162 multipv 1 pv b7a6 g5d5 a6f1 d5f5 f1d3 f5c5 h3g2 d6d5 a3a4 h6h5 d3h7 c5c1 g2h3 c1a1 b2b3 f6e5 a4a5 a1a3 h7h5 e5d6 h5e8 a3b3 e8d8 d6c6 d8e7 c6b5 e7a7 b5c6 a5a6 b3b6 a7f7 b6a6 f7e6 c6b7 e6d5 b7c7 h3g4 a6d6 d5c5 c7d7 c5b5 d7e7 g4f3 e3h6 b5d3 h6g7 h2h4 d6e6 f3f4 e6f6 f4g4 f6e6 d3h7 e7f7 g4g5 e6d6

Code: Select all

info depth 49 seldepth 90 score cp 313 nodes 278416677366 nps 24538113 time 11346295 multipv 1 pv b7e4 g5e5 e4d3 e5c5 h3g2 h6h5 a3a4 c5a5 b2b4 a5a4 d3b1 a4a8 b4b5 a8b8 b5b6 d6d5 b1f1 f6e5 f1a6 d4d3 a6d3 e3b6 d3f3 e5d6 f3f4 e6e5 f4f6 d6c5 f6e5 b8b7 e5h5 b6c7 h5e2 c7d6 e2c2 c5d4 h2h4 b7b8 c2f2 d4e5 f2e2 e5f6 g3g4 d6e5 g4g5 f6f5 e2c2 f5e6 c2g6 e6e7 g6d3

Code: Select all

info depth 50 seldepth 102 score cp 275 nodes 652676948978 nps 25448132 time 25647342 multipv 1 pv b7e4 g5e5 e4c2 e5c5 c2e2 c5a5 e2d3 a5c5 d3e2
Thanks Louis, but I do not believe SF.
In the actual game, some 30+ plies on, it still showed a very significant white advantage.
That is how search without good eval is meaningless: both SF and Houdini do not see a forced winning line for black, as in the best of cases there is a long series of checks, but the win is there.

Something should be done about SF eval in the above position, but I am still not certain what exactly, apart from further rasing the passer bonus for the side with more pieces.
I believe stockfish because the score went up and earlier it was lower.
I guess that stockfish saw some tactics that it did not see in the game(in the game it did not see 2.75 pawns advantage).
With those scores I can not trust SF.
I am almost convinced black wins here, but someone has to analyse it.

Code: Select all

info depth 54 seldepth 102 score cp 350 nodes 1132292206154 nps 25781700 time 43918445 multipv 1 pv b7e4 g5e5 e4d3 e5c5 h3g2 h6h5 a3a4 c5a5 b2b4 a5a4 d3b1 a4a8 b4b5 a8b8 b5b6 d6d5 b1f1 f6e5 f1a6 d4d3 a6d3 e3b6 d3f3 e5d6 f3f4 e6e5 f4f6 d6c5 f6e5 b8b7 e5h5 b6c7 h5e2 d5d4 e2c2 c5d5 h2h4 c7e5 g3g4 b7c7 c2f5 c7c5 h4h5 d5c4 h5h6 d4d3 f5e4 c4c3 e4e1 c3c4 g4g5 e5d4 g2f3 c5f5 f3g4 f5f2 e1e6 c4c3 e6c6 c3b3 c6e4 b3c3
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Spacemask 2

Post by Lyudmil Tsvetkov »

Uri Blass wrote:
Lyudmil Tsvetkov wrote:Well, it is a pity no one is interested in this, I not only believe, but am very much certain that is a very good idea.
Why so? Because I saw similar SF behaviour in more than 10% of the games I have looked at, and I have looked at much more than hundred games.

I am adamant, based on the games I have seen, that SF underestimates blocked enemy pawns on the 5th rank. Something seems to be going extremely wrong currently in STF, the queue is empty, patches that have passed STC are not tested at LTC for some reason, as the Trapped Bishop patch, first try, NoMobilityQueen patch, RNN vs R imbalance, the patch that passed STC. (btw. regarding this patch, a very meaningful thing would be to add R vs BBN to the imbalance, if it is not already added, with twice bigger success chances) Instead, people test patches that have very good almost obvious chances to fail, mostly linked to removing knowledge.

Believe it or not, even Joerg, definitely for me the most thoughtful, original and open-minded of the team, has started removing knowledge and using CLOP values as of late!! Joerg, you know removing knowledge almost never works, unless it is removing unessential things like queen on 7th. Concerning CLOP, I very much respect Remi Coulom, and The Crazy Bishop was one of my favourite sparring engines in the past, but CLOP simply does not work. Anyone still thinking CLOP is working? At STF all CLOP-tuned patches failed either easily, or even much more convincingly. All of them. A human assesssment is way superior than an automatic tuning system one.

I very much hope someone tries this idea for adding new features to Spacemask. SF simply needs it, as it needs more imbalance eval, as it needs more closed eval, and more pawn specifications. It needs this, because SF games say so. You can not go around blindly fixing some imaginary problem, you need to first have some evidence of the problem to start fixing it. And the evidence is in the games lost by SF. Please, look more carefully at them.

The nasty thing is that SF needs some 17 elo more at least for a reasonable update. And, if those 17 elo are not achieved within the next month at worst, this will be the first time STF fails in more than a year. SF really needs some new ideas, based on sound chess knowledge.

Joerg, you are not going to let down the sensible approach, are you? It is more than obvious for any relatively good chess player that removing trapped rook condition is going to fail. It is vital in the eval, queen on 7th is not at all. You can remove only things that have no real chess value, but not important ones. SF very much needs further imbalance testing, why not do a single patch of the Q vs 3 pieces imbalance, specifying within it the 3 respective imbalances with their respective values? (probably after some testing of how they behave on their own) Such a patch will have much much bigger chances to pass the test convincingly. I think this is the reasonable thing to do about rare eval elements as this.

Arjun, you liked space eval, why not try the above suggestion? Just assign a bonus of 5-10cps for any pawn on the 5th rank, blocked by an enemy pawn, on files c-f. Very simple.

Sorry guys, I do not want to say anything or interfere where no one asks me, but, you need to add another 17 elo in less than a month's time in order not to let end user expectations down, which are very high. Too many people, me included, are impatiently waiting for a new official SF 5 release. :D Please do something about it.
I do not plan to try to give new patches because I learned that people who give patches and have history of some productive patches can later be blamed of wasting resources.

I think that it is a very bad idea for development to do it and it clearly discourage people to give patches.

My opinion is that stockfish could earn more elo if there was a rule that you never blame people who give patches for wasting resources.

My opinion is that you could get significantly more testers and significantly more people who give patches with better rules that allow testing everything.

For more testers:
Looking at the almost empty queue it seems that stockfish does not need more testers and it certainly does not encourage more people to give computer time.

For more people who give patches
I think that there is a psychological advantage to allow people to give patches for tests that are not useful because it can encourage the same people to give also patches that are useful.

There are 3 possible simple ways to treat people who give also patches that are not useful(or patches at very long time control like 5 minutes+5 seconds time control that I did not try to test).

1)Allowing it at low priority with a rule that forbid complaining about it(if testers do not like to test it they can wait for tests with higher priority and not waste computer time on these tests)
2)Not allowing it in the first place and not complaining about people who give these patches about wasting computer resources because the patches are never tested in the framework
3)Allowing it and later complaining that the people waste computer resources.

I think that 1 is the best and 2 is better than 3
but from experience the stockfish team choose 3 that is the worst option.
Hi Uri.

I think an interesting discussion could be started in this thread, to the good of SF development.

My take on it:

1. I trust Marco unreservedly, I trust him both as a person, and as an outstanding programmer. Thank you very much Marco, for starting this really wonderful project in late 2008. I am sure you did not believe back then where it will get.

2. I am convinced everything could be solved with peaceful, good-naturedly means. If someone feels a grudge about something, there is a peaceful way to settle it. When a war erupts, it is the end of everything.

3. I do not quite trust the judgements of some of the SF team what concerns implementing new chess knowledge. Many on the team seem to think that new chess rules are damaging to SF, but evidence testifies otherwise; most of the tests on STF to remove valuable knowledge failed, while patches adding new knowledge reasonably succeed. In this case I think it is useless to argue: new knowledge only helps, removing valuable knowledge hurts.

4. My impression of STF is excellent until this moment, I see all kinds of patches implemented, could not even think that someone is censuring something. Still, I ask myself:
- why are patches that have passed STC not automatically tested at LTC? Currently, there are 3 patches that passed STC - nomobilityqueen and trapped bishop, 1st take of Joerg Oster, and BNN vs R imbalance - that should automatically be let to run at LTC. Instead, the queue is empty, why so?
- at the same time we see rerun of patches that have already passed both TC, with reruns giving different results. Is not some chance event involved here, for example where the book starts? Which should be the official rule in this case?

5. I think SF already is a public code, public engine, public property, public concern. Gary Linscott made it very much so to the benefit of all, thanks Gary! So that the governing board, if you could say so, should also be public. Of course, Marco might have the final say on a variety of matters, but only when the rules are applied. It is obvious for me that Marco will never do something bad for SF development.

6. Again, I conclude it with what I started with: I absolutely and fully trust Marco, without him this wonderful project would never had started, remember how vilified he was at the beginning of his effort. At the same time, testing of new approaches, as long as they obey the STF rules, should never be censured in any way or even looked down upon. Wherever a patch meets STF rules, it should be allowed to run. If it is successful, simply incorporate it.

What do other people think, why is STF currently empty?
Uri Blass
Posts: 10412
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Spacemask 2

Post by Uri Blass »

Lyudmil Tsvetkov wrote:
Uri Blass wrote:
Lyudmil Tsvetkov wrote:Well, it is a pity no one is interested in this, I not only believe, but am very much certain that is a very good idea.
Why so? Because I saw similar SF behaviour in more than 10% of the games I have looked at, and I have looked at much more than hundred games.

I am adamant, based on the games I have seen, that SF underestimates blocked enemy pawns on the 5th rank. Something seems to be going extremely wrong currently in STF, the queue is empty, patches that have passed STC are not tested at LTC for some reason, as the Trapped Bishop patch, first try, NoMobilityQueen patch, RNN vs R imbalance, the patch that passed STC. (btw. regarding this patch, a very meaningful thing would be to add R vs BBN to the imbalance, if it is not already added, with twice bigger success chances) Instead, people test patches that have very good almost obvious chances to fail, mostly linked to removing knowledge.

Believe it or not, even Joerg, definitely for me the most thoughtful, original and open-minded of the team, has started removing knowledge and using CLOP values as of late!! Joerg, you know removing knowledge almost never works, unless it is removing unessential things like queen on 7th. Concerning CLOP, I very much respect Remi Coulom, and The Crazy Bishop was one of my favourite sparring engines in the past, but CLOP simply does not work. Anyone still thinking CLOP is working? At STF all CLOP-tuned patches failed either easily, or even much more convincingly. All of them. A human assesssment is way superior than an automatic tuning system one.

I very much hope someone tries this idea for adding new features to Spacemask. SF simply needs it, as it needs more imbalance eval, as it needs more closed eval, and more pawn specifications. It needs this, because SF games say so. You can not go around blindly fixing some imaginary problem, you need to first have some evidence of the problem to start fixing it. And the evidence is in the games lost by SF. Please, look more carefully at them.

The nasty thing is that SF needs some 17 elo more at least for a reasonable update. And, if those 17 elo are not achieved within the next month at worst, this will be the first time STF fails in more than a year. SF really needs some new ideas, based on sound chess knowledge.

Joerg, you are not going to let down the sensible approach, are you? It is more than obvious for any relatively good chess player that removing trapped rook condition is going to fail. It is vital in the eval, queen on 7th is not at all. You can remove only things that have no real chess value, but not important ones. SF very much needs further imbalance testing, why not do a single patch of the Q vs 3 pieces imbalance, specifying within it the 3 respective imbalances with their respective values? (probably after some testing of how they behave on their own) Such a patch will have much much bigger chances to pass the test convincingly. I think this is the reasonable thing to do about rare eval elements as this.

Arjun, you liked space eval, why not try the above suggestion? Just assign a bonus of 5-10cps for any pawn on the 5th rank, blocked by an enemy pawn, on files c-f. Very simple.

Sorry guys, I do not want to say anything or interfere where no one asks me, but, you need to add another 17 elo in less than a month's time in order not to let end user expectations down, which are very high. Too many people, me included, are impatiently waiting for a new official SF 5 release. :D Please do something about it.
I do not plan to try to give new patches because I learned that people who give patches and have history of some productive patches can later be blamed of wasting resources.

I think that it is a very bad idea for development to do it and it clearly discourage people to give patches.

My opinion is that stockfish could earn more elo if there was a rule that you never blame people who give patches for wasting resources.

My opinion is that you could get significantly more testers and significantly more people who give patches with better rules that allow testing everything.

For more testers:
Looking at the almost empty queue it seems that stockfish does not need more testers and it certainly does not encourage more people to give computer time.

For more people who give patches
I think that there is a psychological advantage to allow people to give patches for tests that are not useful because it can encourage the same people to give also patches that are useful.

There are 3 possible simple ways to treat people who give also patches that are not useful(or patches at very long time control like 5 minutes+5 seconds time control that I did not try to test).

1)Allowing it at low priority with a rule that forbid complaining about it(if testers do not like to test it they can wait for tests with higher priority and not waste computer time on these tests)
2)Not allowing it in the first place and not complaining about people who give these patches about wasting computer resources because the patches are never tested in the framework
3)Allowing it and later complaining that the people waste computer resources.

I think that 1 is the best and 2 is better than 3
but from experience the stockfish team choose 3 that is the worst option.
Hi Uri.

I think an interesting discussion could be started in this thread, to the good of SF development.

My take on it:

1. I trust Marco unreservedly, I trust him both as a person, and as an outstanding programmer. Thank you very much Marco, for starting this really wonderful project in late 2008. I am sure you did not believe back then where it will get.

2. I am convinced everything could be solved with peaceful, good-naturedly means. If someone feels a grudge about something, there is a peaceful way to settle it. When a war erupts, it is the end of everything.

3. I do not quite trust the judgements of some of the SF team what concerns implementing new chess knowledge. Many on the team seem to think that new chess rules are damaging to SF, but evidence testifies otherwise; most of the tests on STF to remove valuable knowledge failed, while patches adding new knowledge reasonably succeed. In this case I think it is useless to argue: new knowledge only helps, removing valuable knowledge hurts.

4. My impression of STF is excellent until this moment, I see all kinds of patches implemented, could not even think that someone is censuring something. Still, I ask myself:
- why are patches that have passed STC not automatically tested at LTC? Currently, there are 3 patches that passed STC - nomobilityqueen and trapped bishop, 1st take of Joerg Oster, and BNN vs R imbalance - that should automatically be let to run at LTC. Instead, the queue is empty, why so?
- at the same time we see rerun of patches that have already passed both TC, with reruns giving different results. Is not some chance event involved here, for example where the book starts? Which should be the official rule in this case?

5. I think SF already is a public code, public engine, public property, public concern. Gary Linscott made it very much so to the benefit of all, thanks Gary! So that the governing board, if you could say so, should also be public. Of course, Marco might have the final say on a variety of matters, but only when the rules are applied. It is obvious for me that Marco will never do something bad for SF development.

6. Again, I conclude it with what I started with: I absolutely and fully trust Marco, without him this wonderful project would never had started, remember how vilified he was at the beginning of his effort. At the same time, testing of new approaches, as long as they obey the STF rules, should never be censured in any way or even looked down upon. Wherever a patch meets STF rules, it should be allowed to run. If it is successful, simply incorporate it.

What do other people think, why is STF currently empty?
1)for 4 the reason is not to waste resources and this is one of the reasons that I do not like to be part of the project.

The target should be simply to improve stockfish and cost in term of computer time should not be part of the equation.

People who give productive patches and spend time to write their patches do not complain about the time that they spend when maybe they could earn money in the same time.

For the same reason I think that people who give computer time should not complain if part of the time that they give is not useful to make stockfish better.

I think that a good project is a project of volunteers who give patches and do not complain or give computer time and do not complain(they can only decide not to test patches that they are not interested to test them).

2)For the specific case of the patch that passed both short time control and long time control and marco tested again at long time control:

Lucas said

"The correct way to proceed is to schedule 3-4 attempts at STC, wait for all of them to complete, and schedule only the most promising one at LTC. What you did instead was run 9 STC and 3 STC. It increases the lucky run effect, which is why Marco decided to rerun the LTC one."

This is the reason that people do not hurry to test at long time control.

I do not think that it is correct and the main problem is that it is hard to know which patch is the most promising one.

The patch that pass faster is often not the patch that is better even if we talk only about 15+0.05 time control so I think that by Lucas's method there is a good chance that you miss the best patch.

It is correct that when you run 3 tests at LTC and one of them pass there is a bigger chance for lucky run but I think that when the same patch pass once and fail once(with a positive score) there is still a good chance that the patch is productive at long time control so maybe it is better to test it again with SPRT(0.4) and not with SPRT(0,6)

I expect Lucas may complain about waste of resources in this case
but I think that you can get progress by this way and the fact that the stockfish team allow almost only SPRT(0,6) for testing is one reason for relatively slower progress.

I think that having more SPRT(0,4) tests may attract more testers because in this case the queue is not going to be almost empty so testers are going to see that their time is needed but maybe I do not understand the psychology of testers and my opinion is wrong.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Doubled and isolated

Post by Lyudmil Tsvetkov »

Just to post 2 more games with this SF behaviour. Maybe Gary will push a patch on Friday after all. :D

[pgn][MLNrOfMoves "72"]
[MLFlags "000100"]
[Event "4 Minutes/Game"]
[Site "2 SF 4 min gauntlet, HP-PC"]
[Date "2014.04.09"]
[Round "47.1"]
[White "Houdini4Prox64"]
[Black "Stockfish1404061153x64"]
[Result "1-0"]

1. c4 e5 2. g3 Nf6 3. Bg2 d6 4. Nc3 Be7 5. e3 c6 6. d4 Be6 7. b3 exd4 8. Qxd4
O-O 9. Nge2 {+0.16/19 5s} 9... d5 {+0.17/21 6s} 10. cxd5 {+0.22/20 6s (Nf4)}
10... cxd5 {+0.25/20 4s} 11. O-O {+0.31/20 6s} 11... Nc6 {+0.38/20 9s} 12. Qd2
{+0.30/19 6s} 12... Qa5 {+0.37/21 2s} 13. Bb2 {+0.32/18 2s} 13... Rad8
{+0.35/20 10s (Rfd8)} 14. Rac1 {+0.30/16 4s (Rfd1)} 14... Rfe8
{+0.42/20 6s (Bc5)} 15. Nd4 {+0.41/18 3s (Rfd1)} 15... Nxd4 {+0.38/22 20s} 16.
Qxd4 {+0.37/19 5s} 16... b5 {+0.39/23 7s (Ba3)} 17. Qd3 {+0.37/18 8s (a3)} 17...
a6 {+0.39/20 3s} 18. Rc2 {+0.31/18 7s (Ne2)} 18... Rc8 {+0.39/21 3s} 19. Rfc1
{+0.32/19 2s (Ra1)} 19... Bf8 {+0.30/21 6s (d4)} 20. h3 {+0.22/17 6s (Bf3)}
20... Rc7 {+0.18/19 6s} 21. g4 {+0.26/18 3s} 21... h6 {+0.23/22 7s} 22. Nd1
{+0.32/17 5s} 22... Rxc2 {+0.28/23 11s} 23. Rxc2 {+0.33/15 0s} 23... Qe1+
{+0.28/24 1s (Ne4)} 24. Qf1 {+0.38/19 3s (Bf1)} 24... Qa5 {+0.26/21 2s} 25. Bxf6
{+0.37/18 1s (Qe2)} 25... gxf6 {0.00/23 3s} 26. Qd3 {+0.20/17 1s} 26... Qe1+
{0.00/23 3s} 27. Kh2 {+0.26/17 2s (Bf1)} 27... Bd6+ {0.00/21 6s (d4)} 28. f4
{+0.09/1 0s} 28... Kg7 {0.00/23 1s (Bf8)} 29. Bf3 {+0.33/18 8s (Qd4)} 29... Rd8
{0.00/21 2s} 30. Kg2 {+0.43/18 1s (Qd4)} 30... h5 {0.00/22 6s (d4)} 31. Nf2
{+0.60/17 2s} 31... hxg4 {0.00/23 1s} 32. hxg4 {+0.60/15 0s (Bxg4)} 32... Qb1
{+0.43/21 3s} 33. f5 {+0.78/18 2s} 33... Bd7 {+0.47/23 2s} 34. Rd2
{+0.68/19 3s (Re2)} 34... Qc1 {+0.96/20 2s (Qxd3)} 35. Rd1 {+0.83/18 1s} 35...
Qc7 {+0.93/24 6s} 36. Qxd5 {+0.74/18 1s} 36... Bc8 {+0.81/24 1s (Be8)} 37. Qc6
{+0.87/19 3s} 37... Qe7 {+0.95/23 3s} 38. Rd2 {+0.82/19 1s (Rd4)} 38... a5
{+0.71/23 2s (Kf8)} 39. Qxb5 {+0.98/16 2s (Qe4)} 39... Rh8 {+0.99/19 2s} 40.
Be4 {+1.13/18 15s (Qc6)} 40... a4 {+1.01/19 5s (Rh2+)} 41. bxa4 {+1.76/14 4s}
41... Bh2 {+1.32/20 3s} 42. Qa5 {+1.76/12 0s (Qb6)} 42... Bc7
{+1.16/19 1s (Bd6)} 43. Qc3 {+1.45/15 1s} 43... Be5 {+1.65/19 3s} 44. Qc6
{+1.53/15 1s} 44... Bb8 {+1.59/21 1s} 45. Rd1 {+1.51/16 3s (Kf1)} 45... Qe5
{+1.65/18 1s} 46. Rg1 {+1.51/14 0s} 46... Qh2+ {+1.57/21 2s} 47. Kf1
{+1.45/14 0s} 47... Bg3 {+1.39/23 1s} 48. Rg2 {+1.42/16 1s} 48... Qh4
{+1.37/24 1s} 49. Nd3 {+1.44/17 2s (Ke2)} 49... Qxg4 {+1.87/22 1s (Ba6)} 50. Bf3
{+3.01/15 0s} 50... Qg5 {+1.89/27 1s} 51. e4 {+3.04/14 0s} 51... Rh1+
{+1.85/27 0s} 52. Ke2 {+3.04/13 0s} 52... Ra1 {+1.87/23 1s} 53. Qxc8
{+3.04/12 0s} 53... Rxa2+ {+2.00/24 2s} 54. Kd1 {+2.99/11 0s} 54... Rxg2
{+2.00/25 0s} 55. Bxg2 {+2.90/10 0s} 55... Qe3 {+2.05/26 0s} 56. Bf1
{+2.78/9 0s} 56... Qxe4 {+2.05/24 0s} 57. Qc2 {+3.08/17 2s} 57... Qxf5
{+2.13/20 1s} 58. Qg2 {+3.09/15 0s} 58... Qg4+ {+2.13/20 0s} 59. Kd2
{+3.17/17 1s (Kc1)} 59... Kf8 {+2.43/21 3s} 60. Qc6 {+3.32/18 4s (Qh3)} 60...
Kg7 {+2.08/18 1s} 61. Be2 {+3.38/17 1s} 61... Qf5 {+2.27/19 1s} 62. Kc2
{+3.79/18 8s (Bf3)} 62... Qa5 {+2.29/19 1s} 63. Kb3 {+3.62/18 1s (Qc4)} 63...
Qd2 {+2.31/18 1s} 64. Qg2 {+3.62/16 0s} 64... Qe3 {+2.33/21 1s} 65. a5
{+3.99/16 2s (Kc2)} 65... Kf8 {+2.61/19 2s} 66. a6 {+3.99/14 0s} 66... Bc7
{+2.56/19 1s} 67. Qg4 {+4.19/16 1s (Kc2)} 67... Bb6 {+2.44/17 0s (Bd6)} 68. Bf1
{+4.45/16 1s} 68... Ba7 {+2.91/20 3s} 69. Qd7 {+4.45/14 0s (Qc8+)} 69... Qe7
{+3.08/18 2s (Kg7)} 70. Qxe7+ {+5.37/17 0s (Qc6)} 70... Kxe7 {+4.80/23 0s} 71.
Kc4 {+4.85/18 0s} 71... f5 {+5.25/25 0s (Kd6)} 72. Kb5 {+5.54/17 1s} 72... Kd6
{+5.65/25 0s} 73. Bh3 {+5.54/16 0s (Nb4)} 1-0

[MLNrOfMoves "81"]
[MLFlags "010100"]
[Event "4 Minutes/Game"]
[Site "2 SF 4 min gauntlet, HP-PC"]
[Date "2014.04.09"]
[Round "52.1"]
[White "Stockfish1404061153x64"]
[Black "Houdini4Prox64"]
[Result "0-1"]

1. e4 c5 2. Nf3 Nc6 3. Nc3 d6 4. d4 cxd4 5. Nxd4 Nf6 6. Be3 e5 7. Nf3 Be7 8. Bc4
O-O 9. O-O {0.00/19 3s} 9... Be6 {-0.04/18 6s (Na5)} 10. Bb3
{+0.04/19 2s (Qe2)} 10... Rc8 {0.00/17 5s} 11. Qd2 {+0.11/19 3s (Ng5)} 11... Na5
{-0.16/18 4s (a6)} 12. Rfd1 {0.00/21 4s} 12... Nc4 {-0.28/19 4s} 13. Bxc4
{-0.08/23 4s} 13... Rxc4 {-0.31/19 3s} 14. Ng5 {-0.16/23 7s (Qd3)} 14... Bg4
{-0.45/19 6s} 15. f3 {-0.18/23 10s (Qd3)} 15... Bd7 {-0.48/18 2s} 16. Nh3
{-0.13/23 2s (a3)} 16... Bxh3 {-0.52/18 2s (Qc7)} 17. gxh3 {-0.22/22 7s} 17...
Qd7 {-0.51/18 2s} 18. Qg2 {-0.16/21 2s} 18... Rfc8 {-0.48/18 3s} 19. Bg5
{-0.08/21 6s (a4)} 19... Kh8 {-0.42/18 8s} 20. Kh1 {-0.16/20 12s} 20... h6
{-0.45/18 4s} 21. Be3 {-0.24/22 4s} 21... b5 {-0.48/19 7s (d5)} 22. a3
{-0.12/22 4s} 22... a5 {-0.51/19 2s (Bf8)} 23. Rd2 {-0.20/22 4s} 23... Kh7
{-0.44/19 13s (Bf8)} 24. Rad1 {-0.13/20 6s (h4)} 24... b4 {-0.63/17 3s (g6)} 25.
axb4 {-0.44/21 10s} 25... Rxb4 {-0.62/18 1s} 26. Ra1 {-0.37/23 23s} 26... a4
{-0.65/17 5s} 27. Nd1 {-0.46/22 11s} 27... g6 {-0.73/17 6s} 28. Qf1
{-0.46/21 1s} 28... Nh5 {-0.78/16 3s (Kg7)} 29. c3 {-0.66/20 7s (Qg2)} 29... Rb3
{-0.94/16 1s} 30. Kg1 {-0.53/20 1s (h4)} 30... a3 {-0.86/19 3s (Bg5)} 31. bxa3
{-0.31/19 3s (Rxa3)} 31... Bg5 {-0.92/17 3s (Qa4)} 32. Bxg5 {-0.20/23 3s} 32...
hxg5 {-0.92/16 0s} 33. a4 {-0.17/24 1s} 33... Nf4 {-0.91/18 1s} 34. Kh1
{-0.31/23 5s (a5)} 34... Rcb8 {-1.45/13 6s} 35. a5 {-0.44/22 3s} 35... Rb1
{-1.45/11 0s} 36. Raa2 {-0.79/23 2s (Rda2)} 36... Kg7 {-1.64/17 2s} 37. a6
{-1.31/24 8s} 37... Qc7 {-1.64/16 0s (Ra8)} 38. h4 {-1.12/22 3s (Ra3)} 38...
gxh4 {-1.95/15 0s} 39. h3 {-0.99/23 1s (Qg1)} 39... Ra8 {-2.07/18 1s (Qxc3)}
40. Kh2 {-0.89/23 1s} 40... Qc6 {-2.07/19 3s (Ra7)} 41. Ra3 {-1.02/23 4s (Ra5)}
41... Ra7 {-2.02/18 3s} 42. Ra5 {-1.12/23 2s (Raa2)} 42... Qd7
{-2.01/19 2s (Rb6)} 43. Ra3 {-1.06/23 6s (Raa2)} 43... Rc1 {-1.96/19 1s (Qc8)}
44. Ra5 {-1.06/27 1s} 44... Qc7 {-1.99/19 1s} 45. Ra3 {-1.06/25 0s (Raa2)}
45... Qb6 {-1.98/18 1s (Qc5)} 46. Raa2 {-1.06/26 1s} 46... Qc6
{-2.02/19 2s (Rb1)} 47. Ra5 {-1.06/27 0s (Kh1)} 47... Qd7 {-1.95/19 2s (Rb1)}
48. Raa2 {-1.06/27 1s} 48... Qe6 {-1.95/19 1s (Qc8)} 49. Ra3 {-1.06/25 0s}
49... Ra8 {-1.86/19 2s (Qc8)} 50. Raa2 {-1.06/26 1s (Ra5)} 50... Rb1
{-1.88/19 3s (Qd7)} 51. Ra3 {-1.06/25 1s} 51... Qe7 {-1.88/19 2s (Qd7)} 52. Raa2
{-1.06/24 1s} 52... Qd7 {-1.93/19 1s} 53. Ra5 {-1.06/26 0s} 53... Qc8
{-1.87/19 1s (Qd8)} 54. Rda2 {-1.06/25 1s} 54... Rc1 {-1.90/19 2s} 55. Rd2
{-1.06/28 0s} 55... Qc7 {-1.91/19 1s (Ra7)} 56. Ra4 {-1.06/26 0s (Raa2)} 56...
Qc5 {-1.88/18 1s (Qd8)} 57. Raa2 {-1.06/25 0s} 57... Qc6 {-1.91/18 1s} 58. Ra3
{-1.06/24 0s (Ra5)} 58... d5 {-2.12/15 4s (Qd7)} 59. exd5 {-1.27/21 2s} 59...
Nxd5 {-2.13/13 0s (Qd6)} 60. Qd3 {-1.76/17 0s} 60... Nb6 {-2.39/17 0s} 61. Rb3
{-2.06/21 2s (Raa2)} 61... Nc4 {-2.90/16 1s} 62. Rg2 {-2.55/20 0s} 62... Rxa6
{-2.90/14 0s} 63. Rb4 {-2.63/20 0s} 63... Qd6 {-2.97/17 1s} 64. Qxd6
{-2.75/21 0s} 64... Nxd6 {-2.77/16 0s} 65. Nf2 {-2.82/21 1s (Rd2)} 65... Rxc3
{-2.75/17 0s} 66. f4 {-3.05/20 0s} 66... exf4 {-2.92/19 1s} 67. Rxf4
{-3.06/22 0s} 67... Ra2 {-2.99/20 0s} 68. Nd1 {-3.12/21 0s} 68... Rxg2+
{-2.83/19 1s (Rcc2)} 69. Kxg2 {-3.12/23 0s} 69... Rg3+ {-2.83/19 1s} 70. Kh2
{-3.13/24 0s} 70... g5 {-2.98/20 1s} 71. Rd4 {-3.13/22 0s} 71... Nf5
{-2.95/19 0s} 72. Rd7 {-3.20/21 0s (Rd2)} 72... Kf6 {-3.26/20 1s} 73. Nf2
{-3.20/21 0s (Rd5)} 73... Ke6 {-3.36/19 0s} 74. Rd8 {-3.24/20 0s} 74... Ne3
{-3.36/17 0s (Re3)} 75. Ne4 {-3.13/17 0s} 75... Rg2+ {-3.36/16 0s} 76. Kh1
{-3.13/1 0s} 76... Ke5 {-3.44/19 0s} 77. Re8+ {-3.92/18 1s} 77... Kf4
{-3.44/17 0s} 78. Nf6 {-4.22/19 0s} 78... Rd2 {-3.94/19 0s (Ra2)} 79. Ra8
{-4.78/17 0s (Rc8)} 79... Rd6 {-4.77/17 0s} 80. Ng4 {-5.60/20 0s} 80... Rd1+
{-5.32/16 0s} 81. Kh2 {-5.60/1 0s} 81... Nxg4+ {-5.32/15 0s} 0-1
[/pgn]

[d]4rbk1/5pp1/p3bn1p/qp1p4/6P1/1P2P2P/PBR2PB1/3N1QK1 w - - 0 25
Houdini now plays 25.Bf6 with 30cps white edge, SF thinks it is perfectly equal. The doubled and isolated f black pawns are a very big liability, bigger than what their individual penalties would suggest.

[d]3r4/5pk1/p2bbp1p/1p1p4/5PP1/1P1QPB1P/P1R4K/3Nq3 w - - 0 30
In this particular position the pair of bishops does not help.

[d]3r4/5pk1/p2bbp2/1p1p4/5PP1/1P1QPB2/P1R2NK1/4q3 b - - 0 32
Only now SF search tells it it is losing. It is all because of the f7 and f6 pawns.

[d]3q1rk1/pp1bbppp/3p1n2/4p3/2r1P3/2N1BP1N/PPPQ2PP/R2R2K1 b - - 0 16
Sf has just played 16.Nh3, allowing the enemy to double and isolate its h pawns.

[d]1r6/3q1pk1/3p2p1/P3p1p1/4Pn2/2P2P1P/R2R3P/1r1N1Q1K w - - 0 37
The perfectly equal SF score transforms into a disaster. h2 and h3 look very ugly and leave unprotected a wide range of important squares around the king: h3,h4,f4.

As you see, this is a pattern of behaviour. If someone sometime, in a year's time or so, decides to make SF correct its behaviour, my suggestion is to simply increase the penalty for any doubled pawn that is isolated at the same time. Simple, is not it?
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Spacemask 2

Post by Lyudmil Tsvetkov »

Uri Blass wrote:
Lyudmil Tsvetkov wrote:
Uri Blass wrote:
Lyudmil Tsvetkov wrote:Well, it is a pity no one is interested in this, I not only believe, but am very much certain that is a very good idea.
Why so? Because I saw similar SF behaviour in more than 10% of the games I have looked at, and I have looked at much more than hundred games.

I am adamant, based on the games I have seen, that SF underestimates blocked enemy pawns on the 5th rank. Something seems to be going extremely wrong currently in STF, the queue is empty, patches that have passed STC are not tested at LTC for some reason, as the Trapped Bishop patch, first try, NoMobilityQueen patch, RNN vs R imbalance, the patch that passed STC. (btw. regarding this patch, a very meaningful thing would be to add R vs BBN to the imbalance, if it is not already added, with twice bigger success chances) Instead, people test patches that have very good almost obvious chances to fail, mostly linked to removing knowledge.

Believe it or not, even Joerg, definitely for me the most thoughtful, original and open-minded of the team, has started removing knowledge and using CLOP values as of late!! Joerg, you know removing knowledge almost never works, unless it is removing unessential things like queen on 7th. Concerning CLOP, I very much respect Remi Coulom, and The Crazy Bishop was one of my favourite sparring engines in the past, but CLOP simply does not work. Anyone still thinking CLOP is working? At STF all CLOP-tuned patches failed either easily, or even much more convincingly. All of them. A human assesssment is way superior than an automatic tuning system one.

I very much hope someone tries this idea for adding new features to Spacemask. SF simply needs it, as it needs more imbalance eval, as it needs more closed eval, and more pawn specifications. It needs this, because SF games say so. You can not go around blindly fixing some imaginary problem, you need to first have some evidence of the problem to start fixing it. And the evidence is in the games lost by SF. Please, look more carefully at them.

The nasty thing is that SF needs some 17 elo more at least for a reasonable update. And, if those 17 elo are not achieved within the next month at worst, this will be the first time STF fails in more than a year. SF really needs some new ideas, based on sound chess knowledge.

Joerg, you are not going to let down the sensible approach, are you? It is more than obvious for any relatively good chess player that removing trapped rook condition is going to fail. It is vital in the eval, queen on 7th is not at all. You can remove only things that have no real chess value, but not important ones. SF very much needs further imbalance testing, why not do a single patch of the Q vs 3 pieces imbalance, specifying within it the 3 respective imbalances with their respective values? (probably after some testing of how they behave on their own) Such a patch will have much much bigger chances to pass the test convincingly. I think this is the reasonable thing to do about rare eval elements as this.

Arjun, you liked space eval, why not try the above suggestion? Just assign a bonus of 5-10cps for any pawn on the 5th rank, blocked by an enemy pawn, on files c-f. Very simple.

Sorry guys, I do not want to say anything or interfere where no one asks me, but, you need to add another 17 elo in less than a month's time in order not to let end user expectations down, which are very high. Too many people, me included, are impatiently waiting for a new official SF 5 release. :D Please do something about it.
I do not plan to try to give new patches because I learned that people who give patches and have history of some productive patches can later be blamed of wasting resources.

I think that it is a very bad idea for development to do it and it clearly discourage people to give patches.

My opinion is that stockfish could earn more elo if there was a rule that you never blame people who give patches for wasting resources.

My opinion is that you could get significantly more testers and significantly more people who give patches with better rules that allow testing everything.

For more testers:
Looking at the almost empty queue it seems that stockfish does not need more testers and it certainly does not encourage more people to give computer time.

For more people who give patches
I think that there is a psychological advantage to allow people to give patches for tests that are not useful because it can encourage the same people to give also patches that are useful.

There are 3 possible simple ways to treat people who give also patches that are not useful(or patches at very long time control like 5 minutes+5 seconds time control that I did not try to test).

1)Allowing it at low priority with a rule that forbid complaining about it(if testers do not like to test it they can wait for tests with higher priority and not waste computer time on these tests)
2)Not allowing it in the first place and not complaining about people who give these patches about wasting computer resources because the patches are never tested in the framework
3)Allowing it and later complaining that the people waste computer resources.

I think that 1 is the best and 2 is better than 3
but from experience the stockfish team choose 3 that is the worst option.
Hi Uri.

I think an interesting discussion could be started in this thread, to the good of SF development.

My take on it:

1. I trust Marco unreservedly, I trust him both as a person, and as an outstanding programmer. Thank you very much Marco, for starting this really wonderful project in late 2008. I am sure you did not believe back then where it will get.

2. I am convinced everything could be solved with peaceful, good-naturedly means. If someone feels a grudge about something, there is a peaceful way to settle it. When a war erupts, it is the end of everything.

3. I do not quite trust the judgements of some of the SF team what concerns implementing new chess knowledge. Many on the team seem to think that new chess rules are damaging to SF, but evidence testifies otherwise; most of the tests on STF to remove valuable knowledge failed, while patches adding new knowledge reasonably succeed. In this case I think it is useless to argue: new knowledge only helps, removing valuable knowledge hurts.

4. My impression of STF is excellent until this moment, I see all kinds of patches implemented, could not even think that someone is censuring something. Still, I ask myself:
- why are patches that have passed STC not automatically tested at LTC? Currently, there are 3 patches that passed STC - nomobilityqueen and trapped bishop, 1st take of Joerg Oster, and BNN vs R imbalance - that should automatically be let to run at LTC. Instead, the queue is empty, why so?
- at the same time we see rerun of patches that have already passed both TC, with reruns giving different results. Is not some chance event involved here, for example where the book starts? Which should be the official rule in this case?

5. I think SF already is a public code, public engine, public property, public concern. Gary Linscott made it very much so to the benefit of all, thanks Gary! So that the governing board, if you could say so, should also be public. Of course, Marco might have the final say on a variety of matters, but only when the rules are applied. It is obvious for me that Marco will never do something bad for SF development.

6. Again, I conclude it with what I started with: I absolutely and fully trust Marco, without him this wonderful project would never had started, remember how vilified he was at the beginning of his effort. At the same time, testing of new approaches, as long as they obey the STF rules, should never be censured in any way or even looked down upon. Wherever a patch meets STF rules, it should be allowed to run. If it is successful, simply incorporate it.

What do other people think, why is STF currently empty?
1)for 4 the reason is not to waste resources and this is one of the reasons that I do not like to be part of the project.

The target should be simply to improve stockfish and cost in term of computer time should not be part of the equation.

People who give productive patches and spend time to write their patches do not complain about the time that they spend when maybe they could earn money in the same time.

For the same reason I think that people who give computer time should not complain if part of the time that they give is not useful to make stockfish better.

I think that a good project is a project of volunteers who give patches and do not complain or give computer time and do not complain(they can only decide not to test patches that they are not interested to test them).

2)For the specific case of the patch that passed both short time control and long time control and marco tested again at long time control:

Lucas said

"The correct way to proceed is to schedule 3-4 attempts at STC, wait for all of them to complete, and schedule only the most promising one at LTC. What you did instead was run 9 STC and 3 STC. It increases the lucky run effect, which is why Marco decided to rerun the LTC one."

This is the reason that people do not hurry to test at long time control.

I do not think that it is correct and the main problem is that it is hard to know which patch is the most promising one.

The patch that pass faster is often not the patch that is better even if we talk only about 15+0.05 time control so I think that by Lucas's method there is a good chance that you miss the best patch.

It is correct that when you run 3 tests at LTC and one of them pass there is a bigger chance for lucky run but I think that when the same patch pass once and fail once(with a positive score) there is still a good chance that the patch is productive at long time control so maybe it is better to test it again with SPRT(0.4) and not with SPRT(0,6)

I expect Lucas may complain about waste of resources in this case
but I think that you can get progress by this way and the fact that the stockfish team allow almost only SPRT(0,6) for testing is one reason for relatively slower progress.

I think that having more SPRT(0,4) tests may attract more testers because in this case the queue is not going to be almost empty so testers are going to see that their time is needed but maybe I do not understand the psychology of testers and my opinion is wrong.
So it is Lucas that spoils everything. :)

I think it is wrong to run a couple of patches at STC, and the run at LTC only the most successful one, as there are so many patches that perform better at LTC. It is not infrequent that a patch that passes STC after 7000 games fails to pass LTC, while a patch that passes STC only after 40 000 games succeeds at LTC after only 10 000 games. Probably the rule should be that the first successful patch that passes STC is automatically run at LTC, and only if it fails, try to improve on it.

But the main problem is the lack of developers - you see, the queue is almost empty. I do not know why this is the case, but I wonder how in this way SF is going to add another 20 elo to become releasable for a 40 blitz elo increase? By the current rate of development, if something miraculous does not happen, this will be somewhere in August of this year. And 100 000+ end users will be unhappy. It is the end users that matter, after all.

I think people who give productive patches should be encouraged to do more so. Actually, it is everybody that gives productive patches, but currently interest seems low.

Do you have any indications from the kitchen why patches that passed STC are not tested at LTC? They might be stupid and unnecassary, but once they passed STC, give them an automatic chance at LTC.

What concerns your own participation into the network, Uri, I would happily see your 900-ply patch run at super LTC. :D Seriously, why do not you push another patch, maybe a bit more realistic one, at normal time controls? You contributed so many good successful patches in the past! Please run another one.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: StormDanger

Post by Lyudmil Tsvetkov »

[d]6k1/5ppp/4P3/8/8/8/8/6K1 w - - 0 1
Do you think the e6 pawn does not influence the black king shelter and it does not make sense to extend storm danger to more files?
Uri Blass
Posts: 10412
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Spacemask 2

Post by Uri Blass »

Lyudmil Tsvetkov wrote:
Uri Blass wrote:
Lyudmil Tsvetkov wrote:
Uri Blass wrote:
Lyudmil Tsvetkov wrote:Well, it is a pity no one is interested in this, I not only believe, but am very much certain that is a very good idea.
Why so? Because I saw similar SF behaviour in more than 10% of the games I have looked at, and I have looked at much more than hundred games.

I am adamant, based on the games I have seen, that SF underestimates blocked enemy pawns on the 5th rank. Something seems to be going extremely wrong currently in STF, the queue is empty, patches that have passed STC are not tested at LTC for some reason, as the Trapped Bishop patch, first try, NoMobilityQueen patch, RNN vs R imbalance, the patch that passed STC. (btw. regarding this patch, a very meaningful thing would be to add R vs BBN to the imbalance, if it is not already added, with twice bigger success chances) Instead, people test patches that have very good almost obvious chances to fail, mostly linked to removing knowledge.

Believe it or not, even Joerg, definitely for me the most thoughtful, original and open-minded of the team, has started removing knowledge and using CLOP values as of late!! Joerg, you know removing knowledge almost never works, unless it is removing unessential things like queen on 7th. Concerning CLOP, I very much respect Remi Coulom, and The Crazy Bishop was one of my favourite sparring engines in the past, but CLOP simply does not work. Anyone still thinking CLOP is working? At STF all CLOP-tuned patches failed either easily, or even much more convincingly. All of them. A human assesssment is way superior than an automatic tuning system one.

I very much hope someone tries this idea for adding new features to Spacemask. SF simply needs it, as it needs more imbalance eval, as it needs more closed eval, and more pawn specifications. It needs this, because SF games say so. You can not go around blindly fixing some imaginary problem, you need to first have some evidence of the problem to start fixing it. And the evidence is in the games lost by SF. Please, look more carefully at them.

The nasty thing is that SF needs some 17 elo more at least for a reasonable update. And, if those 17 elo are not achieved within the next month at worst, this will be the first time STF fails in more than a year. SF really needs some new ideas, based on sound chess knowledge.

Joerg, you are not going to let down the sensible approach, are you? It is more than obvious for any relatively good chess player that removing trapped rook condition is going to fail. It is vital in the eval, queen on 7th is not at all. You can remove only things that have no real chess value, but not important ones. SF very much needs further imbalance testing, why not do a single patch of the Q vs 3 pieces imbalance, specifying within it the 3 respective imbalances with their respective values? (probably after some testing of how they behave on their own) Such a patch will have much much bigger chances to pass the test convincingly. I think this is the reasonable thing to do about rare eval elements as this.

Arjun, you liked space eval, why not try the above suggestion? Just assign a bonus of 5-10cps for any pawn on the 5th rank, blocked by an enemy pawn, on files c-f. Very simple.

Sorry guys, I do not want to say anything or interfere where no one asks me, but, you need to add another 17 elo in less than a month's time in order not to let end user expectations down, which are very high. Too many people, me included, are impatiently waiting for a new official SF 5 release. :D Please do something about it.
I do not plan to try to give new patches because I learned that people who give patches and have history of some productive patches can later be blamed of wasting resources.

I think that it is a very bad idea for development to do it and it clearly discourage people to give patches.

My opinion is that stockfish could earn more elo if there was a rule that you never blame people who give patches for wasting resources.

My opinion is that you could get significantly more testers and significantly more people who give patches with better rules that allow testing everything.

For more testers:
Looking at the almost empty queue it seems that stockfish does not need more testers and it certainly does not encourage more people to give computer time.

For more people who give patches
I think that there is a psychological advantage to allow people to give patches for tests that are not useful because it can encourage the same people to give also patches that are useful.

There are 3 possible simple ways to treat people who give also patches that are not useful(or patches at very long time control like 5 minutes+5 seconds time control that I did not try to test).

1)Allowing it at low priority with a rule that forbid complaining about it(if testers do not like to test it they can wait for tests with higher priority and not waste computer time on these tests)
2)Not allowing it in the first place and not complaining about people who give these patches about wasting computer resources because the patches are never tested in the framework
3)Allowing it and later complaining that the people waste computer resources.

I think that 1 is the best and 2 is better than 3
but from experience the stockfish team choose 3 that is the worst option.
Hi Uri.

I think an interesting discussion could be started in this thread, to the good of SF development.

My take on it:

1. I trust Marco unreservedly, I trust him both as a person, and as an outstanding programmer. Thank you very much Marco, for starting this really wonderful project in late 2008. I am sure you did not believe back then where it will get.

2. I am convinced everything could be solved with peaceful, good-naturedly means. If someone feels a grudge about something, there is a peaceful way to settle it. When a war erupts, it is the end of everything.

3. I do not quite trust the judgements of some of the SF team what concerns implementing new chess knowledge. Many on the team seem to think that new chess rules are damaging to SF, but evidence testifies otherwise; most of the tests on STF to remove valuable knowledge failed, while patches adding new knowledge reasonably succeed. In this case I think it is useless to argue: new knowledge only helps, removing valuable knowledge hurts.

4. My impression of STF is excellent until this moment, I see all kinds of patches implemented, could not even think that someone is censuring something. Still, I ask myself:
- why are patches that have passed STC not automatically tested at LTC? Currently, there are 3 patches that passed STC - nomobilityqueen and trapped bishop, 1st take of Joerg Oster, and BNN vs R imbalance - that should automatically be let to run at LTC. Instead, the queue is empty, why so?
- at the same time we see rerun of patches that have already passed both TC, with reruns giving different results. Is not some chance event involved here, for example where the book starts? Which should be the official rule in this case?

5. I think SF already is a public code, public engine, public property, public concern. Gary Linscott made it very much so to the benefit of all, thanks Gary! So that the governing board, if you could say so, should also be public. Of course, Marco might have the final say on a variety of matters, but only when the rules are applied. It is obvious for me that Marco will never do something bad for SF development.

6. Again, I conclude it with what I started with: I absolutely and fully trust Marco, without him this wonderful project would never had started, remember how vilified he was at the beginning of his effort. At the same time, testing of new approaches, as long as they obey the STF rules, should never be censured in any way or even looked down upon. Wherever a patch meets STF rules, it should be allowed to run. If it is successful, simply incorporate it.

What do other people think, why is STF currently empty?
1)for 4 the reason is not to waste resources and this is one of the reasons that I do not like to be part of the project.

The target should be simply to improve stockfish and cost in term of computer time should not be part of the equation.

People who give productive patches and spend time to write their patches do not complain about the time that they spend when maybe they could earn money in the same time.

For the same reason I think that people who give computer time should not complain if part of the time that they give is not useful to make stockfish better.

I think that a good project is a project of volunteers who give patches and do not complain or give computer time and do not complain(they can only decide not to test patches that they are not interested to test them).

2)For the specific case of the patch that passed both short time control and long time control and marco tested again at long time control:

Lucas said

"The correct way to proceed is to schedule 3-4 attempts at STC, wait for all of them to complete, and schedule only the most promising one at LTC. What you did instead was run 9 STC and 3 STC. It increases the lucky run effect, which is why Marco decided to rerun the LTC one."

This is the reason that people do not hurry to test at long time control.

I do not think that it is correct and the main problem is that it is hard to know which patch is the most promising one.

The patch that pass faster is often not the patch that is better even if we talk only about 15+0.05 time control so I think that by Lucas's method there is a good chance that you miss the best patch.

It is correct that when you run 3 tests at LTC and one of them pass there is a bigger chance for lucky run but I think that when the same patch pass once and fail once(with a positive score) there is still a good chance that the patch is productive at long time control so maybe it is better to test it again with SPRT(0.4) and not with SPRT(0,6)

I expect Lucas may complain about waste of resources in this case
but I think that you can get progress by this way and the fact that the stockfish team allow almost only SPRT(0,6) for testing is one reason for relatively slower progress.

I think that having more SPRT(0,4) tests may attract more testers because in this case the queue is not going to be almost empty so testers are going to see that their time is needed but maybe I do not understand the psychology of testers and my opinion is wrong.
So it is Lucas that spoils everything. :)

I think it is wrong to run a couple of patches at STC, and the run at LTC only the most successful one, as there are so many patches that perform better at LTC. It is not infrequent that a patch that passes STC after 7000 games fails to pass LTC, while a patch that passes STC only after 40 000 games succeeds at LTC after only 10 000 games. Probably the rule should be that the first successful patch that passes STC is automatically run at LTC, and only if it fails, try to improve on it.

But the main problem is the lack of developers - you see, the queue is almost empty. I do not know why this is the case, but I wonder how in this way SF is going to add another 20 elo to become releasable for a 40 blitz elo increase? By the current rate of development, if something miraculous does not happen, this will be somewhere in August of this year. And 100 000+ end users will be unhappy. It is the end users that matter, after all.

I think people who give productive patches should be encouraged to do more so. Actually, it is everybody that gives productive patches, but currently interest seems low.

Do you have any indications from the kitchen why patches that passed STC are not tested at LTC? They might be stupid and unnecassary, but once they passed STC, give them an automatic chance at LTC.

What concerns your own participation into the network, Uri, I would happily see your 900-ply patch run at super LTC. :D Seriously, why do not you push another patch, maybe a bit more realistic one, at normal time controls? You contributed so many good successful patches in the past! Please run another one.
The bad treatment(mainly from marco but not only from him) is one of the reasons that caused me not to want to give more patches.

It is only one of the reasons so I do not promise that I give more patches even if I know that people do not get this bad treatment in the future.

Another reason is that I have other things to do with my time that I will probably prefer to do.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: StormDanger

Post by Lyudmil Tsvetkov »

[pgn][MLNrOfMoves "81"]
[MLFlags "000100"]
[Event "4 Minutes/Game"]
[Site "2 SF 4 min gauntlet, HP-PC"]
[Date "2014.04.10"]
[Round "63.3"]
[White "KomodoTCECrx64"]
[Black "Stockfish140407IPxx64"]
[Result "1-0"]

1. e4 c5 2. Nf3 d6 3. d4 cxd4 4. Nxd4 Nf6 5. Nc3 a6 6. Be2 e5 7. Nb3 Be7 8. Be3
Be6 9. O-O {+0.22/18 5s} 9... Nbd7 {0.00/21 5s} 10. f4 {+0.11/19 7s} 10... Rc8
{-0.01/21 4s (exf4)} 11. f5 {+0.16/17 4s} 11... Bc4 {-0.02/23 2s} 12. Bxc4
{+0.22/19 3s} 12... Rxc4 {-0.02/22 2s} 13. Nd2 {+0.14/19 5s} 13... Rc6
{0.00/24 3s (Rc8)} 14. a4 {+0.15/17 3s} 14... Qa5 {0.00/26 2s} 15. Re1
{+0.17/17 7s} 15... O-O {0.00/24 2s} 16. Bf2 {+0.20/18 4s} 16... Rfc8
{-0.05/22 2s} 17. Nb3 {+0.09/17 3s} 17... Qc7 {-0.08/22 2s} 18. a5
{+0.09/19 4s} 18... Rc4 {-0.06/22 4s (h6)} 19. Re2 {-0.11/16 4s} 19... Rb4
{-0.07/23 2s} 20. h3 {-0.08/18 11s (Re1)} 20... Rd8 {-0.05/21 4s (Nc5)} 21. Qe1
{+0.12/19 6s (g4)} 21... Qc6 {-0.12/20 4s (Nc5)} 22. Rd1 {+0.08/19 5s (Qd2)}
22... Re8 {-0.11/23 6s} 23. Re3 {0.00/18 8s} 23... Bd8 {-0.08/22 10s} 24. Qd2
{-0.13/19 10s} 24... Bc7 {-0.11/22 2s} 25. Qe2 {-0.13/17 3s (Rde1)} 25... h6
{-0.07/21 9s (Rc8)} 26. Kh1 {-0.13/18 5s (Qf3)} 26... b6 {-0.21/21 5s (b5)} 27.
Nd5 {-0.13/17 5s (Be1)} 27... Nxd5 {-0.14/21 3s} 28. exd5 {-0.05/16 2s} 28...
Qb7 {-0.10/21 3s} 29. Be1 {+0.03/18 4s} 29... Ra4 {-0.20/23 7s} 30. Nd2
{+0.01/18 14s (c4)} 30... bxa5 {-0.25/22 3s} 31. Nc4 {-0.03/18 4s (Ne4)} 31...
Qb8 {-0.27/20 3s (Qc8)} 32. Rb3 {-0.03/18 8s} 32... Qd8 {-0.26/20 5s} 33. Rf3
{-0.01/17 4s} 33... Kh8 {-0.23/19 5s (Qc8)} 34. f6 {+0.60/16 7s (Rc3)} 34... g5
{+0.11/21 4s} 35. Bd2 {+0.57/15 1s} 35... Rg8 {+0.34/22 10s} 36. Rdf1
{+0.51/16 2s} 36... Nb6 {+0.36/23 6s} 37. b3 {+1.12/17 4s (Nxa5)} 37... Ra2
{+0.26/20 5s} 38. Ne3 {+1.00/16 1s} 38... Qf8 {+0.50/20 5s (a4)} 39. Qd1
{+1.61/18 3s (h4)} 39... a4 {+0.77/23 2s} 40. Qb1 {+1.54/17 0s} 40... axb3
{+0.81/25 4s} 41. Qxb3 {+1.63/17 1s} 41... Ra4 {+0.94/25 7s} 42. Nf5
{+1.61/17 0s} 42... a5 {+0.76/25 1s (Bd8)} 43. h4 {+1.69/16 1s (Rg3)} 43... Rf4
{+0.92/20 3s (Rxh4+)} 44. hxg5 {+1.86/16 3s} 44... a4 {+1.12/22 4s (Rxf5)} 45.
Qb5 {+2.17/17 1s} 45... Rxf5 {+1.29/23 6s} 46. Rxf5 {+2.17/15 0s} 46... Rg6
{+1.40/23 4s} 47. Be3 {+2.40/16 1s (R5f3)} 47... Qd8 {+1.47/21 1s} 48. Rb1
{+2.35/16 1s} 48... a3 {+1.54/22 1s} 49. Ra1 {+2.39/17 1s} 49... Qd7
{+1.51/23 1s} 50. Qxd7 {+2.42/16 1s} 50... Nxd7 {+2.01/24 4s} 51. Rxa3
{+2.54/17 1s} 51... hxg5 {+1.99/25 2s} 52. Ra7 {+2.52/18 1s} 52... Nxf6
{+2.06/26 2s} 53. Rxc7 {+2.55/19 1s} 53... Nxd5 {+2.14/26 1s} 54. Rc8+
{+2.53/19 1s} 54... Kg7 {+2.07/26 0s} 55. Bxg5 {+2.56/19 1s} 55... f6
{+2.14/26 1s} 56. Bd2 {+2.52/18 0s (c4)} 56... Kf7 {+2.16/24 1s} 57. Rf3
{+2.54/17 0s (c4)} 57... Rg4 {+1.96/20 1s (Ke6)} 58. Kg1 {+2.54/17 1s} 58...
Ke6 {+2.10/22 2s} 59. Rh3 {+2.55/17 2s (Re8+)} 59... Rd4 {+1.92/19 3s} 60. Rd3
{+2.56/17 1s} 60... Ra4 {+2.00/20 0s} 61. Re8+ {+2.50/17 1s} 61... Kd7
{+2.10/21 1s} 62. Rb8 {+2.64/17 2s} 62... Kc6 {+2.08/22 0s (Ke6)} 63. Bh6
{+2.66/12 0s (Rf8)} 63... f5 {+2.18/18 2s (Ra6)} 64. Rd8 {+2.80/16 1s} 64... Kc7
{+2.43/19 1s (Ra1+)} 65. Rf8 {+3.10/17 1s (Bg5)} 65... Kc6 {+2.47/18 0s} 66.
Rxf5 {+3.10/17 0s} 66... Rc4 {+2.51/19 1s} 67. Rf2 {+3.13/18 0s} 67... Nc3
{+2.68/18 0s} 68. Bf8 {+3.29/17 1s (Rf6)} 68... Ne4 {+2.79/17 0s} 69. Re2
{+3.14/17 0s} 69... Rb4 {+2.96/21 2s (Kd7)} 70. Rh3 {+3.39/16 0s (g4)} 70...
Kd5 {+2.68/18 1s (Rd4)} 71. Kh2 {+3.61/16 0s (Rh6)} 71... Ra4
{+2.73/19 0s (Rc4)} 72. Rh4 {+3.76/16 0s (Rh6)} 72... Ra8 {+2.82/20 0s (Rd4)}
73. c4+ {+4.10/17 0s (Rexe4)} 73... Kc6 {+3.05/17 0s} 74. Rhxe4 {+4.41/18 0s}
74... Rxf8 {+3.18/19 0s} 75. g4 {+4.47/18 0s} 75... Kd7 {+3.33/20 1s} 76. g5
{+4.62/19 0s (Kg3)} 76... Rg8 {+3.45/19 1s} 77. Rg4 {+4.68/18 0s} 77... Ke6
{+3.64/21 0s (Rg6)} 78. g6 {+5.01/16 0s (Kg3)} 78... Rg7 {+4.26/19 0s (Kf5)} 79.
Rd2 {+5.48/17 0s (Kg3)} 79... Ke7 {+4.61/18 0s} 80. Kg3 {+6.22/20 1s} 80... Ke6
{+5.33/20 0s} 81. Kh4 {+6.25/18 0s (Kf3)} 81... Rg8 {+5.36/19 0s (Ke7)} 82. Kg5
{+6.34/17 0s (g7)} 1-0
[/pgn]

[d]3qr2k/2bn1pp1/p2p1P1p/p2Pp3/r1N5/5R1P/1PP1Q1P1/3RB2K b - - 0 34
Stockfish has just missed 34.f6. Now, is not this a storming pawn, if it is not on an adjacent file with the black king on h8?

[d]5qrk/2b2p2/pn1p1P1p/3Pp1p1/r7/1Q2NR1P/2PB2P1/5R1K w - - 0 42
Is not f6 a storming pawn?
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Doubled and isolated

Post by Lyudmil Tsvetkov »

Another interesting game on the theme:

[pgn][MLNrOfMoves "74"]
[MLFlags "000100"]
[Event "4 Minutes/Game"]
[Site "2 SF 4 min gauntlet, HP-PC"]
[Date "2014.04.10"]
[Round "61.1"]
[White "Stockfish140407IPxx64"]
[Black "Houdini4Prox64"]
[Result "0-1"]

1. e4 e6 2. d4 d5 3. e5 c5 4. c3 Nc6 5. Nf3 Qb6 6. Na3 cxd4 7. cxd4 Bb4+ 8. Bd2
Bd7 9. Nc2 {+0.01/23 12s} 9... Bxd2+ {-0.08/20 10s} 10. Qxd2 {+0.01/24 2s} 10...
Qxb2 {-0.09/20 3s (f6)} 11. Bd3 {0.00/23 3s} 11... Qb6 {-0.10/20 3s} 12. O-O
{+0.02/25 2s} 12... h6 {0.00/20 5s} 13. Rab1 {-0.03/23 4s (Rfb1)} 13... Qc7
{-0.11/18 0s} 14. Qf4 {-0.04/24 2s (Ne3)} 14... Nge7 {-0.22/19 2s} 15. Qg3
{-0.14/23 16s} 15... O-O {-0.36/18 3s} 16. Ne3 {-0.14/25 2s} 16... Rac8
{-0.26/19 3s} 17. Qh3 {-0.14/24 5s (Qh4)} 17... Kh8 {-0.41/17 7s (Ng6)} 18. Qh5
{-0.12/21 11s (Qh4)} 18... f5 {-0.78/17 2s (Ng8)} 19. exf6 {-0.13/23 3s} 19...
Rxf6 {-0.79/15 0s} 20. Ng4 {-0.13/24 4s (Rfc1)} 20... Rxf3 {-0.66/17 2s} 21.
gxf3 {-0.23/25 6s} 21... Be8 {-0.67/20 5s} 22. Qh3 {-0.20/25 3s} 22... Bg6
{-0.69/19 3s} 23. Bxg6 {-0.20/25 2s} 23... Nxg6 {-0.69/17 0s} 24. Rfd1
{-0.20/24 4s} 24... Nf4 {-0.65/19 18s} 25. Qf1 {-0.24/24 5s} 25... Qf7
{-0.70/19 5s (Rb8)} 26. Kh1 {-0.19/22 6s} 26... Ng6 {-0.73/18 6s (Rc7)} 27. Qh3
{-0.16/20 3s} 27... Rc7 {-0.53/18 4s (Rf8)} 28. Qg3 {-0.10/22 3s} 28... Nf4
{-0.55/19 1s} 29. Rd2 {-0.11/23 2s} 29... Nh5 {-0.51/18 3s (g5)} 30. Qg2
{-0.08/23 6s} 30... Kh7 {-0.55/19 4s (Qf4)} 31. Rbd1 {-0.15/22 11s (Rg1)} 31...
Qf4 {-0.63/18 7s (Nf4)} 32. Rd3 {-0.13/21 2s (Ne3)} 32... Rf7
{-0.66/18 5s (a6)} 33. Rb3 {-0.11/21 3s (Rg1)} 33... a5 {-0.64/18 2s} 34. Ra3
{-0.12/22 4s} 34... Rf5 {-0.84/19 3s (Qf5)} 35. Rad3 {-0.27/20 3s} 35... Qc7
{-0.79/18 1s} 36. Rb3 {-0.31/20 4s (Qf1)} 36... Nf4 {-1.15/18 1s} 37. Qf1
{-0.30/22 2s} 37... h5 {-1.36/17 1s (Ng6)} 38. Ne3 {-0.22/21 2s} 38... Rg5
{-1.49/19 3s} 39. Rd2 {-0.36/23 3s} 39... Qe7 {-1.45/19 2s (Qf7)} 40. a4
{-0.40/24 2s (Rb1)} 40... Rg6 {-1.38/18 5s (Qf6)} 41. Rb1 {-0.57/23 4s (Nc2)}
41... Qf6 {-2.02/20 2s (Qh4)} 42. Nc2 {-0.56/24 1s} 42... e5 {-1.94/17 1s} 43.
dxe5 {-0.78/24 2s} 43... Nxe5 {-1.95/15 0s} 44. Rb3 {-0.98/26 6s (Nd4)} 44...
Qg5 {-2.49/18 1s} 45. Nd4 {-1.27/27 3s} 45... Nh3 {-3.03/17 2s} 46. Rd1
{-1.11/28 1s} 46... Qh4 {-2.97/16 0s} 47. Rb2 {-1.22/27 1s} 47... Rg1+
{-2.94/17 1s} 48. Qxg1 {-1.22/1 0s} 48... Nxg1 {-2.85/16 0s} 49. Kxg1
{-1.31/26 2s} 49... Ng4 {-2.39/16 1s} 50. h3 {-1.31/26 1s (Rdd2)} 50... Qxh3
{-2.82/15 0s} 51. Re1 {-1.39/21 2s (Rdd2)} 51... Qh2+ {-3.47/14 0s (Qh4)} 52.
Kf1 {-0.88/1 0s} 52... Ne5 {-3.47/13 0s (Qh3+)} 53. Rb3 {-1.36/22 2s (Rd1)}
53... h4 {-4.03/12 0s} 54. Rbe3 {-1.62/22 1s (Rd1)} 54... Ng6 {-3.61/11 0s} 55.
Ne6 {-1.69/23 1s} 55... h3 {-4.06/16 1s} 56. Ke2 {-1.78/24 3s} 56... Qg2
{-4.06/15 0s} 57. f4 {-1.75/23 1s} 57... Kg8 {-4.50/15 1s (h2)} 58. Rg3
{-2.28/19 1s} 58... Qe4+ {-4.50/13 0s} 59. Re3 {-2.51/20 1s} 59... Qc4+
{-4.50/12 0s} 60. Kf3 {-2.55/20 0s} 60... Nh4+ {-4.50/11 0s} 61. Kg3
{-2.63/23 2s (Kg4)} 61... Ng2 {-5.87/17 2s} 62. Kxh3 {-2.69/23 1s} 62... Nxe1
{-5.88/15 0s} 63. Rxe1 {-2.75/23 1s} 63... d4 {-5.56/15 0s} 64. Re5
{-2.81/23 1s} 64... Qxa4 {-5.56/14 0s} 65. Kg3 {-3.25/23 6s} 65... d3
{-5.56/13 0s} 66. Rd5 {-3.86/20 1s} 66... Qc4 {-6.83/16 1s} 67. Rd6
{-3.94/20 0s} 67... a4 {-7.40/15 0s} 68. Ng5 {-4.23/21 1s} 68... Kf8
{-7.40/14 0s} 69. Nf3 {-4.26/21 0s} 69... Qb3 {-7.40/13 0s (a3)} 70. Nd2
{-3.62/17 0s (Nd4)} 70... Qc2 {-7.78/12 0s} 71. Nf1 {-3.93/17 0s} 71... a3
{-8.18/11 0s (Qe2)} 72. Ne3 {-4.40/18 0s} 72... Qb3 {-8.18/10 0s (Qc1)} 73. Kf3
{-5.91/19 1s} 73... a2 {-8.18/9 0s} 74. Rd8+ {-7.55/20 2s} 74... Ke7
{-8.18/9 0s (Kf7)} 0-1
[/pgn]

[d]2r4k/ppq3p1/2n1p1np/3p4/3P2N1/5P1Q/P4P1P/1R3RK1 w - - 0 24
SF thinks it is almost equal, Houdini sees some 70cps black edge. And Houdini is, of course, right. What is the imbalance: R vs N+2 pawns, or R vs N+ pawn? How much are the f2 and f3 pawns worth? I think in most cases much closer to one pawn than 2 pawns. That is why I repeatedly say, when SF plays such mistakes in 10% of its games, please give more penalty to a doubled pawn that is isolated at the same time.

[d]8/1p4pk/6r1/p2pn2p/P2N3q/5P1n/1R3P1P/3R1Q1K b - - 0 47
When mate approaches, f2 and f3 are still there, staying into the way of the own pieces.
BeyondCritics
Posts: 406
Joined: Sat May 05, 2012 2:48 pm
Full name: Oliver Roese

Re: What to do about this?

Post by BeyondCritics »

Sorry, not even close to a win for black. E.g 49.Qa6!(Stockfish) Rd5 50.Qd3 and the black passer is hemmed currently, whereas the white passers are not. White should have the better chances, in my opinion But still a highly complex, tactical endgame, where even current engines may fail repeatedly. You could explore this for hours, but still find surprises. Chess is deep and beautiful!