Passed Pawns (endgame)

Ralph Stoesser · Post by **Ralph Stoesser** » Mon May 24, 2010 6:25 pm

lech wrote:
Ralph Stoesser wrote:
lech wrote: BTW: Were there some works (tries) to switch “search” between “score” and “increase of score” ?
I think it could well work in endings. Only sacrifices (captures) seem to be a problem (separate tree?).
Do you mean something like
Code: Select all
move 1
depth  1 score 3.00
depth  2 score 3.00
depth  3 score 3.00
...
depth 30 score 3.00

move 2
depth  1 score 0.10
depth  2 score 0.20
depth  3 score 0.30
...
depth 30 score 2.50
Then we should choose move 2 instead of move 1?
I've thought about it recently. It could be an interesting approach to try to solve the thread issue.
Thanks Ralph, it means that some tries were not done, or gave a bad result. OK.

I don't know. Let's wait what others say.

zamar · Post by **zamar** » Mon May 24, 2010 6:48 pm

bob wrote: Far be it from me to try to convince you to change your testing approach, but at least don't try to justify 1,000 games as better solely because the games are at a slower time control.

It's not my goal to try to convince anybody about anything. I've just told that testing with 1000 games and 1+0 has taken us this far. One day when we stop making progress, we might have to change to 2s+0s 30000 games, but that day is not now.

bob · Post by **bob** » Mon May 24, 2010 9:24 pm

zamar wrote:
bob wrote: Far be it from me to try to convince you to change your testing approach, but at least don't try to justify 1,000 games as better solely because the games are at a slower time control.
It's not my goal to try to convince anybody about anything. I've just told that testing with 1000 games and 1+0 has taken us this far. One day when we stop making progress, we might have to change to 2s+0s 30000 games, but that day is not now.

I am pointing out, from a _ton_ of experience, that you could most likely have gotten far farther. I relied on the 1000 game approach for years. After 1000 games, you have 20 point guestimate for the rating. It could be anywhere within that range (and can obviously lie outside that range with lower probability).

If you are happy with your results, and want to ignore the statistics, that's a choice you are free to make. But statistics don't lie. And there are _many_ that fall into that same trap, thinking that fewer slower games are better than more fast games. It simply isn't so. Certainly when talking about a 30x range in number of games...

mcostalba · Post by **mcostalba** » Mon May 24, 2010 10:13 pm

zamar wrote: I'd be very interested in research in this area, but I'm not prepared to stall the development of Stockfish for weeks or months just to test this thing.

Joona, currently if ater a 1000 games a patch fails to result in a clear advantage, say less then 10 ELO, we reject the patch (as you know) because we are much below the noise level and we cannot trust, even not partially, the result.

Perhaps we could improve the efficency of this scheme to start addressing pacthes that give 4-5 ELO of contribution.

Here is a possible scheme:

1) When a patch fails to show a clear advantage after 1000 games instead of discard save the patch in a "candidate low gain" set

2) Pick the patches in the "candidate low gain set" and run a match of say 10000 games at very short time control for instance 2 sec/game. If the patch fails even there then discard, otherwise collect the patch in a "low gain" set

3) When the "low gain" set reaches 4-5 patches then retest at 1000 games 1 minute per game but with_all_ the patches applied in one go so that expected result should be detectable even with 1000 games at slower time control.

4) If the patch set proves to be good apply them all.

This scheme has the following advantages:

- Allow to gain ELO even above small tweaks

- Do the final verification with a standard 1000 games at 1 minute TC

- Reduce the added burden in a retest at very short time control, so that the time resurces allocated for such patches is only doubled. The final test cost is shared among all the patches that are committed in one go if positive.

Comments ?

Ralph Stoesser · Post by **Ralph Stoesser** » Mon May 24, 2010 10:27 pm

Maybe it's worth to give a "sideline research" a try. Buy another cheap computer. Whenever you test something, do your 1+0 tests as always, but at the same time also run 30,000 games, 1 sec on the other computer.

In case both results suggest the same, there is nothing to do. Write down the results and the type of patch.

In case the results differ, try to find out which of the both results is more reliable. (If you don't want to spend extra time on this step, I'm sure there are ways to transfer the job to someone else here in the forum)

After a few dozen patches you should get a rough estimate about which testing scheme is more reliable in general.

mcostalba · Post by **mcostalba** » Mon May 24, 2010 10:36 pm

Ralph Stoesser wrote:Maybe it's worth to give a "sideline research" a try. Buy another cheap computer. Whenever you test something, do your 1+0 tests as always, but at the same time also run 30,000 games, 1 sec on the other computer.

This seems to me less efficient then first try 1 minute and in case of not clear evidence of success fall back on the 2 sec/game test. The latter scheme is more efficent because you don't test at 2 sec/games patches that give a clear advantage at 1 minute TC.

If you have another PC you can test in parallel another patch at 1' TC.

To pass the burden to someone else does not increase scheme efficency becasue you just add testing resources, but is a different concept.

I am talking about the most efficent scheme intended as time involved, it doesn't count if this time is spent by one tester only or shared among many of them.

bob · Post by **bob** » Tue May 25, 2010 1:38 am

Ralph Stoesser wrote:Maybe it's worth to give a "sideline research" a try. Buy another cheap computer. Whenever you test something, do your 1+0 tests as always, but at the same time also run 30,000 games, 1 sec on the other computer.

In case both results suggest the same, there is nothing to do. Write down the results and the type of patch.

In case the results differ, try to find out which of the both results is more reliable. (If you don't want to spend extra time on this step, I'm sure there are ways to transfer the job to someone else here in the forum)

After a few dozen patches you should get a rough estimate about which testing scheme is more reliable in general.

I'd agree completely. We've already done this in fact. I had even thought about posting some sample data, say making a 30K game run, but also grabbing 1K random games and giving the Elo from them as well, just to show how far (and how often) the two values are _far_ apart.

Ralph Stoesser · Post by **Ralph Stoesser** » Tue May 25, 2010 9:43 pm

mcostalba wrote:
Ralph Stoesser wrote:Maybe it's worth to give a "sideline research" a try. Buy another cheap computer. Whenever you test something, do your 1+0 tests as always, but at the same time also run 30,000 games, 1 sec on the other computer.
This seems to me less efficient then first try 1 minute and in case of not clear evidence of success fall back on the 2 sec/game test. The latter scheme is more efficent because you don't test at 2 sec/games patches that give a clear advantage at 1 minute TC.

If you have another PC you can test in parallel another patch at 1' TC.

To pass the burden to someone else does not increase scheme efficency becasue you just add testing resources, but is a different concept.

I am talking about the most efficent scheme intended as time involved, it doesn't count if this time is spent by one tester only or shared among many of them.

Sorry, I should have quoted Joona in my reply. It was not meant as a new testing scheme. I meant stay with your testing methology (1000 games, 1+0), but additionally do dome "sideline research" w.r.t. very fast games.

lech · Post by **lech** » Thu May 27, 2010 12:11 am

Why in pawns.cpp (evaluate_pawns() - function) pawns in endgames don’t get bonus for a chain ?
Example:
[d] 8/p2p4/8/PP2P1P1/8/8/3P4/8 w - - 0 1
bonuses:
Pg5 – 100 (only default "r"),
Pe5+d2 – 68,
Pa5+b5 – 68.
The order by the real strong: Pa5+b5 next Pg5, Pe5+d2
Maybe I can’t see something?

Eelco de Groot · Post by **Eelco de Groot** » Thu May 27, 2010 1:44 am

lech wrote:Why in pawns.cpp (evaluate_pawns() - function) pawns in endgames don’t get bonus for a chain ?
Example:
[d] 8/p2p4/8/PP2P1P1/8/8/3P4/8 w - - 0 1
bonuses:
Pg5 – 100 (only default "r"),
Pe5+d2 – 68,
Pa5+b5 – 68.
The order by the real strong: Pa5+b5 next Pg5, Pe5+d2
Maybe I can’t see something?

Yes, I could not figure that either. Maybe you saw that I changed it to a very small plus in Rainbow Serpent but if this term essentially is tuned to zero there must be some other term missing I think. Why should being part of a pawn chain be disadvantageous? Sometimes only the foremost pawn should run but that is more a concern for passed pawns. Passed pawns are not evaluated in pawns.cpp and connected passers are underevaluated anyway in Stockfish I think, the bonus for moving a passed pawn forward even if it means leaving a supporting pawn behind are always large enough that it should not influence this pawn chain term in pawns.cpp.

Eelco

Passed Pawns (endgame)

Re: Passed Pawns (endgame)

Re: Passed Pawns (endgame)

Re: Passed Pawns (endgame)

Re: Passed Pawns (endgame)

Re: Passed Pawns (endgame)

Re: Passed Pawns (endgame)

Re: Passed Pawns (endgame)

Re: Passed Pawns (endgame)

Re: Passed Pawns (endgame)

Re: Passed Pawns (endgame)