Stuck

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Interpretation request.

Post by Lyudmil Tsvetkov »

arjuntemurnikar wrote:
Lyudmil Tsvetkov wrote: Still, I support my point that simplifications scale worse than knowledge patches at LTC.
I don't think so. There were other functional changes in the period you describe too. Why do you only see the simplifications as candidates for bad scalability? All simplifications committed in recent times have passed both STC and LTC, so I see no reason why they should be less scalable than standard patches that also pass STC and LTC. Anyway, you are again not taking into account that as TC/depth increases, and as the gap in relative strength widens, the apparent elo difference curve flattens out naturally, so what you are seeing are ghosts. You may also cross-check this with regression tests before SF DD release. Each consecutive regression test against master seemed to gain less and less elo, especially in the later stages (~40-65 elo) and that is because of what I explained. It is only natural.
Lyudmil Tsvetkov wrote: Now that we are here, Arjun, and I depend entirely upon you for submitting tests, could you please schedule the more successful blocked pawn patch for standard SPRT? I guess this would be the 40cps bonus. It would be interesting to see the difference in performance at both TC.

Many thanks in advance, Arjun!
It doesn't look like to me that the test did positively. 60,60 did better than 40,40 (which failed quite clearly), but it was still neutral. Perhaps I might try increasing it further 80,80 or 100,100 and seeing if there is improvement. After that if it doesn't work, I think it would be good to give that idea a rest for a while.

Cheers!
Cheers, Arjun.

I am speaking of 80% SPRT 0;6 patches in the first period, and 50% SPRT 0;6 patches in the last period, so that obviously the 80% did great job back then, it is not those patches that scale bad. Same behaviour was observed in the second period. Do you really not see it: when a test scores negatively, 5-6 more lost games after 100 000, although passing SPRT, that is a liability that is going to have its consequences on performance. Same when it scores only 50 more won games out of 40 000. It passes SPRT, but it contributes very few to strength. +200 games after 40 000 is another thing, you should not be afraid there that the patch does not contribute. Flattening out of the curve is a different thing, we are speaking here that the changes in the last month performed 3 times worse in terms of scalability than the changes in the first period. (13 elo gained and 1.6 elo SMP loss vs 39 elo gained and 1.7 elo SMP loss) I think this is too obvious, too blatant.

Do you really believe a game in 5 sec. says anything about how good an idea is? I bet I will win all the games with me having 5 min. per game vs SF just 5 sec. You do storm tuning with 5 sec., it works great, it still scales well with 15 sec., and then fails with 60 sec. Probably it is the same, but the other way round, with those blocked pawns, they might perform better at 15 or 60 sec., although failing at 5 sec. Some changes perform better at LTC, others at STC.

Please, do not get me wrong, Arjun, I just care for SF, I would not like seeing it suddenly starting scaling worse at very long TC, which is the time control that matters.

Many thanks again for the blocked test. I see you have queued another test, many thanks indeed! So take 2 was the 60cps value? What is the 3rd try, 80 or 100cps? Please Arjun, be so kind, just one more time for the sake of curiosity at least, but also because blocked on the 6th are a big problem for SF and fixable anyhow, to schedule just one more test at standard 15 sec. SPRT with the version that performed best, although it might have scored negatively. Just to see the difference in performance at longer TC. I am sure it will do better there.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Scaling with opposite-coloured bishops

Post by Lyudmil Tsvetkov »

mcostalba wrote:
Lyudmil Tsvetkov wrote: - scale for the entire game, and not just for the endgame (the Berlin is an excellent example that you should scale for the entire game, and not just the endgame; if I have time, I might post some game where SF is the black protagonist in the Berlin and, believe me, SF is notorious for mishandling the Berlin)
I have submitted a test to do it.
Marco, just to ask you, when you have more time, is it possible to push a new edition of the full scale test, with just one additional condition that is very important and will not alter bench negatively? The condition is of course, do not scale when queens are on the board, either in the middlegame, or the endgame. That is very important, because queens influence greatly king safety, especially with opposite-coloured bishops.

That is why, actually, I think, full scale failed at LTC: at LTC SF saw the irrelevance of scaling with queens present, as king safety kicked in, something it failed to see at STC.

For those interested in applying fully correct rules, one further scaling condition: exclude from scaling cases when one of the sides leads by more than a pawn in total non-pawn material. Here we have examples like R+B vs N+ opposite-coloured bishop.

[d]6k1/1p2qpp1/p1n1b2p/8/8/2BQP3/1P3PP1/R5K1 w - - 0 1
You should not scale here, as it is not only bishop vs opposite-coloured bishop, but partially also R vs B and opposite-coloured bishop vs N.

The most important thing, however, for scaling to work correctly, is to exclude queens from scaling, you simply can not do without this condition.

Or maybe, if Marco does not have time, someone else could retest Marco's patch with just the above single additional condition.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Stuck

Post by Lyudmil Tsvetkov »

Hi Arjun.

Just to ask you once again, as we agreed earlier, to reschedule the most successful blocked pawn on 6th patch at 15 sec. standard SPRT. I guess this will be take 2, 60cps bonus, that scored about neutral at 5 sec.

I see the queue is almost empty and that will be not against the rules, as my intention was not to let the patch first suffer the hard preliminary 5 sec. test. :(

I am really puzzled why, when SF has an apparent difficulty with such pawns, an additional bonus will not work, should be working, but who knows what the engine is thinking.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Scaling with opposite-coloured bishops

Post by Lyudmil Tsvetkov »

Again, many thanks to Joerg for doing a rewrite of Marco's full scale.

Joerg, just one observation, if you allow me: I think that not scaling the score with queens off only in the middlegame, and scaling in the endgame is not the perfect decision and possibly creates discontinuities, as with queens of you scale both stages, but when queen are on in the middlegame, you scale only one stage. For example, you have middlegame with queens, you do not scale, then you come to the endgame, in many cases still with queens, and you suddenly start scaling.

Probably this creates discontinuities of some sort, I do not know if big or small. I think the right decision is not to scale with queens on both in the middlegame and the endgame.

Do you think it is possible to check also how this performs?

Possibly also smoother scaling transition and setting a value above which you do not scale could be of help. I think 150cps on the standard 1-3-5-9 material scale would be the right value. And also, when you do not scale with queens on in both the middlegame and endgame, probably you should scale much bolder, if done with gradual increase.

Again, just to rephrase what I suggested in a nearby thread:

- scale only when score lower than 150cps
- and do not scale with queens on
- scale for both middlegame and endgame with gradually increasing factoring (meaning scale more with less material)
- scale 10-20% in early middlegame, 30-40% in late middlegame, 50-60% in early endgame, and 70-80% in late endgame

Many thanks again, Joerg! And also to Marco for taking the initiative without being asked. I am cretain you understand me: the only reason why I insist is that SF badly needs this knowledge, both in terms of strength and positional elegance. I am absolutely certain that it should work, but apart from much tuning you should also have a bit of luck.

Many thanks again.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Scaling with opposite-coloured bishops

Post by Lyudmil Tsvetkov »

Hi Joerg.

I thought the second attempt would go like Schmitz Katze, but what to do?
Idea hui, implementation pfui. :) (or maybe vice versa)

I understand nothing anymore. Marco's patch with the wrong scaling with queens on gets fine score in STC, this patch with the right idea gets quick red. Maybe it is about finetuning, maybe about specification, maybe the engine has some features, which is quite probable, that would make difficult or impossible implementation of certain ideas with current code, although the ideas might be overall right.

I do not suggest it at all, do not even think of it, but who knows, if one patch succeeds at STC and fails at long, maybe this will fail short and succeed in LTC?

Anyway, I hope with experience and changed code someone pushes a new patch on this in the future, no matter when, as SF simply needs it. Maybe this will happen sometime in the future.

Many thanks again, Joerg! You are unbelievable. (but also to Marco who started it)