Stuck

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Interpretation request

Post by Lyudmil Tsvetkov »

arjuntemurnikar wrote:
Lyudmil Tsvetkov wrote:I do not know if this is the right thread, but I will ask here.

You know, I am very curious, and that is a bad thing.

Still, below the data from the last 3 standard and SMP (3 threads) SF regression tests:

27.02.2014 standard 39.25 elo SMP 37.55 difference 1.70 elo

25.03.2014 standard 43.70 elo SMP 39.24 difference 4.46 elo

26.04.2014 standard 57.08 elo SMP 51.04 difference 6.04 elo

As you see, the difference between standard single core and SMP tests continuously rises, with some peculiar behaviour. What I observe?

On 27.02, at 40 elo increase over SF DD, SF loses only 1.7 elo with SMP; on 26.04, at almost 60 elo increase over DD, 1/3 of the overall increase, SF already loses 6 elo with SMP; from 27.02 to 25.03 SF adds 4.5 elo and loses 2.76 elo at the same time with SMP, obviously the most macabre scaling period. From 25.03 to 26.04 SF adds 13.40 elo standard and loses 1.60 elo with SMP.

So, basically, the SMP performance on 27.02 is 3 times better than the SMP performance from 25.03 to 26.04. The SMP performance from 27.02 to 25.03 is absolutely disastrous for some reason.

Bearing in mind that SMP is very much tantamount to scalability, closely related to performance at longer TC, it is clear that the changes introduced since DD until 27.02 scale much better than the changes introduced in the last month, and even more so than the changes introduced from February till end of March.

How do you explain this peculiar behaviour? What were the changes that scaled so well in the beginning, and what are the changes that scale considerably less so now? What were the changes from February till March that scaled so extremely bad? Could we learn a lesson at looking at the history page what change it is good to make and what not, so that the scalability of SF is optimal?

Any comments very much appreciated. I am new and weak in this branch so maybe people will elucidate me. At the end of the day, we want an engine that scales best and plays best at longer TC and multiple threads as the TCEC conditions.
That is not correct.

When TC/depth increases, draw-ratio increases. Elo calculation takes into account draw-ratio, so naturally as SMP games tend to have deeper depths, elo gain is lower because there are more draws. (If you see carefully, win-loss ratio remains quite consistent).

Also, as elo difference between engines increases (for eg. SF DD vs master is now ~60 elo), the elo curve starts flattening out slightly. It flattens out even more for SMP. It is a normal thing. Nothing to worry about. :)

That is why the relative strength of engines at TCEC time control (with 16-cores) is much closer than on rating lists at bullet/blitz TC on standard hardware. The draw-ratio is orders of multitude higher in TCEC than in say LSratinglist. That's why you have so many close encounters even between engines that are 100-200 elo apart on standard rating lists. As TC/depth increases, elo difference decreases.
Hi Arjun.

Waiting any time for the latest 'Simplification - no mobility' patch. :D Or maybe 'Simplification - no piece values'.

Joking apart, I think you see the difference between a patch that scores -5 games after 110 000 games, and a patch that scores +500 games after the same 110 000 games. The second patch is much more productive. No one can convince me that their contribution is equal.

When resources are scarce, simplification works, when there are ample computing resources, it is good to have knowledge.

But nevermind, I am obviously a loner. It is just that I care for SF performance.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: An imbalance suggestion that might work in SF

Post by Lyudmil Tsvetkov »

Another game showing that king safety weight should be reduced for the side with more non-pawn material, or maybe even consider minor piece king shelter.

[pgn][PlyCount "142"]
[MLNrOfMoves "71"]
[MLFlags "000100"]
[Date "2014.04.21"]
[Round "451"]
[White "Gull"]
[Black "SF"]
[Result "1-0"]
[TimeControl "64+0.32"]

1. Nf3 {book} 1... c5 {book} 2. c4 {book} 2... Nf6 {book} 3. Nc3 {book} 3... Nc6
{book} 4. g3 {book} 4... d5 {book} 5. cxd5 {book} 5... Nxd5 {book} 6. Bg2 {book}
6... e6 {book} 7. d4 {book} 7... Be7 {book} 8. dxc5 {book} 8... Bxc5 {book} 9.
O-O {+0.19/17 4.1s} 9... O-O {-0.33/19 1.6s} 10. Bd2 {+0.27/16 0.27s} 10... Nxc3
{-0.17/22 3.4s} 11. Bxc3 {+0.33/16 0.83s} 11... f6 {-0.39/22 3.2s} 12. Qb3
{+0.53/17 7.4s} 12... Qb6 {-0.55/21 2.3s} 13. Qc4 {+0.63/15 0.000s} 13... Bd7
{-0.49/21 1.1s} 14. Rad1 {+0.57/16 1.3s} 14... Rad8 {-0.65/22 3.3s} 15. a3
{+0.70/16 1.1s} 15... Nb8 {-0.65/23 1.1s} 16. b4 {+0.64/16 1.6s} 16... Be7
{-0.61/22 1.5s} 17. Bh3 {+0.54/18 4.7s} 17... Qa6 {-0.66/24 4.5s} 18. Qe4
{+0.62/17 1.4s} 18... Kh8 {-0.65/22 2.0s} 19. Nd4 {+0.63/16 1.2s} 19... Rde8
{-0.61/23 4.1s} 20. b5 {+0.83/17 3.4s} 20... Bxb5 {-0.66/23 1.6s} 21. Nxb5
{+0.83/15 0.000s} 21... Qxb5 {-0.58/24 2.8s} 22. Rb1 {+0.83/15 0.000s} 22... Qc5
{-0.58/23 0.97s} 23. Rfc1 {+0.63/16 1.5s} 23... b6 {-0.74/23 2.1s} 24. Bxe6
{+0.65/17 4.7s} 24... Qg5 {-0.40/20 1.6s} 25. Bd4 {+0.63/17 1.3s} 25... f5
{-0.44/23 8.4s} 26. Qd3 {+0.58/16 5.4s} 26... Bd6 {-0.34/20 2.3s} 27. Ba2
{+0.58/15 0.000s} 27... Rd8 {-0.29/19 1.7s} 28. Qc4 {+0.47/13 1.1s} 28... Nd7
{-0.12/22 2.4s} 29. Rd1 {+0.24/15 2.0s} 29... Nc5 {-0.06/21 0.75s} 30. e3
{+0.26/14 2.7s} 30... Ne4 {-0.22/18 1.1s} 31. a4 {+0.30/16 1.4s} 31... Qh5
{0.00/20 2.1s} 32. Qc2 {+0.22/16 1.3s} 32... Ng5 {0.00/23 0.70s} 33. Bd5
{+0.69/12 0.41s} 33... f4 {-0.18/22 1.4s} 34. Qc3 {+0.36/14 1.8s} 34... Nh3+
{0.00/21 1.0s} 35. Kf1 {+0.67/14 1.1s} 35... Qg6 {-0.13/24 1.8s} 36. exf4
{+0.14/15 1.3s} 36... Bxf4 {-0.15/24 0.64s} 37. f3 {+0.22/15 0.047s} 37... Bh6
{0.00/24 0.94s} 38. Qb3 {+0.72/13 0.093s} 38... Qh5 {-0.35/21 1.8s} 39. Kg2
{+0.72/16 0.39s} 39... Rd6 {-0.53/24 1.2s} 40. Qc4 {+0.64/15 1.2s} 40... Nf4+
{0.00/21 1.1s} 41. gxf4 {+0.38/13 0.27s} 41... Bxf4 {0.00/23 0.56s} 42. Bg1
{+0.41/16 1.1s} 42... Rh6 {0.00/24 0.55s} 43. Qe4 {+0.30/16 2.5s} 43... Bd6
{0.00/22 0.88s} 44. Kf1 {+0.89/12 0.61s} 44... Rhf6 {0.00/22 1.1s} 45. Rb3
{+0.58/15 1.4s} 45... Bxh2 {0.00/19 1.1s} 46. Bf2 {+0.50/14 0.22s} 46... Bf4
{0.00/21 0.48s} 47. Rb4 {+0.46/14 0.31s} 47... h6 {0.00/19 1.6s} 48. Bd4
{+0.69/12 0.47s} 48... Rf5 {-0.75/20 1.7s} 49. Qe7 {0.00/13 0.67s} 49... R5f7
{-0.61/21 1.3s} 50. Qe6 {+0.64/17 0.86s} 50... Bg3 {-0.88/20 0.56s} 51. Ke2
{+0.55/14 0.36s} 51... Rc7 {-0.81/21 0.70s} 52. Kd3 {+0.46/15 0.80s} 52... Rxf3+
{-0.41/21 0.83s} 53. Be3 {+0.46/15 0.094s} 53... Rf8 {-0.98/21 1.7s} 54. Rh1
{+0.46/14 0.000s} 54... Qf5+ {-1.05/19 0.34s} 55. Qxf5 {+1.13/14 0.31s} 55...
Rxf5 {-0.96/20 0.58s} 56. Be4 {+1.13/13 0.000s} 56... Rf8 {-1.05/20 0.47s} 57.
Bd4 {+1.13/13 0.000s} 57... Bf4 {-0.91/20 0.67s} 58. Rh5 {+1.06/17 1.0s} 58...
Bg5 {-0.78/18 0.45s} 59. Rb5 {+1.00/18 1.0s} 59... Bf6 {-0.75/19 0.33s} 60. Rhf5
{+1.00/19 0.66s} 60... Rd8 {-0.60/21 0.25s} 61. Rbd5 {+1.00/20 0.75s} 61... Rxd5
{-0.60/23 0.20s} 62. Rxd5 {+1.00/20 0.20s} 62... Kg8 {-0.67/23 0.30s} 63. Bxf6
{+1.00/20 0.25s} 63... gxf6 {-0.67/23 0.22s} 64. Kd4 {+1.00/20 0.14s} 64... Re7
{-0.66/20 0.23s} 65. Bf3 {+1.00/20 1.1s} 65... Kh7 {-0.60/23 0.23s} 66. Kc4
{+1.14/19 0.52s} 66... Kg6 {-0.60/23 0.23s} 67. Bh5+ {+1.14/20 0.27s} 67... Kg7
{-0.60/24 0.25s} 68. Kb5 {+1.22/21 0.97s} 68... Rc7 {-0.96/21 0.70s} 69. Ka6
{+1.30/20 0.28s} 69... Kf8 {-1.30/22 0.55s} 70. Be2 {+1.78/20 1.8s} 70... Kf7
{-1.53/20 0.27s} 71. Bd3 {+1.71/16 0.000s} 71... Re7
{White wins by adjudication} 1-0
[/pgn]

[d]5r1k/p5pp/1p1r4/3B3q/P1QB1b2/5P2/6KP/1R1R4 w - - 0 42
0.0 draw score here from SF, Gull thinks white is better by some half a pawn. Again, SF weighs the enemy king safety as very bad, but the white king has one additional minor piece to defend it. That is very important.

[d]5r1k/p5pp/1p1b3r/3B3q/P3Q3/5P2/7P/1R1R1KB1 b - - 0 44
Still 0.0 score SF, +90cps white edge Gull. Does SF not count spare pieces at all?

[d]5r1k/p5p1/1p3r1p/3B3q/PR2Qb2/5P2/5B2/3R1K2 w - - 0 48
The f2 bishop shelters the white king perfectly.

[d]5r1k/p1r3p1/1p2Q2p/3B3q/PR6/3KB1b1/8/3R4 w - - 0 54
The white king feels very comfortable, even in the center of the board, with 2 own bishops close by.

Again, a pattern of SF weakness, but who knows when it will be fixed. You can not expect to fix it with just one or 2 tries. You need a consistent approach and finetuning. But if the problem is there, it is of course real and fixable.

One other funny suggestion for solving different SF imbalance woes: apply a uniform 10-20 cps bonus for the side with more pieces. Strange as it might seem and illogical at first glance, I am sure this will work with SF, based on the games I have seen. SF seems to underevaluate 2 minors vs R, minor piece vs pawns, etc., so the bonus really makes sense. And of course, the main merit of this bonus is that it will have nothing to do with the quadratic table.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Scaling with opposite-coloured bishops

Post by Lyudmil Tsvetkov »

It is funny how a single detail might decide if an idea is implementable or not: you scale only in the endgame, and it does not work, you scale for both the endgame and middlegame, and it still does not work, you exclude queens from the scaling rule, and suddenly the idea starts working...

Of course, this should be tested, but one certain thing is that good implementation of scaling with opposite-coloured bishops should work perfectly and with a reasonable strength contribution, especially in clueless modern engines. There is so much to gain from that fine positional rule!

One further testing idea: if queens are not excluded overall from scaling, then scale significantly less with queens on, and much bolder without queens.
User avatar
Ajedrecista
Posts: 1985
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: Interpretation request.

Post by Ajedrecista »

Hello Lyudmil:
Lyudmil Tsvetkov wrote:
arjuntemurnikar wrote:
Lyudmil Tsvetkov wrote:I do not know if this is the right thread, but I will ask here.

You know, I am very curious, and that is a bad thing.

Still, below the data from the last 3 standard and SMP (3 threads) SF regression tests:

27.02.2014 standard 39.25 elo SMP 37.55 difference 1.70 elo

25.03.2014 standard 43.70 elo SMP 39.24 difference 4.46 elo

26.04.2014 standard 57.08 elo SMP 51.04 difference 6.04 elo

As you see, the difference between standard single core and SMP tests continuously rises, with some peculiar behaviour. What I observe?

On 27.02, at 40 elo increase over SF DD, SF loses only 1.7 elo with SMP; on 26.04, at almost 60 elo increase over DD, 1/3 of the overall increase, SF already loses 6 elo with SMP; from 27.02 to 25.03 SF adds 4.5 elo and loses 2.76 elo at the same time with SMP, obviously the most macabre scaling period. From 25.03 to 26.04 SF adds 13.40 elo standard and loses 1.60 elo with SMP.

So, basically, the SMP performance on 27.02 is 3 times better than the SMP performance from 25.03 to 26.04. The SMP performance from 27.02 to 25.03 is absolutely disastrous for some reason.

Bearing in mind that SMP is very much tantamount to scalability, closely related to performance at longer TC, it is clear that the changes introduced since DD until 27.02 scale much better than the changes introduced in the last month, and even more so than the changes introduced from February till end of March.

How do you explain this peculiar behaviour? What were the changes that scaled so well in the beginning, and what are the changes that scale considerably less so now? What were the changes from February till March that scaled so extremely bad? Could we learn a lesson at looking at the history page what change it is good to make and what not, so that the scalability of SF is optimal?

Any comments very much appreciated. I am new and weak in this branch so maybe people will elucidate me. At the end of the day, we want an engine that scales best and plays best at longer TC and multiple threads as the TCEC conditions.
That is not correct.

When TC/depth increases, draw-ratio increases. Elo calculation takes into account draw-ratio, so naturally as SMP games tend to have deeper depths, elo gain is lower because there are more draws. (If you see carefully, win-loss ratio remains quite consistent).

Also, as elo difference between engines increases (for eg. SF DD vs master is now ~60 elo), the elo curve starts flattening out slightly. It flattens out even more for SMP. It is a normal thing. Nothing to worry about. :)

That is why the relative strength of engines at TCEC time control (with 16-cores) is much closer than on rating lists at bullet/blitz TC on standard hardware. The draw-ratio is orders of multitude higher in TCEC than in say LSratinglist. That's why you have so many close encounters even between engines that are 100-200 elo apart on standard rating lists. As TC/depth increases, elo difference decreases.
Hi Arjun.

Waiting any time for the latest 'Simplification - no mobility' patch. :D Or maybe 'Simplification - no piece values'.

Joking apart, I think you see the difference between a patch that scores -5 games after 110 000 games, and a patch that scores +500 games after the same 110 000 games. The second patch is much more productive. No one can convince me that their contribution is equal.

When resources are scarce, simplification works, when there are ample computing resources, it is good to have knowledge.

But nevermind, I am obviously a loner. It is just that I care for SF performance.
Arjun cited the wins/loses ratio. Here is a thread where it is discussed and it looks like this ratio could be almost constant with long enough TC games.

IMHO, it is completely logical that differences become smaller with increasing number of threads and with longer TC. It is a kind of law of diminishing returns, just as matches at fixed depth, for example (at the end, the draw ratio raises). I am sure that you will find that Elo differences between engines are smaller at longer TC that in shorter TC... you only have to check it in CCRL 40/40 vs. 40/4, CEGT 40/120 vs. 40/20 vs. 40/4, rating lists by fastgm, etc.

Simplifications are intended to remove useless code which slows down the search and/or the eval, not to remove valuable (what you call knowledge) parts. Other thing is ask if every simplification is removing only useless code. SPRT(-3, 1) tries to minimize the risk of accept big Elo loses (there are some numbers in Fishcooking Google Group sticky thread called SPRT parameters guideline).

If I am not wrong, SF Elo gain trend is positive without doubts so it is the important thing. No in vain, SPRT(0, 6) tests may accept regressions but with a tiny probability. Everything is about trade-offs.

Thank you very much for all your suggestions.

Regards from Spain.

Ajedrecista.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Interpretation request.

Post by Lyudmil Tsvetkov »

Ajedrecista wrote:Hello Lyudmil:
Lyudmil Tsvetkov wrote:
arjuntemurnikar wrote:
Lyudmil Tsvetkov wrote:I do not know if this is the right thread, but I will ask here.

You know, I am very curious, and that is a bad thing.

Still, below the data from the last 3 standard and SMP (3 threads) SF regression tests:

27.02.2014 standard 39.25 elo SMP 37.55 difference 1.70 elo

25.03.2014 standard 43.70 elo SMP 39.24 difference 4.46 elo

26.04.2014 standard 57.08 elo SMP 51.04 difference 6.04 elo

As you see, the difference between standard single core and SMP tests continuously rises, with some peculiar behaviour. What I observe?

On 27.02, at 40 elo increase over SF DD, SF loses only 1.7 elo with SMP; on 26.04, at almost 60 elo increase over DD, 1/3 of the overall increase, SF already loses 6 elo with SMP; from 27.02 to 25.03 SF adds 4.5 elo and loses 2.76 elo at the same time with SMP, obviously the most macabre scaling period. From 25.03 to 26.04 SF adds 13.40 elo standard and loses 1.60 elo with SMP.

So, basically, the SMP performance on 27.02 is 3 times better than the SMP performance from 25.03 to 26.04. The SMP performance from 27.02 to 25.03 is absolutely disastrous for some reason.

Bearing in mind that SMP is very much tantamount to scalability, closely related to performance at longer TC, it is clear that the changes introduced since DD until 27.02 scale much better than the changes introduced in the last month, and even more so than the changes introduced from February till end of March.

How do you explain this peculiar behaviour? What were the changes that scaled so well in the beginning, and what are the changes that scale considerably less so now? What were the changes from February till March that scaled so extremely bad? Could we learn a lesson at looking at the history page what change it is good to make and what not, so that the scalability of SF is optimal?

Any comments very much appreciated. I am new and weak in this branch so maybe people will elucidate me. At the end of the day, we want an engine that scales best and plays best at longer TC and multiple threads as the TCEC conditions.
That is not correct.

When TC/depth increases, draw-ratio increases. Elo calculation takes into account draw-ratio, so naturally as SMP games tend to have deeper depths, elo gain is lower because there are more draws. (If you see carefully, win-loss ratio remains quite consistent).

Also, as elo difference between engines increases (for eg. SF DD vs master is now ~60 elo), the elo curve starts flattening out slightly. It flattens out even more for SMP. It is a normal thing. Nothing to worry about. :)

That is why the relative strength of engines at TCEC time control (with 16-cores) is much closer than on rating lists at bullet/blitz TC on standard hardware. The draw-ratio is orders of multitude higher in TCEC than in say LSratinglist. That's why you have so many close encounters even between engines that are 100-200 elo apart on standard rating lists. As TC/depth increases, elo difference decreases.
Hi Arjun.

Waiting any time for the latest 'Simplification - no mobility' patch. :D Or maybe 'Simplification - no piece values'.

Joking apart, I think you see the difference between a patch that scores -5 games after 110 000 games, and a patch that scores +500 games after the same 110 000 games. The second patch is much more productive. No one can convince me that their contribution is equal.

When resources are scarce, simplification works, when there are ample computing resources, it is good to have knowledge.

But nevermind, I am obviously a loner. It is just that I care for SF performance.
Arjun cited the wins/loses ratio. Here is a thread where it is discussed and it looks like this ratio could be almost constant with long enough TC games.

IMHO, it is completely logical that differences become smaller with increasing number of threads and with longer TC. It is a kind of law of diminishing returns, just as matches at fixed depth, for example (at the end, the draw ratio raises). I am sure that you will find that Elo differences between engines are smaller at longer TC that in shorter TC... you only have to check it in CCRL 40/40 vs. 40/4, CEGT 40/120 vs. 40/20 vs. 40/4, rating lists by fastgm, etc.

Simplifications are intended to remove useless code which slows down the search and/or the eval, not to remove valuable (what you call knowledge) parts. Other thing is ask if every simplification is removing only useless code. SPRT(-3, 1) tries to minimize the risk of accept big Elo loses (there are some numbers in Fishcooking Google Group sticky thread called SPRT parameters guideline).

If I am not wrong, SF Elo gain trend is positive without doubts so it is the important thing. No in vain, SPRT(0, 6) tests may accept regressions but with a tiny probability. Everything is about trade-offs.

Thank you very much for all your suggestions.

Regards from Spain.

Ajedrecista.
Thanks for posting Jesus. I appreciate it very much.
A discussion is necessary and always helps.

Yes, the most important thing is that SF progresses, I am very happy about this.

This is not about rating differences getting smaller with longer TC, but about relative performance at short and longer TC. Some engines are better with STC, like Houdini, which seems to have excellently tuned parameters, but not too much to offer in terms of deep positional knowledge. Other engines, like SF, seem to scale almost equally nowadays at all TC. SF might not have much knowledge, but I think, at least in terms of eval, what scales very well for SF is its very good king attack, as it is known that king safety scales well.

Regarding Komodo, it scales better than other top engines, and what explains this result is that it is the engine currently having most sophisticated positional knowledge.

So it is about relative scaling capabilities.

It is a fact that from DD to 27.02 SF achieved 39 elo increase and lost only 1.7 elo in SMP. At the same time from 25.03 to 26.04 SF achieved 13 elo increase and lost 1.6 elo in SMP. So basically the scaling efficiency here is a factor of 3 to 1. I think this is too much to be accidental. 80% SPRT 0;6 patches in the first case, 50% in the second, this simply can not be accidental.

Now, the problem for me is not simplification in itself, but that simplification becomes a trend. I have never seen so many simplification patches for the whole framework history, and believe me, I have read all of it. Currently, simplification patches exceed by far adding-knowledge patches in numbers, and throughout SF framework development that has never happened. So this seems like a trend.

I would very much like to see another 20 elo increase just from simplifications, and then doing another SMP test. I guess the SMP loss will rise significantly then, but who knows?
User avatar
Ajedrecista
Posts: 1985
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: Interpretation request.

Post by Ajedrecista »

Hi again:
Lyudmil Tsvetkov wrote:Yes, the most important thing is that SF progresses, I am very happy about this.

This is not about rating differences getting smaller with longer TC, but about relative performance at short and longer TC. Some engines are better with STC, like Houdini, which seems to have excellently tuned parameters, but not too much to offer in terms of deep positional knowledge. Other engines, like SF, seem to scale almost equally nowadays at all TC. SF might not have much knowledge, but I think, at least in terms of eval, what scales very well for SF is its very good king attack, as it is known that king safety scales well.

Regarding Komodo, it scales better than other top engines, and what explains this result is that it is the engine currently having most sophisticated positional knowledge.

So it is about relative scaling capabilities.
I am not convinced that Komodo is the best engine that scales better from STC to LTC... in fact I thought it was SF!

Andreas (fastgm) ran a series of different TC matches and Peter Österlund (who is very smart) post-processed the data in a clever way. The result was that SF scales far better that any other engine in view of his numbers:

Re: Playing strength development - increasing time control

Komodo was the second in that list (only referring to top engines). One thing is sure: SF is a great engine! With all the deserved respect to the rest of engines.

Regards from Spain.

Ajedrecista.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Interpretation request.

Post by Lyudmil Tsvetkov »

Ajedrecista wrote:Hi again:
Lyudmil Tsvetkov wrote:Yes, the most important thing is that SF progresses, I am very happy about this.

This is not about rating differences getting smaller with longer TC, but about relative performance at short and longer TC. Some engines are better with STC, like Houdini, which seems to have excellently tuned parameters, but not too much to offer in terms of deep positional knowledge. Other engines, like SF, seem to scale almost equally nowadays at all TC. SF might not have much knowledge, but I think, at least in terms of eval, what scales very well for SF is its very good king attack, as it is known that king safety scales well.

Regarding Komodo, it scales better than other top engines, and what explains this result is that it is the engine currently having most sophisticated positional knowledge.

So it is about relative scaling capabilities.
I am not convinced that Komodo is the best engine that scales better from STC to LTC... in fact I thought it was SF!

Andreas (fastgm) ran a series of different TC matches and Peter Österlund (who is very smart) post-processed the data in a clever way. The result was that SF scales far better that any other engine in view of his numbers:

Re: Playing strength development - increasing time control

Komodo was the second in that list (only referring to top engines). One thing is sure: SF is a great engine! With all the deserved respect to the rest of engines.

Regards from Spain.

Ajedrecista.
Hi Jesus.

This is scaling from 3 sec. per game till 5 minutes per game. I meant real long TC, as the one used in TCEC, where Komodo undoubtedly gains much more than at blitz and rapid.

What kind of game is a game in 3 seconds? I guess you might play without any knowledge at all. But for TCEC time control, you need knowledge.

I am still not certain SF is better than Komodo at TCEC time control, although it looks like this to be the case.

SF is great, but we want it even greater. :D
arjuntemurnikar
Posts: 204
Joined: Tue Oct 15, 2013 10:22 pm
Location: Singapore

Re: Interpretation request.

Post by arjuntemurnikar »

Lyudmil Tsvetkov wrote: Now, the problem for me is not simplification in itself, but that simplification becomes a trend. I have never seen so many simplification patches for the whole framework history, and believe me, I have read all of it. Currently, simplification patches exceed by far adding-knowledge patches in numbers, and throughout SF framework development that has never happened. So this seems like a trend.

I would very much like to see another 20 elo increase just from simplifications, and then doing another SMP test. I guess the SMP loss will rise significantly then, but who knows?
That is because the SPRT parameters for simplification tests weren't defined before. People did try many simplification tests before (mostly 40000 games test) but later it was shown that it is an inferior way to test for simplifications as the probability for a false positive to pass is quite high, and thus it became a concern and so SPRT -3, 1 was designed to minimize the risk of regressions.

As explained by Jesus, all simplification patches are about removing redundant stuff that bogs down or gets in the way of other search and eval terms, so it is always good to have a simplified code base. Simplifications are generalizations are good in programming because they make it easier to understand and debug the code, but it does not mean SF team is ignoring adding features and knowledge.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Interpretation request.

Post by Lyudmil Tsvetkov »

arjuntemurnikar wrote:
Lyudmil Tsvetkov wrote: Now, the problem for me is not simplification in itself, but that simplification becomes a trend. I have never seen so many simplification patches for the whole framework history, and believe me, I have read all of it. Currently, simplification patches exceed by far adding-knowledge patches in numbers, and throughout SF framework development that has never happened. So this seems like a trend.

I would very much like to see another 20 elo increase just from simplifications, and then doing another SMP test. I guess the SMP loss will rise significantly then, but who knows?
That is because the SPRT parameters for simplification tests weren't defined before. People did try many simplification tests before (mostly 40000 games test) but later it was shown that it is an inferior way to test for simplifications as the probability for a false positive to pass is quite high, and thus it became a concern and so SPRT -3, 1 was designed to minimize the risk of regressions.

As explained by Jesus, all simplification patches are about removing redundant stuff that bogs down or gets in the way of other search and eval terms, so it is always good to have a simplified code base. Simplifications are generalizations are good in programming because they make it easier to understand and debug the code, but it does not mean SF team is ignoring adding features and knowledge.
OK, I like this last one. :D

Still, I support my point that simplifications scale worse than knowledge patches at LTC.

Now that we are here, Arjun, and I depend entirely upon you for submitting tests, could you please schedule the more successful blocked pawn patch for standard SPRT? I guess this would be the 40cps bonus. It would be interesting to see the difference in performance at both TC.

Many thanks in advance, Arjun!
arjuntemurnikar
Posts: 204
Joined: Tue Oct 15, 2013 10:22 pm
Location: Singapore

Re: Interpretation request.

Post by arjuntemurnikar »

Lyudmil Tsvetkov wrote: Still, I support my point that simplifications scale worse than knowledge patches at LTC.
I don't think so. There were other functional changes in the period you describe too. Why do you only see the simplifications as candidates for bad scalability? All simplifications committed in recent times have passed both STC and LTC, so I see no reason why they should be less scalable than standard patches that also pass STC and LTC. Anyway, you are again not taking into account that as TC/depth increases, and as the gap in relative strength widens, the apparent elo difference curve flattens out naturally, so what you are seeing are ghosts. You may also cross-check this with regression tests before SF DD release. Each consecutive regression test against master seemed to gain less and less elo, especially in the later stages (~40-65 elo) and that is because of what I explained. It is only natural.
Lyudmil Tsvetkov wrote: Now that we are here, Arjun, and I depend entirely upon you for submitting tests, could you please schedule the more successful blocked pawn patch for standard SPRT? I guess this would be the 40cps bonus. It would be interesting to see the difference in performance at both TC.

Many thanks in advance, Arjun!
It doesn't look like to me that the test did positively. 60,60 did better than 40,40 (which failed quite clearly), but it was still neutral. Perhaps I might try increasing it further 80,80 or 100,100 and seeing if there is improvement. After that if it doesn't work, I think it would be good to give that idea a rest for a while.

Cheers!