OliThink 5.4.0 has been published with an big leap in strength for only 3 lines of code

OliverBr · Post by **OliverBr** » Sun Aug 30, 2020 9:40 pm

Kieren Pearson wrote: ↑Sun Aug 30, 2020 4:03 pm Do you use an opening book for your tournaments?

Of course. It's a must to use an opening book to get variety.

I am using 3moves books with thousands of openings. I made some tests with other books and found out, that actually it's mostly luck playing just 100 games. That's why the error is so high.

Before I rented this 32-core computer, I never run more than 100 games. (needed too much time). Now I know that I need as least 2000 games, better 5000 games, too get a safe conclusion.

Dann Corbit · Post by **Dann Corbit** » Mon Aug 31, 2020 3:38 am

I have seen +100 Elo in the early part of a contest (200-300 games) for a change with a net Elo change of +5.
I think when we make a change and see a big jump, we get very excited. That makes these starts stand out and we remember them more than the ones that do not start that way. So kind of a psychological factor.

When I first made smooth scaling null move pruning for Stockfish, after 100 games it as some incredible thing like +150 Elo, but it settled down to a lot less than that.

Naturally, I jumped on the internet to tell everyone of this incredible advance after I saw the early results. It turned out to be a lot more ho-hum than I had hoped for.

So when I see a change that looks tremendous early, I always try to temper my expectations.

I tried an eval hash for Hakkapeliitta which was even worse. It started at +50 Elo, and ended at zero. That was disappointing

OliverBr · Post by **OliverBr** » Mon Aug 31, 2020 6:51 am

Dann Corbit wrote: ↑Mon Aug 31, 2020 3:38 am I have seen +100 Elo in the early part of a contest (200-300 games) for a change with a net Elo change of +5.
I think when we make a change and see a big jump, we get very excited.
...
Naturally, I jumped on the internet to tell everyone of this incredible advance after I saw the early results. It turned out to be a lot more ho-hum than I had hoped for.

Very true. Look at this thread!
"OliThink 5.4.0... big leap...blabla"... Actually now I know that the ELO went 9 down with that version. (see table below)

I finished the Intra-OliThink-tourneys for Version 5 and you can see that this happened quite often.. But not again since 5.6.4. That's when I got the test server.

Code: Select all

OliThink diff	ELO
5.0.0		1644
5.0.1	202	1846
5.0.2	2	1848
5.0.3	5	1853
5.0.4	-43	1810
5.0.5	8	1818
5.0.6	126	1944
5.0.7	17	1961
5.0.8	109	2070
5.0.9	21	2091
5.1.0	55	2146
5.1.1	52	2198
5.1.2	19	2217
5.1.3	-34	2183
5.1.4	41	2224
5.1.5	-12	2212
5.1.6	50	2262
5.1.7	-25	2237
5.1.8	-14	2223
5.1.9	-2	2221
5.2.0	0	2221
5.2.1	22	2243
5.2.2	-12	2231
5.2.3	19	2250
5.2.4	-20	2230
5.2.5	5	2235
5.2.6	60	2295
5.2.7	-12	2283
5.2.8	49	2332
5.2.9	24	2356
5.3.0	-18	2338
5.3.1	8	2346
5.3.2	26	2372
5.3.3	0	2372
5.3.4	13	2385
5.3.5	-6	2379
5.4.0	-9	2370
5.4.1	30	2400
5.4.2	12	2412
5.4.3	34	2446
5.4.4	-21	2425
5.4.5	15	2440
5.4.6	-12	2428
5.4.7	9	2437
5.4.8	24	2461
5.4.9	-33	2428
5.4.10	2	2430
5.4.11	9	2439
5.4.12	11	2450
5.4.13	-29	2421
5.5.0	42	2463
5.5.1	-1	2462
5.5.2	11	2473
5.5.3	-8	2465
5.5.4	-9	2456
5.5.5	20	2476
5.5.6	-2	2474
5.5.7	4	2478
5.5.8	4	2482
5.5.9	29	2511
5.6.0	4	2515
5.6.1	9	2524
5.6.2	-2	2522
5.6.3	-2	2520
5.6.4	-1	2519
5.6.5	29	2548
5.6.6	39	2587
5.6.7	14	2601
5.6.8	16	2617
5.6.9	22	2639

The ELOs are calibrated against Version 5.3.2 with an ELO of 2372.

jonkr · Post by **jonkr** » Mon Aug 31, 2020 6:52 am

When I initially started testing changes in games for elo gain, I was just playing around 1000 or 2000 games, and if the elo difference was small sometimes another couple thousand to verify, and that was enough to accumulate elo gaining changes and make good progress. Possibly a lot of the changes weren't actually winners, but it was more good then bad, so progress was quick.

Now though it's a different story... if I get +8 elo on a change after 2000 games, then +4 elo after 4000 games, feels like it's likely an even or negative change, and I would have to play even more games (and maybe try different time controls like the standard LTC test, or different opening set) to have confidence in it. I think a big part of the reason for this originally there was nothing but intuition or a couple of test positions behind why the code was the way it was, now that there's in game testing behind a lot of it, getting lucky is not so likely... It tested well the way it is at least once maybe more, so something different is more likely to be luck and less likely to be actually good.

On the subject of opening books, I do find that even though there can be some randomness in the moves programs choose, going through the same section of the test book often gives similar results. Sometimes the first parts of the test book can start becoming the strongest because I'm not always running full tests with the same amount of games for all changes while I'm working. (Possibly from being actually better at those openings, possibly just from arranging values things so that what amounts to random luck in move choice is leading to better results.)

Graham Banks · Post by **Graham Banks** » Mon Aug 31, 2020 7:07 am

OliverBr wrote: ↑Sun Aug 30, 2020 11:52 am 5.6.9 has been released. It gains about 25 ELO to 5.6.8, thus OliThink finally has more than ELO 2600.

Gauntlet running in TLCV.

If you install TLCV (Tom's Live Chess Viewer) on your computer, you can watch the games live move by move.
download/file.php?id=42959
Host - GrahamCCRL.dyndns.org Port - 16091

Linux users can use Livius:
https://github.com/kmar/livius

There is also a Livius windows version.
It has live pv boards as a nice addition.
http://www.crabaware.com/livius/

OliverBr · Post by **OliverBr** » Tue Sep 01, 2020 2:27 pm

It's really amazing.. Even 900 games are no enough by a long shot.

It's +36 ELO after 900:

Code: Select all

   # PLAYER            :  RATING  ERROR  POINTS  PLAYED   (%)    W    D    L  D(%)  CFS(%)
   1 OliThink 5.7.0    :      36     18   497.0     908  54.7  349  296  263  32.6     100
   2 OliThink 5.6.9    :       0   ----   411.0     908  45.3  263  296  349  32.6     ---

White advantage = 95.02 +/- 8.84
Draw rate (equal opponents) = 34.71 % +/- 1.60

Actually, it's only +8 ELO. needed over 3000 games to arrive at the correct rating:

Code: Select all

   # PLAYER            :  RATING  ERROR  POINTS  PLAYED   (%)     W     D     L  D(%)  CFS(%)
   1 OliThink 5.7.0    :       8      9  1944.5    3810  51.0  1261  1367  1182  35.9      96
   2 OliThink 5.6.9    :       0   ----  1865.5    3810  49.0  1182  1367  1261  35.9     ---

White advantage = 77.93 +/- 4.66
Draw rate (equal opponents) = 37.33 % +/- 0.82

8 is the correct value, as another tournament with Fruit 2.1 is showing:

Code: Select all

   # PLAYER            :  RATING  ERROR  POINTS  PLAYED   (%)     W     D     L  D(%)  CFS(%)
   1 Fruit 2.1         :      68     10  2566.5    4357  58.9  2165   803  1389  18.4     100
   2 OliThink 5.6.7    :       8     10  2022.0    4359  46.4  1425  1194  1740  27.4      94
   3 OliThink 5.6.9    :       0   ----  1948.5    4358  44.7  1338  1221  1799  28.0     ---

White advantage = 29.76 +/- 3.74
Draw rate (equal opponents) = 25.09 % +/- 0.55

OliverBr · Post by **OliverBr** » Thu Sep 03, 2020 1:13 am

Unfortunately I had to remove 5.7.0 because of a mini-bug. It only affected pinned-pawn-under-promotions, which are extremely rare.
Still, a bug is a bug, so I have released 5.7.1 together with some change in time management.

OliverBr · Post by **OliverBr** » Sun Sep 06, 2020 2:49 pm

Dann Corbit wrote: ↑Fri Sep 04, 2020 12:30 am
OliverBr wrote: ↑Thu Sep 03, 2020 9:53 pm I included the header and released OliThink 5.7.2. Everything is smaller now, code size (45k, 1615 lines), executable size (90kb), memory/cache usage, time usage.
Here is a mini tourney against friends.
Code: Select all
   # PLAYER                   :  RATING  ERROR  POINTS  PLAYED   (%)    W    D    L  D(%)  CFS(%)
   1 Fruit 2.1                :      37     29   276.0     500  55.2  230   92  178  18.4      57
   2 Scorpio_2.8.8 MCTS+NN    :      33     28   273.5     500  54.7  236   75  189  15.0      91
   3 K2 v.087                 :       4     29   252.5     500  50.5  205   95  200  19.0      60
   4 OliThink 5.7.2           :       0   ----  1008.0    2000  50.4  837  342  821  17.1     100
   5 Arasan 11.7              :     -86     28   190.0     500  38.0  150   80  270  16.0     ---
The first time even ever OliThink won more games than lost in this round.
Looks like you are about to scale K2

5.7.3 finally got K2 (at least in this tournament):

Code: Select all

   # PLAYER                   :  RATING  ERROR  POINTS  PLAYED   (%)    W    D    L  D(%)  CFS(%)
   1 Scorpio_2.8.8 MCTS+NN    :      37     28   276.5     500  55.3  239   75  186  15.0      87
   2 Fruit 2.1                :      14     29   260.0     500  52.0  215   90  195  18.0      83
   3 OliThink 5.7.3           :       0   ----  1040.5    2000  52.0  856  369  775  18.4      82
   4 K2 v.087                 :     -13     28   240.5     500  48.1  188  105  207  21.0     100
   5 Arasan 11.7              :     -97     28   182.5     500  36.5  133   99  268  19.8     ---

White advantage = 3.05 +/- 7.16
Draw rate (equal opponents) = 18.71 % +/- 0.87

Comparing with the result of 5.7.2:

Code: Select all

Scorpio	-4
Fruit	+23
K2	+17
Arasan	+11
============
Average	+11.75

Playing h2h against 5.7.2 it's a ELO gain of +13.

Dann Corbit · Post by **Dann Corbit** » Sun Sep 06, 2020 7:27 pm

The mighty Fruit, former world champion, is next.

OliverBr · Post by **OliverBr** » Tue Sep 08, 2020 1:18 am

OliThink 5.7.4 has been published and it is a Leela killer.
Seriously, it actually beat Leela in this game:

[pgn]
[Event "Computer Chess Game"]
[Site "Olivers-MacBook-2.local"]
[Date "2020.09.08"]
[Round "-"]
[White "OliThink 5.7.4"]
[Black "Lc0 v0.25.1+git.unknown"]
[Result "1-0"]
[TimeControl "40/30"]
[Annotator "4. +0.27 1... -0.12"]

1. e4 e5 {-0.12/3 0.5} 2. Bc4 Nf6 {+0.00/3 0.5} 3. d3 c6 {+0.01/6 0.5} 4.
Qe2 {+0.27/11 0.4} d5 {+0.18/3 0.5} 5. exd5 {+0.33/13 0.5} cxd5
{+0.09/7 0.5} 6. Bb5+ {+0.44/12 0.1} Nc6 {+0.06/9 0.5} 7. Qxe5+
{+0.30/11 0.4} Be7 {+0.09/10 0.5} 8. Bxc6+ {+0.35/11 0.4} bxc6
{+0.10/8 0.6} 9. Nf3 {+0.26/13 0.4} O-O {+0.08/7 0.4} 10. O-O
{+0.35/13 0.2} Re8 {+0.10/8 0.7} 11. Bf4 {+0.45/13 0.5} Bg4 {+0.12/5 0.6}
12. Nbd2 {+0.29/14 0.2} Nh5 {+0.08/5 0.5} 13. Qc7 {+0.37/14 0.5} Nxf4
{-0.17/5 0.5} 14. Qxf4 Be6 {-0.09/5 0.5} 15. Rfe1 {+0.35/13 0.1} Bd6
{-0.14/4 0.7} 16. Qa4 {+0.34/13 0.2} Qb6 {-0.18/4 0.5} 17. Nb3
{+0.29/12 0.2} a5 {-0.11/4 0.5} 18. c3 {+0.41/13 1.1} h6 {-0.15/5 0.5} 19.
Nfd4 {+0.80/13 0.5} Bd7 {+0.09/3 0.5} 20. Rxe8+ {+0.25/14 0.7} Bxe8
{+0.01/6 0.4} 21. Re1 {+0.41/14 0.4} Bd7 {-0.01/5 0.5} 22. Nf3
{+0.25/15 0.5} Qc7 {-0.03/6 0.4} 23. c4 {+0.29/13} Bb4 {+0.25/5 1.1} 24.
Nbd2 {+0.22/13 0.4} dxc4 {-0.11/4 0.5} 25. dxc4 {+0.35/14 0.5} Rd8
{-0.09/4 0.5} 26. a3 {+0.43/16 0.9} Bf8 {-0.15/7 0.5} 27. Qc2
{+0.46/12 0.5} c5 {-0.14/4 0.5} 28. b3 {+0.69/13 0.5} a4 {-0.06/3 0.4} 29.
bxa4 {+0.45/14 0.4} Qa5 {-0.04/5 0.4} 30. Ne4 {+0.29/13 0.1} Bxa4
{-0.04/5 0.7} 31. Qc3 {+0.05/14 0.2} Qa6 {-0.09/6 0.4} 32. Ne5
{+0.42/12 0.7} Rd4 {-0.10/4 0.4} 33. Nd2 {+0.32/13 0.5} Qd6 {-0.10/5 0.4}
34. Ndf3 {+0.20/15 0.8} Rd1 {-0.12/7 0.4} 35. h3 {+0.18/14 0.6} Be7
{-0.20/5 0.4} 36. Qe3 {+0.25/14 0.7} Bd8 {-0.20/5 0.4} 37. Rxd1
{+0.16/15 0.9} Qxd1+ {-0.24/7 0.3} 38. Kh2 {+0.19/16 0.8} Qd6 {-0.28/6 0.5}
39. g3 {+0.14/15 0.4} Bc7 {-0.29/5 0.4} 40. Qe4 {+0.47/15 1.2} Qe6
{-0.32/6 0.4} 41. Qa8+ {+0.30/14 0.4} Be8 {-0.10/7 1.1} 42. Nd3
{+0.14/17 0.2} Kf8 {-0.15/7 1.2} 43. Nxc5 {+0.48/13 0.4} Qxc4 {+0.00/8 1.4}
44. Ne5 {+0.31/13 0.6} Qe2 {-0.16/5 2.0} 45. Ned3 {+0.45/14 0.5} Bd6
{-0.18/6 1.2} 46. Qd5 {+0.38/17 0.2} Be7 {-0.35/6 1.3} 47. a4
{+0.79/14 0.5} Bxc5 {-0.40/7 1.2} 48. Qxc5+ {+0.47/15 0.5} Kg8
{-0.37/10 0.7} 49. Qc8 {+0.77/17 0.6} f6 {-0.04/7 1.0} 50. a5
{+0.85/13 0.4} Kh7 {+0.00/8 0.8} 51. Qf5+ {+2.05/15 0.5} Bg6 {+0.13/7 1.1}
52. Qxg6+ {+6.25/14 0.4} Kxg6 {-71.22/5 1.1} 53. Nf4+ {+14.36/15 1.1} Kf5
{-83.80/5 0.7} 54. Nxe2 {+14.80/16 0.3} Kg6 {-77.89/4 0.7} 55. a6
{+15.01/12 1.4} f5 {-78.92/1 0.6} 56. a7 {+16.33/11 0.9} Kf7 {-89.93/4 0.8}
57. a8=Q {+16.52/10 1.2} h5 {-118.45/1 0.4} 58. Nf4 {+16.70/9 0.4} g5
{-108.12/2 0.5} 59. Qd5+ {+1000.06/11 0.1} Kg7 {-104.17/2 0.6} 60. Nxh5+
{+1000.05/9} Kh6 {-121.86/1 0.2} 61. Qxf5 {+1000.03/5} g4 {-25.27/3 0.3}
62. Qxg4 {+1000.02/3} Kh7 {+0.00/1 0.1} 63. Qg7# {+1000.01/1 0.1}
{Xboard adjudication: Checkmate} 1-0
[/pgn]

Ok, it was a fast game, and Leela didn't have a GeForce 1080Ti this time, but still a decent AMD GPU with RX 560.
One position was her downfall on move 51 for white:

[d]2Q1b3/6pk/5p1p/P7/8/3N2PP/4qP1K/8 w - - 1 51

In post game analysis , Leela insists for a long time on white playing a6, while the clear (and very obvious) winner is Qf5+. She needs up to 1 minute thinking time to see this.
Is this Leela trolling again? Give away the knight without any need?!

PS: 51...Bg6??? Oh, she is defending to badly.

OliThink 5.4.0 has been published with an big leap in strength for only 3 lines of code

Re: OliThink 5.4.0 has been published with an big leap in strength for only 3 lines of code

Re: OliThink 5.4.0 has been published with an big leap in strength for only 3 lines of code

Re: OliThink 5.4.0 has been published with an big leap in strength for only 3 lines of code

Re: OliThink 5.4.0 has been published with an big leap in strength for only 3 lines of code

Re: OliThink 5.4.0 has been published with an big leap in strength for only 3 lines of code

Re: OliThink 5.4.0 has been published with an big leap in strength for only 3 lines of code

Re: OliThink 5.4.0 has been published with an big leap in strength for only 3 lines of code

Re: OliThink 5.4.0 has been published with an big leap in strength for only 3 lines of code

Re: OliThink 5.4.0 has been published with an big leap in strength for only 3 lines of code

Re: OliThink 5.4.0 has been published with an big leap in strength for only 3 lines of code