Page 7 of 7
Re: OliThink 5.4.0 has been published with an big leap in strength for only 3 lines of code
Posted: Sun Aug 30, 2020 9:40 pm
by OliverBr
Kieren Pearson wrote: ↑Sun Aug 30, 2020 4:03 pm
Do you use an opening book for your tournaments?
Of course. It's a must to use an opening book to get variety.
I am using 3moves books with thousands of openings. I made some tests with other books and found out, that actually it's mostly luck playing just 100 games. That's why the error is so high.
Before I rented this 32-core computer, I never run more than 100 games. (needed too much time). Now I know that I need as least 2000 games, better 5000 games, too get a safe conclusion.
Re: OliThink 5.4.0 has been published with an big leap in strength for only 3 lines of code
Posted: Mon Aug 31, 2020 3:38 am
by Dann Corbit
I have seen +100 Elo in the early part of a contest (200-300 games) for a change with a net Elo change of +5.
I think when we make a change and see a big jump, we get very excited. That makes these starts stand out and we remember them more than the ones that do not start that way. So kind of a psychological factor.
When I first made smooth scaling null move pruning for Stockfish, after 100 games it as some incredible thing like +150 Elo, but it settled down to a lot less than that.
Naturally, I jumped on the internet to tell everyone of this incredible advance after I saw the early results. It turned out to be a lot more ho-hum than I had hoped for.
So when I see a change that looks tremendous early, I always try to temper my expectations.
I tried an eval hash for Hakkapeliitta which was even worse. It started at +50 Elo, and ended at zero. That was disappointing
Re: OliThink 5.4.0 has been published with an big leap in strength for only 3 lines of code
Posted: Mon Aug 31, 2020 6:51 am
by OliverBr
Dann Corbit wrote: ↑Mon Aug 31, 2020 3:38 am
I have seen +100 Elo in the early part of a contest (200-300 games) for a change with a net Elo change of +5.
I think when we make a change and see a big jump, we get very excited.
...
Naturally, I jumped on the internet to tell everyone of this incredible advance after I saw the early results. It turned out to be a lot more ho-hum than I had hoped for.
Very true. Look at this thread!
"OliThink 5.4.0... big leap...blabla"... Actually now I know that the ELO went 9 down with that version. (see table below)
I finished the Intra-OliThink-tourneys for Version 5 and you can see that this happened quite often.. But not again since 5.6.4. That's when I got the test server.
Code: Select all
OliThink diff ELO
5.0.0 1644
5.0.1 202 1846
5.0.2 2 1848
5.0.3 5 1853
5.0.4 -43 1810
5.0.5 8 1818
5.0.6 126 1944
5.0.7 17 1961
5.0.8 109 2070
5.0.9 21 2091
5.1.0 55 2146
5.1.1 52 2198
5.1.2 19 2217
5.1.3 -34 2183
5.1.4 41 2224
5.1.5 -12 2212
5.1.6 50 2262
5.1.7 -25 2237
5.1.8 -14 2223
5.1.9 -2 2221
5.2.0 0 2221
5.2.1 22 2243
5.2.2 -12 2231
5.2.3 19 2250
5.2.4 -20 2230
5.2.5 5 2235
5.2.6 60 2295
5.2.7 -12 2283
5.2.8 49 2332
5.2.9 24 2356
5.3.0 -18 2338
5.3.1 8 2346
5.3.2 26 2372
5.3.3 0 2372
5.3.4 13 2385
5.3.5 -6 2379
5.4.0 -9 2370
5.4.1 30 2400
5.4.2 12 2412
5.4.3 34 2446
5.4.4 -21 2425
5.4.5 15 2440
5.4.6 -12 2428
5.4.7 9 2437
5.4.8 24 2461
5.4.9 -33 2428
5.4.10 2 2430
5.4.11 9 2439
5.4.12 11 2450
5.4.13 -29 2421
5.5.0 42 2463
5.5.1 -1 2462
5.5.2 11 2473
5.5.3 -8 2465
5.5.4 -9 2456
5.5.5 20 2476
5.5.6 -2 2474
5.5.7 4 2478
5.5.8 4 2482
5.5.9 29 2511
5.6.0 4 2515
5.6.1 9 2524
5.6.2 -2 2522
5.6.3 -2 2520
5.6.4 -1 2519
5.6.5 29 2548
5.6.6 39 2587
5.6.7 14 2601
5.6.8 16 2617
5.6.9 22 2639
The ELOs are calibrated against Version 5.3.2 with an ELO of 2372.
Re: OliThink 5.4.0 has been published with an big leap in strength for only 3 lines of code
Posted: Mon Aug 31, 2020 6:52 am
by jonkr
When I initially started testing changes in games for elo gain, I was just playing around 1000 or 2000 games, and if the elo difference was small sometimes another couple thousand to verify, and that was enough to accumulate elo gaining changes and make good progress. Possibly a lot of the changes weren't actually winners, but it was more good then bad, so progress was quick.
Now though it's a different story... if I get +8 elo on a change after 2000 games, then +4 elo after 4000 games, feels like it's likely an even or negative change, and I would have to play even more games (and maybe try different time controls like the standard LTC test, or different opening set) to have confidence in it. I think a big part of the reason for this originally there was nothing but intuition or a couple of test positions behind why the code was the way it was, now that there's in game testing behind a lot of it, getting lucky is not so likely... It tested well the way it is at least once maybe more, so something different is more likely to be luck and less likely to be actually good.
On the subject of opening books, I do find that even though there can be some randomness in the moves programs choose, going through the same section of the test book often gives similar results. Sometimes the first parts of the test book can start becoming the strongest because I'm not always running full tests with the same amount of games for all changes while I'm working. (Possibly from being actually better at those openings, possibly just from arranging values things so that what amounts to random luck in move choice is leading to better results.)
Re: OliThink 5.4.0 has been published with an big leap in strength for only 3 lines of code
Posted: Mon Aug 31, 2020 7:07 am
by Graham Banks
OliverBr wrote: ↑Sun Aug 30, 2020 11:52 am
5.6.9 has been released. It gains about 25 ELO to 5.6.8, thus OliThink finally has more than ELO 2600.
Gauntlet running in TLCV.
If you install TLCV (Tom's Live Chess Viewer) on your computer, you can watch the games live move by move.
download/file.php?id=42959
Host - GrahamCCRL.dyndns.org Port - 16091
Linux users can use Livius:
https://github.com/kmar/livius
There is also a Livius windows version.
It has live pv boards as a nice addition.
http://www.crabaware.com/livius/
Re: OliThink 5.4.0 has been published with an big leap in strength for only 3 lines of code
Posted: Tue Sep 01, 2020 2:27 pm
by OliverBr
It's really amazing.. Even 900 games are no enough by a long shot.
It's +36 ELO after 900:
Code: Select all
# PLAYER : RATING ERROR POINTS PLAYED (%) W D L D(%) CFS(%)
1 OliThink 5.7.0 : 36 18 497.0 908 54.7 349 296 263 32.6 100
2 OliThink 5.6.9 : 0 ---- 411.0 908 45.3 263 296 349 32.6 ---
White advantage = 95.02 +/- 8.84
Draw rate (equal opponents) = 34.71 % +/- 1.60
Actually, it's only +8 ELO. needed over 3000 games to arrive at the correct rating:
Code: Select all
# PLAYER : RATING ERROR POINTS PLAYED (%) W D L D(%) CFS(%)
1 OliThink 5.7.0 : 8 9 1944.5 3810 51.0 1261 1367 1182 35.9 96
2 OliThink 5.6.9 : 0 ---- 1865.5 3810 49.0 1182 1367 1261 35.9 ---
White advantage = 77.93 +/- 4.66
Draw rate (equal opponents) = 37.33 % +/- 0.82
8 is the correct value, as another tournament with Fruit 2.1 is showing:
Code: Select all
# PLAYER : RATING ERROR POINTS PLAYED (%) W D L D(%) CFS(%)
1 Fruit 2.1 : 68 10 2566.5 4357 58.9 2165 803 1389 18.4 100
2 OliThink 5.6.7 : 8 10 2022.0 4359 46.4 1425 1194 1740 27.4 94
3 OliThink 5.6.9 : 0 ---- 1948.5 4358 44.7 1338 1221 1799 28.0 ---
White advantage = 29.76 +/- 3.74
Draw rate (equal opponents) = 25.09 % +/- 0.55
Re: OliThink 5.4.0 has been published with an big leap in strength for only 3 lines of code
Posted: Thu Sep 03, 2020 1:13 am
by OliverBr
Unfortunately I had to remove 5.7.0 because of a mini-bug. It only affected pinned-pawn-under-promotions, which are extremely rare.
Still, a bug is a bug, so I have released 5.7.1 together with some change in time management.
Re: OliThink 5.4.0 has been published with an big leap in strength for only 3 lines of code
Posted: Sun Sep 06, 2020 2:49 pm
by OliverBr
Dann Corbit wrote: ↑Fri Sep 04, 2020 12:30 am
OliverBr wrote: ↑Thu Sep 03, 2020 9:53 pm
I included the header and released OliThink 5.7.2. Everything is smaller now, code size (45k, 1615 lines), executable size (90kb), memory/cache usage, time usage.
Here is a mini tourney against friends.
Code: Select all
# PLAYER : RATING ERROR POINTS PLAYED (%) W D L D(%) CFS(%)
1 Fruit 2.1 : 37 29 276.0 500 55.2 230 92 178 18.4 57
2 Scorpio_2.8.8 MCTS+NN : 33 28 273.5 500 54.7 236 75 189 15.0 91
3 K2 v.087 : 4 29 252.5 500 50.5 205 95 200 19.0 60
4 OliThink 5.7.2 : 0 ---- 1008.0 2000 50.4 837 342 821 17.1 100
5 Arasan 11.7 : -86 28 190.0 500 38.0 150 80 270 16.0 ---
The first time even ever OliThink won more games than lost in this round.
Looks like you are about to scale K2
5.7.3 finally got K2 (at least in this tournament):
Code: Select all
# PLAYER : RATING ERROR POINTS PLAYED (%) W D L D(%) CFS(%)
1 Scorpio_2.8.8 MCTS+NN : 37 28 276.5 500 55.3 239 75 186 15.0 87
2 Fruit 2.1 : 14 29 260.0 500 52.0 215 90 195 18.0 83
3 OliThink 5.7.3 : 0 ---- 1040.5 2000 52.0 856 369 775 18.4 82
4 K2 v.087 : -13 28 240.5 500 48.1 188 105 207 21.0 100
5 Arasan 11.7 : -97 28 182.5 500 36.5 133 99 268 19.8 ---
White advantage = 3.05 +/- 7.16
Draw rate (equal opponents) = 18.71 % +/- 0.87
Comparing with the result of 5.7.2:
Code: Select all
Scorpio -4
Fruit +23
K2 +17
Arasan +11
============
Average +11.75
Playing h2h against 5.7.2 it's a ELO gain of +13.
Re: OliThink 5.4.0 has been published with an big leap in strength for only 3 lines of code
Posted: Sun Sep 06, 2020 7:27 pm
by Dann Corbit
The mighty Fruit, former world champion, is next.
Re: OliThink 5.4.0 has been published with an big leap in strength for only 3 lines of code
Posted: Tue Sep 08, 2020 1:18 am
by OliverBr
OliThink 5.7.4 has been published and it is a Leela killer.
Seriously, it actually beat Leela in this game:
[pgn]
[Event "Computer Chess Game"]
[Site "Olivers-MacBook-2.local"]
[Date "2020.09.08"]
[Round "-"]
[White "OliThink 5.7.4"]
[Black "Lc0 v0.25.1+git.unknown"]
[Result "1-0"]
[TimeControl "40/30"]
[Annotator "4. +0.27 1... -0.12"]
1. e4 e5 {-0.12/3 0.5} 2. Bc4 Nf6 {+0.00/3 0.5} 3. d3 c6 {+0.01/6 0.5} 4.
Qe2 {+0.27/11 0.4} d5 {+0.18/3 0.5} 5. exd5 {+0.33/13 0.5} cxd5
{+0.09/7 0.5} 6. Bb5+ {+0.44/12 0.1} Nc6 {+0.06/9 0.5} 7. Qxe5+
{+0.30/11 0.4} Be7 {+0.09/10 0.5} 8. Bxc6+ {+0.35/11 0.4} bxc6
{+0.10/8 0.6} 9. Nf3 {+0.26/13 0.4} O-O {+0.08/7 0.4} 10. O-O
{+0.35/13 0.2} Re8 {+0.10/8 0.7} 11. Bf4 {+0.45/13 0.5} Bg4 {+0.12/5 0.6}
12. Nbd2 {+0.29/14 0.2} Nh5 {+0.08/5 0.5} 13. Qc7 {+0.37/14 0.5} Nxf4
{-0.17/5 0.5} 14. Qxf4 Be6 {-0.09/5 0.5} 15. Rfe1 {+0.35/13 0.1} Bd6
{-0.14/4 0.7} 16. Qa4 {+0.34/13 0.2} Qb6 {-0.18/4 0.5} 17. Nb3
{+0.29/12 0.2} a5 {-0.11/4 0.5} 18. c3 {+0.41/13 1.1} h6 {-0.15/5 0.5} 19.
Nfd4 {+0.80/13 0.5} Bd7 {+0.09/3 0.5} 20. Rxe8+ {+0.25/14 0.7} Bxe8
{+0.01/6 0.4} 21. Re1 {+0.41/14 0.4} Bd7 {-0.01/5 0.5} 22. Nf3
{+0.25/15 0.5} Qc7 {-0.03/6 0.4} 23. c4 {+0.29/13} Bb4 {+0.25/5 1.1} 24.
Nbd2 {+0.22/13 0.4} dxc4 {-0.11/4 0.5} 25. dxc4 {+0.35/14 0.5} Rd8
{-0.09/4 0.5} 26. a3 {+0.43/16 0.9} Bf8 {-0.15/7 0.5} 27. Qc2
{+0.46/12 0.5} c5 {-0.14/4 0.5} 28. b3 {+0.69/13 0.5} a4 {-0.06/3 0.4} 29.
bxa4 {+0.45/14 0.4} Qa5 {-0.04/5 0.4} 30. Ne4 {+0.29/13 0.1} Bxa4
{-0.04/5 0.7} 31. Qc3 {+0.05/14 0.2} Qa6 {-0.09/6 0.4} 32. Ne5
{+0.42/12 0.7} Rd4 {-0.10/4 0.4} 33. Nd2 {+0.32/13 0.5} Qd6 {-0.10/5 0.4}
34. Ndf3 {+0.20/15 0.8} Rd1 {-0.12/7 0.4} 35. h3 {+0.18/14 0.6} Be7
{-0.20/5 0.4} 36. Qe3 {+0.25/14 0.7} Bd8 {-0.20/5 0.4} 37. Rxd1
{+0.16/15 0.9} Qxd1+ {-0.24/7 0.3} 38. Kh2 {+0.19/16 0.8} Qd6 {-0.28/6 0.5}
39. g3 {+0.14/15 0.4} Bc7 {-0.29/5 0.4} 40. Qe4 {+0.47/15 1.2} Qe6
{-0.32/6 0.4} 41. Qa8+ {+0.30/14 0.4} Be8 {-0.10/7 1.1} 42. Nd3
{+0.14/17 0.2} Kf8 {-0.15/7 1.2} 43. Nxc5 {+0.48/13 0.4} Qxc4 {+0.00/8 1.4}
44. Ne5 {+0.31/13 0.6} Qe2 {-0.16/5 2.0} 45. Ned3 {+0.45/14 0.5} Bd6
{-0.18/6 1.2} 46. Qd5 {+0.38/17 0.2} Be7 {-0.35/6 1.3} 47. a4
{+0.79/14 0.5} Bxc5 {-0.40/7 1.2} 48. Qxc5+ {+0.47/15 0.5} Kg8
{-0.37/10 0.7} 49. Qc8 {+0.77/17 0.6} f6 {-0.04/7 1.0} 50. a5
{+0.85/13 0.4} Kh7 {+0.00/8 0.8} 51. Qf5+ {+2.05/15 0.5} Bg6 {+0.13/7 1.1}
52. Qxg6+ {+6.25/14 0.4} Kxg6 {-71.22/5 1.1} 53. Nf4+ {+14.36/15 1.1} Kf5
{-83.80/5 0.7} 54. Nxe2 {+14.80/16 0.3} Kg6 {-77.89/4 0.7} 55. a6
{+15.01/12 1.4} f5 {-78.92/1 0.6} 56. a7 {+16.33/11 0.9} Kf7 {-89.93/4 0.8}
57. a8=Q {+16.52/10 1.2} h5 {-118.45/1 0.4} 58. Nf4 {+16.70/9 0.4} g5
{-108.12/2 0.5} 59. Qd5+ {+1000.06/11 0.1} Kg7 {-104.17/2 0.6} 60. Nxh5+
{+1000.05/9} Kh6 {-121.86/1 0.2} 61. Qxf5 {+1000.03/5} g4 {-25.27/3 0.3}
62. Qxg4 {+1000.02/3} Kh7 {+0.00/1 0.1} 63. Qg7# {+1000.01/1 0.1}
{Xboard adjudication: Checkmate} 1-0
[/pgn]
Ok, it was a fast game, and Leela didn't have a GeForce 1080Ti this time, but still a decent AMD GPU with RX 560.
One position was her downfall on move 51 for white:
[d]2Q1b3/6pk/5p1p/P7/8/3N2PP/4qP1K/8 w - - 1 51
In post game analysis , Leela insists for a long time on white playing a6, while the clear (and very obvious) winner is Qf5+. She needs up to 1 minute thinking time to see this.
Is this Leela trolling again? Give away the knight without any need?!
PS: 51...Bg6??? Oh, she is defending to badly.