Intra OliThink Tournaments

OliverBr · Post by **OliverBr** » Fri Aug 21, 2020 6:35 pm

1) Battle of the Generations

Time Control 40/40, 500 rounds à 2 games (-repeat)

Participants:

1) OliPow 2.2.3 (1997, 2.2e + minor protocol bugfix)
2) OliThink 3.0.7 (2001, 3.0.6 + 3-fold-repetition bugfix)
3) OliThink 4.1.3 (2004, + 64 bit compiler fix)
4) OliThink 5.3.3 (2012, first bugfree version with getDir fix)
5) OliThink 5.6.5 (2020, last version)

Result:

Code: Select all

   # PLAYER            :  RATING  ERROR  POINTS  PLAYED   (%)     W    D     L  D(%)  CFS(%)     (~ELO)
   1 OliThink 5.6.5    :     928     29  3555.0    4000  88.9  3363  384   253   9.6     100     (2525)
   2 OliThink 5.3.3    :     806     29  3126.0    4000  78.2  2911  430   659  10.8     100     (2389)
   3 OliThink 4.1.3    :     505     25  1971.0    4000  49.3  1783  376  1841   9.4     100     (2163)
   4 OliThink 3.0.7    :     274     21  1106.5    4000  27.7   971  271  2758   6.8     100     (2015)
   5 OliPow 2.2.3      :       0   ----   241.5    4000   6.0   163  157  3680   3.9     ---     (1681)

White advantage = 11.00 +/- 5.43
Draw rate (equal opponents) = 17.85 % +/- 0.60

In order to play a tournament with thousands of games the engines needs to be stable. OliThink since 5.3.3 was already stable. So was 4.1.3 but it needed a small code change for 64bit compile. 3.0.7 and 2.2.3 needed one or two bugfixes mainly in protocol.
See source code at https://github.com/olithink/

Funny fact: OliPow 2.2.x was much higher rated in 1997, more about 1900.

Dann Corbit · Post by **Dann Corbit** » Fri Aug 21, 2020 6:44 pm

122 Elo increase from 5.3.3 to 5.6.5 is the same sort of jump as from SF to the SF-nnue turn of events.
Nothing short of incredible, because teeny-tiny 5.3.3 was already surprisingly strong.
What change caused the biggest Elo gain?

It seems odd that the rating column difference of 122 is not the same as the Elo difference of 136.

OliverBr · Post by **OliverBr** » Fri Aug 21, 2020 9:06 pm

Dann Corbit wrote: ↑Fri Aug 21, 2020 6:44 pm 122 Elo increase from 5.3.3 to 5.6.5 is the same sort of jump as from SF to the SF-nnue turn of events.
Nothing short of incredible, because teeny-tiny 5.3.3 was already surprisingly strong.

Thank you very much to mention it in a same sentence with SF-nnue. Hands-Down.
PS: The 122 could be as well 136 (See below).

What change caused the biggest Elo gain?

This is what I am trying to get out now. I didn't release anything after 5.6.5 because there were too many version and I lost overview.
So I rented this AMD EPYC 7502P 32-Core Processor and now I can play many thousands of game within hours and get exact ratings between every verson. There will come more tournaments here between subversions.

If you (or anybody else) need such a test for another engine, I would gladly to do it. Just provide linux64-executables (or unix-compatible source code)!

It seems odd that the rating column difference of 122 is not the same as the Elo difference of 136.

Actually this is what cutechess-cli is saying at the end of the tourney. It's a little different compare to the result of ordo. I am not sure which is correct, so I added it in an extra column:

Code: Select all

Rank Name                          Elo     +/-   Games   Score   Draws
   1 OliThink 5.6.5                355      23    1591   88.5%   10.4%
   2 OliThink 5.3.3                219      19    1593   77.9%   12.1%
   3 OliThink 4.1.3                 -7      16    1593   49.0%    9.5%
   4 OliThink 3.0.6               -155      18    1590   29.0%    6.5%
   5 OliPow 2.2.2                 -489      34    1597    5.7%    3.4%
Finished match

OliverBr · Post by **OliverBr** » Fri Aug 21, 2020 11:47 pm

Ups, I made a mistake. The result below was another run. Just ignore it...

OliverBr wrote: ↑Fri Aug 21, 2020 9:06 pm Actually this is what cutechess-cli is saying at the end of the tourney. It's a little different compare to the result of ordo. I am not sure which is correct, so I added it in an extra column:
Code: Select all
Rank Name                          Elo     +/-   Games   Score   Draws
   1 OliThink 5.6.5                355      23    1591   88.5%   10.4%
   2 OliThink 5.3.3                219      19    1593   77.9%   12.1%
   3 OliThink 4.1.3                 -7      16    1593   49.0%    9.5%
   4 OliThink 3.0.6               -155      18    1590   29.0%    6.5%
   5 OliPow 2.2.2                 -489      34    1597    5.7%    3.4%
Finished match

OliverBr · Post by **OliverBr** » Fri Aug 28, 2020 2:29 am

It looks like I squeezed another ~15 ELO out of it with 5.6.7, mostly by simplifying the code. Should be about 2570 now.

Code: Select all

   # PLAYER            :  RATING  ERROR  POINTS  PLAYED   (%)    W    D    L  D(%)  CFS(%)
   1 OliThink 5.6.7    :      16     12  1044.5    2000  52.2  660  769  571  38.5      99
   2 OliThink 5.6.6    :       0   ----   955.5    2000  47.8  571  769  660  38.5     ---

White advantage = 10.71 +/- 6.18
Draw rate (equal opponents) = 38.54 % +/- 1.06

OliverBr · Post by **OliverBr** » Sat Aug 29, 2020 11:36 pm

There is another ~ 25 ELO points gain from 5.6.7 to 5.6.8 (~15 ELO for Windows)

Code: Select all

   # PLAYER            :  RATING  ERROR  POINTS  PLAYED   (%)     W     D     L  D(%)  CFS(%)
   1 OliThink 5.6.8    :      27     10  1572.5    2920  53.9  1005  1135   780  38.9     100
   2 OliThink 5.6.7    :       0   ----  1347.5    2920  46.1   780  1135  1005  38.9     ---

White advantage = 10.27 +/- 5.22
Draw rate (equal opponents) = 39.09 % +/- 0.89

The code has the still the same size (<50k code, ~100k executable).

OliverBr · Post by **OliverBr** » Sat Aug 29, 2020 11:48 pm

After playing over hundred thousand of games there is a overview about which version/change has changed more than 10 ELO since 5.3.3:

Code: Select all

OliThink 5.6.8	+25	Pawns reduce king attack, compiler optimization
OliThink 5.6.7	+12	More bonus for attack on king's squares. No nullmove on pvnode. Quiesce puffer to 125. Simplify pawn evaluation.
OliThink 5.6.6	+30	Internal Iterative Reduction
OliThink 5.6.5	+25	Hashtable version 2
...
OliThink 5.5.9	+25	Redo "Opposite QR doesn't block". Don't clear hashtable(d) after each move.
OliThink 5.5.5	+20	RN/RB vs R is draw. Lazy evaluation after implementing delta piece counting.
OlIThink 5.5.2	+11	Extend Queen Promotions, don't reduce any promotion. Win NBK vs k.
OliThink 5.5.0	+40	Rewrite Time Management.
OliThink 5.4.13	-25	Clear one hashtable after each move. Increase size.
OliThink 5.4.12 +11 Reduce search for captures with neg. SEE.
OliThink 5.4.3 +30	Aggressive pruning with history heuristics. Fix wrong insufficient material claim.
OliThink 5.4.2 +12	history heuristics increase with depth
OliThink 5.4.1 +30	Move count pruning.

I didn't mention a couple of minor changes and those which were reverted in the next version. The error in 5.4.13 to clear a hashtable after each move has been reverted much later in 5.5.9, so those two are in the list. A couple of versions lost more than 10 ELO (5.4.4, 5.4.6, 5.4.9, 5.4.13). The sum of this list does not represent the entire ELO gain from 5.3.3 to 5.6.8, which should be about 200.

Since 5.6.5 this shouldn't happen again, because I got a test machine.

OliverBr · Post by **OliverBr** » Tue Sep 08, 2020 10:01 am

OliverBr wrote: ↑Fri Aug 21, 2020 6:35 pm Result:

Code: Select all

   # PLAYER            :  RATING  ERROR  POINTS  PLAYED   (%)     W    D     L  D(%)  CFS(%)     (~ELO)
   1 OliThink 5.6.5    :     928     29  3555.0    4000  88.9  3363  384   253   9.6     100     (2525)
   2 OliThink 5.3.3    :     806     29  3126.0    4000  78.2  2911  430   659  10.8     100     (2389)
   3 OliThink 4.1.3    :     505     25  1971.0    4000  49.3  1783  376  1841   9.4     100     (2163)
   4 OliThink 3.0.7    :     274     21  1106.5    4000  27.7   971  271  2758   6.8     100     (2015)
   5 OliPow 2.2.3      :       0   ----   241.5    4000   6.0   163  157  3680   3.9     ---     (1681)

White advantage = 11.00 +/- 5.43
Draw rate (equal opponents) = 17.85 % +/- 0.60

Another tournament has finished. I replaces 5.6.5 with newes version 5.7.4 and a fast time control (40/30 instead of 40/40)

Code: Select all

   # PLAYER            :  RATING  ERROR  POINTS  PLAYED   (%)     W    D     L  D(%)  CFS(%)	(~ELO)
   1 OliThink 5.7.4    :    1052     33  3750.5    4000  93.8  3609  283   108   7.1     100	2643
   2 OliThink 5.3.3    :     798     29  2962.0    4000  74.0  2769  386   845   9.7     100	2389
   3 OliThink 4.1.3    :     506     25  1902.5    4000  47.6  1728  349  1923   8.7     100	2097
   4 OliThink 3.0.7    :     310     22  1174.0    4000  29.4  1051  246  2703   6.2     100	1901
   5 OliPow 2.2.3      :       0   ----   211.0    4000   5.3   142  138  3720   3.5     ---	1591

White advantage = -0.53 +/- 2.54
Draw rate (equal opponents) = 16.67 % +/- 0.62

Another tournament including 4.1.1 and 3.0.1 showed, that they are about as strong as 4.1.3 resp. 3.0.7, which is quite a surprise esp. for 3.0.1.
Unfortunately I don't have any other version of 4, 3 or 2. If anybody knows anything, I would be grateful.

Alayan · Post by **Alayan** » Wed Sep 09, 2020 8:07 pm

It looks like switching to proper testing methods made a big difference for your engine's progress.

OliverBr · Post by **OliverBr** » Wed Sep 09, 2020 10:53 pm

Alayan wrote: ↑Wed Sep 09, 2020 8:07 pm It looks like switching to proper testing methods made a big difference for your engine's progress.

This is correct. There wasn't any more progress since 2012 because I didn't do the right testing.

First Step: In 2020 I restarted with more testing, about 100 matches and I could progress.

Second Step: But this wan't enough, so I rented a 32-core server and this is the way to go: It can play thousands of games within a couple of hours and the result is reliable, so +250 ELOs.

It's getting better: 5.7.5 is released and it looks very strong and the best part is: The code is getting smaller and more simple. 1603 lines now.

This is the prime directive of OliThink: How strong can a engine be with most simple code, methods and evaluation?
I never thought it could reach the 2700s.

Intra OliThink Tournaments

Intra OliThink Tournaments

Re: Intra OliThink Tournaments

Re: Intra OliThink Tournaments

Re: Intra OliThink Tournaments

Re: Intra OliThink Tournaments

Re: Intra OliThink Tournaments

Re: Intra OliThink Tournaments

Re: Intra OliThink Tournaments

Re: Intra OliThink Tournaments

Re: Intra OliThink Tournaments