OliThink 5.9.5 is very small

Ras · Post by **Ras** » Sun Jun 06, 2021 6:41 pm

mvanthoor wrote: ↑Sun Jun 06, 2021 6:26 pm50K games would run for days and days on end...

I need about a day for that.

(I don't want to run them on my laptop, to be honest.)

I do that on my desktop with a CPU cooler that is clearly oversized in relation to the CPU's power consumption so that even sustained full load stays inaudible at currently 43°C CPU temperature.

I assume killer moves and PVS (and possibly AW) are not below the error margins.

Then you should see that at least in self-play. The problem with other engines is that if your engine has N serious weaknesses, and you fix one of them, you still won't see much of a difference because it will instead lose due to one of the other N-1 weaknesses. While self-play inflates the gains, it does allow to test individual features.

mvanthoor · Post by **mvanthoor** » Sun Jun 06, 2021 7:01 pm

Ras wrote: ↑Sun Jun 06, 2021 6:41 pm
mvanthoor wrote: ↑Sun Jun 06, 2021 6:26 pm50K games would run for days and days on end...
I need about a day for that.

What monstrous quadcore is that? I only have an i7-6700K. After the current tests, which often take more than one night (10 hours or so), I think extending the tests to 20K or 50K games would take at least 2 days.

I do that on my desktop with a CPU cooler that is clearly oversized in relation to the CPU's power consumption so that even sustained full load stays inaudible at currently 43°C CPU temperature.

Same here. I always use oversized hardware for what I want to do, so it stays cool and thus quiet.

Then you should see that at least in self-play. The problem with other engines is that if your engine has N serious weaknesses, and you fix one of them, you still won't see much of a difference because it will instead lose due to one of the other N-1 weaknesses. While self-play inflates the gains, it does allow to test individual features.

I noticed. Some engines don't really mind if my engine becomes faster and gains even 2 ply on them; they just keep scoring the same. Some other engines are _very_ sensitive to a speed gain in their opponent though.

I think that engines which get their playing strength from search depth are more sensitive to an engine that is "catching up" in search depth; where other engines, that get most of their strength due to a good evaluation function, are less sensitive to opponents becoming faster. (At least, within reason.)

Now I'll probably change my testing protocol again:
1. Create e new feature.
2. Test it in self-play against the dev-version, using SPRT
3. When the feature succeeds, merge it into dev.
4. Go back to 1 for the next feature
5. When all features are done, test dev against master
6. Test dev against a small selection of engines

Step 6 should get me at least in the ballpark with regard to the rating it is going to get in a CCRL test.

OliverBr · Post by **OliverBr** » Sun Jun 06, 2021 8:36 pm

mvanthoor wrote: ↑Sun Jun 06, 2021 6:26 pm 50K games would run for days and days on end... I can't do that yet, until I have a dedicated computer to run those tests. (I don't want to run them on my laptop, to be honest.)

If you use a 16 core engine and small time controls (e.g. 40/12), it can be done in much less time. That's what I am doing.
I rent a server, so I don't have to handle with overheating.

So this one has just finished and needed about half a day. I am quite certain about the performance win, in this case.

Code: Select all

   # PLAYER            :  RATING  ERROR   POINTS  PLAYED   (%)     W     D     L  D(%)  CFS(%)
   1 OliThink 5.9.8    :      18      4  10505.0   20000  52.5  5601  9808  4591  49.0     100
   2 OliThink 5.9.7    :       0   ----   9495.0   20000  47.5  4591  9808  5601  49.0     ---

White advantage = 12.69 +/- 1.78
Draw rate (equal opponents) = 49.23 % +/- 0.34

So, what would be better:
- run a 10 engine gauntlet, with 500 games per engine...
- run an SPRT-test between two versions of the same engine, and if the new version is stronger, call it a day and release it, using the self-play Elo in the progression chart.

I stumble over this question, too. There are cases, where the engines improves significally against the former version, but not against other engines and vice versa. A mix would be great. Playing against itself yields quicker and more exact results.

I've been thinking about that. Maybe just create a very small opening book.

I am using a couple of books with 3-moves opening. Believe it or not, even those are many thousands. Of course, it's important to use as many different openings as possible.

Ras · Post by **Ras** » Sun Jun 06, 2021 8:57 pm

mvanthoor wrote: ↑Sun Jun 06, 2021 7:01 pmWhat monstrous quadcore is that? I only have an i7-6700K.

My 3400G offers similar performance as your CPU, just at lower power draw, and it also has graphics integrated. The kicker to get many games is 10s/game as TC. Against other engines, I don't look much at the absolute winning percentage - rather whether things improve.

I think that engines which get their playing strength from search depth are more sensitive to an engine that is "catching up" in search depth; where other engines, that get most of their strength due to a good evaluation function, are less sensitive to opponents becoming faster. (At least, within reason.)

That makes sense.

OliverBr · Post by **OliverBr** » Sun Jun 06, 2021 11:35 pm

Bitboard enthusiasts may be interested in the latest change in the bitboard implementation of OliThink.

Finally, only 6 bits are used instead of 7. I am expecting a 5 ELO gain from this.

https://github.com/olithink/OliThink/co ... a22e6c0dcf

Note: When I first implemented this move generator in OliThink (OliPerft) in 2007, I had never heard of "Kindergarten Bitboards". It was a completely detached development on my own:

You can see the first implementation here:
https://github.com/olithink/OliThink/co ... 5d44848a26

Kotlov · Post by **Kotlov** » Mon Jun 07, 2021 1:42 pm

OliverBr wrote: ↑Sun Jun 06, 2021 11:35 pm Finally, only 6 bits are used instead of 7. I am expecting a 5 ELO gain from this.

bits or bitboards(U64)?

OliverBr · Post by **OliverBr** » Mon Jun 07, 2021 10:25 pm

Kotlov wrote: ↑Mon Jun 07, 2021 1:42 pm
OliverBr wrote: ↑Sun Jun 06, 2021 11:35 pm Finally, only 6 bits are used instead of 7. I am expecting a 5 ELO gain from this.
bits or bitboards(U64)?

6 bits. You only need to know the inner 6 bits of a file/rank in order to calculate the possible sliding moves.

Including 5.9.8 OlIThink is precomputing 7 bits which is more than necessary. This shows that OliThink's move generator had been independently developed.

OliverBr · Post by **OliverBr** » Fri Jun 11, 2021 11:34 pm

OliThink 5.9.9 has been released.

Finally it features half-sized bitboards and some minor changes that improved the engine a little while being even smaller.

The original targets of OliThink are still valid:

- Just mobility evaluation.
- No pre-calculated data.
- Compact and efficient.

OliThink 5.9.9 has an ELO of about 2950 on Linux. It has been developed on MacOSX and tested on Linux, so it may be perform better there than on Windows.
As a result of its nature (small and efficient) it is stronger on very fast time controls like 40/1 (one second for 40 moves).

There is a 1:1 Java port, which has the exact same calculation, but needs about 50% more time. (and more memory).

Notable mention to two engines being also quite compact (but larger and much stronger): Xiphos and Weiss.
Thanks to two sparring partners featuring hundreds of thousands of games: Halogen 8.1 and Drofa 3.0.0

Time Control 40/16:

Code: Select all

   # PLAYER             :  RATING  ERROR  POINTS  PLAYED   (%)     W     D     L  D(%)  CFS(%)
   1 Halogen 8.1        :      55      5  8333.0   14487  57.5  6833  3000  4654  20.7     100
   2 OliThink 5.9.9     :       2      5  6735.0   14489  46.5  4100  5270  5119  36.4      80
   3 OliThink 5.9.8b    :       0   ----  6664.0   14488  46.0  4021  5286  5181  36.5     ---

White advantage = 39.23 +/- 1.98
Draw rate (equal opponents) = 31.83 % +/- 0.32

Time Control 40/1:

Code: Select all

   # PLAYER            :  RATING  ERROR  POINTS  PLAYED   (%)     W    D     L  D(%)  CFS(%)
   1 OliThink 5.9.9    :      68      8  3575.5    6008  59.5  3162  827  2019  13.8     100
   2 Halogen 8.1       :       0   ----  2432.5    6008  40.5  2019  827  3162  13.8     ---

White advantage = 6.12 +/- 4.50
Draw rate (equal opponents) = 14.07 % +/- 0.46

OliThink 5.9.5 is very small

Re: OliThink 5.9.5 is very small

Re: OliThink 5.9.5 is very small

Re: OliThink 5.9.5 is very small

Re: OliThink 5.9.5 is very small

Re: OliThink 5.9.5 is very small

Re: OliThink 5.9.5 is very small

Re: OliThink 5.9.5 is very small

Re: OliThink 5.9.5 is very small