Reference Engines?

mvanthoor · Post by **mvanthoor** » Sun Nov 21, 2021 10:49 pm

Madeleine Birchfield wrote: ↑Sun Nov 21, 2021 1:35 pm Another one I would add is BBC 1.0, which like VICE is part of a video tutorial on Youtube, and is also named after a media company for some reason.

Welll; even though its a useful tutorial (BCC certainly is a tutorial engine), I wouldn't call it a reference engine yet, in the same way as I did with VICE, TSCP, and MicroMax/Fairy-Max, because people are not targeting this engine as their reference point yet. At least, I'm not aware of it at the moment.

mvanthoor · Post by **mvanthoor** » Sun Nov 21, 2021 10:52 pm

dangi12012 wrote: ↑Sun Nov 21, 2021 1:18 pm As it stands now ELO is a comparison within a population. You can only say who is on top relative to each other - but what is really missing is a benchmark or reference point to compare across all populations.

This is how Elo works; it compares entities within the same population to one another.

You could create a reference entity such as "Engine X = 3000 Elo @ Y million NPS". Then you can also determine how strong other engines are compared to this reference engine, when running their own time control... but the time control is meaningless if the hardware is different. 2m + 1s on a Pentium 60 is something very different than 2m + 1s on a single core of a Ryzen 5950x.

dangi12012 · Post by **dangi12012** » Sun Nov 21, 2021 11:05 pm

mvanthoor wrote: ↑Sun Nov 21, 2021 10:52 pm
dangi12012 wrote: ↑Sun Nov 21, 2021 1:18 pm As it stands now ELO is a comparison within a population. You can only say who is on top relative to each other - but what is really missing is a benchmark or reference point to compare across all populations.
This is how Elo works; it compares entities within the same population to one another.

You could create a reference entity such as "Engine X = 3000 Elo @ Y million NPS". Then you can also determine how strong other engines are compared to this reference engine, when running their own time control... but the time control is meaningless if the hardware is different. 2m + 1s on a Pentium 60 is something very different than 2m + 1s on a single core of a Ryzen 5950x.

Yes thats exactly the reason why you take NPS out of the equation. With a reference engine you can compare across population boundaries since you would have a reference which is included in both populations.
The idea is sound. - I need to setup a website and build specific stockfish versions for that purpose.
That leaves the last point - If these engines are created:

Ref 1000
Ref 1500
Ref 2000
Ref 2500
Ref 3000
Ref 3250
... Higher increments of 250 points each when it makes sense

They need to be made so that in selfplay they get exactly that score. But also they need to fit in that list with the least amount of standard error. https://ccrl.chessdom.com/ccrl/4040/rat ... t_all.html
Since that list is the best population there is.

What this would enable you to do:
Selfplay against a Reference Engine on your own computer and you know where on the CCRL list your engine would land. Or any other list that includes these reference points.

R. Tomasi · Post by **R. Tomasi** » Sun Nov 21, 2021 11:23 pm

I think having such a kind of reference "population" of engines across a reasonable ELO scale is a compelling idea. I do however think that only basing it on variations of the same engine is problematic. An engine may perform very well against engine A, while against engine B it get's slaughtered - even if A and B have the same ELO rating. That makes the rating obtained by playing against these references a good guess at best, but certainly not anything that would enable you to predict where an engine would end up on CCRL. Another issue to keep in mind is that intervals of 250 or even 500 ELO is way to large. Once one engine is clearly superiour (around 100 ELO better or more) the rating difference obtained will be highly exagerated.

So hm... not sure what a good course of action might be: having to build like 30 versions of (let's say) 5 different engines seems a lot of work to me and probably is not a realistic/reasonable scenario.

mvanthoor · Post by **mvanthoor** » Sun Nov 21, 2021 11:56 pm

And... to be able to know were you're going to fall on the CCRL list, you just run a few 10s+0.1 matches against a few engines. You'll find one that is similar to yours in rating fast enough. Then run a 1m+1s gauntlet against 10 other engines (5 below your guesstimated rating, 5 over, increments of about 50 points or so), 100 games against each engine. Then the reating you obtain in that gauntlet will be pretty similar to where you're going to fall on the CCRL Blitz list.

I did it with the three Alpha versions of Rustic.

Alpha 1 I estimated at 1650 - 1700; result 1677 (after meeting TSCP: before that, the rating was 1695)
Alpha 2 I estimated at 1800 - 1850; result 1817 (at first the rating was 1760 after meeting an engine against which Rustic played very poorly; but in the second roundup, there was an engine against which it played extremely well.)
Alpha 3 I estimated at 1850 - 1890; result 1870

In my current gauntlets, Rustic 4 is scoring 2130 against some engines, 2210 against others, and on average the performance rating is 2160 Elo, so I'm going to give an estimate of 2130 - 2190 when this version is released. I would be very surprised if the rating is a lot below 2130 or over 2190. Under 2150 I'd find disappointing though, over 2170 would be a good performance.

The main differences in CCRL performance vs my own tests would be because of the opponents and opening books CCRL chooses.

Reference Engines?

Re: Reference Engines?

Re: Reference Engines?

Re: Reference Engines?

Re: Reference Engines?

Re: Reference Engines?