Feed bayeselo with pure game results without PGN

Sergei S. Markoff · Post by **Sergei S. Markoff** » Thu Mar 08, 2018 2:52 pm

Hello all!

Do you have any ideas how to quickly feed bayeselo with game results array? In my tuning framework I'm going to play several millions of games per day, so storing them in PGN and then feeding them to elostat will be too slow. Of course, it's possible to tweak elostat source to do it, but may be someone already did it?..

Daniel Shawul · Post by **Daniel Shawul** » Thu Mar 08, 2018 5:12 pm

Why not feed it a minimal PGN file with the players and the result ? cutechess-cli should be able to produce minimal PGN.

Note that you can have multiple players in the PGN, each playing either white or black (bayeselo takes into consideration home advantage), with W/D/L result, so it is not a straight forward feed it an array.

I had at one point implemented such a thing in bayeselo using std::map for the players and results that bypasses PGN reading, but i don't have the source code now.

Daniel

Sergei S. Markoff · Post by **Sergei S. Markoff** » Thu Mar 08, 2018 5:40 pm

Daniel Shawul wrote:Why not feed it a minimal PGN file with the players and the result ? cutechess-cli should be able to produce minimal PGN.

Because it's too slow to parse PGN with 10 mln games, even minimal, especially when you need to do it after every batch of games per CPU core.
Moreover, it's hard to even store this PGN. It's better to feed just info of number of wins/draws/losses (i.e. 3000/5000/2500) per each pair of engine versions instead of parsing 10500 minimal PGN game headers.

Note that you can have multiple players in the PGN, each playing either white or black (bayeselo takes into consideration home advantage), with W/D/L result, so it is not a straight forward feed it an array.

It's easy to store W/D/L for white and black separately.

I had at one point implemented such a thing in bayeselo using std::map for the players and results that bypasses PGN reading, but i don't have the source code now.

That's sad

Daniel Shawul · Post by **Daniel Shawul** » Thu Mar 08, 2018 7:02 pm

It's easy to store W/D/L for white and black separately.

Nope you need to feed it the name of the players in the form of string. You are just thinkingof a situation with two players only (as in the case of tuning).

What you need to feed bayeselo, on the other hand, per a game result are:

"white player's name"
"black player's name"
"result"

You can feed in dummy player names but bayeselo expects player names anyway.

I have Bopo source code, derived from Bayeselo with two additonal draw models, with the feature you need if you are interested.

Daniel

P.S: It may be better to just compute the elo from winning percentage using logistic formula in case you have only two players and results. Scorpio does this when tuning evaluation actually

Code you need

Code: Select all

static inline double score_to_elo&#40;double p&#41; &#123;
    return 400.0 * log10&#40;p / &#40;1 - p&#41;);
&#125;
static inline double gamma_to_elo&#40;double g&#41; &#123;
    return 400.0 * log10&#40;g&#41;;
&#125;
static inline double elo_to_gamma&#40;double eloDelta&#41; &#123;
    return pow&#40;10.0,eloDelta / 400.0&#41;;
&#125;
static inline double logistic&#40;double eloDelta&#41; &#123;
    return 1 / &#40;1 + pow&#40;10.0,eloDelta / 400.0&#41;);
&#125;
static inline double gaussian&#40;double eloDelta&#41; &#123;
    return &#40;1 + erf&#40;-eloDelta / 400.0&#41;) / 2;
&#125;
static double win_prob&#40;double eloDelta, int eloH, int eloD&#41; &#123;
    if&#40;ELO_MODEL == 0&#41; &#123;
        return logistic&#40;-eloDelta - eloH + eloD&#41;;
    &#125; else if&#40;ELO_MODEL == 1&#41; &#123;
        double thetaD = elo_to_gamma&#40;eloD&#41;;
        double f = thetaD * sqrt&#40;logistic&#40;eloDelta + eloH&#41; * logistic&#40;-eloDelta - eloH&#41;);
        return logistic&#40;-eloDelta - eloH&#41; / &#40;1 + f&#41;;
    &#125; else &#123;
        return gaussian&#40;-eloDelta - eloH + eloD&#41;;
    &#125;
&#125;
static double loss_prob&#40;double eloDelta, int eloH, int eloD&#41; &#123;
    if&#40;ELO_MODEL == 0&#41; &#123;
        return logistic&#40;eloDelta + eloH + eloD&#41;;
    &#125; else if&#40;ELO_MODEL == 1&#41; &#123;
        double thetaD = elo_to_gamma&#40;eloD&#41;;
        double f = thetaD * sqrt&#40;logistic&#40;eloDelta + eloH&#41; * logistic&#40;-eloDelta - eloH&#41;);
        return logistic&#40;eloDelta + eloH&#41; / &#40;1 + f&#41;;
    &#125; else &#123;
        return gaussian&#40;eloDelta + eloH + eloD&#41;;
    &#125;
&#125;
static double draw_prob&#40;double eloDelta, int eloH, int eloD&#41; &#123;
    return 1 - win_prob&#40;eloDelta,eloH,eloD&#41; - loss_prob&#40;eloDelta,eloH,eloD&#41;;
&#125;
static double get_scale&#40;double eloD, double eloH&#41; &#123;
    const double K = log&#40;10&#41;/400.0;
    double df;
    if&#40;ELO_MODEL == 0&#41; &#123;
        double f = 1.0 / &#40;1 + exp&#40;-K*&#40;eloD - eloH&#41;));
        df = f * &#40;1 - f&#41; * K;
    &#125; else if&#40;ELO_MODEL == 1&#41; &#123;
        double dg = elo_to_gamma&#40;eloD&#41; - 1;
        double f = 1.0 / &#40;1 + exp&#40;-K*&#40;eloD - eloH&#41;));
        double dfx = f * &#40;1 - f&#41;;
        double dx = dg * sqrt&#40;dfx&#41;;
        double b = 1 + dx;
        double c = &#40;dg * f * &#40;1 - 2 * f&#41;) / &#40;2 * sqrt&#40;dfx&#41;);
        df = (&#40;b - c&#41; / &#40;b * b&#41;) * dfx * K;
    &#125; else if&#40;ELO_MODEL == 2&#41; &#123;
        const double pi = 3.14159265359;
        double x = -&#40;eloD - eloH&#41;/400.0;
        df = exp&#40;-x*x&#41; / &#40;400.0 * sqrt&#40;pi&#41;);
    &#125;
    return &#40;4.0 / K&#41; * df;
&#125;
double get_log_likelihood&#40;int result, double se&#41; &#123;
    double factor_m = double&#40;material&#41; / MAX_MATERIAL;
    int eloH = 0; //we have stm bonus
    int eloD = ELO_DRAW + factor_m * ELO_DRAW_SLOPE_PHASE;
    double scale = get_scale&#40;eloD,eloH&#41;;
    se = se / scale;
    if&#40;result == 1&#41;
        return -log&#40;win_prob&#40;se,eloH,eloD&#41;);
    else if&#40;result == -1&#41;
        return -log&#40;loss_prob&#40;se,eloH,eloD&#41;);
    else
        return -log&#40;draw_prob&#40;se,eloH,eloD&#41;);
&#125;

Guenther · Post by **Guenther** » Thu Mar 08, 2018 7:08 pm

Sergei S. Markoff wrote:
That's sad :)

If you would use Ordo you could use the additional ordoprep tool, which
strips off all moves and most headers of the pgn file.
It will leave only the players names and the result and can be processed
further with ordo then.
(AFAIK most people use Ordo anyway nowadays for rating calculation)
It is well documented and open source.)
Using ordoprep speeds up calculations a lot of course when using huge pgn files.

https://github.com/michiguel/Ordo
https://sites.google.com/site/gaviotachessengine/ordo

Strangely I cannot find a newer version of ordoprep than the one given
in the last link. Yet I have a newer one on my HD. If you need it I can send it.

Edit:

Reading Daniels last post which came inbetween it seems you could even use ordoprep and then still calculate the resulting file with bayeselo.
The ordoprep output will be like this:

Code: Select all

&#91;White "Ace 01"&#93;
&#91;Black "MicroChess 1976"&#93;
&#91;Result "0-1"&#93;
0-1

Sergei S. Markoff · Post by **Sergei S. Markoff** » Thu Mar 08, 2018 9:47 pm

Thank you!

But I have a multiple versions, so I need to fit their ratings.
My framework is based on genetical approach. The previous version was based on playing vs base version, but there are two problems: 1) if you will be able to use games of some sibligngs vs each other it will help you to save 50% of time, 2) when you're playing vs base version there is a problem of overfitting — some of your siblings can be successful vs base version but not successful vs broad range of opponents.

Sergei S. Markoff · Post by **Sergei S. Markoff** » Thu Mar 08, 2018 9:52 pm

Thank you!

I'm going to make a tool to feed results in a most compact form, for example:

engine1: {engine2: {w: {100/100/101}, b: {95/100/105}}, engine3: {w: 101/100/100, b: {96

104}}
engine2: ...
etc

So instead of multiple games or game headers you will have just results. In the case you're having more than 10000 engines and several million games it seems to be the only way to store and process this data.

Feed bayeselo with pure game results without PGN

Feed bayeselo with pure game results without PGN

Re: Feed bayeselo with pure game results without PGN

Re: Feed bayeselo with pure game results without PGN

Re: Feed bayeselo with pure game results without PGN

Re: Feed bayeselo with pure game results without PGN

Re: Feed bayeselo with pure game results without PGN

Re: Feed bayeselo with pure game results without PGN