Throwing out draws to calculate Elo

Dann Corbit · Post by **Dann Corbit** » Wed Jul 01, 2020 2:08 am

Nice discussion Ovyron, but I don't think anyone understands what I am saying (probably because I am not communicating very effectively). Lots of intelligent people do not understand what I am saying, which means I am not doing a good job explaining.

Earlier, we discovered that with 1000 wins and 0 losses, the Elo difference for the winning side is +infinity.
Earlier, we discovered that with 1000 wins, 0 losses and 1000000000000000000000000000 draws is zero.
So the Elo calculation says A is infinitely stronger than B given the first set of outcomes.
The Elo calculation says that A is exactly the same strength as B, given the second set of outcomes.
Yet LOS says that A has exactly the same probability of being stronger in both scenarios.

So what is Elo? It is a predictor of strength.
It says the first set of numbers predicts A is definitely much, much stronger.
It says that the second set of numbers predicts A and B have exactly the same strength.
But LOS says the odds that A is stronger than B is identical with both sets of numbers.
Put another way, Elo predicts equality. LOS says dominatingly sure that A is stronger than B.
Perhaps even dominatingly sure would not be a problem. But the SAME prediction as the first set projects simply does not make sense.
I think the Elo model has it right and the LOS model is not correct. Otherwise, they would agree about who is stronger.

If I have a googol games that all ended in a draw, the engines have the same strength. Even with 1000 wins, nothing changes over the 8. If we watch them play, all we will see is draws, and if one does win, I think it is random.

An event which happens one thousand times out of a googol trials is by its very definition somewhat random (unless there is a predicted pattern, which seems unlikely to me). And we do not have the apparatus with the given calculations to search for a pattern.

Th notion that evidence of equalty (which is what a draw is) has no bearing on determing who is stronger does not make sense.
Indeed, Elo agrees. The engine that won 1000 games out of one googol games has exactly the same strength as if they had all been draws.

hgm · Post by **hgm** » Wed Jul 01, 2020 8:54 am

Elo does NOT agree. They will not have exactly the same Elo; the Elo will differ by something of the order 1/10^100.

Evidence for equality is not proof of equality. Evidence for inequality rules out equality.

You just refuse to recognize the existense of small numbers different from zero. But they do exist.

We understand perfectly what you are saying. But we recognize almost every sentence you utter as a false statement, and have refuted them one by one. Yet you keep repeating them as if they would support your point of view. Are you really surprised that you cannot convince people by only throwing falsehoods at them?

Dann Corbit · Post by **Dann Corbit** » Wed Jul 01, 2020 12:39 pm

I think you just like to argue for argument's sake. I guess that you don't really believe the things you are saying because it is obvious that they are utterly wrong,

Alayan · Post by **Alayan** » Wed Jul 01, 2020 1:03 pm

1.000000000000000000000000000000000000000000000000000001 > 1

Dann Corbit · Post by **Dann Corbit** » Wed Jul 01, 2020 1:47 pm

Before we proceed further in this discussion, I think we should all put forth our opinions on an important question:

When two engines play against each other, it the outcome deterministic or is randomness involved?

Dann says there is randomness involved. Now, others, please weigh in.

BTW, the tails of the Elo curve decay exponentially.

Person: Randomness
Dann [Y]

Dann Corbit · Post by **Dann Corbit** » Wed Jul 01, 2020 2:31 pm

Code: Select all

#include <random>
#include <iostream>
using namespace std;
class results {
public:
    int lt_h;
    int gt_h;
    int ties;
    results()
    {
        lt_h = 0;
        gt_h = 0;
        ties = 0;
    }
};

#include <cstdio>
#include <cstdlib>
#include <cmath>

// http://talkchess.com/forum3/viewtopic.php?t=51003#p554193
double los(int wins, int losses, int draws)
{
    double games = wins + losses + draws;
#ifdef _DEBUG
    std::printf("Number of games: %g\n", games);
#endif
    double winning_fraction = (wins + 0.5*draws) / games;
#ifdef _DEBUG
    std::printf("Winning fraction: %g\n", winning_fraction);
#endif
    double elo_difference = -std::log(1.0 / winning_fraction - 1.0)*400.0 / std::log(10.0);
#ifdef _DEBUG
    std::printf("Elo difference: %+g\n", elo_difference);
#endif
    return 0.5 + 0.5 * std::erf((wins - losses) / std::sqrt(2.0*(wins + losses)));

}

size_t experiment_size = 1000;
int main(int argc, char **argv)
{
    if (argc > 1)
    {
        experiment_size = atoll(argv[1]);
        if (experiment_size < 10) experiment_size = 10;
    }
    std::mt19937 generator (17);
    std::uniform_real_distribution<double> urd(0.0, 1.0);
    results * contests = new results[experiment_size];
    for (int contest = 0; contest < experiment_size; contest++)
    {
        for (int result = 0; result < experiment_size; result++)
        {
            double value = urd(generator);
            if (value < 0.5) contests[contest].lt_h++;
            else if (value > 0.5) contests[contest].gt_h++;
            else  contests[contest].ties++;
        }
    }
    for (int contest = 0; contest < experiment_size; contest++)
    {
        std::cout << "losses: " << contests[contest].lt_h  << " wins: " <<  contests[contest].gt_h  << " ties: " <<  contests[contest].ties << " LOS: " <<los(contests[contest].gt_h, contests[contest].lt_h, contests[contest].ties) << std::endl;
    }
    delete [] contests;
    return 0;
}

Remember, these "engines" are perfectly equal.

Code: Select all

Q:\cc\wlt\x64\Release>type wlt10.txt
losses: 3 wins: 7 ties: 0 LOS: 0.897048
losses: 4 wins: 6 ties: 0 LOS: 0.736455
losses: 5 wins: 5 ties: 0 LOS: 0.5
losses: 6 wins: 4 ties: 0 LOS: 0.263545
losses: 8 wins: 2 ties: 0 LOS: 0.0288898

Q:\cc\wlt\x64\Release>type wlt100.txt
losses: 39 wins: 61 ties: 0 LOS: 0.986097
losses: 40 wins: 60 ties: 0 LOS: 0.97725
losses: 41 wins: 59 ties: 0 LOS: 0.96407
losses: 42 wins: 58 ties: 0 LOS: 0.945201
losses: 43 wins: 57 ties: 0 LOS: 0.919243
losses: 44 wins: 56 ties: 0 LOS: 0.88493
losses: 45 wins: 55 ties: 0 LOS: 0.841345
losses: 46 wins: 54 ties: 0 LOS: 0.788145
losses: 47 wins: 53 ties: 0 LOS: 0.725747
losses: 48 wins: 52 ties: 0 LOS: 0.655422
losses: 49 wins: 51 ties: 0 LOS: 0.57926
losses: 50 wins: 50 ties: 0 LOS: 0.5
losses: 51 wins: 49 ties: 0 LOS: 0.42074
losses: 52 wins: 48 ties: 0 LOS: 0.344578
losses: 53 wins: 47 ties: 0 LOS: 0.274253
losses: 54 wins: 46 ties: 0 LOS: 0.211855
losses: 55 wins: 45 ties: 0 LOS: 0.158655
losses: 56 wins: 44 ties: 0 LOS: 0.11507
losses: 57 wins: 43 ties: 0 LOS: 0.0807567
losses: 58 wins: 42 ties: 0 LOS: 0.0547993
losses: 59 wins: 41 ties: 0 LOS: 0.0359303
losses: 61 wins: 39 ties: 0 LOS: 0.0139034
losses: 62 wins: 38 ties: 0 LOS: 0.00819754

Q:\cc\wlt\x64\Release>type wlt1k.txt
losses: 450 wins: 550 ties: 0 LOS: 0.999217
losses: 452 wins: 548 ties: 0 LOS: 0.9988
losses: 455 wins: 545 ties: 0 LOS: 0.997787
losses: 456 wins: 544 ties: 0 LOS: 0.997305
losses: 457 wins: 543 ties: 0 LOS: 0.996732
losses: 458 wins: 542 ties: 0 LOS: 0.99605
losses: 459 wins: 541 ties: 0 LOS: 0.995244
losses: 460 wins: 540 ties: 0 LOS: 0.994294
losses: 464 wins: 536 ties: 0 LOS: 0.988602
losses: 465 wins: 535 ties: 0 LOS: 0.986572
losses: 466 wins: 534 ties: 0 LOS: 0.984236
losses: 467 wins: 533 ties: 0 LOS: 0.981561
losses: 468 wins: 532 ties: 0 LOS: 0.978508
losses: 469 wins: 531 ties: 0 LOS: 0.975038
losses: 470 wins: 530 ties: 0 LOS: 0.97111
losses: 471 wins: 529 ties: 0 LOS: 0.966682
losses: 472 wins: 528 ties: 0 LOS: 0.961709
losses: 473 wins: 527 ties: 0 LOS: 0.956147
losses: 474 wins: 526 ties: 0 LOS: 0.949952
losses: 475 wins: 525 ties: 0 LOS: 0.943077
losses: 476 wins: 524 ties: 0 LOS: 0.935479
losses: 477 wins: 523 ties: 0 LOS: 0.927117
losses: 478 wins: 522 ties: 0 LOS: 0.917948
losses: 479 wins: 521 ties: 0 LOS: 0.907937
losses: 480 wins: 520 ties: 0 LOS: 0.897048
losses: 481 wins: 519 ties: 0 LOS: 0.885253
losses: 482 wins: 518 ties: 0 LOS: 0.872527
losses: 483 wins: 517 ties: 0 LOS: 0.858852
losses: 484 wins: 516 ties: 0 LOS: 0.844214
losses: 485 wins: 515 ties: 0 LOS: 0.828609
losses: 486 wins: 514 ties: 0 LOS: 0.81204
losses: 487 wins: 513 ties: 0 LOS: 0.794516
losses: 488 wins: 512 ties: 0 LOS: 0.776058
losses: 489 wins: 511 ties: 0 LOS: 0.756692
losses: 490 wins: 510 ties: 0 LOS: 0.736455
losses: 491 wins: 509 ties: 0 LOS: 0.715393
losses: 492 wins: 508 ties: 0 LOS: 0.693559
losses: 493 wins: 507 ties: 0 LOS: 0.671015
losses: 494 wins: 506 ties: 0 LOS: 0.647832
losses: 495 wins: 505 ties: 0 LOS: 0.624085
losses: 496 wins: 504 ties: 0 LOS: 0.599859
losses: 497 wins: 503 ties: 0 LOS: 0.575242
losses: 498 wins: 502 ties: 0 LOS: 0.550328
losses: 499 wins: 501 ties: 0 LOS: 0.525215
losses: 500 wins: 500 ties: 0 LOS: 0.5
losses: 501 wins: 499 ties: 0 LOS: 0.474785
losses: 502 wins: 498 ties: 0 LOS: 0.449672
losses: 503 wins: 497 ties: 0 LOS: 0.424758
losses: 504 wins: 496 ties: 0 LOS: 0.400141
losses: 505 wins: 495 ties: 0 LOS: 0.375915
losses: 506 wins: 494 ties: 0 LOS: 0.352168
losses: 507 wins: 493 ties: 0 LOS: 0.328985
losses: 508 wins: 492 ties: 0 LOS: 0.306441
losses: 509 wins: 491 ties: 0 LOS: 0.284607
losses: 510 wins: 490 ties: 0 LOS: 0.263545
losses: 511 wins: 489 ties: 0 LOS: 0.243308
losses: 512 wins: 488 ties: 0 LOS: 0.223942
losses: 513 wins: 487 ties: 0 LOS: 0.205484
losses: 514 wins: 486 ties: 0 LOS: 0.18796
losses: 515 wins: 485 ties: 0 LOS: 0.171391
losses: 516 wins: 484 ties: 0 LOS: 0.155786
losses: 517 wins: 483 ties: 0 LOS: 0.141148
losses: 518 wins: 482 ties: 0 LOS: 0.127473
losses: 519 wins: 481 ties: 0 LOS: 0.114747
losses: 520 wins: 480 ties: 0 LOS: 0.102952
losses: 521 wins: 479 ties: 0 LOS: 0.0920632
losses: 522 wins: 478 ties: 0 LOS: 0.0820518
losses: 523 wins: 477 ties: 0 LOS: 0.0728834
losses: 524 wins: 476 ties: 0 LOS: 0.0645207
losses: 525 wins: 475 ties: 0 LOS: 0.0569231
losses: 526 wins: 474 ties: 0 LOS: 0.0500484
losses: 527 wins: 473 ties: 0 LOS: 0.0438525
losses: 528 wins: 472 ties: 0 LOS: 0.0382907
losses: 529 wins: 471 ties: 0 LOS: 0.033318
losses: 530 wins: 470 ties: 0 LOS: 0.0288898
losses: 531 wins: 469 ties: 0 LOS: 0.0249621
losses: 532 wins: 468 ties: 0 LOS: 0.0214924
losses: 533 wins: 467 ties: 0 LOS: 0.0184394
losses: 534 wins: 466 ties: 0 LOS: 0.0157638
losses: 535 wins: 465 ties: 0 LOS: 0.0134283
losses: 536 wins: 464 ties: 0 LOS: 0.0113979
losses: 537 wins: 463 ties: 0 LOS: 0.00963966
losses: 538 wins: 462 ties: 0 LOS: 0.00812324
losses: 539 wins: 461 ties: 0 LOS: 0.00682058
losses: 540 wins: 460 ties: 0 LOS: 0.00570602
losses: 541 wins: 459 ties: 0 LOS: 0.0047562
losses: 545 wins: 455 ties: 0 LOS: 0.00221326
losses: 547 wins: 453 ties: 0 LOS: 0.00147673
losses: 548 wins: 452 ties: 0 LOS: 0.00119955
losses: 551 wins: 449 ties: 0 LOS: 0.000628713

Now, the 100K file is too large to post, but you can easily build it yourself like this:
Q:\cc\wlt\x64\Release>wlt 100000 |sort|uniq>wlt.txt
Q:\cc\wlt\x64\Release>ren wlt.txt wlt100k.txt

Here are the tails of that file:

Code: Select all

losses: 49373 wins: 50627 ties: 0 LOS: 0.999963
losses: 49375 wins: 50625 ties: 0 LOS: 0.999961
losses: 49376 wins: 50624 ties: 0 LOS: 0.99996
losses: 49393 wins: 50607 ties: 0 LOS: 0.999938
losses: 49400 wins: 50600 ties: 0 LOS: 0.999926
losses: 49403 wins: 50597 ties: 0 LOS: 0.99992
losses: 49411 wins: 50589 ties: 0 LOS: 0.999902
losses: 49415 wins: 50585 ties: 0 LOS: 0.999892
losses: 49416 wins: 50584 ties: 0 LOS: 0.999889
losses: 49421 wins: 50579 ties: 0 LOS: 0.999875
losses: 49423 wins: 50577 ties: 0 LOS: 0.999869
losses: 49430 wins: 50570 ties: 0 LOS: 0.999844
losses: 49438 wins: 50562 ties: 0 LOS: 0.999811
losses: 49439 wins: 50561 ties: 0 LOS: 0.999806
losses: 49441 wins: 50559 ties: 0 LOS: 0.999796
losses: 49443 wins: 50557 ties: 0 LOS: 0.999786
losses: 49445 wins: 50555 ties: 0 LOS: 0.999776
losses: 49449 wins: 50551 ties: 0 LOS: 0.999754
losses: 49450 wins: 50550 ties: 0 LOS: 0.999748
losses: 49453 wins: 50547 ties: 0 LOS: 0.999729
losses: 49455 wins: 50545 ties: 0 LOS: 0.999716
losses: 49456 wins: 50544 ties: 0 LOS: 0.99971
losses: 49458 wins: 50542 ties: 0 LOS: 0.999696
losses: 49460 wins: 50540 ties: 0 LOS: 0.999681
losses: 49461 wins: 50539 ties: 0 LOS: 0.999674
losses: 49463 wins: 50537 ties: 0 LOS: 0.999658
losses: 49468 wins: 50532 ties: 0 LOS: 0.999617
losses: 49469 wins: 50531 ties: 0 LOS: 0.999608
losses: 49474 wins: 50526 ties: 0 LOS: 0.999561
losses: 49475 wins: 50525 ties: 0 LOS: 0.999551
losses: 49476 wins: 50524 ties: 0 LOS: 0.99954
losses: 49477 wins: 50523 ties: 0 LOS: 0.99953
losses: 49478 wins: 50522 ties: 0 LOS: 0.999519
losses: 49479 wins: 50521 ties: 0 LOS: 0.999508
losses: 49483 wins: 50517 ties: 0 LOS: 0.999462
losses: 49485 wins: 50515 ties: 0 LOS: 0.999437
losses: 49486 wins: 50514 ties: 0 LOS: 0.999425
losses: 49487 wins: 50513 ties: 0 LOS: 0.999412
losses: 49488 wins: 50512 ties: 0 LOS: 0.999399
losses: 49489 wins: 50511 ties: 0 LOS: 0.999385
losses: 49491 wins: 50509 ties: 0 LOS: 0.999357
losses: 49495 wins: 50505 ties: 0 LOS: 0.999298
losses: 49496 wins: 50504 ties: 0 LOS: 0.999283
losses: 49497 wins: 50503 ties: 0 LOS: 0.999267
losses: 49498 wins: 50502 ties: 0 LOS: 0.999251
losses: 49499 wins: 50501 ties: 0 LOS: 0.999234
losses: 49500 wins: 50500 ties: 0 LOS: 0.999217
losses: 49501 wins: 50499 ties: 0 LOS: 0.9992
losses: 49502 wins: 50498 ties: 0 LOS: 0.999183
losses: 49503 wins: 50497 ties: 0 LOS: 0.999165
losses: 49504 wins: 50496 ties: 0 LOS: 0.999147
losses: 49506 wins: 50494 ties: 0 LOS: 0.999109
losses: 49507 wins: 50493 ties: 0 LOS: 0.99909
losses: 49508 wins: 50492 ties: 0 LOS: 0.99907
losses: 49510 wins: 50490 ties: 0 LOS: 0.999029
losses: 49511 wins: 50489 ties: 0 LOS: 0.999008
losses: 49512 wins: 50488 ties: 0 LOS: 0.998987

...

losses: 50570 wins: 49430 ties: 0 LOS: 0.000156079
losses: 50576 wins: 49424 ties: 0 LOS: 0.000134769
losses: 50578 wins: 49422 ties: 0 LOS: 0.000128294
losses: 50584 wins: 49416 ties: 0 LOS: 0.000110577
losses: 50592 wins: 49408 ties: 0 LOS: 9.05075e-05
losses: 50595 wins: 49405 ties: 0 LOS: 8.39065e-05
losses: 50601 wins: 49399 ties: 0 LOS: 7.20399e-05
losses: 50607 wins: 49393 ties: 0 LOS: 6.17669e-05
losses: 50620 wins: 49380 ties: 0 LOS: 4.40501e-05
losses: 50630 wins: 49370 ties: 0 LOS: 3.38155e-05
losses: 50639 wins: 49361 ties: 0 LOS: 2.65676e-05
losses: 50665 wins: 49335 ties: 0 LOS: 1.30063e-05
losses: 50781 wins: 49219 ties: 0 LOS: 3.9166e-07

Remember, this is the output for two combatants that are exactly, perfectly equal.

So, now let's expand it to a googol games and wonder what the output looks like.

hgm · Post by **hgm** » Wed Jul 01, 2020 4:11 pm

One supposes that the tester enforced the randomness, by starting the games from randomly chosen opening positions. Engines can be completely deterministic, and playing 10^100 times the same game would be ultimately stupid.

So yes, randomness is involved, if not through the engine, then through the test setup.

That applies to the generated games. If you only look at the score... Well, if all games are draws, there isn't anything random in the scores. These then would be deterministically 0.5-0.5. If the results vary, on can again assume that the tester has taken care that the games do not affect each other (e.g. clear hash tables between games, do not run more games simultaneously on the same machine, etc.). So they must be independent random events, the randomness enforced by the fact they each started from a different independently randomly chosen position.

Dann Corbit wrote: ↑Wed Jul 01, 2020 2:31 pmSo, now let's expand it to a googol games and wonder what the output looks like.

No need to wonder. If you do 100 matches (doesn't really matter how many games there were in each match; you seem to couple this), you should expect that the most extreme 'flukes' predict a LOS of ~99% or ~1%. That is exactly what LOS means: if we say the LOS = 99%, we mean that only in the remaining 1% of the cases the observed result would be the consequence of pure luck, and the engines equal. Your simulations nicely obey this; you do have equal engines, you play 100 matches between them, and, lo and behold the LOS of about 0.99 occurred once, i.e. in 1% of the cases. Do a thousand matches and the highest will be around 0.999. Do 100K, and it will be around 0.99999. Do 10^100 matches, and it will be close to 1-10^-100.

All absolutely normal. And of course not related in any way to the case where the engines nearly always draw; they almost never draw here, the results have as large a variance as they could possibly have, they are always 0 or 1.

Dann Corbit · Post by **Dann Corbit** » Wed Jul 01, 2020 4:37 pm

My experiment has proven that LOS is not accurate even for short sequences of ten games and the inaccuracy grows unbounded as the game count increases until it produces pure noise. This does not even take into account that it also becomes less and less accurate because it throws out the draw data.

The Elo model, on the other hand, handles the data generated by the experiment with aplomb producing accurate results even for the tails that grows more and more accurate as the game count increases even for the most extreme of the tails.

You keep using that word...
I don't think it means what you think it means.

LOS is a train wreck of a statistic that gets worse and worse as more data is added, due to magnification of randomness to the point of absurdity.

But if you want to use it to tune your engine after a googol trials, be my guest, but you would be better off using a random number generator to tune your engine.

Dann Corbit · Post by **Dann Corbit** » Wed Jul 01, 2020 4:46 pm

Let me add that in a perfect world where there is no randomness and every trial results in pure truth, LOS works perfectly.

hgm · Post by **hgm** » Wed Jul 01, 2020 5:06 pm

Dann Corbit wrote: ↑Wed Jul 01, 2020 4:37 pm My experiment has proven that LOS is not accurate even for short sequences of ten games and the inaccuracy grows unbounded as the game count increases until it produces pure noise.

In reality the exact opposite. Your results give a perfect demonstration of what the LOS means. And that you can cheat in stochastic tests by simply repeating the test until you get the result that you want, and then treat it like it was 'the' result of the test, rather than just your own fabrication.

BTW, even your conclusion that is due to game count happens to be false. That doesn't follow from your examples, because you vary game count and number of matches together, so you cannot see if any observed effect is due to one or the other. If you would vary them independently you would see that it has absolutely nothing to do with game count, only with the number of matches you play.

This does not even take into account that it also becomes less and less accurate because it throws out the draw data.

As said, it has absolutely nothing to do with the orinal issue, reflects no light on it at all, and that you think it would get less accurate by throwing away draws is yet another false claim sprouting purely from your imagination, without any evidence.

The Elo model, on the other hand, handles the data generated by the experiment with aplomb producing accurate results even for the tails that grows more and more accurate as the game count increases even for the most extreme of the tails.

But it calculates something else.

Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo