Throwing out draws to calculate Elo

Dann Corbit · Post by **Dann Corbit** » Thu Jul 02, 2020 9:00 pm

I have attached code.
Run it.
Start with 10 games, then 100, then 1000, then 10,000 then 100K.
You will see the behavior predicted.
Now, the outcome of a LOS calculation could be any one of the individual runs, and some of them are 0.5 (quite a few, actually, when you get to 100K trials). But if you randomly select any one run, it will say (most of the time) that A is stronger or B is stronger and the magnitude of "stronger" increases with game count.

You may not trust my code. So write your own simulation. It should take less than half and hour.

hgm · Post by **hgm** » Thu Jul 02, 2020 9:07 pm

I don't see any code here. Do you mean the code you posted before, which was already debunked?

Let me say it again then: The LOS did not go up with the number of games at all, with that code. You cheated by playing more matches, it was a purely fabricated result. If you do more matches, and select the one that produced the highest LOS, then of course it goes up. The more matches you have to select from, the more boldly you can cheat.

You couldn't do that when you played only a single match, increasing the number of games in it. As testers who calculate LOS do.

Dann Corbit · Post by **Dann Corbit** » Thu Jul 02, 2020 9:10 pm

In case people want to simply analyze the data using statistical queries, here is a Microsoft Access database (Username is Admin, Password is blank (empty, not the word blank):

You should be able to do all sorts of interesting statistical queries with it.

Dann Corbit · Post by **Dann Corbit** » Thu Jul 02, 2020 9:12 pm

hgm wrote: ↑Thu Jul 02, 2020 9:07 pm I don't see any code here. Do you mean the code you posted before, which was already debunked?

Let me say it again then: The LOS did not go up with the number of games at all, with that code. You cheated by playing more matches, it was a purely fabricated result. If you do more matches, and select the one that produced the highest LOS, then of course it goes up. The more matches you have to select from, the more boldly you can cheat.

You couldn't do that when you played only a single match, increasing the number of games in it. As testers who calculate LOS do.

I certainly did not pick an individual item from the data. In fact, the data shows that more and more elements have larger and larger LOS (for either A or B) when you increase the game count.

You are welcome to produce correct code that shows LOS (on average from a single item picked at random) does not change as the game count increases.

hgm · Post by **hgm** » Thu Jul 02, 2020 9:19 pm

Dann Corbit wrote: ↑Thu Jul 02, 2020 9:12 pmI certainly did not pick an individual item from the data. In fact, the data shows that more and more elements have larger and larger LOS (for either A or B) when you increase the game count.

Like hell you didn't...

You tried 100k times, and then sorted the 100k outcomes to pick the single one (or 40, or hundred, in any case a negligible fraction) that had the highest LOS.

You are welcome to produce correct code that shows LOS (on average from a single item picked at random) does not change as the game count increases.

That is easy enough. Just change the line

Code: Select all

    for (int contest = 0; contest < experiment_size; contest++)

in your code to

Code: Select all

    for (int contest = 0; contest < 1; contest++)

Dann Corbit · Post by **Dann Corbit** » Thu Jul 02, 2020 9:25 pm

hgm wrote: ↑Thu Jul 02, 2020 9:19 pm
Dann Corbit wrote: ↑Thu Jul 02, 2020 9:12 pmI certainly did not pick an individual item from the data. In fact, the data shows that more and more elements have larger and larger LOS (for either A or B) when you increase the game count.
Like hell you didn't...

You tried 100k times, and then sorted the 100k outcomes to pick the single one (or 40, or hundred, in any case a negligible fraction) that had the highest LOS.

You are welcome to produce correct code that shows LOS (on average from a single item picked at random) does not change as the game count increases.
That is easy enough. Just change the line
Code: Select all
    for (int contest = 0; contest < experiment_size; contest++)
in your code to
Code: Select all
    for (int contest = 0; contest < 1; contest++)

I most certainly did not do what you claim.
I showed the data at the tail.
Then I showed the data at 20,000 lines into the sorted set.
Then I showed the data at 40,000 lines into the sorted set (very close to the center).

Your claiming that I am untruthful is highly offensive.
I suggest you go back to my post and read it again, and don't just look at the batch for the tail, look at all three of them.

Pio · Post by **Pio** » Thu Jul 02, 2020 9:31 pm

hgm wrote: ↑Thu Jul 02, 2020 9:19 pm
Dann Corbit wrote: ↑Thu Jul 02, 2020 9:12 pmI certainly did not pick an individual item from the data. In fact, the data shows that more and more elements have larger and larger LOS (for either A or B) when you increase the game count.
Like hell you didn't...

You tried 100k times, and then sorted the 100k outcomes to pick the single one (or 40, or hundred, in any case a negligible fraction) that had the highest LOS.

You are welcome to produce correct code that shows LOS (on average from a single item picked at random) does not change as the game count increases.
That is easy enough. Just change the line
Code: Select all
    for (int contest = 0; contest < experiment_size; contest++)
in your code to
Code: Select all
    for (int contest = 0; contest < 1; contest++)

If there are 8 persons who can think and we put in 10^100 people who cannot think then those 8 persons vanish in thin air.

If we take 100 k runs of 10 coin flips or 100 k of runs of 10^100 coin flips everything will change.

I know you at least understand that I am joking

Dann Corbit · Post by **Dann Corbit** » Thu Jul 02, 2020 9:39 pm

Here are the first few lines of the raw data:
losses: 50099 wins: 49901 ties: 0 LOS: 0.265615 Elo diff: -0.687897
Look at that, A is not superior to B at all. But that means B has an LOS of 1-0.265615=0.734385
So, would you like to choose that one? It tells us that B looks stronger than A. And we have 100K games to prove it.

losses: 49948 wins: 50052 ties: 0 LOS: 0.628876 Elo diff: 0.361319
This one is only .6, so maybe a candidate you would like.

losses: 50060 wins: 49940 ties: 0 LOS: 0.352168 Elo diff: -0.416907
Again, A is not looking very strong here, but B is. 1-0.352168 =0.647832

losses: 50040 wins: 49960 ties: 0 LOS: 0.400141 Elo diff: -0.277938
B is .6

losses: 49872 wins: 50128 ties: 0 LOS: 0.790899 Elo diff: 0.889403
Not a bad cherry pick here.

losses: 50180 wins: 49820 ties: 0 LOS: 0.127473 Elo diff: -1.25073
Now we are getting somewhere. You should have told me to pick the 6th one, not the first.

losses: 50010 wins: 49990 ties: 0 LOS: 0.474785 Elo diff: -0.0694844
My goodness, even better. Let's choose the 7th.

hgm · Post by **hgm** » Thu Jul 02, 2020 9:41 pm

Dann Corbit wrote: ↑Thu Jul 02, 2020 9:25 pmI most certainly did not do what you claim.
I showed the data at the tail.
Then I showed the data at 20,000 lines into the sorted set.
Then I showed the data at 40,000 lines into the sorted set (very close to the center).

Your claiming that I am untruthful is highly offensive.

I did not say you were doing it on purpose. But you are doing it nevertheless. It is up to you to decide whether you'd rather want it to be out of ignorance than out of deviousness.

I suggest you go back to my post and read it again, and don't just look at the batch for the tail, look at all three of them.

Oh, I read it all right. And you even confess it here: "20,000 lines into the set, 40,000 lines into the set...".

The very fact that there is more than 1 line in that set is cheating!

Engine testers don't do 100K matches, to get some outlier LOS. They only do a single match, and that's it. As long as you are not doing the same, you have no basis whatsoever to criticise them.

Dann Corbit · Post by **Dann Corbit** » Thu Jul 02, 2020 9:45 pm

hgm wrote: ↑Thu Jul 02, 2020 9:41 pm
Dann Corbit wrote: ↑Thu Jul 02, 2020 9:25 pmI most certainly did not do what you claim.
I showed the data at the tail.
Then I showed the data at 20,000 lines into the sorted set.
Then I showed the data at 40,000 lines into the sorted set (very close to the center).

Your claiming that I am untruthful is highly offensive.
I did not say you were doing it on purpose. But you are doing it nevertheless. It is up to you to decide whether you'd rather want it to be out of ignorance than out of deviousness.

I suggest you go back to my post and read it again, and don't just look at the batch for the tail, look at all three of them.
Oh, I read it all right. And you even confess it here: "20,000 lines into the set, 40,000 lines into the set...".

The very fact that there is more than 1 line in that set is cheating!

Engine testers don't do 100K matches, to get some outlier LOS. They only do a single match, and that's it. As long as you are not doing the same, you have no basis whatsoever to criticise them.

But if I picked the first line, it would have proven my point.
The reason I produced lots of lines is to show that there is randomeness in the LOS value with lots of tests.
I do not criticize anyone for doing LOS calculations.
I also suspect that there is a sweet spot where LOS works very well.

Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo

Re: Throwing out draws to calculate Elo