I have attached code.
Run it.
Start with 10 games, then 100, then 1000, then 10,000 then 100K.
You will see the behavior predicted.
Now, the outcome of a LOS calculation could be any one of the individual runs, and some of them are 0.5 (quite a few, actually, when you get to 100K trials). But if you randomly select any one run, it will say (most of the time) that A is stronger or B is stronger and the magnitude of "stronger" increases with game count.
You may not trust my code. So write your own simulation. It should take less than half and hour.
Throwing out draws to calculate Elo
Moderator: Ras
-
- Posts: 12778
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Throwing out draws to calculate Elo
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
- Posts: 28354
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Throwing out draws to calculate Elo
I don't see any code here. Do you mean the code you posted before, which was already debunked?
Let me say it again then: The LOS did not go up with the number of games at all, with that code. You cheated by playing more matches, it was a purely fabricated result. If you do more matches, and select the one that produced the highest LOS, then of course it goes up. The more matches you have to select from, the more boldly you can cheat.
You couldn't do that when you played only a single match, increasing the number of games in it. As testers who calculate LOS do.
Let me say it again then: The LOS did not go up with the number of games at all, with that code. You cheated by playing more matches, it was a purely fabricated result. If you do more matches, and select the one that produced the highest LOS, then of course it goes up. The more matches you have to select from, the more boldly you can cheat.
You couldn't do that when you played only a single match, increasing the number of games in it. As testers who calculate LOS do.
-
- Posts: 12778
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Throwing out draws to calculate Elo
In case people want to simply analyze the data using statistical queries, here is a Microsoft Access database (Username is Admin, Password is blank (empty, not the word blank):
You should be able to do all sorts of interesting statistical queries with it.
You should be able to do all sorts of interesting statistical queries with it.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
- Posts: 12778
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Throwing out draws to calculate Elo
I certainly did not pick an individual item from the data. In fact, the data shows that more and more elements have larger and larger LOS (for either A or B) when you increase the game count.hgm wrote: ↑Thu Jul 02, 2020 9:07 pm I don't see any code here. Do you mean the code you posted before, which was already debunked?
Let me say it again then: The LOS did not go up with the number of games at all, with that code. You cheated by playing more matches, it was a purely fabricated result. If you do more matches, and select the one that produced the highest LOS, then of course it goes up. The more matches you have to select from, the more boldly you can cheat.
You couldn't do that when you played only a single match, increasing the number of games in it. As testers who calculate LOS do.
You are welcome to produce correct code that shows LOS (on average from a single item picked at random) does not change as the game count increases.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
- Posts: 28354
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Throwing out draws to calculate Elo
Like hell you didn't...Dann Corbit wrote: ↑Thu Jul 02, 2020 9:12 pmI certainly did not pick an individual item from the data. In fact, the data shows that more and more elements have larger and larger LOS (for either A or B) when you increase the game count.
You tried 100k times, and then sorted the 100k outcomes to pick the single one (or 40, or hundred, in any case a negligible fraction) that had the highest LOS.
That is easy enough. Just change the lineYou are welcome to produce correct code that shows LOS (on average from a single item picked at random) does not change as the game count increases.
Code: Select all
for (int contest = 0; contest < experiment_size; contest++)
Code: Select all
for (int contest = 0; contest < 1; contest++)
-
- Posts: 12778
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Throwing out draws to calculate Elo
I most certainly did not do what you claim.hgm wrote: ↑Thu Jul 02, 2020 9:19 pmLike hell you didn't...Dann Corbit wrote: ↑Thu Jul 02, 2020 9:12 pmI certainly did not pick an individual item from the data. In fact, the data shows that more and more elements have larger and larger LOS (for either A or B) when you increase the game count.
You tried 100k times, and then sorted the 100k outcomes to pick the single one (or 40, or hundred, in any case a negligible fraction) that had the highest LOS.
That is easy enough. Just change the lineYou are welcome to produce correct code that shows LOS (on average from a single item picked at random) does not change as the game count increases.in your code toCode: Select all
for (int contest = 0; contest < experiment_size; contest++)
Code: Select all
for (int contest = 0; contest < 1; contest++)
I showed the data at the tail.
Then I showed the data at 20,000 lines into the sorted set.
Then I showed the data at 40,000 lines into the sorted set (very close to the center).
Your claiming that I am untruthful is highly offensive.
I suggest you go back to my post and read it again, and don't just look at the batch for the tail, look at all three of them.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
- Posts: 338
- Joined: Sat Feb 25, 2012 10:42 pm
- Location: Stockholm
Re: Throwing out draws to calculate Elo
If there are 8 persons who can think and we put in 10^100 people who cannot think then those 8 persons vanish in thin air.hgm wrote: ↑Thu Jul 02, 2020 9:19 pmLike hell you didn't...Dann Corbit wrote: ↑Thu Jul 02, 2020 9:12 pmI certainly did not pick an individual item from the data. In fact, the data shows that more and more elements have larger and larger LOS (for either A or B) when you increase the game count.
You tried 100k times, and then sorted the 100k outcomes to pick the single one (or 40, or hundred, in any case a negligible fraction) that had the highest LOS.
That is easy enough. Just change the lineYou are welcome to produce correct code that shows LOS (on average from a single item picked at random) does not change as the game count increases.in your code toCode: Select all
for (int contest = 0; contest < experiment_size; contest++)
Code: Select all
for (int contest = 0; contest < 1; contest++)
If we take 100 k runs of 10 coin flips or 100 k of runs of 10^100 coin flips everything will change.
I know you at least understand that I am joking
-
- Posts: 12778
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Throwing out draws to calculate Elo
Here are the first few lines of the raw data:
losses: 50099 wins: 49901 ties: 0 LOS: 0.265615 Elo diff: -0.687897
Look at that, A is not superior to B at all. But that means B has an LOS of 1-0.265615=0.734385
So, would you like to choose that one? It tells us that B looks stronger than A. And we have 100K games to prove it.
losses: 49948 wins: 50052 ties: 0 LOS: 0.628876 Elo diff: 0.361319
This one is only .6, so maybe a candidate you would like.
losses: 50060 wins: 49940 ties: 0 LOS: 0.352168 Elo diff: -0.416907
Again, A is not looking very strong here, but B is. 1-0.352168 =0.647832
losses: 50040 wins: 49960 ties: 0 LOS: 0.400141 Elo diff: -0.277938
B is .6
losses: 49872 wins: 50128 ties: 0 LOS: 0.790899 Elo diff: 0.889403
Not a bad cherry pick here.
losses: 50180 wins: 49820 ties: 0 LOS: 0.127473 Elo diff: -1.25073
Now we are getting somewhere. You should have told me to pick the 6th one, not the first.
losses: 50010 wins: 49990 ties: 0 LOS: 0.474785 Elo diff: -0.0694844
My goodness, even better. Let's choose the 7th.
losses: 50099 wins: 49901 ties: 0 LOS: 0.265615 Elo diff: -0.687897
Look at that, A is not superior to B at all. But that means B has an LOS of 1-0.265615=0.734385
So, would you like to choose that one? It tells us that B looks stronger than A. And we have 100K games to prove it.
losses: 49948 wins: 50052 ties: 0 LOS: 0.628876 Elo diff: 0.361319
This one is only .6, so maybe a candidate you would like.
losses: 50060 wins: 49940 ties: 0 LOS: 0.352168 Elo diff: -0.416907
Again, A is not looking very strong here, but B is. 1-0.352168 =0.647832
losses: 50040 wins: 49960 ties: 0 LOS: 0.400141 Elo diff: -0.277938
B is .6
losses: 49872 wins: 50128 ties: 0 LOS: 0.790899 Elo diff: 0.889403
Not a bad cherry pick here.
losses: 50180 wins: 49820 ties: 0 LOS: 0.127473 Elo diff: -1.25073
Now we are getting somewhere. You should have told me to pick the 6th one, not the first.
losses: 50010 wins: 49990 ties: 0 LOS: 0.474785 Elo diff: -0.0694844
My goodness, even better. Let's choose the 7th.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
- Posts: 28354
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Throwing out draws to calculate Elo
I did not say you were doing it on purpose. But you are doing it nevertheless. It is up to you to decide whether you'd rather want it to be out of ignorance than out of deviousness.Dann Corbit wrote: ↑Thu Jul 02, 2020 9:25 pmI most certainly did not do what you claim.
I showed the data at the tail.
Then I showed the data at 20,000 lines into the sorted set.
Then I showed the data at 40,000 lines into the sorted set (very close to the center).
Your claiming that I am untruthful is highly offensive.
Oh, I read it all right. And you even confess it here: "20,000 lines into the set, 40,000 lines into the set...".I suggest you go back to my post and read it again, and don't just look at the batch for the tail, look at all three of them.
The very fact that there is more than 1 line in that set is cheating!
Engine testers don't do 100K matches, to get some outlier LOS. They only do a single match, and that's it. As long as you are not doing the same, you have no basis whatsoever to criticise them.
-
- Posts: 12778
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Throwing out draws to calculate Elo
But if I picked the first line, it would have proven my point.hgm wrote: ↑Thu Jul 02, 2020 9:41 pmI did not say you were doing it on purpose. But you are doing it nevertheless. It is up to you to decide whether you'd rather want it to be out of ignorance than out of deviousness.Dann Corbit wrote: ↑Thu Jul 02, 2020 9:25 pmI most certainly did not do what you claim.
I showed the data at the tail.
Then I showed the data at 20,000 lines into the sorted set.
Then I showed the data at 40,000 lines into the sorted set (very close to the center).
Your claiming that I am untruthful is highly offensive.
Oh, I read it all right. And you even confess it here: "20,000 lines into the set, 40,000 lines into the set...".I suggest you go back to my post and read it again, and don't just look at the batch for the tail, look at all three of them.
The very fact that there is more than 1 line in that set is cheating!
Engine testers don't do 100K matches, to get some outlier LOS. They only do a single match, and that's it. As long as you are not doing the same, you have no basis whatsoever to criticise them.
The reason I produced lots of lines is to show that there is randomeness in the LOS value with lots of tests.
I do not criticize anyone for doing LOS calculations.
I also suspect that there is a sweet spot where LOS works very well.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.