best way to determine elos of a group
Posted: Tue Jul 30, 2019 4:53 am
I am sure you are all familiar with selfplay elo graphs of lc0 are considered not so reliable ( sometimes called its "ego" due to its inflated value ).
I think this is mostly because each network is being tested against the previous network -- maybe they do more but for this discussion lets stick with
testing against previous network only. So in essence elo is assumed to be incremental. The elo of neworks goes like +50, +40, -20, +30, +25, ... etc
I faced this sort of problem when trying to evaluate my networks. For my case roughly 60 times more nets are produced comapred to lc0's.
So the time I spent evaluating networks takes a significant portion of my resources. Using the lc0 approach for the first 12 networks calculated with bopo (my implementaton of bayeselo algorithms) can be seen here
http://scorpiozero.ddns.net/scorpiozero ... atings.txt
When i got to 160 networks in under 80k games, I decided that I should sample networks and match them to possible fit a curve for all nets without actually playing net ID x with x+1. This thought led me to some interesting questions on maximizing resources for estimation of elo of a group. My question simply put is: what is the best way to match the nets to produce a reliable elo estimate ? The incremental testing resulted in "ego" so there must be a better way to approach the problem of determining elos with few games as possible. I know this is not a very precise question for mathematicians but if you get the idea please frame the question the way you like it. I am thinking of @Michel here.
This problem also seems to be related to Upper confidence bound (UCB) formula -- specifically for perft estimation! Where should you spend your match games to minimize the standard errors of elo estimates of all networks. We had fun with "perft estimation" some years ago that showed that this problem
is different from standard UCB as used in tree search where the goal is to find the best arm. If the problem was to find the best network, the standard UCT approach will zoom in on the latest networks and minimize the errors in their elo esimates.
Should the goal even be to determine elos for all networks? Also determining the best network is probably easy without even doing matches so I think the goal should be somewhere in between. We want to detect when things are going bad so the goal shouldn't be to find only the best net which is probably trivial anyway.
regards,
Daniel
I think this is mostly because each network is being tested against the previous network -- maybe they do more but for this discussion lets stick with
testing against previous network only. So in essence elo is assumed to be incremental. The elo of neworks goes like +50, +40, -20, +30, +25, ... etc
I faced this sort of problem when trying to evaluate my networks. For my case roughly 60 times more nets are produced comapred to lc0's.
So the time I spent evaluating networks takes a significant portion of my resources. Using the lc0 approach for the first 12 networks calculated with bopo (my implementaton of bayeselo algorithms) can be seen here
http://scorpiozero.ddns.net/scorpiozero ... atings.txt
When i got to 160 networks in under 80k games, I decided that I should sample networks and match them to possible fit a curve for all nets without actually playing net ID x with x+1. This thought led me to some interesting questions on maximizing resources for estimation of elo of a group. My question simply put is: what is the best way to match the nets to produce a reliable elo estimate ? The incremental testing resulted in "ego" so there must be a better way to approach the problem of determining elos with few games as possible. I know this is not a very precise question for mathematicians but if you get the idea please frame the question the way you like it. I am thinking of @Michel here.
This problem also seems to be related to Upper confidence bound (UCB) formula -- specifically for perft estimation! Where should you spend your match games to minimize the standard errors of elo estimates of all networks. We had fun with "perft estimation" some years ago that showed that this problem
is different from standard UCB as used in tree search where the goal is to find the best arm. If the problem was to find the best network, the standard UCT approach will zoom in on the latest networks and minimize the errors in their elo esimates.
Should the goal even be to determine elos for all networks? Also determining the best network is probably easy without even doing matches so I think the goal should be somewhere in between. We want to detect when things are going bad so the goal shouldn't be to find only the best net which is probably trivial anyway.
regards,
Daniel