By playing the same position multiple times you are just creating more data points. The only difference is that you are creating multiple points for the same position, instead of for different (but similar) positions.Evert wrote:Maybe.AlvaroBegue wrote: This should not bother you at all. A probability is a continuous quantity that expresses our expectation of a binary result, and we use logistic regression to fit probabilities to discrete outcomes all the time. If you have time to play more games, you should probably play them from new positions so you get a more complete sample of situations on the board, instead of playing them from the same positions.
But how certain are you that the outcome is correct? If the result is 1-0, but a match over 10 games would result in 7-3, then we would be better off using 0.7 rather than 1.0 for that position. Worse, what if the result of a 10-game match is 1-9? Right now we treat positions that are won because you are a minor ahead as though they should give the same score as positions that are won because you are a queen ahead. They aren't, of course. Allowing for a difference there should reduce the noise in the fit. Of course adding more positions can also do that, but each individual position is then less important. On the other hand, reducing noise is not a goal per se.
The interesting positions here are not those where you are a piece ahead (or behind), of course, but those closer to the draw score, around the inflection point of the logistic function. I might try to make an estimate for how much time it would take to get a better estimate for some of those.
If a certain huge data set would be fitted by a certain function, a random selection of 10% of the data points should still be fitted by the same function, except with somewhat larger error bars for the fitted parameters. Having only a single result for each position can be seen as making a random selection of an N times larger set where each position was played N times.
So playing positions multiple times isn't particularly better than any other way for creating more data points. But a difference is that you would actually have to generate the games, while more data points from other positions can be obtained almost for free, by selecting from more games that were played anyway.