Why is the tuning methods minimize the mean-squared-error directly instead of maximizing the likelihood ( or minimizing the negative likelihood). Given r=result, the objective with the maximum likelihood estimation would be
How do you handle draws? Or does your evaluation function return W/D/L probabilities?
Draws should be fine i think. Unless we have a score of -inf, inf, the logistic function should return scores between 0 and 1.
Edit:
But the real answer is that I don't need to penalize my evaluation function infinitely for getting one case wrong, which using logs would do.
Maximum likelihood is the 'standard' method for logistic regression. Also i seem to get more stable iterations with it than minimzing the mse directly.
How do you handle draws? Or does your evaluation function return W/D/L probabilities?
Draws should be fine i think. Unless we have a score of -inf, inf, the logistic function should return scores between 0 and 1.
The log-likelihood function you posted is not bounded. If my evaluation function predicts the probability of the result is 0, it gets an infinite penalty.
I don't know why you think that the convergence would be better than using mean squared error. I don't really know if it would be, but I am curious if you have a reason to believe that a priori.
How do you handle draws? Or does your evaluation function return W/D/L probabilities?
Draws should be fine i think. Unless we have a score of -inf, inf, the logistic function should return scores between 0 and 1.
The log-likelihood function you posted is not bounded. If my evaluation function predicts the probability of the result is 0, it gets an infinite penalty.
The evaluation score of my engine is between (-20000, 20000), and for a draw it would be score=0 or logistic(score) = 0.5. So draws are no problem as far as i can see.
How do you handle draws? Or does your evaluation function return W/D/L probabilities?
Draws should be fine i think. Unless we have a score of -inf, inf, the logistic function should return scores between 0 and 1.
The log-likelihood function you posted is not bounded. If my evaluation function predicts the probability of the result is 0, it gets an infinite penalty.
The evaluation score of my engine is between (-20000, 20000), and for a draw it would be score=0 or logistic(score) = 0.5. So draws are no problem as far as i can see.
I raised two separate objections. One of them is that, although using log-likelihood is somewhat theoretically motivated, it is not at all clear how draws should be handled.
The other [more serious] objection is that the penalty imposed for getting one single sample wrong in the training set is unbounded in the case of the log-likelihood formula, while it is bounded if you use mean squared error.
I don't really get what you are missing here -- again draws are not a problem as far as i can see.
The maximum-likelihood estimation can even be used to train neural networks (multiple layers) instead of a single layer evaluation function we are talking about here. Here is an example paper (http://pubmedcentralcanada.ca/pmcc/arti ... 4-0306.pdf ) where they show that backpropagation done with a maximum likelihood objective (ML-BP) is shown to be better than the least squares objective (LS-BP).
Daniel Shawul wrote:I don't really get what you are missing here -- again draws are not a problem as far as i can see.
Log-likelihood is a very natural quantity to maximize if you have a probability model. So if we had some procedure that produced a probability for winning, a probability for drawing and a probability for losing, it would make sense to penalize by the -log of the probability of the outcome that really happened. But the particular penalty you are using for a draw is not well motivated. Nothing will blow up, but what you are doing is not exactly maximizing likelihood.
The maximum-likelihood estimation can even be used to train neural networks (multiple layers) instead of a single layer evaluation function we are talking about here. Here is an example paper (http://pubmedcentralcanada.ca/pmcc/arti ... 4-0306.pdf ) where they show that backpropagation done with a maximum likelihood objective (ML-BP) is shown to be better than the least squares objective (LS-BP).
Yes, it's a very common thing to optimize if what you are maximizing over are probability models. But, as I said, that's not exactly true here.
Oh, and I wouldn't go around quoting neural-network papers from 1992.
Good to know! So far i have had better results with the ML objective function -- even though both barely improved my engine. You seem to use a 1 draw = 2 wins + 2 losses approach unless I am mistaken, is that intentional ? I am only aware of elo models that use 1 draw = 1 win + 1 loss (rao-kapper), 2 draw = 1 win + 1 loss (davidson).