I am not sure it is done correctly.
--Jon
tuning via maximizing likelihood
Moderators: hgm, Rebel, chrisw
-
- Posts: 4367
- Joined: Fri Mar 10, 2006 5:23 am
- Location: http://www.arasanchess.org
-
- Posts: 741
- Joined: Tue May 22, 2007 11:13 am
Re: tuning via maximizing likelihood
Theoretically, the "log(logistic(x))" and "log(1 - logistic(x))" are perfectly well behaved for any finite value x (since logistic maps [-Inf, +Inf] to [0, 1]). However, to avoid numerical overflow when using single precision floats, one needs to be careful about scaling the linear eval (sum of weights * features) to somewhere in the range [-10, +10] (or [-20,+20] for double precision floats).AlvaroBegue wrote:How do you handle draws? Or does your evaluation function return W/D/L probabilities?
But the real answer is that I don't need to penalize my evaluation function infinitely for getting one case wrong, which using logs would do.
-
- Posts: 741
- Joined: Tue May 22, 2007 11:13 am
Re: tuning via maximizing likelihood
If you have as MSE loss function for features x and weights w:AlvaroBegue wrote: I don't know why you think that the convergence would be better than using mean squared error. I don't really know if it would be, but I am curious if you have a reason to believe that a priori.
Code: Select all
Loss = 1/2 (result - logistic (w * x))^2
Code: Select all
Grad(w) = (result - logistic(w * x)) * x * logistic(w * x) * (1 - logistic(w * x))
Code: Select all
Loss = (1 - result) * log(1 - logistic(w * x)) + result * logistic(w * x)
Code: Select all
Grad(w) = (result - logistic(w * x)) * x
-
- Posts: 741
- Joined: Tue May 22, 2007 11:13 am
Re: tuning via maximizing likelihood
I agree that plugging result = 1/2 into the log-likelihood for binomial logistical regression is ad hoc (even if it works numerically). The ordered logit model (aka cumulative link model aka proportional odds model) is ideally suited for handling draws. The loss function is thenAlvaroBegue wrote: I raised two separate objections. One of them is that, although using log-likelihood is somewhat theoretically motivated, it is not at all clear how draws should be handled.
Code: Select all
if (result == L) return log(logistic(theta_LD - w * x))
if (result == D) return log(logistic(theta_DW - w * x) - logistic(theta_LD - w *x))
if (result == W) return log(1 - logistic(theta_DW - w * x))
Since 1 - logistic(x) = logistic(-x), taking theta_DW = theta_LD = 0, you recover the log-likelihood for binomial regression.
-
- Posts: 2272
- Joined: Mon Sep 29, 2008 1:50 am
Re: tuning via maximizing likelihood
It seems one simply needs to choose an elo model (e.g. Bayes-Elo or Davidson)...
One may regard the evaluation function as a predictor for the elo of a position. Using an elo model one may convert the elo of a position into actual w/d/l ratios.
One may regard the evaluation function as a predictor for the elo of a position. Using an elo model one may convert the elo of a position into actual w/d/l ratios.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.
-
- Posts: 2272
- Joined: Mon Sep 29, 2008 1:50 am
Re: tuning via maximizing likelihood
In Arasan this seems to be implemented in a rather non-standard way. If e is the prediction by the evaluation function then (w,d,l) are set to e,1-e,(e*(1-e))**2. These do not sum up to 1 and so they are not probabilities.
My instinct would be to stay with a standard ML implementation (supported by a standard elo model). But of course it is impossible to know without testing.
My instinct would be to stay with a standard ML implementation (supported by a standard elo model). But of course it is impossible to know without testing.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.
-
- Posts: 741
- Joined: Tue May 22, 2007 11:13 am
Re: tuning via maximizing likelihood
The ordered logit model is such an elo-model. See e.g. Knorr‐Held, L., 2000. Dynamic rating of sports teams. Journal of the Royal Statistical Society: Series D (The Statistician), 49(2), pp.261-276.Michel wrote:In Arasan this seems to be implemented in a rather non-standard way. If e is the prediction by the evaluation function then (w,d,l) are set to e,1-e,(e*(1-e))**2. These do not sum up to 1 and so they are not probabilities.
My instinct would be to stay with a standard ML implementation (supported by a standard elo model). But of course it is impossible to know without testing.
The nice thing is that it uses a latent variable approach so that in a search tree, one can use only the linear predictor w*x (the log and logistic are monotonous transformations, so they leave the search outcome invariant).
-
- Posts: 741
- Joined: Tue May 22, 2007 11:13 am
Re: tuning via maximizing likelihood
Plugging result=1/2 into the binomial logistic likelihood forces 2 draw = 1 win + 1 loss.Daniel Shawul wrote:Good to know! So far i have had better results with the ML objective function -- even though both barely improved my engine. You seem to use a 1 draw = 2 wins + 2 losses approach unless I am mistaken, is that intentional ? I am only aware of elo models that use 1 draw = 1 win + 1 loss (rao-kapper), 2 draw = 1 win + 1 loss (davidson).
Daniel
-
- Posts: 2272
- Joined: Mon Sep 29, 2008 1:50 am
Re: tuning via maximizing likelihood
Interesting. For w,d,l outcomes this seems to translate into the Bayes-Elo model if we impose the additional (very reasonable) requirement that elo=0 implies w=l (assuming of course that we take the logistic function as the response function).The ordered logit model is such an elo-model. See e.g. Knorr‐Held, L., 2000. Dynamic rating of sports teams. Journal of the Royal Statistical Society: Series D (The Statistician), 49(2), pp.261-276.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.
-
- Posts: 2272
- Joined: Mon Sep 29, 2008 1:50 am
Re: tuning via maximizing likelihood
In fact the standard Bayes-Elo model with the "draw elo" and "white advantage" parameters corresponds exactly to the model in the paper.The ordered logit model is such an elo-model. See e.g. Knorr‐Held, L., 2000. Dynamic rating of sports teams. Journal of the Royal Statistical Society: Series D (The Statistician), 49(2), pp.261-276.
In the case of evaluation tuning the white advantage parameter is not relevant as it will have been incorporated in the stm bonus in the evaluation function.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.