tuning via maximizing likelihood

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

tuning via maximizing likelihood

Post by Daniel Shawul »

Why is the tuning methods minimize the mean-squared-error directly instead of maximizing the likelihood ( or minimizing the negative likelihood). Given r=result, the objective with the maximum likelihood estimation would be

Code: Select all

   like =  r * log( logistic(score) ) + (1 - r) log( 1 - logistic(score) )
   objective = 1/N Sum(  -like  )
This should converge faster than plain least squares regression using mean squared error

Code: Select all

   se = (r - logistic(score)) ** 2
   objective = 1/N Sum ( se )
Daniel
AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: tuning via maximizing likelihood

Post by AlvaroBegue »

How do you handle draws? Or does your evaluation function return W/D/L probabilities?

But the real answer is that I don't need to penalize my evaluation function infinitely for getting one case wrong, which using logs would do.
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: tuning via maximizing likelihood

Post by Daniel Shawul »

How do you handle draws? Or does your evaluation function return W/D/L probabilities?
Draws should be fine i think. Unless we have a score of -inf, inf, the logistic function should return scores between 0 and 1.

Edit:
But the real answer is that I don't need to penalize my evaluation function infinitely for getting one case wrong, which using logs would do.
Maximum likelihood is the 'standard' method for logistic regression. Also i seem to get more stable iterations with it than minimzing the mse directly.
AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: tuning via maximizing likelihood

Post by AlvaroBegue »

Daniel Shawul wrote:
How do you handle draws? Or does your evaluation function return W/D/L probabilities?
Draws should be fine i think. Unless we have a score of -inf, inf, the logistic function should return scores between 0 and 1.
The log-likelihood function you posted is not bounded. If my evaluation function predicts the probability of the result is 0, it gets an infinite penalty.

I don't know why you think that the convergence would be better than using mean squared error. I don't really know if it would be, but I am curious if you have a reason to believe that a priori.
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: tuning via maximizing likelihood

Post by Daniel Shawul »

AlvaroBegue wrote:
Daniel Shawul wrote:
How do you handle draws? Or does your evaluation function return W/D/L probabilities?
Draws should be fine i think. Unless we have a score of -inf, inf, the logistic function should return scores between 0 and 1.
The log-likelihood function you posted is not bounded. If my evaluation function predicts the probability of the result is 0, it gets an infinite penalty.
The evaluation score of my engine is between (-20000, 20000), and for a draw it would be score=0 or logistic(score) = 0.5. So draws are no problem as far as i can see.
AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: tuning via maximizing likelihood

Post by AlvaroBegue »

Daniel Shawul wrote:
AlvaroBegue wrote:
Daniel Shawul wrote:
How do you handle draws? Or does your evaluation function return W/D/L probabilities?
Draws should be fine i think. Unless we have a score of -inf, inf, the logistic function should return scores between 0 and 1.
The log-likelihood function you posted is not bounded. If my evaluation function predicts the probability of the result is 0, it gets an infinite penalty.
The evaluation score of my engine is between (-20000, 20000), and for a draw it would be score=0 or logistic(score) = 0.5. So draws are no problem as far as i can see.
I raised two separate objections. One of them is that, although using log-likelihood is somewhat theoretically motivated, it is not at all clear how draws should be handled.

The other [more serious] objection is that the penalty imposed for getting one single sample wrong in the training set is unbounded in the case of the log-likelihood formula, while it is bounded if you use mean squared error.
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: tuning via maximizing likelihood

Post by Daniel Shawul »

I don't really get what you are missing here -- again draws are not a problem as far as i can see.

The maximum-likelihood estimation can even be used to train neural networks (multiple layers) instead of a single layer evaluation function we are talking about here. Here is an example paper (http://pubmedcentralcanada.ca/pmcc/arti ... 4-0306.pdf ) where they show that backpropagation done with a maximum likelihood objective (ML-BP) is shown to be better than the least squares objective (LS-BP).

Daniel
jdart
Posts: 4366
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: tuning via maximizing likelihood

Post by jdart »

My tuner actually has an option to do this, and in addition can do Ordinal Logistic Regression. See Objective enum in:

https://github.com/jdart1/arasan-chess/ ... /tuner.cpp

I have done very limited experimentation but generally have not found these options better than mean-squared error.

--Jon
AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: tuning via maximizing likelihood

Post by AlvaroBegue »

Daniel Shawul wrote:I don't really get what you are missing here -- again draws are not a problem as far as i can see.
Log-likelihood is a very natural quantity to maximize if you have a probability model. So if we had some procedure that produced a probability for winning, a probability for drawing and a probability for losing, it would make sense to penalize by the -log of the probability of the outcome that really happened. But the particular penalty you are using for a draw is not well motivated. Nothing will blow up, but what you are doing is not exactly maximizing likelihood.
The maximum-likelihood estimation can even be used to train neural networks (multiple layers) instead of a single layer evaluation function we are talking about here. Here is an example paper (http://pubmedcentralcanada.ca/pmcc/arti ... 4-0306.pdf ) where they show that backpropagation done with a maximum likelihood objective (ML-BP) is shown to be better than the least squares objective (LS-BP).
Yes, it's a very common thing to optimize if what you are maximizing over are probability models. But, as I said, that's not exactly true here.

Oh, and I wouldn't go around quoting neural-network papers from 1992. :)
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: tuning via maximizing likelihood

Post by Daniel Shawul »

Good to know! So far i have had better results with the ML objective function -- even though both barely improved my engine. You seem to use a 1 draw = 2 wins + 2 losses approach unless I am mistaken, is that intentional ? I am only aware of elo models that use 1 draw = 1 win + 1 loss (rao-kapper), 2 draw = 1 win + 1 loss (davidson).

Daniel