Why does em algorithm work




















It only takes a minute to sign up. Connect and share knowledge within a single location that is structured and easy to search. Update: it's been suggested that this is duplicate of Why should one use EM vs. It is indeed highly related, but the answer there says that EM is better but does not provide an explanation as for why. Also, it does not address the need to use the logarithm. It is simply stating an inference principle.

This is a mixture of two Gaussian distributions with unknown means. This is a smooth surface but from an optimisation perspective, it is not regular enough, offering saddle points, plateaus, multiple modes. This means that an off-the-shelf minimisation method like the gradient method or Newtwon-Raphson algorithm is unable to find the global maximum without a sufficiently fine partition of the parameter space.

Optimising this non-convex function is a difficult problem , which is not easily tackled as a purely mathematical maximisation problem. As for the question about the log, i. Sign up to join this community. The best answers are voted up and rise to the top. In statistical terminology, an SF model is a linear mixed-effects model, making it natural to consider the EM algorithm for maximum likelihood ML estimation.

The regression part of SF models represents the production frontier of the i -th firm: the response y is possibly some transformation of measured output and x is a vector of possibly transformed inputs. Stochastic Frontier Analysis. Cambridge University Press. The Econometric Approach to Efficiency Analysis. Oxford University Press. On estimating the industry production function. American Economic Review, R package version 0. R package version 1. Cary, NC. Stata Statistical Software : Release College Station, TX.

Econometric Software, Inc. Strictly speaking, the ML problem of SF analysis is a constrained optimization problem since there are parameters restricted to the positive axis.

With luck and good starting values this may be of no practical effect in a given application but the usual practice with Newton-like methods is to reparameterize the model and retrieve standard errors by means of the delta method.

EM updates will keep the iterates for the positive parameters positive and in our experience with SF estimation EM seems to work with more liberal starting values. The use of EM in SF estimation is not new. Maximum likelihood estimation of stochastic frontier production models. Journal of Econometrics, 18 2 : On maximum likelihood estimation of stochastic frontier production models.

Journal of Econometrics, 23 2 : The new scheme actually solved the likelihood equations and preserved positivity of variance estimates. Lee did not mention that his scheme was in fact the EM algorithm for the half normal SF model.

Southern Economic Journal, 50 3 : In Section 3. In later sections we develop the EM algorithm for the other three standard models. EM does not directly yield standard errors of estimates. Lee was not explicit about calculation of standard errors and Huang used least-squares errors for the regression coefficients.

We will use more general methods Section 3. Two illustrations are given in Section 4: estimation of a production frontier with the half normal model using data from the Brazilian Census of Agriculture and estimation of a cost function with exponential and gamma inefficiencies. Concluding remarks and some practical considerations are given in Section 5.

The resulting joint density is. In order to maximize the likelihood, instead of directly using 2 , the EM algorithm uses the complete-data likelihood Dempster et al.

Then, the E-step is the evaluation of. The above definitions will be used in the expressions for the Q function and its derivatives in the following sections. We close this section by noting that the integrals in the E-step can be carried out explicitly except in the gamma model Section 3. The original models used in SF analysis were the half normal and exponential models, the former being the default option in most software for SF analysis.

EM is developed for these two models in Sections 3. The econometric literature on SF models has considered generalizations of those two models to allow for a nonzero mode for u leading to the truncated normal Section 3.

Another direction of generalization is to make some or all of the parameters in the distribution of the noise dependent on covariates. In any case, the resulting EM algorithm will not render explicit updates as with the simple half normal and exponential specifications. The Q function, after some simplifications using expressions 6 — 9 , becomes constant term omitted.

We finally obtain the iterative scheme in Algorithm A below with w given by Some remarks are due. Note that Algorithm A has remarkably simple and intuitive updates. We now extend the EM algorithm to the other models starting with the exponential SF model. The notation developed in Section 2 and used in Section 3. Similar to Section 3. Remarks similar to those for the half normal model can be made.

In the real-world applications of machine learning, it is very common that there are many relevant features available for learning but only a small subset of them are observable. So, for the variables which are sometimes observable and sometimes not, then we can use the instances when that variable is visible is observed for the purpose of learning and then predict its value in the instances when it is not observable.

On the other hand, Expectation-Maximization algorithm can be used for the latent variables variables that are not directly observable and are actually inferred from the values of the other observed variables too in order to predict their values with the condition that the general form of probability distribution governing those latent variables is known to us. This algorithm is actually at the base of many unsupervised clustering algorithms in the field of machine learning.

It was explained, proposed and given its name in a paper published in by Arthur Dempster, Nan Laird, and Donald Rubin. It is used to find the local maximum likelihood parameters of a statistical model in the cases where latent variables are involved and the data is missing or incomplete.

Algorithm: Attention reader! Get hold of all the important Machine Learning Concepts with the Machine Learning Foundation Course at a student-friendly price and become industry ready.

Given a set of incomplete data, consider a set of starting parameters. Expectation step E — step : Using the observed available data of the dataset, estimate guess the values of the missing data.



0コメント

  • 1000 / 1000