The Maximum Likelihood Estimation (MLE) is the process of estimating the parameters of a model for sample data by finding the parameters that maximise the likelihood function. Consider a sample of independent and identically distributed random variables \( X = (X_1, X_2, ... , X_n) \) with the joint density function:

$$f(X_1, X_2, ... , X_n) = f(X_1) .f(X_2) ... f(X_n).$$

The likelihood function is defined as the probability of given observations \( (x_1, x_2,...,x_n) \) as a function of \( \theta\)

$$L(\theta; x_1, x_2, ... , x_n) = f(x_1, x_2, ... , x_n| \theta) = \prod_{i=1} ^n  f(x_i | \theta). $$

The maximum likelihood estimator is then given by:

$$ \hat{\theta} = argmax _{\theta} L(\theta; x_1, x_2,..., x_n ). $$  

Finding the maximum of a product of terms is tedious in practice, therefore we regard the logarithm of  \( L \) which translates the product into a summation instead, i.e.,

$$log(L(\theta; x_1, x_2, ... , x_n)) = \sum _{i=1} ^{n} log (f(x_i | \theta)).$$

Since the logarithm is an increasing function, finding the maximum of \(L\) is equivalent to maximising \(log(L) \), which is a much simpler problem.