Bayesian machine learning is a way of using Bayes theorem to estimate the posterior of model distribution given the observed data.

### Parameter Estimation

Parameter estimation is a common application of Bayesian machine learning. Given a model with some unknown parameters \theta , we use Bayes theorem to estimate the probability distribution p(\theta|x) of these parameters.

To make the posterior distributions easy to calculate, we often select the prior distribution to be **conjugate prior** as the posterior distributions.

CMU lecture video:

https://scs.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=49482c13-0a60-4b02-8220-cf70f20ecf3a

### Maximum a Posteriori (MAP) Estimation

Calculating the posterior distributions can involve complex integrals that are not directly calculatable. MAP is a method to simplify this calculation. Instead of calculating the whole posterior distribution, we estimate \theta to be the point with maximum posterior probability. It is equivalent to finding the mode in the distribution. The problem then becomes an optimization problem: to find the variable \theta that maximize the posterior probability of the model.

Although MAP is easier to implement, it also has some problems.

Problem 1: the mode can be an atypical point in the distribution. The following figure is an example of it.

Problem 2: MAP estimate is not invariant to reparmeterization. That is, given a transformation y=f(x), the mode of y does not necessarily equal to f(x_{mode}).

Reference:

https://metacademy.org/roadmaps/rgrosse/bayesian_machine_learning