## Bayesian Machine Learning

Bayesian machine learning is a way of using Bayes theorem to estimate the posterior of model distribution given the observed data.

### Parameter Estimation

Parameter estimation is a common application of Bayesian machine learning. Given a model with some unknown parameters \theta , we use Bayes theorem to estimate the probability distribution p(\theta|x) of these parameters.

To make the posterior distributions easy to calculate, we often select the prior distribution to be conjugate prior as the posterior distributions.

CMU lecture video:

https://scs.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=49482c13-0a60-4b02-8220-cf70f20ecf3a

### Maximum a Posteriori (MAP) Estimation

Calculating the posterior distributions can involve complex integrals that are not directly calculatable. MAP is a method to simplify this calculation. Instead of calculating the whole posterior distribution, we estimate \theta to be the point with maximum posterior probability. It is equivalent to finding the mode in the distribution. The problem then becomes an optimization problem: to find the variable \theta that maximize the posterior probability of the model.

Although MAP is easier to implement, it also has some problems.

Problem 1: the mode can be an atypical point in the distribution. The following figure is an example of it.

Problem 2: MAP estimate is not invariant to reparmeterization. That is, given a transformation y=f(x), the mode of y does not necessarily equal to f(x_{mode}).

Reference:

## Point Estimation

To estimate the parameters of the underlying distribution based on the observations.

Key Definitions:

Moment Estimator

Maximum Likelihood Estimator

Expectation-Maximization

Loss function

## Envisage

verb (used with object)envisaged, envisaging.
1.to contemplate; visualize:

He envisages an era of great scientific discoveries.
2.Archaic. to look in the face of; face.
Synonyms: confront, consider, image, picture, regard, visualize

For example, in GENI it is envisaged that a researcher will be allocated a slice of re- sources across the whole network, consisting of a portion of network links, packet processing elements (e.g. routers) and end-hosts; researchers program their slices to behave as they wish.