- "naive" conditional independence assumptions: each feature
$$x_i$$ is conditionally independent of every other feature$$x_j$$ .
- Based on the maximum a posteriori or MAP decision rule, Bayes classifier:
$$ \widehat{y} = \mathop{\operatorname{argmax}}{k\in {1,...,K}}p(C_k)\prod{i=1}^{n}p(x_i|C_k) $$
- the continuous values associated with each class are distributed according to a Gaussian distribution
- training: calculate
$$\mu_k$$ and$$\sigma_k$$ from the data - prediction: plug in features
$$v_i$$ and calculate the probability of each class given those features then pick the class with the highest probability.