A Bernoulli Naive Bayes Classifier. Naive Bayes classifiers assumes that features probabilities are independent. In order to classify an instance, it is calculated the probability that it was generated by each known class, that is, for each class C, the following probability is calculated.

p(C|F1, F2, F3... Fn)

Being F1, F2... Fn the instance features. Using the bayes theorem this can be written as:

\frac{p(C) \times p(F1, F2... Fn|C)}{p(F1, F2... Fn)}

In the Bernoulli Naive Bayes features are considered independent booleans, this means that the likelihood of a document given a class C is given by:

p(F1, F2... Fn) = \prod_{i=1}^{n}{[F_i \times p(f_i|C)) + (1-F_i)(1 - p(f_i|C)))]}


- F_i equals to 1 if feature is present in vector and zero
- p(f_i|C) the probability of class C generating the feature

For more information:

C.D. Manning, P. Raghavan and H. Schuetze (2008). Introduction to Information Retrieval. Cambridge University Press, pp. 234-265.

BernoulliNBClassifier is referenced in 1 repository