If I had a dollar for every time someone asked me this question, I would have enough money to buy Trevor Hastie's The Elements of Statistical Learning Second Edition :). Anyways, here is a good explanation from Algorithm's of the intelligent web on what is so naïve about naïve Bayesian?

*"This
is the calculation of the conditional probabilities *

*p(Y|X).*

The term

The term

*naïve has its origin*

in this method. Note that

instance is, in essence,

the joint probability of all the attribute value conditional probabilities.

Each attribute

value conditional probability is given by the term (aV.getCount()/concept-Priors.get(c)). In the preceding implementation, it’s assumed that all

these attribute values

are statistically independent, s

In general, without

the statistical independence of the attributes, the joint probability wouldn’t

be equal to that product."

in this method. Note that

**we’re seeking the probability of occurrence for a**

particular instance,

given a particular concept. But each instance is uniquely determined by

the unique values of its attributes. The conditional probability of theparticular instance,

given a particular concept. But each instance is uniquely determined by

the unique values of its attributes

instance is, in essence,

the joint probability of all the attribute value conditional probabilities.

Each attribute

value conditional probability is given by the term (aV.getCount()/concept-Priors.get(c)). In the preceding implementation, it’s assumed that all

these attribute values

are statistically independent, s

**o the joint probability is simply the product of**

the individual probabilities for each attribute value. That’s the “naïve” part.the individual probabilities for each attribute value. That’s the “naïve” part.

In general, without

the statistical independence of the attributes, the joint probability wouldn’t

be equal to that product."

And the interesting part is

*"We
use quotes around the word naïve because it turns out that the naïve Bayes algorithm
is very robust and widely applicable, even in problems where the attribute independence
assumption is clearly violated. In fact, it can be shown that the naïve Bayes
algorithm is optimal in the exact opposite case—when there’s a completely deterministic
dependency among the attributes"*

BTW, Algorithm's of the intelligent web is this excellent book by Haralambos Marmanis and Babenko Dmitry; recommended reading.

n simple terms, a naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature. For example, a fruit may be considered to be an apple if it is red, round, and about 4" in diameter. Even though these features depend on the existence of the other features, a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple.

Suppose your data consist of fruits, described by their color and shape. Bayesian classifiers operate by saying "If you see a fruit that is red and round, which type of fruit is it most likely to be, based on the observed data sample? In future, classify red and round fruit as that type of fruit."

Depending on the precise nature of the probability model, naive Bayes classifiers can be trained very efficiently in a supervised learning setting. In many practical applications, parameter estimation for naive Bayes models uses the method of maximum likelihood; in other words, one can work with the naive Bayes model without believing in Bayesian probability or using any Bayesian methods.