If I had a dollar for every time someone asked me this question, I would have enough money to buy Trevor Hastie's The Elements of Statistical Learning Second Edition :). Anyways, here is a good explanation from Algorithm's of the intelligent web on what is so naïve about naïve Bayesian?
is the calculation of the conditional probabilities p(Y|X).
The term naïve has its origin
in this method. Note that we’re seeking the probability of occurrence for a
given a particular concept. But each instance is uniquely determined by
the unique values of its attributes. The conditional probability of the
instance is, in essence,
the joint probability of all the attribute value conditional probabilities.
value conditional probability is given by the term (aV.getCount()/concept-Priors.get(c)). In the preceding implementation, it’s assumed that all
these attribute values
are statistically independent, so the joint probability is simply the product of
the individual probabilities for each attribute value. That’s the “naïve” part.
In general, without
the statistical independence of the attributes, the joint probability wouldn’t
be equal to that product."
And the interesting part is
use quotes around the word naïve because it turns out that the naïve Bayes algorithm
is very robust and widely applicable, even in problems where the attribute independence
assumption is clearly violated. In fact, it can be shown that the naïve Bayes
algorithm is optimal in the exact opposite case—when there’s a completely deterministic
dependency among the attributes"