Bayesian Filters

Bayesian Filters

A common problem with filters is the fact that they are
a one-size-fits all solution to SPAM. The rules are concrete
and only change based on input from updates from the Anti-spam
service.

SPAM changes too quickly to make that method effective.
Additionally, what is SPAM to you may not be to someone else.
That is where Bayesian filters come in.

They are very effective at eliminating SPAM and have
very low false-positive rates for their users.

Bayesian filters are based on Bayesian logic, a branch
of logic named for Thomas Bayes, an eighteenth century
Mathematician.

This type of logic applies to decision making by
determining the probability of a certain event based on the
history of past events.

Using this as a model seemed a logical step for SPAM
filtering. If you can predict what SPAM will look like now
based on what is has looked like in the past, you are halfway to
the solution.

To finish solving the problem, Bayesian filters were
developed to be dynamic and continue to be effective as the SPAM
changes.

Bayesian filters are content based. They look for
characteristics in each email that you receive and calculate the
probability of it actually being SPAM.

These characteristics are generally words in the content
and the header file information that each email contains. They
can also include common SPAM HTML code, word pairs, phrases, and
the location of a phrase in the body of the email.

Typical words in SPAM would be "Free" and "Win", while
"humility" would probably not appear. The filter begins with a
50% neutral score for the email, and then adds points for SPAM
characteristics.

Likewise, deductions are made for non-SPAM characteristics
present. The total score is calculated and then action is taken
based on its likelihood of being SPAM.

The filter does not assume that all arriving email is
bad, rather that all email is neutral and should be considered
equally.

Bayesian filters are better than traditional content
scoring filters in that they are trained by you to recognize
your email.

A doctor, for example, might have many emails
legitimately using the word "Viagra". A traditional content
scoring filter would probably shoot that email to the SPAM
folder, or delete it.

This would result in a high false-positive rate for the
doctor, even if you don't want Viagra emails. The filter will
build a list based on the doctors email use and corrections to
incorrectly marked email.

The initial training period may be a little time consuming,
but once complete offers a tailored solution to SPAM
control for each user.

In addition to protecting the good email, the filter makes
it difficult for Spammers to trick as every filter will have
individual requirements.

That being said, Spammers do have a few weapons in their
arsenal to attempt to circumvent Bayesian filters. The easiest
would be to create SPAM that looks like an everyday letter.

This would remove their ability to use typical marketing
techniques and so is not as likely with normal commercial email.
For the purveyors of fraud, however, this would be easier.

Spammers could also so weight a message with a common
good word, or distort the bad ones, that it becomes scored as
neutral or lower and get through.

Once correctly marked as SPAM by you, though, the filter
will adjust and not be fooled again. This automation and
ability of the software to grow as you and SPAM change over time
is key to the significance of these types of filters.

Widespread use of good Bayesian filters will not only
eliminate SPAM on your end, but would reduce the practice of
Spamming altogether. If they cannot get the mail through, they
are just wasting their time.

About the Author

Debbie Hamstead is the webmaster of http://www.StompingOutSPAM.com
Offering a comprehensive Quick Start Guide to keeping SPAM out
of your inbox. She also manages http://www.nichesites4profit.com