Bayesian Database

Zoom Window Out
Larger Text | Smaller Text
Hide Page Header
Show Expanding Text
Printable Version
Send Mail Feedback
Save Permalink URL

Navigation: Admin Settings > Settings > Spam Filter > General

Bayesian Database

To get to this page, go to Settings → Spam Filter → General → Bayesian Database

The Bayesian Database is a database which is managed by VPOP3 containing statistics of previously received messages and their contents and whether they were detected as spam or not spam.

This database is used by the Bayesian Filter component of the spam filter.

After some time the database can contain details of many messages (eg the above screenshot shows that this installation contains details of over 120 million messages). This is not usually a problem.

VPOP3 will periodically maintain the database contents so that it does not grow uncontrollably - eg by removing terms which have only rarely been seen because they are unlikely to be useful.

If you wish, you can explictly clear database entries using the Clear Bayesian Statistics section. You can select entries based on how many times they have been seen, and clear them. So, for instance - 'with count=1' means that the term (word) has been seen in only one message so far.

Bayesian Analysis

The Bayesian Analysis section lets you perform a manual Bayesian analysis of an email message using VPOP3's Bayesian filter/database. You should copy/paste the message to analyze into the first box, and press the Analyse button. Make sure you include the full message headers of the message, because those are also used for the analysis.

After pressing the Analyse button, VPOP3 will show the Final Result value - this ranges from 0 for "definitely not spam" to 100 for "definitely spam". Below that it shows the 'terms' detected in the message, and details of the analysis of those.

The most 'interesting' terms are marked in bold - these are the only terms which are used to calculate the final result. Note that terms include message header fields as they are also indicative of the message spamminess just as the message content itself is - for instance, messages from a certain location may have a very low likelihood of being spam. Header field terms are shown as <header name> ":" <header data word>.

Then, the columns displayed are:

•# Ham - the number of times this term has been seen in 'ham' (not spam) messages.

•# Spam - the number of times this term has been seen in 'spam' messages.

•% Ham - the percentage of times this term has been seen in 'ham' messages.

•% Spam - the number of times this term has been seen in 'spam' messages.

•% Calc - the calculation of how 'spammy' this term is (low numbers are less spammy - a term which has only ever been seen in 'ham' messages will have a value of 0, and a term which has only ever been seen in 'spam' messages will have a value of 100.

See the Bayesian Filter section for more details on how the Bayesian analysis works.

If you think this help topic could be improved, please send us constructive feedback