Physics Computing Services



Spam Scoring and Filtering with SpamAssassin

Introduction

In May 2004 PCS implemented a spam scoring system using SpamAssassin for email delivered to mail.physics.ucsb.edu mailboxes. The system works by scanning incoming email for spam before it gets delivered to your Inbox. When the system finds email that matches verifiable spam message patterns, it adds a keyword and score to the Subject line indicating how certain it is that the message qualifies as spam; then delivers the email as usual to your mailbox. This lets you decide for yourself how to deal with spam. Our recommendation? Take a look at the How do I filter spam? section to learn how to configure your email account or client so it processes spam and moves it out of your Inbox.

The spam scorer scans all incoming email sent to mail.physics.ucsb.edu mailboxes. Messages over 250K in size are not scanned, since studies have shown that most spam messages are very small in size (plain text or HTML).


How does the system work?
How will this affect my email?
Is the system foolproof? What can I do about mistakes?
How do I filter marked spam out of my Inbox?
How do I change the way SpamAssassin behaves for my mail?


How does the system work? 

SpamAssassin assigns each message a final score is the sum of the scores from each test, including:

How will this affect my email? 

Is the system foolproof? What can I do about mistakes?

SpamAssassin is not foolproof. Email is, by nature, so varied that the anti-spam system will occasionally make mistakes. Since SpamAssassin's threshold for marking a message as spam is configurable, you have some control over the ratio of false positives and negatives. 

You can attempt to prevent future false negatives by adding the sender to your spam Blacklist.

You can also help reduce the frequency of false positives and negatives by forwarding the mistakes to PCS. If the mistakes are related to the score given by the Bayesian (content) test, the examples you send will be used to train the Bayesian filter so it doesn't repeat its error. Please make sure you forward the message with full headers.


How do I filter marked spam out of my Inbox?


1) You need to create a mailbox on the IMAP mail server to store marked spam.  Be sure to subscribe to this folder in your IMAP email client. You should look through this "trapped spam" mailbox at least once a week, to delete spam messages and catch any false positives.

2) You have 2 options on how to set up the filtering: client side filtering, or server-side with procmail.
a) Filter your email with your email client software, looking for the tag "X-Spam-Status: Yes" in the header, or Subject=[PCS-SPAM and moving those messages to your Spam mail folder.
b) Filter your email at the server, using a Procmail recipe which tosses any marked spam messages into a special mail folder on the server.  You can configure this using Option 14 the Menu shell on mail.physics.ucsb.edu.

How do I change the way SpamAssassin behaves for my mail?

In addition to sending the spam "mistakes" to pcs_spam - at - physics.ucsb.edu, you may wish to change the way SpamAssassin behaves, for instance, If you get email that is marked as spam but is legitimate, and you want to continue receiving.  To change your personal spam scoring settings, just login to Webmail (https://webmail.physics.ucsb.edu) to configure your personal spam scoring settings: click on Options, then Spam Filter Configuration.

Here is a list of settings you may customize:

Whitelist From
if a message arrives from an address matching this string or pattern, assume it is not spam.
Whitelist To
if a message arrives to this address (for instance, a list that you receive mail on as a member) assume it is not spam.
Spam Score
this is the minimum score (scores are provided in a header called X-Spam-Status:, which you should be able to see by looking at the full headers or "source" of a message) required to mark a message as spam. Raising this number will reduce "false positives" (messages incorrectly marked as spam) at the cost of increasing "false negatives" (spam which is not tagged as such), and lowering it will do the converse.  The default setting is 5.

You can use this option to opt-out of spam scoring, by setting the value to 10000.
Short Report
If this is unchecked, a brief description of each test which caused SpamAssassin to mark a message as spam is added to the headers or body of the message. If this setting is checked, only symbolic test names are provided.  The default is off (long report).
Acceptable Languages
Select the languages you expect to receive legitimate e-mail in.  This will override SpamAssassin's default setting, which marks messages containing foreign languages (non-English) with a higher score.

 
 What if I need help? If you have problems with or questions about spam scoring and filtering, please contact pcs@physics.ucsb.edu.


Physics Computing Services, pcs@physics.ucsb.edu
Last updated May 25, 2004 by Jennifer L. Mehl