Last week, I read an interesting e-mail from the Windows and .NET magazine network Exchange and Outlook update, discussing Spam URL Realtime Block Lists (SURBLs), which examine message contents to block spam. This week’s e-mail highlights a free Exchange Server SURBL – the RegEx filter.
The basic idea behind the RegEx Filter, is the ability to filter email based on any arbitrary text pattern. It is implemented as an event sink that hooks into the Exchange SMTP engine (by default, the filter works only with the first virtual server instance, but this can easily be changed) and applies regular expression tests against the message sender, recipient, or contents.
It is possible to specify any number of individual filters to run against incoming messages and the filter also includes:
- A large filter file that tests for common patterns found in adult-oriented spam.
- A whitelist of expressions to be allowed; by customizing this list, it is possible to easily whitelist addresses or senders.
- A list of blocked recipients; the filter drops blocked recipients as soon as it sees the SMTP
rcpt to
verb, instead of waiting until the mail has been accepted for delivery. - A list with expressions commonly found in “Nigerian scam” messages.
Any or all of these capabilities can be used to roll in additional filtered expressions (by editing XML files, as long as there is a regular expression that will match the messages to be accepted or dropped). The XML schema for the filtering language includes the ability to specify IP address ranges, perform DNS lookups and filter according to the results, slow down the sending mail server by imposing a timeout (for punishing repetitive spammers), and a host of other features.
I haven’t tested the filter yet (I need to move my e-mail service over to Exchange first), but in Paul Robichaux’s original article (from which the information in this post is taken) he suggested that the filter didn’t add any significant performance overhead and that it also includes a set of Performance Monitor counters that can be registered to assist in assessing any performance issues as a result of filtering.
Robichaux also highlighted that RegEx isn’t perfect: its documentation is pretty opaque, and there’s no real step-by-step guide to installing the filter on an Exchange server. Also (and potentially worrying) the default filter configuration logs all accepted messages to disk, exposing all valid, accepted mail in plain text form. Apart from the obvious security implications, these logs also consume a large amount of disk space. Fortunately, the logging can be turned off.
This looks to me, to be a useful (free) tool in the battle to prevent spam.