Regular expressions and Solaris 8

David Wolfskill david at catwhisker.org
Wed Jul 21 15:37:49 PDT 2004


As (some of) you may recall, in my role as postmaster at baylisa.org I make
use of a couple of different approaches to try to squelch spam at
BayLISA'a MTA.

One of those approaches is a content filter that uses regular
expressions.  The bulk of the specification I use for it are intended to
look for certain "spamvertized" domains.  (The census of these is now at
about 3975.)

Thus, a typical regex deployed for this use looked like

	`([^-0-9a-z]|([=%]2[ef]))2LD(=2E|\.)TLD`ie

where:
* the ` are the delimiters -- I didn't use / because sometimes I specify
  more of a URL, and they often have / characters in them.

* "2LD" is the second-level domain

* "TLD" is the top-level domain

* "ie" (after the closing delimiter) denotes case-insensitive matching
  and extended regular expression syntax.


Well, this morning, I received a spam that mentioned a known
spamvertized domain.  On looking at the spam a bit more closely, I saw
that the doamin name in question was left-anchored on the line; thus,
the above regex would not match (because it's looking for some sort of
delimiter to the left of the doamin name).

So I poked around in Jeffrey Friedl's _Mastering Regular Expressions_
and found that the construct "\<" may be used to serve as a "left
word-anchor" ... in some regular expression implementations.

I then tried using "egrep"on one of my FreeBSD boxen (running the same
flavor of FreeBSD as my home firewall/MTA) and found that a regex of the
form

	`\>2LD(=2E|\.)TLD`ie

fed to egrep appeared to work.

Then I got a little more adventurous:  some spammers like to use encodin
constructs for the URLS; I tried

	`(\<|([=%]2[ef]))2LD(=2E|\.)TLD`ie

and that appeared to work very nicely.

(The next step, assuming all works OK, is to use

	`(\<|([=%]2[ef]))2LD(=2E|\.)TLD\>`ie

though that's not really foolproof.)


However, when I tried the same egrep test on the BayLISA machine, it
failed to find the lines in question -- so I thought that maybe Solaris
8 didn't have supportfor \< and \> in its regex library.

But the regexp)5) man page seems to indicate that the construct is
recognized.

Anyone have any clue whether this ought to work or not?  (Note that the
application is a "milter," not egrep (per se).

Thanks,
david
-- 
David H. Wolfskill				david at catwhisker.org
I do not "unsubscribe" from email "services" to which I have not explicitly
subscribed.  Rather, I block spammers' access to SMTP servers I control,
and encourage others who are in a position to do so to do likewise.



More information about the Baylisa mailing list