Usenet Spam Filter Policy (DRAFT)


Usenet Spam Filter Policy (DRAFT)

There are a number of different definitions of spam. This document deals with usenet Spam Filter Policy on agate (campus nntp server), not with defining SPAM. For different definitions of SPAM, you can web over to:
spam.abuse.net/faq.html
www.vix.com/spam/faq.html
www.uiuc.edu/ph/www/tskirvin/faqs/spam.html

agate.berkeley.edu provides news service for the UC Berkeley Campus. Agate carries the major hierarchies, and a number of foreign, regional, and academic groups, including being the source of the ucb.* hierarchy.

The development of a Spam Policy is closely related to the type of service provided to users and news peers. The intention is to provide access to news resources with a minimal amount of spam, and to provide a newsfeed to peers that offers as little spam as possible (reducing load on the sending server, the remote server, as well as reducing the network bandwidth required for news peering).

Spam has grown increasingly worse, and now accounts for the majority of usenet traffic (with cancel messages included). Cancel messages for spam are no longer able to cope with the high volume of spam traffic, and are too resource intensive: writing and storing an article to disk, writing a cancel message to disk, and then removing the cancelled article.

Because of this, a spam filtering policy has been established for agate which filters based upon the following criteria:

  • Excessive Cross Posting (ECP). Cross-posting to more than a given number of groups (currently 8).
  • Excessive Multi-Posting (EMP). Posting of an identical article numerous times.
  • Articles with invalid "From:" header. Addresses in the from header which do not match a potentially valid user name and FQDN (Fully Qualified Domain Name). The Perl regular expression which the "From:" header must match is:
    (.+?)\@([-\w\d]+\.)*([-\w\d]+)\.([-\w\d]{2,})
  • Make Money Fast (MMF) posts. MMF posts are typically identified by certain header elements, such as the appearance of the phrase "Make Money Fast" in the "Subject:" header. MMF and similar "subject-matter based" filtering is not currently implemented on agate.

    A number of methods for dealing with spam have been developed which focus on refusing spam articles before they are processed by the news host. They typically check the headers/body for certain characteristics which identify the article as spam, and refuse it. The resources required to do basic header/body checking are typically much less than the disk I/O that would result by processing the articles.

    The filtering policy employed on agate is often referred to in usenet circles as "non-aggressive" filtering. No attempt is made to filter with regards to subject matter or content of the articles, but rather on characteristic SPAM features such as repetetive posting or crossposting, or common indications of spam attempts, such as a from address (required according to RFC 1036) which is not in "Internet Syntax".

    [CNS Homepage] [IST Homepage] [UCB Homepage]

    Updated 9/30/98, by Chris van den Berg <chrisvdb@ack.berkeley.edu.berkeley.edu>