Abstract
These are definitions for some of the words and terms that are used throughout this document.
A filter that assigns a probability of spam based on the recurrence of words (or, more recently, word constellations/phrases) between messages.
You initially train the filter by feeding it known junk mail (spam) and known legitimate mail (ham). A bayesian score is then be assigned to each word (or phrase) in each message, indicating whether this particular word or phrase occurs most commonly in ham or in spam. The word, along with its score, is stored in a bayesian index.
Such filters may catch indicators that may be missed by human programmers trying to manually create keyword-based filters. At the very least, they automate this task.
Bayesian word indexes are most certainly specific to the language in which they received training. Moreover, they are specific to individual users. Thus, they are perhaps more suitable for individual content filters (e.g. in Mail User Agents) than they are for system-wide, SMTP-time filtering.
Moreover, spammers have developed techniques to defeat simple bayesian filters, by including random dictionary words and/or short stories in their messages. This decreases the spam probability assigned by a baynesian filter, and in the long run, degrades the quality of the bayesian index.
See also: http://www.everything2.com/index.pl?node=Bayesian.
Blocking of a legitimate sender host due to an entry in a DNS blocklist.
Some blocklists (like SPEWS) routinely list the entire IP address space of an ISP if they feel the ISP is not responsive to abuse complaints, thereby affecting all its customers.
See also: False Positive
Automated messages sent in response to an original message (mostly spam or malware) where the sender address is forged. Typical examples of collateral spam include virus scan reports (“You have a virus”) or other Delivery Status Notifications).
(abbrev: DNS) The de-facto standard for obtaining information about internet domain names. Examples of such information include IP addresses of its servers (so-called A records), the dedication of incoming mail exchangers (MX records), generic server information (SRV records), and miscellaneous text information (TXT records).
DNS is a hierarctical, distributed system; each domain name is associated with a set of one or more DNS servers that provide information about that domain - including delegation of name service for its subdomains.
For instance, the top-level domain “org” is operated by The Public Interest Registry; its DNS servers delegate queries for the domain name “tldp.org” to specific name servers for The Linux Documentation Project. In turn, TLDPs name server (actually operated by UNC) may or may not delegate queries for third-level names, such as “www.tldp.org”.
DNS lookups are usually performed by forwarding name servers, such as those provided by an Internet Service Provider (e.g. via DHCP).
(abbrev: DSN) A message automatically created by an MTA or MDA, to inform the sender of an original messsage (usually included in the DSN) about its status. For instance, DSNs may inform the sender of the original message that it could not be delivered due to a temporary or permanent problem, and/or whether or not and for how long delivery attempts will continue.
Delivery Status Notifications are sent with an empty Envelope Sender address.
The e-mail address given as sender of a message during the SMTP transaction, using the MAIL FROM: command. This may be different from the address provided in the “From:” header of the message itself.
One special case is Delivery Status Notification (bounced message, return receipt, vacation message..). For such mails, the Envelope Sender is empty. This is to prevent Mail Loops, and generally to be able to distinguish these from “regular” mails.
See also: The SMTP Transaction
The e-mail address(es) to which the message is sent. These are provided during the SMTP transaction, using the RCPT TO command. These may be different from the addresses provided in the “To:” and “Cc:” headers of the message itself.
See also: The SMTP Transaction
Junk mail (spam, virus, malware) that is misclassified as legitimate mail (and consequently, not filtered out).
Legitimate mail that is misclassified as junk (and consequently, blocked).
See also: Collateral Damage.
(a.k.a. “FQDN”). A full, globally unique, internet name, including DNS domain. For instance: “www.yahoo.com”.
A FQDN does not always point to a single host. For instance, common service names such as “www” often point to many IP addresses, in order to provide some load balancing on the servers. However, the primary host name of a given machine should always be unique to that machine; for instance: “p16.www.scd.yahoo.com”.
A FQDN always contains a period ("."). The part before the first period is the unqualified name, and is not globally unique.
A spam designed to look like it came from someone else's valid address, often in a malicous attempt at generating complaints from third parties and/or cause other damage to the owner of that address.
See also: http://www.everything2.com/index.pl?node=Joe%20Job
(abbrev: MDA) Software that runs on the machine where a users' mailbox is located, to deliver mail into that mailbox. Often, that delivery is performed directly by the MTA Mail Transport Agent, which then serves a secondary role as an MDA. Examples of separate Mail Delivery Agents include: Deliver, Procmail, Cyrmaster and/or Cyrdeliver (from the Cyrus IMAP suite).
A situation where one automated message triggers another, which directly or indirectly triggers the first message over again, and so on.
Imagine a mailing list where one of the subscribers is the address of the list itself. This situation is often dealt with by the list server adding an “X-Loop:” line in the message header, and not processing mails that already have one.
Another equivalent term is Ringing.
(abbrev: MTA) Software that runs on a mail server, such as the mail exchanger(s) of a internet domain, to send mail to and receive mail from other hosts. Popular MTAs include: Sendmail, Postfix, Exim, Smail.
(abbrev: MUA; a.k.a. Mail Reader) User software to access, download, read, and send mail. Examples include Microsoft Outlook/Outlook Express, Apple Mail.app, Mozilla Thunderbird, Ximian Evolution.
(abbrev: MX) A machine dedicated to (sending and/or) receiving mail for an internet domain.
The DNS zone information for a internet domain normally contains a list of Fully Qualified Domain Names that act as incoming mail exchangers for that domain. Each such listing is called an “MX record”, and it also contains a number indicating its “priority” among several “MX records”. The listing with the lowest number has the first priority, and is considered the “primary mail exchanger” for that domain.
(a.k.a. sender pay schemes). The sender of a message expends some machine resources to create a virtual postage stamp for each recipient of a message - usually by solving a mathematical challenge that requires a large number of memory read/write operations, but is relatively CPU speed insensitive. This stamp is then added to the headers of the message, and the recipient would validate the stamp through a much simpler decoding operation.
The idea is that because the message requires a postage stamp for every recipient address, spamming hundreds or thousands of users at once would be prohibitively "expensive".
Two such systems are:
A proxy which openly accepts TCP/IP connections from anywhere, and forwards them anywhere.
These are typically exploited by spammers and virii, who use them to conceal their own IP address, and/or to more effectively distribute transmission loads across several hosts and networks.
See also: Zombie Host
A Relay which openly accepts mail from anywhere, and forwards them to anywhere.
In the 1980s, virtually every public SMTP server was an Open Relay. Messages would often travel between multiple third-party machines before it reached the intended recipient. Now, legitimate mail are almost exclusively sent directly from an outgoing Mail Transport Agent on the sender's end to the incoming Mail Exchanger(s) for the recipient's domain.
Conversely, Open Relay servers that still exist on the internet are almost exclusively exploited by spammers to hide their own identity, and to perform some load balancing on the task of sending out millions of messages, presumably before DNS blocklists have a chance to get all of these machines listed.
See also the discussion on Open Relay Prevention.
A machine that acts on behalf of someone else. It may forward e.g. HTTP requests or TCP/IP connections, usually to or from the internet. For instance, companies - or sometimes entire countries - often use “Web Proxy Servers” to filter outgoing HTTP requests from their internal network. This may or may not be transparent to the end user.
See also: Open Proxy, Relay.
Mass-mailing virii and e-mail software used by spammers, specifically designed to deliver large amounts of mail in a very short time.
Most ratware implementations incorporate only as much SMTP client code as strictly neccessary to deliver mail in the best-case scenario. They provide false or inaccurate information in the SMTP dialogue with the receiving host. They do not wait for responses from the receiver before issuing commands, and disconnect if no response has been received in a few seconds. They do not follow normal retry mechanisms in case of temporary failures.
A machine that forwards e-mail, usually to or from the internet. One example of a relay is the “smarthost” that an ISP provides to its customers for sending outgoing mail.
See also: Open Relay, proxy
(abbrev: RFC) From http://www.rfc-editor.org/: “ The Request for Comments (RFC) document series is a set of technical and organizational notes about the internet [...]. Memos in the RFC series discuss many aspects of computer networking, incluing protocols, procedures, programs, and concepts, as well as meeting notes, opinions, and sometimes humor. ”
These documents make up the “rules” internet conduct, including descriptions of protocols and data formats. Of particular interest for mail deliveries are:
An e-mail address that is seeded to address-harvesting robots via public locations, then used to feed collaborative tools such as DNS Blacklists and Junk Mail Signature Repository.
Mails sent to these addresses are normally spam or malware. However, some of it will be collateral, spam - i.e. Delivery Status Notification to faked sender addresses. Thus, unless the spam trap has safeguards in place to disregard such messages, the resulting tool may not be completely reliable.
A machine with an internet connection that is infected by a mass-mailing virus or worm. Such machines invariably run a flavor of the Microsoft® Windows® operating system, and are almost always in “residential” IP address blocks. Their owners either do not know or do not care that the machines are infected, and often, their ISP will not take any actions to shut them down.
Fortunately, there are various DNS blocklists, such as “dul.dnsbl.sorbs.net”, that incorporate such "residential" address blocks. You should be able to use these blocklists to reject incoming mail. Legitimate mail from residential users should normally go through their ISP's “smarthost”.