Spam filtering involves analyzing various pieces of information. The email itself is of course one full bag. The SMTP command parameters and DNS also contribute their share. Today we look at just one piece of the puzzle, the language used to author an email or more precisely the character set.
Before moving further, I will cover some basics. In simple terms a character set is a collection of characters allowing us to express ourselves in one or more languages. A single character sets is often able to cover a number of languages having small variations across them. For example Windows-1252 caters for English and various Western European languages.
more about CharacterSet on Wikipedia
Character Sets in SMTP Emails
SMTP as defined in RFC2821, only allow the use of 7-bit ASCII characters. This is a very small set that is unable to go much beyond the English language. Thus it was necessary to enable SMTP emails to somehow convey texts from other languages. The MIME standard provided a solution, defining methods for encoding non-ASCII text.
The basic idea is that of encoding character sequences from other sets using exclusively the 7-bit ASCII repertoire. MIME provides two solutions, one for email bodies and the other for headers such as the Subject, From and To.
MIME breaks emails into parts, packaging together blocks of content and headers. An email body is contained within a MIME part whose headers identify the character set and encoding type. In this manner an email exposes the character set used on authoring the content and specifies the encoding used to package it in 7-bit ASCII. This is enough for the receiving end to retrieve the body as originally intended.
Here is a snippet showing a body in ISO-2022-JP character set, widely used for the Cyrillic Language.
Adding koi8-r to your Microsoft Exchange Content Filter, for example, will flag or rate all emails using this charset as junk / spam.
to do that, simple use the Microsoft Exchange Management Shell.
Following table gives you an overview of the different codepages including a column to copy/paste the corresponding codepage as a Exchange Management Shell cmdlet.
[table id=2 /]
feel free to comment..!