Microsoft Exchange 2007: using Content Filter to fight Spam / Junk using Codepages
Spam filtering involves analyzing various pieces of information. The email itself is of course one full bag. The SMTP command parameters and DNS also contribute their share. Today we look at just one piece of the puzzle, the language used to author an email or more precisely the character set.
Before moving further, I will cover some basics. In simple terms a character set is a collection of characters allowing us to express ourselves in one or more languages. A single character sets is often able to cover a number of languages having small variations across them. For example Windows-1252 caters for English and various Western European languages.
more about CharacterSet on Wikipedia
Character Sets in SMTP Emails
SMTP as defined in RFC2821, only allow the use of 7-bit ASCII characters. This is a very small set that is unable to go much beyond the English language. Thus it was necessary to enable SMTP emails to somehow convey texts from other languages. The MIME standard provided a solution, defining methods for encoding non-ASCII text.
The basic idea is that of encoding character sequences from other sets using exclusively the 7-bit ASCII repertoire. MIME provides two solutions, one for email bodies and the other for headers such as the Subject, From and To.
MIME breaks emails into parts, packaging together blocks of content and headers. An email body is contained within a MIME part whose headers identify the character set and encoding type. In this manner an email exposes the character set used on authoring the content and specifies the encoding used to package it in 7-bit ASCII. This is enough for the receiving end to retrieve the body as originally intended.
Here is a snippet showing a body in ISO-2022-JP character set, widely used for the Cyrillic Language.
SPAM_Junk_Mail_KOI8-R
Adding koi8-r to your Microsoft Exchange Content Filter, for example, will flag or rate all emails using this charset as junk / spam.
to do that, simple use the Microsoft Exchange Management Shell.
see illustration
Add_ContentFilterPhrase
Following table gives you an overview of the different codepages including a column to copy/paste the corresponding codepage as a Exchange Management Shell cmdlet.
| Character Set Label | Win32 Code page | Character Set Name | Exchange Management Shell cmdlet |
|---|---|---|---|
| ansi_x3.4-1968 | 1252 | Western | Add-ContentFilterPhrase -Influence BadWord -Phrase "ansi_x3.4-1968" |
| ansi_x3.4-1986 | 1252 | Western | Add-ContentFilterPhrase -Influence BadWord -Phrase "ansi_x3.4-1986" |
| ascii | 1252 | Western | Add-ContentFilterPhrase -Influence BadWord -Phrase "ascii" |
| big5 | 950 | Traditional Chinese (BIG5) | Add-ContentFilterPhrase -Influence BadWord -Phrase "big5" |
| chinese | 936 | Chinese Simplified | Add-ContentFilterPhrase -Influence BadWord -Phrase "chinese" |
| cp367 | 1252 | Western | Add-ContentFilterPhrase -Influence BadWord -Phrase "cp367" |
| cp819 | 1252 | Western | Add-ContentFilterPhrase -Influence BadWord -Phrase "cp819" |
| csascii | 1252 | Western | Add-ContentFilterPhrase -Influence BadWord -Phrase "csascii" |
| csbig5 | 950 | Traditional Chinese (BIG5) | Add-ContentFilterPhrase -Influence BadWord -Phrase "csbig5" |
| cseuckr | 949 | Korean | Add-ContentFilterPhrase -Influence BadWord -Phrase "cseuckr" |
| cseucpkdfmtjapanese | CODE_JPN_EUC | Japanese (EUC) | Add-ContentFilterPhrase -Influence BadWord -Phrase "cseucpkdfmtjapanese" |
| csgb2312 | 936 | Chinese Simplified (GB2312) | Add-ContentFilterPhrase -Influence BadWord -Phrase "csgb2312" |
| csiso2022jp | CODE_JPN_JIS | Japanese (JIS-Allow 1 byte Kana) | Add-ContentFilterPhrase -Influence BadWord -Phrase "csiso2022jp" |
| csiso2022kr | 50225 | Korean (ISO) | Add-ContentFilterPhrase -Influence BadWord -Phrase "csiso2022kr" |
| csiso58gb231280 | 936 | Chinese Simplified (GB2312) | Add-ContentFilterPhrase -Influence BadWord -Phrase "csiso58gb231280" |
| csisolatin2 | 28592 | Central European (ISO) | Add-ContentFilterPhrase -Influence BadWord -Phrase "csisolatin2" |
| csisolatinhebrew | 1255 | Hebrew (ISO-Visual) | Add-ContentFilterPhrase -Influence BadWord -Phrase "csisolatinhebrew" |
| cskoi8r | 20866 | Cyrillic (KOI8-R) | Add-ContentFilterPhrase -Influence BadWord -Phrase "cskoi8r" |
| csksc56011987 | 949 | Korean | Add-ContentFilterPhrase -Influence BadWord -Phrase "csksc56011987" |
| csshiftjis | 932 | Shift-JIS | Add-ContentFilterPhrase -Influence BadWord -Phrase "csshiftjis" |
| euc-kr | 949 | Korean | Add-ContentFilterPhrase -Influence BadWord -Phrase "euc-kr" |
| extended_unix_code _packed_format_for _japanese | CODE_JPN_EUC | Japanese (EUC) | Add-ContentFilterPhrase -Influence BadWord -Phrase "extended_unix_code_packed_format_for_japanese" |
| gb2312 | 936 | Chinese Simplified (GB2312) | Add-ContentFilterPhrase -Influence BadWord -Phrase "gb2312" |
| gb_2312-80 | 936 | Chinese Simplified (GB2312) | Add-ContentFilterPhrase -Influence BadWord -Phrase "gb_2312-80" |
| hebrew | 1255 | Hebrew | Add-ContentFilterPhrase -Influence BadWord -Phrase "hebrew" |
| hz-gb-2312 | 936 | Chinese Simplified (HZ) | Add-ContentFilterPhrase -Influence BadWord -Phrase "hz-gb-2312" |
| ibm367 | 1252 | Western | Add-ContentFilterPhrase -Influence BadWord -Phrase "ibm367" |
| ibm819 | 1252 | Western | Add-ContentFilterPhrase -Influence BadWord -Phrase "ibm819" |
| ibm852 | 852 | Central European (DOS) | Add-ContentFilterPhrase -Influence BadWord -Phrase "ibm852" |
| ibm866 | 866 | Cyrillic (DOS) | Add-ContentFilterPhrase -Influence BadWord -Phrase "ibm866" |
| iso-2022-jp | CODE_JPN_JIS | Japanese (JIS) | Add-ContentFilterPhrase -Influence BadWord -Phrase "iso-2022-jp" |
| iso-2022-kr | 50225 | Korean (ISO) | Add-ContentFilterPhrase -Influence BadWord -Phrase "iso-2022-kr" |
| iso-8859-1 | 1252 | Western | Add-ContentFilterPhrase -Influence BadWord -Phrase "iso-8859-1" |
| iso-8859-2 | 28592 | Central European (ISO) | Add-ContentFilterPhrase -Influence BadWord -Phrase "iso-8859-2" |
| iso-8859-8 | 1255 | Hebrew (ISO-Visual) | Add-ContentFilterPhrase -Influence BadWord -Phrase "iso-8859-8" |
| iso-ir-100 | 1252 | Western | Add-ContentFilterPhrase -Influence BadWord -Phrase "iso-ir-100" |
| iso-ir-101 | 28592 | Central European (ISO) | Add-ContentFilterPhrase -Influence BadWord -Phrase "iso-ir-101" |
| iso-ir-138 | 1255 | Hebrew (ISO-Visual) | Add-ContentFilterPhrase -Influence BadWord -Phrase "iso-ir-138" |
| iso-ir-149 | 949 | Korean | Add-ContentFilterPhrase -Influence BadWord -Phrase "iso-ir-149" |
| iso-ir-58 | 936 | Chinese Simplified (GB2312) | Add-ContentFilterPhrase -Influence BadWord -Phrase "iso-ir-58" |
| iso-ir-6 | 1252 | Western | Add-ContentFilterPhrase -Influence BadWord -Phrase "iso-ir-6" |
| iso646-us | 1252 | Western | Add-ContentFilterPhrase -Influence BadWord -Phrase "iso646-us" |
| iso8859-1 | 1252 | Western | Add-ContentFilterPhrase -Influence BadWord -Phrase "iso8859-1" |
| iso8859-2 | 28592 | Central European (ISO) | Add-ContentFilterPhrase -Influence BadWord -Phrase "iso8859-2" |
| iso_646.irv:1991 | 1252 | Western | Add-ContentFilterPhrase -Influence BadWord -Phrase "iso_646.irv:1991" |
| iso_8859-1 | 1252 | Western | Add-ContentFilterPhrase -Influence BadWord -Phrase "iso_8859-1" |
| iso_8859-1:1987 | 1252 | Western | Add-ContentFilterPhrase -Influence BadWord -Phrase "iso_8859-1:1987" |
| iso_8859-2 | 28592 | Central European (ISO) | Add-ContentFilterPhrase -Influence BadWord -Phrase "iso_8859-2" |
| iso_8859-2:1987 | 28592 | Central European (ISO) | Add-ContentFilterPhrase -Influence BadWord -Phrase "iso_8859-2:1987" |
| iso_8859-8 | 1255 | Hebrew (ISO-Visual) | Add-ContentFilterPhrase -Influence BadWord -Phrase "iso_8859-8" |
| iso_8859-8:1988 | 1255 | Hebrew (ISO-Visual) | Add-ContentFilterPhrase -Influence BadWord -Phrase "iso_8859-8:1988" |
| koi8-r | 20866 | Cyrillic (KOI8-R) | Add-ContentFilterPhrase -Influence BadWord -Phrase "koi8-r" |
| korean | 949 | Korean | Add-ContentFilterPhrase -Influence BadWord -Phrase "korean" |
| ks-c-5601 | 949 | Korean | Add-ContentFilterPhrase -Influence BadWord -Phrase "ks-c-5601" |
| ks-c-5601-1987 | 949 | Korean | Add-ContentFilterPhrase -Influence BadWord -Phrase "ks-c-5601-1987" |
| ks_c_5601 | 949 | Korean | Add-ContentFilterPhrase -Influence BadWord -Phrase "ks_c_5601" |
| ks_c_5601-1987 | 949 | Korean | Add-ContentFilterPhrase -Influence BadWord -Phrase "ks_c_5601-1987" |
| ks_c_5601-1989 | 949 | Korean | Add-ContentFilterPhrase -Influence BadWord -Phrase "ks_c_5601-1989" |
| ksc-5601 | 949 | Korean | Add-ContentFilterPhrase -Influence BadWord -Phrase "ksc-5601" |
| ksc5601 | 949 | Korean | Add-ContentFilterPhrase -Influence BadWord -Phrase "ksc5601" |
| ksc_5601 | 949 | Korean | Add-ContentFilterPhrase -Influence BadWord -Phrase "ksc_5601" |
| l2 | 28592 | Central European (ISO) | Add-ContentFilterPhrase -Influence BadWord -Phrase "l2" |
| latin1 | 1252 | Western | Add-ContentFilterPhrase -Influence BadWord -Phrase "latin1" |
| latin2 | 28592 | Central European (ISO) | Add-ContentFilterPhrase -Influence BadWord -Phrase "latin2" |
| ms_kanji | 932 | Shift-JIS | Add-ContentFilterPhrase -Influence BadWord -Phrase "ms_kanji" |
| shift-jis | 932 | Shift-JIS | Add-ContentFilterPhrase -Influence BadWord -Phrase "shift-jis" |
| shift_jis | 932 | Shift-JIS | Add-ContentFilterPhrase -Influence BadWord -Phrase "shift_jis" |
| us | 1252 | Western | Add-ContentFilterPhrase -Influence BadWord -Phrase "us" |
| us-ascii | 1252 | Western | Add-ContentFilterPhrase -Influence BadWord -Phrase "us-ascii" |
| windows-1250 | 1250 | Central European (Windows) | Add-ContentFilterPhrase -Influence BadWord -Phrase "windows-1250" |
| windows-1251 | 1251 | Cyrillic (Windows) | Add-ContentFilterPhrase -Influence BadWord -Phrase "windows-1251" |
| windows-1252 | 1252 | Western | Add-ContentFilterPhrase -Influence BadWord -Phrase "windows-1252" |
| windows-1253 | 1253 | Greek (Windows) | Add-ContentFilterPhrase -Influence BadWord -Phrase "windows-1253" |
| windows-1254 | 1254 | Turkish (Windows) | Add-ContentFilterPhrase -Influence BadWord -Phrase "windows-1254" |
| windows-1255 | 1255 | Hebrew | Add-ContentFilterPhrase -Influence BadWord -Phrase "windows-1255" |
| windows-1256 | 1256 | Arabic | Add-ContentFilterPhrase -Influence BadWord -Phrase "windows-1256" |
| windows-1257 | 1257 | Baltic (Windows) | Add-ContentFilterPhrase -Influence BadWord -Phrase "windows-1257" |
| windows-1258 | 1258 | Vietnamese | Add-ContentFilterPhrase -Influence BadWord -Phrase "windows-1258" |
| windows-874 | 874 | Thai | Add-ContentFilterPhrase -Influence BadWord -Phrase "windows-874" |
| x-cp1250 | 1250 | Central European (Windows) | Add-ContentFilterPhrase -Influence BadWord -Phrase "x-cp1250" |
| x-cp1251 | 1251 | Cyrillic (Windows) | Add-ContentFilterPhrase -Influence BadWord -Phrase "x-cp1251" |
| x-euc | CODE_JPN_EUC | Japanese (EUC) | Add-ContentFilterPhrase -Influence BadWord -Phrase "x-euc" |
| x-euc-jp | CODE_JPN_EUC | Japanese (EUC) | Add-ContentFilterPhrase -Influence BadWord -Phrase "x-euc-jp" |
| x-sjis | 932 | Shift-JIS | Add-ContentFilterPhrase -Influence BadWord -Phrase "x-sjis" |
| x-x-big5 | 950 | Traditional Chinese (BIG5) | Add-ContentFilterPhrase -Influence BadWord -Phrase "x-x-big5" |
feel free to comment..!
Loading...
We are using Exchnage 2007 with Forefront server , we have getting lots of SPAM mail from KOI8 , i have given the follwoing exampls , please help me out.
Subject : Поиск заказов
Message Body : Здравствуйте!
Меня зовут Влад, я предлагаю сотрудничество.
У нашей компании есть возможность поиска заказчиков для Вашего бизнеса.
Если Вам это может быть интересно, пожалуйста свяжитесь со мной, предложу различные схемы:
Телефон: ( Ч 9 5)5 8960 3 4
ICQ: 6 9 9099
Internet header :
Received: from dsldevice.lan (59.101.1.137) by copuex01.coreobjects.com
(192.168.11.18) with Microsoft SMTP Server id 8.1.240.5; Fri, 14 Aug 2009
16:04:40 +0530
Received: from 59.101.1.137 by mxs.mail.ru; Fri, 14 Aug 2009 20:34:25 +1000
From: Amalia Jaramillo
To:
Subject: =?koi8-r?B?8M/J08sg2sHLwdrP1w==?=
Date: Fri, 14 Aug 2009 20:34:25 +1000
Message-ID:
MIME-Version: 1.0
Content-Type: text/plain; charset=”koi8-r”
Content-Transfer-Encoding: 8bit
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0)
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.2663
Importance: Normal
Return-Path: intrusivesx76@list.ru
X-MS-Exchange-Organization-PRD: list.ru
X-MS-Exchange-Organization-SenderIdResult: SoftFail
Received-SPF: SoftFail (copuex01.coreobjects.com: domain of transitioning
intrusivesx76@list.ru discourages use of 59.101.1.137 as permitted sender)
X-MS-Exchange-Organization-SCL: 4
X-MS-Exchange-Organization-PCL: 2
X-MS-Exchange-Organization-Antispam-Report: DV:3.3.5705.600;SID:SenderIDStatus SoftFail;OrigIP:59.101.1.137
X-Auto-Response-Suppress: DR, OOF, AutoReply
Did you try adding the phrase
charset=”koi8-r”
To the content filter?
Using Exchange Management Shell, as Admin you could use this cmdlet to add the string/phrase koi8-r to your Exchange Content Filter as a bad phrase => Add-ContentFilterPhrase -Influence BadWord -Phrase “koi8-r”
Or use the Exchange Management Console, navigate to Edge Transport, double click Content Filtering Feature, switch to custom words and add koi8-r to the bad word list to block all mails with koi8-r codepage / charset.
hope that help, otherwise let me know.
Christian