Microsoft Exchange 2007: using Content Filter to fight Spam / Junk using Codepages

Your ads will be inserted here by

Easy Plugin for AdSense.

Please go to the plugin admin page to
Paste your ad code OR
Suppress this ad slot.

Spam filtering involves analyzing various pieces of information. The email itself is of course one full bag. The SMTP command parameters and DNS also contribute their share. Today we look at just one piece of the puzzle, the language used to author an email or more precisely the character set.

Before moving further, I will cover some basics. In simple terms a character set is a collection of characters allowing us to express ourselves in one or more languages. A single character sets is often able to cover a number of languages having small variations across them. For example Windows-1252 caters for English and various Western European languages.

more about CharacterSet on Wikipedia

Character Sets in SMTP Emails

SMTP as defined in RFC2821, only allow the use of 7-bit ASCII characters. This is a very small set that is unable to go much beyond the English language. Thus it was necessary to enable SMTP emails to somehow convey texts from other languages. The MIME standard provided a solution, defining methods for encoding non-ASCII text.

The basic idea is that of encoding character sequences from other sets using exclusively the 7-bit ASCII repertoire. MIME provides two solutions, one for email bodies and the other for headers such as the Subject, From and To.

MIME breaks emails into parts, packaging together blocks of content and headers. An email body is contained within a MIME part whose headers identify the character set and encoding type. In this manner an email exposes the character set used on authoring the content and specifies the encoding used to package it in 7-bit ASCII. This is enough for the receiving end to retrieve the body as originally intended.

Your ads will be inserted here by

Easy Plugin for AdSense.

Please go to the plugin admin page to
Paste your ad code OR
Suppress this ad slot.

Here is a snippet showing a body in ISO-2022-JP character set, widely used for the Cyrillic Language.

SPAM_Junk_Mail_KOI8-R

SPAM_Junk_Mail_KOI8-R

Adding koi8-r to your Microsoft Exchange Content Filter, for example, will flag or rate all emails using this charset as junk / spam.

to do that, simple use the Microsoft Exchange Management Shell.

see illustration

Add_ContentFilterPhrase

Add_ContentFilterPhrase

Following table gives you an overview of the different codepages including a column to copy/paste the corresponding codepage as a Exchange Management Shell cmdlet.

Character Set LabelWin32 Code pageCharacter Set NameExchange Management Shell cmdlet
ansi_x3.4-19681252WesternAdd-ContentFilterPhrase -Influence BadWord -Phrase "ansi_x3.4-1968"
ansi_x3.4-19861252WesternAdd-ContentFilterPhrase -Influence BadWord -Phrase "ansi_x3.4-1986"
ascii1252WesternAdd-ContentFilterPhrase -Influence BadWord -Phrase "ascii"
big5950Traditional Chinese (BIG5)Add-ContentFilterPhrase -Influence BadWord -Phrase "big5"
chinese936Chinese SimplifiedAdd-ContentFilterPhrase -Influence BadWord -Phrase "chinese"
cp3671252WesternAdd-ContentFilterPhrase -Influence BadWord -Phrase "cp367"
cp8191252WesternAdd-ContentFilterPhrase -Influence BadWord -Phrase "cp819"
csascii1252WesternAdd-ContentFilterPhrase -Influence BadWord -Phrase "csascii"
csbig5950Traditional Chinese (BIG5)Add-ContentFilterPhrase -Influence BadWord -Phrase "csbig5"
cseuckr949KoreanAdd-ContentFilterPhrase -Influence BadWord -Phrase "cseuckr"
cseucpkdfmtjapaneseCODE_JPN_EUCJapanese (EUC)Add-ContentFilterPhrase -Influence BadWord -Phrase "cseucpkdfmtjapanese"
csgb2312936Chinese Simplified (GB2312)Add-ContentFilterPhrase -Influence BadWord -Phrase "csgb2312"
csiso2022jpCODE_JPN_JISJapanese (JIS-Allow 1 byte Kana)Add-ContentFilterPhrase -Influence BadWord -Phrase "csiso2022jp"
csiso2022kr50225Korean (ISO)Add-ContentFilterPhrase -Influence BadWord -Phrase "csiso2022kr"
csiso58gb231280936Chinese Simplified (GB2312)Add-ContentFilterPhrase -Influence BadWord -Phrase "csiso58gb231280"
csisolatin228592Central European (ISO)Add-ContentFilterPhrase -Influence BadWord -Phrase "csisolatin2"
csisolatinhebrew1255Hebrew (ISO-Visual)Add-ContentFilterPhrase -Influence BadWord -Phrase "csisolatinhebrew"
cskoi8r20866Cyrillic (KOI8-R)Add-ContentFilterPhrase -Influence BadWord -Phrase "cskoi8r"
csksc56011987949KoreanAdd-ContentFilterPhrase -Influence BadWord -Phrase "csksc56011987"
csshiftjis932Shift-JISAdd-ContentFilterPhrase -Influence BadWord -Phrase "csshiftjis"
euc-kr949KoreanAdd-ContentFilterPhrase -Influence BadWord -Phrase "euc-kr"
extended_unix_code
_packed_format_for
_japanese
CODE_JPN_EUCJapanese (EUC)Add-ContentFilterPhrase -Influence BadWord -Phrase "extended_unix_code_packed_format_for_japanese"
gb2312936Chinese Simplified (GB2312)Add-ContentFilterPhrase -Influence BadWord -Phrase "gb2312"
gb_2312-80936Chinese Simplified (GB2312)Add-ContentFilterPhrase -Influence BadWord -Phrase "gb_2312-80"
hebrew1255HebrewAdd-ContentFilterPhrase -Influence BadWord -Phrase "hebrew"
hz-gb-2312936Chinese Simplified (HZ)Add-ContentFilterPhrase -Influence BadWord -Phrase "hz-gb-2312"
ibm3671252WesternAdd-ContentFilterPhrase -Influence BadWord -Phrase "ibm367"
ibm8191252WesternAdd-ContentFilterPhrase -Influence BadWord -Phrase "ibm819"
ibm852852Central European (DOS)Add-ContentFilterPhrase -Influence BadWord -Phrase "ibm852"
ibm866866Cyrillic (DOS)Add-ContentFilterPhrase -Influence BadWord -Phrase "ibm866"
iso-2022-jpCODE_JPN_JISJapanese (JIS)Add-ContentFilterPhrase -Influence BadWord -Phrase "iso-2022-jp"
iso-2022-kr50225Korean (ISO)Add-ContentFilterPhrase -Influence BadWord -Phrase "iso-2022-kr"
iso-8859-11252WesternAdd-ContentFilterPhrase -Influence BadWord -Phrase "iso-8859-1"
iso-8859-228592Central European (ISO)Add-ContentFilterPhrase -Influence BadWord -Phrase "iso-8859-2"
iso-8859-81255Hebrew (ISO-Visual)Add-ContentFilterPhrase -Influence BadWord -Phrase "iso-8859-8"
iso-ir-1001252WesternAdd-ContentFilterPhrase -Influence BadWord -Phrase "iso-ir-100"
iso-ir-10128592Central European (ISO)Add-ContentFilterPhrase -Influence BadWord -Phrase "iso-ir-101"
iso-ir-1381255Hebrew (ISO-Visual)Add-ContentFilterPhrase -Influence BadWord -Phrase "iso-ir-138"
iso-ir-149949KoreanAdd-ContentFilterPhrase -Influence BadWord -Phrase "iso-ir-149"
iso-ir-58936Chinese Simplified (GB2312)Add-ContentFilterPhrase -Influence BadWord -Phrase "iso-ir-58"
iso-ir-61252WesternAdd-ContentFilterPhrase -Influence BadWord -Phrase "iso-ir-6"
iso646-us1252WesternAdd-ContentFilterPhrase -Influence BadWord -Phrase "iso646-us"
iso8859-11252WesternAdd-ContentFilterPhrase -Influence BadWord -Phrase "iso8859-1"
iso8859-228592Central European (ISO)Add-ContentFilterPhrase -Influence BadWord -Phrase "iso8859-2"
iso_646.irv:19911252WesternAdd-ContentFilterPhrase -Influence BadWord -Phrase "iso_646.irv:1991"
iso_8859-11252WesternAdd-ContentFilterPhrase -Influence BadWord -Phrase "iso_8859-1"
iso_8859-1:19871252WesternAdd-ContentFilterPhrase -Influence BadWord -Phrase "iso_8859-1:1987"
iso_8859-228592Central European (ISO)Add-ContentFilterPhrase -Influence BadWord -Phrase "iso_8859-2"
iso_8859-2:198728592Central European (ISO)Add-ContentFilterPhrase -Influence BadWord -Phrase "iso_8859-2:1987"
iso_8859-81255Hebrew (ISO-Visual)Add-ContentFilterPhrase -Influence BadWord -Phrase "iso_8859-8"
iso_8859-8:19881255Hebrew (ISO-Visual)Add-ContentFilterPhrase -Influence BadWord -Phrase "iso_8859-8:1988"
koi8-r20866Cyrillic (KOI8-R)Add-ContentFilterPhrase -Influence BadWord -Phrase "koi8-r"
korean949KoreanAdd-ContentFilterPhrase -Influence BadWord -Phrase "korean"
ks-c-5601949KoreanAdd-ContentFilterPhrase -Influence BadWord -Phrase "ks-c-5601"
ks-c-5601-1987949KoreanAdd-ContentFilterPhrase -Influence BadWord -Phrase "ks-c-5601-1987"
ks_c_5601949KoreanAdd-ContentFilterPhrase -Influence BadWord -Phrase "ks_c_5601"
ks_c_5601-1987949KoreanAdd-ContentFilterPhrase -Influence BadWord -Phrase "ks_c_5601-1987"
ks_c_5601-1989949KoreanAdd-ContentFilterPhrase -Influence BadWord -Phrase "ks_c_5601-1989"
ksc-5601949KoreanAdd-ContentFilterPhrase -Influence BadWord -Phrase "ksc-5601"
ksc5601949KoreanAdd-ContentFilterPhrase -Influence BadWord -Phrase "ksc5601"
ksc_5601949KoreanAdd-ContentFilterPhrase -Influence BadWord -Phrase "ksc_5601"
l228592Central European (ISO)Add-ContentFilterPhrase -Influence BadWord -Phrase "l2"
latin11252WesternAdd-ContentFilterPhrase -Influence BadWord -Phrase "latin1"
latin228592Central European (ISO)Add-ContentFilterPhrase -Influence BadWord -Phrase "latin2"
ms_kanji932Shift-JISAdd-ContentFilterPhrase -Influence BadWord -Phrase "ms_kanji"
shift-jis932Shift-JISAdd-ContentFilterPhrase -Influence BadWord -Phrase "shift-jis"
shift_jis932Shift-JISAdd-ContentFilterPhrase -Influence BadWord -Phrase "shift_jis"
us1252WesternAdd-ContentFilterPhrase -Influence BadWord -Phrase "us"
us-ascii1252WesternAdd-ContentFilterPhrase -Influence BadWord -Phrase "us-ascii"
windows-12501250Central European (Windows)Add-ContentFilterPhrase -Influence BadWord -Phrase "windows-1250"
windows-12511251Cyrillic (Windows)Add-ContentFilterPhrase -Influence BadWord -Phrase "windows-1251"
windows-12521252WesternAdd-ContentFilterPhrase -Influence BadWord -Phrase "windows-1252"
windows-12531253Greek (Windows)Add-ContentFilterPhrase -Influence BadWord -Phrase "windows-1253"
windows-12541254Turkish (Windows)Add-ContentFilterPhrase -Influence BadWord -Phrase "windows-1254"
windows-12551255HebrewAdd-ContentFilterPhrase -Influence BadWord -Phrase "windows-1255"
windows-12561256ArabicAdd-ContentFilterPhrase -Influence BadWord -Phrase "windows-1256"
windows-12571257Baltic (Windows)Add-ContentFilterPhrase -Influence BadWord -Phrase "windows-1257"
windows-12581258VietnameseAdd-ContentFilterPhrase -Influence BadWord -Phrase "windows-1258"
windows-874874ThaiAdd-ContentFilterPhrase -Influence BadWord -Phrase "windows-874"
x-cp12501250Central European (Windows)Add-ContentFilterPhrase -Influence BadWord -Phrase "x-cp1250"
x-cp12511251Cyrillic (Windows)Add-ContentFilterPhrase -Influence BadWord -Phrase "x-cp1251"
x-eucCODE_JPN_EUCJapanese (EUC)Add-ContentFilterPhrase -Influence BadWord -Phrase "x-euc"
x-euc-jpCODE_JPN_EUCJapanese (EUC)Add-ContentFilterPhrase -Influence BadWord -Phrase "x-euc-jp"
x-sjis932Shift-JISAdd-ContentFilterPhrase -Influence BadWord -Phrase "x-sjis"
x-x-big5950Traditional Chinese (BIG5)Add-ContentFilterPhrase -Influence BadWord -Phrase "x-x-big5"

feel free to comment..!

6 thoughts on “Microsoft Exchange 2007: using Content Filter to fight Spam / Junk using Codepages

  1. We are using Exchnage 2007 with Forefront server , we have getting lots of SPAM mail from KOI8 , i have given the follwoing exampls , please help me out.

    Subject : Поиск заказов
    Message Body : Здравствуйте!

    Меня зовут Влад, я предлагаю сотрудничество.
    У нашей компании есть возможность поиска заказчиков для Вашего бизнеса.

    Если Вам это может быть интересно, пожалуйста свяжитесь со мной, предложу различные схемы:
    Телефон: ( Ч 9 5)5 8960 3 4
    ICQ: 6 9 9099

    Internet header :

    Received: from dsldevice.lan (59.101.1.137) by copuex01.coreobjects.com
    (192.168.11.18) with Microsoft SMTP Server id 8.1.240.5; Fri, 14 Aug 2009
    16:04:40 +0530
    Received: from 59.101.1.137 by mxs.mail.ru; Fri, 14 Aug 2009 20:34:25 +1000
    From: Amalia Jaramillo
    To:
    Subject: =?koi8-r?B?8M/J08sg2sHLwdrP1w==?=
    Date: Fri, 14 Aug 2009 20:34:25 +1000
    Message-ID:
    MIME-Version: 1.0
    Content-Type: text/plain; charset=”koi8-r”
    Content-Transfer-Encoding: 8bit
    X-Priority: 3 (Normal)
    X-MSMail-Priority: Normal
    X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0)
    X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.2663
    Importance: Normal
    Return-Path: intrusivesx76@list.ru
    X-MS-Exchange-Organization-PRD: list.ru
    X-MS-Exchange-Organization-SenderIdResult: SoftFail
    Received-SPF: SoftFail (copuex01.coreobjects.com: domain of transitioning
    intrusivesx76@list.ru discourages use of 59.101.1.137 as permitted sender)
    X-MS-Exchange-Organization-SCL: 4
    X-MS-Exchange-Organization-PCL: 2
    X-MS-Exchange-Organization-Antispam-Report: DV:3.3.5705.600;SID:SenderIDStatus SoftFail;OrigIP:59.101.1.137
    X-Auto-Response-Suppress: DR, OOF, AutoReply

  2. Did you try adding the phrase
    charset=”koi8-r”
    To the content filter?
    Using Exchange Management Shell, as Admin you could use this cmdlet to add the string/phrase koi8-r to your Exchange Content Filter as a bad phrase => Add-ContentFilterPhrase -Influence BadWord -Phrase “koi8-r”
    Or use the Exchange Management Console, navigate to Edge Transport, double click Content Filtering Feature, switch to custom words and add koi8-r to the bad word list to block all mails with koi8-r codepage / charset.
    hope that help, otherwise let me know.
    Christian

  3. This does not work. Adding koi8-r to the custom bad word list does absolutely nothing. That filter must only look in the subject and body, but not in the header.

  4. I agree. There is also a discrepancy in your procedure. You state in the text to add “koi8-r”, but in the table it says to add “cskoi8-r”. Is there additional configuration required and which of these is correct?

  5. One of the prime factors that determines browser compatibility of
    a website is the complexity of its design. Rendering
    is a selected process in Final Cut Pro for the most part. Once I sign
    in 12 VPN, and pick the city I want (usually Newark), I’m assigned a new IP address and can surf as I would normally (outside the Great Firewall).

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.