public_html/docs/english/spamx.html
author Dirk Haun <dirk@haun-online.de>
Sun, 15 Nov 2009 11:10:47 +0100
branchHEAD
changeset 7474 d560b8c577b6
parent 6958 0774a19f037c
child 7690 e48c1d426d72
permissions -rw-r--r--
Added lang attribute
     1 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
     2 <html lang="en">
     3 <head>
     4   <title>Geeklog Documentation - Geeklog Spam-X Plugin</title>
     5   <link rel="stylesheet" type="text/css" href="../docstyle.css" title="Dev Stylesheet">
     6   <meta name="robots" content="noindex">
     7 </head>
     8 
     9 <body>
    10 <p><a href="http://www.geeklog.net/" style="background:transparent"><img src="../images/newlogo.gif" alt="Geeklog" width="243" height="90"></a></p>
    11 <div class="menu"><a href="index.html">Geeklog Documentation</a> - Geeklog Spam-X Plugin</div>
    12 
    13 <h1>Geeklog Spam-X Plugin</h1>
    14 
    15 <p><small>(If you came here looking for Hendrickson Software Components' email spam filter of the same name, please <a href="http://www.hendricom.com/spamcontrol.htm" rel="nofollow">click here</a>.)</small></p>
    16 
    17 <h2>Introduction</h2>
    18 
    19 <p>The Geeklog Spam-X plugin was created to fight the problem of comment spam
    20 for Geeklog systems. If you are unfamiliar with comment spam you might see the
    21 <a href="http://kalsey.com/2003/11/comment_spam_manifesto/">Comment Spam
    22 Manifesto</a>.</p>
    23 
    24 <p>Spam protection in Geeklog is mostly based on the Spam-X plugin, originally
    25 developed by Tom Willet. It has a modular architecture that allows it to be
    26 extended with new modules to fight the spammer's latest tricks, should the need
    27 arise.</p>
    28 
    29 <h2><a name="checked">What is being checked for spam?</a></h2>
    30 
    31 <p>Geeklog and the Spam-X plugin will check the following for spam:</p>
    32 
    33 <ul>
    34 <li>Story submissions</li>
    35 <li>Comments</li>
    36 <li>Trackbacks and Pingbacks</li>
    37 <li>Event submissions</li>
    38 <li>Link submissions</li>
    39 <li>The text sent with the "Email story to a friend" option</li>
    40 <li>Emails sent to users via the "send email" form from their profile page</li>
    41 <li>A user's profile</li>
    42 </ul>
    43 
    44 <h2><a name="modules">Module Types</a></h2>
    45 
    46 <p>The Spam-X plugin was built to be expandable to easily adapt to changes the
    47 comment spammers might make.  There are three types of modules: <a
    48 href="#examine">Examine</a>, <a href="#action">Action</a>, and <a
    49 href="#admin">Admin</a>. A new module is contained in a file and can simply be
    50 dropped in and it will be added to the plugin.</p>
    51 
    52 <h2><a name="examine">Examine Modules</a></h2>
    53 
    54 <p>Geeklog ships with the following examine modules:</p>
    55 
    56 <ul>
    57 <li><a href="#slv">Spam Link Verification (SLV)</a></li>
    58 <li><a href="#personal">Personal Blacklist</a></li>
    59 <li><a href="#ip">IP Filter</a></li>
    60 <li><a href="#ipofurl">IP of URL Filter</a></li>
    61 <li><a href="#header">HTTP Header Filter</a></li>
    62 <!-- <li><a href="#honeypot">Project Honeypot Filter</a></li> -->
    63 </ul>
    64 
    65 <h3><a name="slv">Spam Link Verification (SLV)</a></h3>
    66 
    67 <p>SLV is a centralized, server-based service that examines posts made on
    68 websites and detects when certain links show up in unusually high numbers. In
    69 other words, when a spammer starts spamming a lot of sites with the same URLs
    70 and those sites all report to SLV, the system will recognize this as a spam
    71 wave and will flag posts containing these URLs as spam.</p>
    72 
    73 <p>In other words still, it's a dynamic blacklist that automatically updates
    74 itself when a spammer starts spamming for their site. And it can only get
    75 better (in terms of accuracy and reaction speed) the more sites use it.</p>
    76 
    77 <p>SLV is a free service run by Russ Jones at <a
    78 href="http://www.linksleeve.org/">www.linksleeve.org</a>.
    79 
    80 <p><strong><a name="slvprivacy">Privacy Notice:</a></strong>
    81 It should be stressed that using SLV means that information from your site
    82 is being sent to a third party's site. In some legislations you may have to
    83 inform your users about this fact - please check with your local privacy
    84 laws.</p>
    85 
    86 <p>Sending information to an external site may also be undesirable on some
    87 setups, e.g. on a company intranet. You can disable SLV support by removing the
    88 four files <tt>SLV.Examine.class.php</tt>, <tt>SLVbase.class.php</tt>,
    89 <tt>SLVreport.Action.class.php</tt>, and <tt>SLVwhitelist.Admin.class.php</tt>
    90  from your Spam-X directory (<tt>/path/to/geeklog/plugins/spamx</tt>). Or you
    91 can simply disable the Spam-X plugin entirely (or uninstall it).</p>
    92 
    93 <p>The SLV Examine and Action modules will extract all URLs from a post and
    94 only send those to SLV (i.e. the rest of the post's content is not being sent).
    95 They also remove any links that contain your Geeklog site's URL. In case a post
    96 does not contain any external links, the modules simply do not contact SLV at
    97 all.</p>
    98 
    99 
   100 <h3><a name="personal">Personal Blacklist</a></h3>
   101 
   102 <p>The Personal Blacklist module lets you add keywords and URLs that typically
   103 exist in spam posts. When you're being hit by spam, make sure to add the URLs
   104 of those spam posts to your Personal Blacklist so that they can be filtered out
   105 automatically, should the spammer try to post them again.</p>
   106 
   107 <p>This will also help you get rid of spam that made it through, as you can
   108 then use the Mass Delete Comments and Mass Delete Trackbacks modules to easily
   109 remove large numbers of spam posts from your database.</p>
   110 
   111 <p>The Personal Blacklist also has an option to import the Geeklog <a
   112 href="config.html#desc_censorlist">censor list</a> and ban all comments which
   113 contain one of those words. This or an expanded list might be useful for a
   114 website that caters to children. Then no comments with offensive language could
   115 be posted.</p>
   116 
   117 <h3><a name="ip">IP Filter</a></h3>
   118 
   119 <p>Sometimes you will encounter spam that is coming from one or only a few IP
   120 addresses. By simply adding those IP addresses to the IP Filter module, any
   121 posts from these IPs will be blocked automatically.</p>
   122 
   123 <p>In addition to single IP addresses, you can also add IP address ranges,
   124 either in <a href="http://en.wikipedia.org/wiki/CIDR" title="Classless Inter-Domain Routing">CIDR</a> notation or as simple <i>from</i>-<i>to</i> ranges.</p>
   125 
   126 <p>Please note that IP addresses aren't really a good filter criterion. While
   127 some ISPs and hosting services are known to host spammers, it won't help much
   128 to block an IP address by one of the well-known ISPs. Often, the spammer will
   129 get a new IP address the next time he connects to the internet, while the
   130 blocked IP address will be reused and may be used by some innocent user.</p>
   131 
   132 <h3><a name="ipofurl">IP of URL Filter</a></h3>
   133 
   134 <p>This module is only useful in a few special cases: Here you enter the IP
   135 address of a webserver that is used to host domains for which you may see spam.
   136 Some spammers have a lot of their sites on only a few webservers, so instead of
   137 adding lots of domains to your blacklist, you only add the IP addresses of
   138 those webservers. The Spam-X module will then check all the URLs in a post to
   139 see if any of these is hosted on one of those blacklisted webservers.</p>
   140 
   141 <h3><a name="header">HTTP Header Filter</a></h3>
   142 
   143 <p>This module lets you filter for certain HTTP headers. Every HTTP request
   144 sent to your site is accompanied by a series of headers identifying, for
   145 example, the browser that your visitors uses, their preferred language, and
   146 other information.</p>
   147 
   148 <p>With the Header filter module, you can block HTTP requests with certain
   149 headers. For example, some spammers are using Perl scripts to send their spam
   150 posts. The user agent (browser identification) sent by Perl scripts is usually
   151 something like "libwww-perl/5.805" (the version number may vary). So to block
   152 posts made by this user agent, you would enter:</p>
   153 
   154 <table border="0" style="width:15em">
   155 <tr><td><b>Header:</b></td><td align="left"><kbd>User-Agent</kbd></td></tr>
   156 <tr><td><b>Content:</b></td><td align="left"><kbd>^libwww-perl</kbd></td></tr>
   157 </table>
   158 <p>This would block all posts from user agents beginning with "libwww-perl".</p>
   159 
   160 <!-- Currently not shipped with Geeklog
   161 
   162 <h3><a name="honeypot">Project Honeypot http:BL Filter</a></h3>
   163 
   164 <p><a href="http://www.projecthoneypot.org" title="visit the project honey pot site">ProjectHoneypot.org</a>
   165     is a new service providing a way of trapping malicious web users with
   166     <a href="http://en.wikipedia.org/wiki/Honeypot_%28computing%29" title="view the wikipedia definition of a Honeypot">honeypots</a>.
   167     Essentially this provides traps for email address harvesting bots, spammers,
   168     and people trying to exploit web sites. Using the honeypots, the project
   169     gathers and maintains an active blacklist of IP addresses categorised by
   170     threat type, level and activity.</p>
   171 
   172 <p>With the ProjectHoneyPot filter module, you can block posts from known bad
   173     ip addresses as identified by the <a href="http://www.projecthoneypot.org/httpbl_configure.php">http:BL</a>
   174     blacklist.
   175     </p>
   176     <p>In order to do so, you must first <a href="http://www.projecthoneypot.org/create_account.php">Register with projectHoneyPot</a>,
   177         <a href="http://www.projecthoneypot.org/manage_honey_pots.php">install a honeypot</a> or
   178         <a href="http://www.projecthoneypot.org/manage_quicklink.php">quick link</a> and
   179         <a href="http://www.projecthoneypot.org/httpbl_configure.php">get an access key</a>
   180         for the http:BL.</p>
   181     <p>Once you have done this, and inserted appropriate details into the Spam-X
   182         config.php file, http:BL blocking will be used for all filtered content
   183     automatically.</p>
   184 
   185 -->
   186 
   187 
   188 <h2><a name="action">Action Modules</a></h2>
   189 
   190 <p>Once one of the <a href="#examine">examine modules</a> detects a spam post,
   191 the action modules will decide what to do with the spam. Most of the time, you
   192 will simply want to delete the post then, so this is what the <b>Delete
   193 Action</b> module does.</p>
   194 
   195 <p>As the name implies, the <b>Mail Admin Action</b> module sends an email to
   196 the site admin when a spam post is encountered. Since this can cause quite a
   197 lot of emails being sent, it is disabled by default.</p>
   198 
   199 <p>Action modules have to be enabled specifically before they are used (examine
   200 modules, on the other hand, are activated by simply dropping them into the
   201 Spam-X directory). For this, every action module has a unique number that needs
   202 to be added up with the number of the other action modules you want to enable
   203 and entered as the value for the <a href="config.html#desc_spamx">spamx config
   204 variable</a> in Geeklog's main configuration.</p>
   205 
   206 <h3>Example</h3>
   207 
   208 <p>The Delete Action module has the value 128, while the Mail Admin Action
   209 module has the value 8. So to activate both modules, add 128 + 8 = 136 and
   210 enter that in the Configuration admin panel.</p>
   211 
   212 <p>The SLV Examine module is complemented by a <strong>SLV Action</strong>
   213 module that ensures that SLV is notified of spam posts caught by other examine
   214 modules. It "piggybacks" on the Delete Action module, i.e. when you activate
   215 the Delete Action module, you'll also enable the SLV Action module.</p>
   216 
   217 
   218 <h2><a name="admin">Admin Modules</a></h2>
   219 
   220 <p>The Admin modules for the <a href="#personal">Personal Blacklist</a>, <a
   221 href="#ip">IP Filter</a>, <a href="#ipofurl">IP of URL Filter</a>, and <a
   222 href="#header">HTTP Header Filter</a> modules provide you with a form to add
   223 new entries. To delete an existing entry, simply click on it.</p>
   224 
   225 <p>With the <strong>SLV Whitelist</strong> admin module you can add URLs that
   226 you don't want to be reported to SLV. This is useful when posts on your site
   227 happen to contain certain URLs quite often but you don't want those to be
   228 considered spam by SLV.<br>Note that your site's URL (i.e. <a
   229 href="config.html#desc_site_url">$_CONF['site_url']</a>) is automatically
   230 whitelisted, so you don't need to add it here.</p>
   231 
   232 <p>The <strong>Log View</strong> module lets you inspect and clear the Spam-X
   233 logfile. The logfile contains additional information about the spam posts, e.g.
   234 which IP address they came from, the user id (if posted by a logged-in user),
   235 and which of the examine modules caught the spam post.</p>
   236 
   237 <p>In case a large number of spam posts made it through without being caught,
   238 the <strong>Mass Delete Comments</strong> and <strong>Mass Delete
   239 Trackbacks</strong> modules will help you get rid of them easily. Before you
   240 use these modules, make sure to add the URLs or keywords from those spams to
   241 your Personal Blacklist.</p>
   242 
   243 <h2><a name="mt-blacklist">Note about MT-Blacklist</a></h2>
   244 
   245 <p>MT-Blacklist was a blacklist, i.e. a listing of URLs that were used in spam
   246 posts, originally developed for Movable Type (hence the name) and maintained by
   247 Jay Allen.</p>
   248 
   249 <p>Maintaining a blacklist is a lot of work, and you're continually playing
   250 catch-up with the spammers. Therefore, Jay Allen eventually <a
   251 href="http://www.geeklog.net/article.php/mt-blacklist-discontinued">discontinued
   252 MT-Blacklist</a> on the assumption that new and better methods to detect spam
   253 are now available.</p>
   254 
   255 <p>Starting with Geeklog 1.4.1, Geeklog no longer uses MT-Blacklist. All
   256 MT-Blacklist entries are removed from the database when you upgrade to
   257 Geeklog 1.4.1 and the MT-Blacklist examine and admin modules are no longer
   258 included.</p>
   259 
   260 <h2><a name="trackback">Trackback Spam</a></h2>
   261 
   262 <p><a href="trackback.html">Trackbacks</a> are also run through Spam-X before
   263 they will be accepted by Geeklog. There are also some additional checks that
   264 can be performed on trackbacks: Geeklog can be configured to check if the site
   265 that supposedly sent the trackback actually contains a link back to your site.
   266 In addition, Geeklog can also check if the IP address of the site in the
   267 trackback URL matches the IP address that sent the trackback. Trackbacks that
   268 fail any of these tests are usually spam. Please refer to the <a
   269 href="config.html#desc_check_trackback_link">documentation for the
   270 configuration</a> for more information.</p>
   271 
   272 <h2><a name="config.php">Configuration</a></h2>
   273 
   274 <p>The Spam-X plugin's configuration can be changed from the Configuration admin
   275 panel:</p>
   276 
   277 <h3><a name="main">Spam-X Main Settings</a></h3>
   278 
   279 <table>
   280 <tr><th style="width:25%">Variable</th>
   281     <th style="width:25%">Default Value</th>
   282     <th style="width:50%">Description</th>
   283 </tr>
   284 <tr>
   285   <td><a name="desc_logging">logging</a></td>
   286   <td><code>true</code></td>
   287   <td>Whether to log recognized spam posts in the <tt>spamx.log</tt> logfile
   288     (if set to <code>true</code>) or not (<code>false</code>).</td>
   289 </tr>
   290 <tr class="r2">
   291   <td><a name="desc_admin_override">admin_override</a></td>
   292   <td>false</td>
   293   <td>The Spam-X plugin will filter posts by any user - even site admins. This
   294     can be a problem sometimes, e.g. when you want to post a note about spam
   295     that itself contains "spammy" URLs or keywords. When this option is set to
   296     <code>true</code> then posts made by users in the 'spamx Admin' group are
   297     not checked for spam.</td>
   298 </tr>
   299 <tr>
   300   <td><a name="desc_timeout">timeout</a></td>
   301   <td>5</td>
   302   <td>Timeout (in seconds) for contacting external services such as SLV.</td>
   303 </tr>
   304 <tr class="r2">
   305   <td><a name="desc_notification_email">notification_email</a></td>
   306   <td><code>$_CONF['site_mail']</code></td>
   307   <td>Email address to which spam notifications are sent when the Mail Admin
   308     <a href="#action">action module</a> is enabled.</td>
   309 </tr>
   310 <tr>
   311   <td><a name="desc_action">action</a></td>
   312   <td>128</td>
   313   <td>This only exists as a fallback in case <a
   314     href="config.html#desc_spamx">$_CONF['spamx']</a> in Geeklog's main
   315     configuration is not set. I.e. <code>$_CONF['spamx']</code> takes
   316     precedence.</td>
   317 </tr>
   318 </table>
   319 
   320 <h2><a name="more">More Information</a></h2>
   321 
   322 <p>Further information as well as a support forum for the Spam-X plugin can be
   323 found on the <a href="http://www.pigstye.net/gplugs/staticpages/index.php/spamx" rel="nofollow">Spam-X Plugin's Homepage</a> and in the <a
   324 href="http://wiki.geeklog.net/wiki/index.php/Dealing_with_Spam">Geeklog
   325 Wiki</a>.</p>
   326 
   327 <div class="footer">
   328     <a href="http://wiki.geeklog.net/">The Geeklog Documentation Project</a><br>
   329     All trademarks and copyrights on this page are owned by their respective owners. Geeklog is copyleft.
   330 </div>
   331 </body>
   332 </html>