1 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
4 <title>Geeklog Documentation - Geeklog Spam-X Plugin</title>
5 <link rel="stylesheet" type="text/css" href="../docstyle.css" title="Dev Stylesheet">
6 <meta name="robots" content="noindex">
10 <p><a href="http://www.geeklog.net/" style="background:transparent"><img src="../images/newlogo.gif" alt="Geeklog" width="243" height="90"></a></p>
11 <div class="menu"><a href="index.html">Geeklog Documentation</a> - Geeklog Spam-X Plugin</div>
13 <h1>Geeklog Spam-X Plugin</h1>
15 <p><small>(If you came here looking for Hendrickson Software Components' email spam filter of the same name, please <a href="http://www.hendricom.com/spamcontrol.htm" rel="nofollow">click here</a>.)</small></p>
19 <p>The Geeklog Spam-X plugin was created to fight the problem of comment spam
20 for Geeklog systems. If you are unfamiliar with comment spam you might see the
21 <a href="http://kalsey.com/2003/11/comment_spam_manifesto/">Comment Spam
24 <p>Spam protection in Geeklog is mostly based on the Spam-X plugin, originally
25 developed by Tom Willet. It has a modular architecture that allows it to be
26 extended with new modules to fight the spammer's latest tricks, should the need
29 <h2><a name="checked">What is being checked for spam?</a></h2>
31 <p>Geeklog and the Spam-X plugin will check the following for spam:</p>
34 <li>Story submissions</li>
36 <li>Trackbacks and Pingbacks</li>
37 <li>Event submissions</li>
38 <li>Link submissions</li>
39 <li>The text sent with the "Email story to a friend" option</li>
40 <li>Emails sent to users via the "send email" form from their profile page</li>
41 <li>A user's profile</li>
44 <h2><a name="modules">Module Types</a></h2>
46 <p>The Spam-X plugin was built to be expandable to easily adapt to changes the
47 comment spammers might make. There are three types of modules: <a
48 href="#examine">Examine</a>, <a href="#action">Action</a>, and <a
49 href="#admin">Admin</a>. A new module is contained in a file and can simply be
50 dropped in and it will be added to the plugin.</p>
52 <h2><a name="examine">Examine Modules</a></h2>
54 <p>Geeklog ships with the following examine modules:</p>
57 <li><a href="#slv">Spam Link Verification (SLV)</a></li>
58 <li><a href="#personal">Personal Blacklist</a></li>
59 <li><a href="#ip">IP Filter</a></li>
60 <li><a href="#ipofurl">IP of URL Filter</a></li>
61 <li><a href="#header">HTTP Header Filter</a></li>
62 <!-- <li><a href="#honeypot">Project Honeypot Filter</a></li> -->
65 <h3><a name="slv">Spam Link Verification (SLV)</a></h3>
67 <p>SLV is a centralized, server-based service that examines posts made on
68 websites and detects when certain links show up in unusually high numbers. In
69 other words, when a spammer starts spamming a lot of sites with the same URLs
70 and those sites all report to SLV, the system will recognize this as a spam
71 wave and will flag posts containing these URLs as spam.</p>
73 <p>In other words still, it's a dynamic blacklist that automatically updates
74 itself when a spammer starts spamming for their site. And it can only get
75 better (in terms of accuracy and reaction speed) the more sites use it.</p>
77 <p>SLV is a free service run by Russ Jones at <a
78 href="http://www.linksleeve.org/">www.linksleeve.org</a>.
80 <p><strong><a name="slvprivacy">Privacy Notice:</a></strong>
81 It should be stressed that using SLV means that information from your site
82 is being sent to a third party's site. In some legislations you may have to
83 inform your users about this fact - please check with your local privacy
86 <p>Sending information to an external site may also be undesirable on some
87 setups, e.g. on a company intranet. You can disable SLV support by removing the
88 four files <tt>SLV.Examine.class.php</tt>, <tt>SLVbase.class.php</tt>,
89 <tt>SLVreport.Action.class.php</tt>, and <tt>SLVwhitelist.Admin.class.php</tt>
90 from your Spam-X directory (<tt>/path/to/geeklog/plugins/spamx</tt>). Or you
91 can simply disable the Spam-X plugin entirely (or uninstall it).</p>
93 <p>The SLV Examine and Action modules will extract all URLs from a post and
94 only send those to SLV (i.e. the rest of the post's content is not being sent).
95 They also remove any links that contain your Geeklog site's URL. In case a post
96 does not contain any external links, the modules simply do not contact SLV at
100 <h3><a name="personal">Personal Blacklist</a></h3>
102 <p>The Personal Blacklist module lets you add keywords and URLs that typically
103 exist in spam posts. When you're being hit by spam, make sure to add the URLs
104 of those spam posts to your Personal Blacklist so that they can be filtered out
105 automatically, should the spammer try to post them again.</p>
107 <p>This will also help you get rid of spam that made it through, as you can
108 then use the Mass Delete Comments and Mass Delete Trackbacks modules to easily
109 remove large numbers of spam posts from your database.</p>
111 <p>The Personal Blacklist also has an option to import the Geeklog <a
112 href="config.html#desc_censorlist">censor list</a> and ban all comments which
113 contain one of those words. This or an expanded list might be useful for a
114 website that caters to children. Then no comments with offensive language could
117 <h3><a name="ip">IP Filter</a></h3>
119 <p>Sometimes you will encounter spam that is coming from one or only a few IP
120 addresses. By simply adding those IP addresses to the IP Filter module, any
121 posts from these IPs will be blocked automatically.</p>
123 <p>In addition to single IP addresses, you can also add IP address ranges,
124 either in <a href="http://en.wikipedia.org/wiki/CIDR" title="Classless Inter-Domain Routing">CIDR</a> notation or as simple <i>from</i>-<i>to</i> ranges.</p>
126 <p>Please note that IP addresses aren't really a good filter criterion. While
127 some ISPs and hosting services are known to host spammers, it won't help much
128 to block an IP address by one of the well-known ISPs. Often, the spammer will
129 get a new IP address the next time he connects to the internet, while the
130 blocked IP address will be reused and may be used by some innocent user.</p>
132 <h3><a name="ipofurl">IP of URL Filter</a></h3>
134 <p>This module is only useful in a few special cases: Here you enter the IP
135 address of a webserver that is used to host domains for which you may see spam.
136 Some spammers have a lot of their sites on only a few webservers, so instead of
137 adding lots of domains to your blacklist, you only add the IP addresses of
138 those webservers. The Spam-X module will then check all the URLs in a post to
139 see if any of these is hosted on one of those blacklisted webservers.</p>
141 <h3><a name="header">HTTP Header Filter</a></h3>
143 <p>This module lets you filter for certain HTTP headers. Every HTTP request
144 sent to your site is accompanied by a series of headers identifying, for
145 example, the browser that your visitors uses, their preferred language, and
146 other information.</p>
148 <p>With the Header filter module, you can block HTTP requests with certain
149 headers. For example, some spammers are using Perl scripts to send their spam
150 posts. The user agent (browser identification) sent by Perl scripts is usually
151 something like "libwww-perl/5.805" (the version number may vary). So to block
152 posts made by this user agent, you would enter:</p>
154 <table border="0" style="width:15em">
155 <tr><td><b>Header:</b></td><td align="left"><kbd>User-Agent</kbd></td></tr>
156 <tr><td><b>Content:</b></td><td align="left"><kbd>^libwww-perl</kbd></td></tr>
158 <p>This would block all posts from user agents beginning with "libwww-perl".</p>
160 <!-- Currently not shipped with Geeklog
162 <h3><a name="honeypot">Project Honeypot http:BL Filter</a></h3>
164 <p><a href="http://www.projecthoneypot.org" title="visit the project honey pot site">ProjectHoneypot.org</a>
165 is a new service providing a way of trapping malicious web users with
166 <a href="http://en.wikipedia.org/wiki/Honeypot_%28computing%29" title="view the wikipedia definition of a Honeypot">honeypots</a>.
167 Essentially this provides traps for email address harvesting bots, spammers,
168 and people trying to exploit web sites. Using the honeypots, the project
169 gathers and maintains an active blacklist of IP addresses categorised by
170 threat type, level and activity.</p>
172 <p>With the ProjectHoneyPot filter module, you can block posts from known bad
173 ip addresses as identified by the <a href="http://www.projecthoneypot.org/httpbl_configure.php">http:BL</a>
176 <p>In order to do so, you must first <a href="http://www.projecthoneypot.org/create_account.php">Register with projectHoneyPot</a>,
177 <a href="http://www.projecthoneypot.org/manage_honey_pots.php">install a honeypot</a> or
178 <a href="http://www.projecthoneypot.org/manage_quicklink.php">quick link</a> and
179 <a href="http://www.projecthoneypot.org/httpbl_configure.php">get an access key</a>
181 <p>Once you have done this, and inserted appropriate details into the Spam-X
182 config.php file, http:BL blocking will be used for all filtered content
188 <h2><a name="action">Action Modules</a></h2>
190 <p>Once one of the <a href="#examine">examine modules</a> detects a spam post,
191 the action modules will decide what to do with the spam. Most of the time, you
192 will simply want to delete the post then, so this is what the <b>Delete
193 Action</b> module does.</p>
195 <p>As the name implies, the <b>Mail Admin Action</b> module sends an email to
196 the site admin when a spam post is encountered. Since this can cause quite a
197 lot of emails being sent, it is disabled by default.</p>
199 <p>Action modules have to be enabled specifically before they are used (examine
200 modules, on the other hand, are activated by simply dropping them into the
201 Spam-X directory). For this, every action module has a unique number that needs
202 to be added up with the number of the other action modules you want to enable
203 and entered as the value for the <a href="config.html#desc_spamx">spamx config
204 variable</a> in Geeklog's main configuration.</p>
208 <p>The Delete Action module has the value 128, while the Mail Admin Action
209 module has the value 8. So to activate both modules, add 128 + 8 = 136 and
210 enter that in the Configuration admin panel.</p>
212 <p>The SLV Examine module is complemented by a <strong>SLV Action</strong>
213 module that ensures that SLV is notified of spam posts caught by other examine
214 modules. It "piggybacks" on the Delete Action module, i.e. when you activate
215 the Delete Action module, you'll also enable the SLV Action module.</p>
218 <h2><a name="admin">Admin Modules</a></h2>
220 <p>The Admin modules for the <a href="#personal">Personal Blacklist</a>, <a
221 href="#ip">IP Filter</a>, <a href="#ipofurl">IP of URL Filter</a>, and <a
222 href="#header">HTTP Header Filter</a> modules provide you with a form to add
223 new entries. To delete an existing entry, simply click on it.</p>
225 <p>With the <strong>SLV Whitelist</strong> admin module you can add URLs that
226 you don't want to be reported to SLV. This is useful when posts on your site
227 happen to contain certain URLs quite often but you don't want those to be
228 considered spam by SLV.<br>Note that your site's URL (i.e. <a
229 href="config.html#desc_site_url">$_CONF['site_url']</a>) is automatically
230 whitelisted, so you don't need to add it here.</p>
232 <p>The <strong>Log View</strong> module lets you inspect and clear the Spam-X
233 logfile. The logfile contains additional information about the spam posts, e.g.
234 which IP address they came from, the user id (if posted by a logged-in user),
235 and which of the examine modules caught the spam post.</p>
237 <p>In case a large number of spam posts made it through without being caught,
238 the <strong>Mass Delete Comments</strong> and <strong>Mass Delete
239 Trackbacks</strong> modules will help you get rid of them easily. Before you
240 use these modules, make sure to add the URLs or keywords from those spams to
241 your Personal Blacklist.</p>
243 <h2><a name="mt-blacklist">Note about MT-Blacklist</a></h2>
245 <p>MT-Blacklist was a blacklist, i.e. a listing of URLs that were used in spam
246 posts, originally developed for Movable Type (hence the name) and maintained by
249 <p>Maintaining a blacklist is a lot of work, and you're continually playing
250 catch-up with the spammers. Therefore, Jay Allen eventually <a
251 href="http://www.geeklog.net/article.php/mt-blacklist-discontinued">discontinued
252 MT-Blacklist</a> on the assumption that new and better methods to detect spam
253 are now available.</p>
255 <p>Starting with Geeklog 1.4.1, Geeklog no longer uses MT-Blacklist. All
256 MT-Blacklist entries are removed from the database when you upgrade to
257 Geeklog 1.4.1 and the MT-Blacklist examine and admin modules are no longer
260 <h2><a name="trackback">Trackback Spam</a></h2>
262 <p><a href="trackback.html">Trackbacks</a> are also run through Spam-X before
263 they will be accepted by Geeklog. There are also some additional checks that
264 can be performed on trackbacks: Geeklog can be configured to check if the site
265 that supposedly sent the trackback actually contains a link back to your site.
266 In addition, Geeklog can also check if the IP address of the site in the
267 trackback URL matches the IP address that sent the trackback. Trackbacks that
268 fail any of these tests are usually spam. Please refer to the <a
269 href="config.html#desc_check_trackback_link">documentation for the
270 configuration</a> for more information.</p>
272 <h2><a name="config.php">Configuration</a></h2>
274 <p>The Spam-X plugin's configuration can be changed from the Configuration admin
277 <h3><a name="main">Spam-X Main Settings</a></h3>
280 <tr><th style="width:25%">Variable</th>
281 <th style="width:25%">Default Value</th>
282 <th style="width:50%">Description</th>
285 <td><a name="desc_logging">logging</a></td>
286 <td><code>true</code></td>
287 <td>Whether to log recognized spam posts in the <tt>spamx.log</tt> logfile
288 (if set to <code>true</code>) or not (<code>false</code>).</td>
291 <td><a name="desc_admin_override">admin_override</a></td>
293 <td>The Spam-X plugin will filter posts by any user - even site admins. This
294 can be a problem sometimes, e.g. when you want to post a note about spam
295 that itself contains "spammy" URLs or keywords. When this option is set to
296 <code>true</code> then posts made by users in the 'spamx Admin' group are
297 not checked for spam.</td>
300 <td><a name="desc_timeout">timeout</a></td>
302 <td>Timeout (in seconds) for contacting external services such as SLV.</td>
305 <td><a name="desc_notification_email">notification_email</a></td>
306 <td><code>$_CONF['site_mail']</code></td>
307 <td>Email address to which spam notifications are sent when the Mail Admin
308 <a href="#action">action module</a> is enabled.</td>
311 <td><a name="desc_action">action</a></td>
313 <td>This only exists as a fallback in case <a
314 href="config.html#desc_spamx">$_CONF['spamx']</a> in Geeklog's main
315 configuration is not set. I.e. <code>$_CONF['spamx']</code> takes
320 <h2><a name="more">More Information</a></h2>
322 <p>Further information as well as a support forum for the Spam-X plugin can be
323 found on the <a href="http://www.pigstye.net/gplugs/staticpages/index.php/spamx" rel="nofollow">Spam-X Plugin's Homepage</a> and in the <a
324 href="http://wiki.geeklog.net/wiki/index.php/Dealing_with_Spam">Geeklog
328 <a href="http://wiki.geeklog.net/">The Geeklog Documentation Project</a><br>
329 All trademarks and copyrights on this page are owned by their respective owners. Geeklog is copyleft.