Blog Spam and how to stop it
It was quite by accident I noticed that a couple of posts on this blog had over 50 comments.
“Wow!” I thought to myself, “I must be getting really popular!“. Sadly not, they were all posts from “robots” sent to automatically submit dubious entries to blogs.
Robo What?
A small program is sent off, scouring the Internet for forms it can fill in. It’s not just limited to Blogs, but any form that follows roughly the same format such as guestbooks, forums and Wikis.Links are left in the comments sections of these sites pointing users to various “spam-related” websites (usually for online pharmacies, curse Viagra!).
Why are they doing it?
Unfortunatley there is a small percentage of people that click on these links and either buy purchases from the sites that they enter or, worse still, click on adverts within these sites (such as Google Adsense adverts) that allows the site owner to make money. If they put out 100,000 links and only 1% of people reached click on the link they are still in to make a lot of money. Not only that, more visible links drive up the spammers commercial site(s) profile on the net, increasing their page ranking within Google meaning they move higher up search results which again brings in a higher page ranking (and more revenue).
How can we stop it?
Fortunately there are several methods for stopping – or at least slowing down the spammers. Because most of the spam is automated the robots can be quite easily fooled by some methods.
JavaScript can be used to write a link into a document which means that the SpamBot can’t read it, however neither can Google or anyone with JavaScript disabled – handy!
Google has recently introduced it’s own (non-standard) html attribute that tells it not to index (follow) a link on a page:
<a href=”http://www.i-am-a-spammer.org” rel=”nofollow”>Link</a>
Google will ignore the above link so that it will not have any page ranking effect. This means of course that you will have to tell Google to ignore your comments script and any preceeding links and the plain fact is you may want your comments to be indexed (it’ll increase your own page-ranking after all) plus it’ll still let spambots through to your comments script to post whatever they want.
So what’s the solution then?
Many of the free, and commercially available blogs now employ various methods for dealing with spam but the most typical one used with TypePad, MoveableType, Blogger etc. is “Captcha”.
Huh? What’s that?
Captcha stands for:
“Completely Automated Public Turing test to tell Computers and Humans Apart”
You’re bound to have seen it, in fact this blog now uses it on the comments section.
Alan Turing, father of the modern computer, stated that in time computers would reach a level of artificial intelligence and that you wouldn’t be able to tell humans and machines apart. The Turing Test was developed as a benchmark to test when machines had reached that level.
The idea behind a Captcha test is that a machine can’t distinguish the question it’s being asked and what the question is referring to. For example on my Captcha you may be asked “What’s the colour of the 3rd letter?” a robot can tell you the third letter is “M” but not that’s it’s blue thus it fails the test.
Whilst looking for a solution I came across plug-ins for the various mainstream blogs, a lot of Captcha tests written in PHP but nothing that would fit in with the “Classic” ASP of my blog. The code used, written for ASP3.0, is based on the implementation found here, the only problem was that the script needed quite a bit of tweaking to get it to work as it initially prints out two Captcha’s if you follow their instructions to the letter.
It’s also important to remember that even simply changing the name of your comment script could cut down the number of spam-comments received.
Related links:
- ASP 3.0 Captcha
- Movable Type Guide to combatting comment spam
- Movable Type Anti-Spam Plugins
- James Seng Anti-Spam Plugins (Movable Type)
- Blogger – Word Verification
- Blogger – Keeping Comments Clean
- Blogger – About Spam Blogs
- WikiPedia Entry – Blog Spam
- WikiPedia Entry – Captcha
- Wikipedia Entry – Alan Turing
Please feel free to add further resources to the comments.
Hopefully we can make it as difficult as possible for the annoying little blighters!
“CAPTCHAs -to me- are most certainly not the answer here. CAPTCHAs are annoying, and impossible to visually impaired and textbased browser users.
Why do you shift your problem (spammers) to your visitors? They shouldn’t have to care about it, you should.
I tend to like systems as [url=www.surbl.org/]surbl[/url], or [url=bsb.empty.us/]bsb[/url] better. “
“I fully agree that Captcha’s aren’t the best solution, they can still be semi-hacked by more advanced robots and also present a problem to visually impared people. However, for most people running blogs who don’t have access to install software on servers or run certain scripts they can be the only solution.”