The (or at least a better) Solution to Spam

There is no easy way to distinguish between a human and a spambot. It’s an arms race which we’ll always be behind. I’m talking here about spam in more general—not only on e-mail but also on for instance Wikipedia or on blogposts. Even if we would have a perfect-solution to test whether there is a human behind something, we still have to combat cheap labour in India: real people spamming “by hand”.

I think the solution is a Web of Trust similar to that of PGP. An identity (not necessarily a person) publishes who she trusts (not to be spammy/phishy) or not trusts. Ideally everyone would be in one big web. Only someone who my blog via-via trusts may post.

Obviously one still may gather trust of people over time and enter the web of trust and then start spamming with that identity. However, then that identity will be marked untrusted by people and also the people who initially marked the identity as trusted will be less trusted. Also, there are way more sophisticated measures of establishment in the web/trust to conceive than just being trusted via-via by one identity.

There is no way to prevent spam perfectly, but the amount of work that has to go in to making an identity trusted and established in the web is several orders of magnitude greater than any other protection we have. The big problem: we don’t have such an ubiquitous web of trust yet. (Yes, it’ll be in SINP if I’ll get around to working on it)

Keep your email secure (and what just doesn’t work)

The best way to keep your e-mail address secure from evil spam bots is some kind of javascript and obfuscation, which unfortunately isn’t always available. There are enough alternatives though.

Usually people tend to replace the ‘@’ with some short replacement like ‘{at}’ or ‘bij’. This just doesn’t help.

Any programmer with a bit of knowledge of regex can create a program that scans for domain names and interprets every small bit of text in front of it as an @ sign.

Some smarter people also replace the dot. This works, unless your email-host uses a easily recognizable TLD (.com) or domainname (gmail.com).

Also putting ‘SPAM’ in your email-adress some.personREMOVETHISFORSPAM@foo.bar is easily filtered.

Best thing is to use something out of the box.

For instance, my email address is X@Y, where:
X = bas.westerbaan
Y = gmail.com
Also I’ve got an email-address on w-nz.com, namely bas.westerbaan.

Or even maybe w-nz.com@bas.

Anonymous comments disabled re-enabled

I’ve disabled anonymous comments for my blog, because my blog is spammed with ~100 spam-comments a day — it seems they’ve worked around Hashcash 3.0. I’ll look into this a bit more when I’ve got time. Sorry for the inconvenience.

Update Seems it were nasty trackbacks instead of comments. So I just disabled trackbacks. You can comment again.

Spam, spam and more spam

I noticed I had an enourmous amount of spam in my moderation queue.

The plugin I used to protect myself from spam wp-hashcash, seemed to have been mastered by spammers.

A download of the newest version did the trick.

If anyone experiences problems with posting comments, please mail me.

Update I: Seems some spam prevailed even over this version. I’d better get to making my own custom changes to wp-hashcsah.

Update II: I changed the secret codes in the plugin. And I broke it for a while. Either one of those could have resulted in the fortunate (hopefully not temporarilly) stop of spam.

Update III: According to Elliot Back, the creator of hashcash, the spammers bruteforce the secret value. Changing it usually is efficient enough to keep them at bay for a while. He’s working on a newer version which features bigger, thus harder to bruteforce values. I just hope they won’t suck my bandwidth too much.

Update IV: Unfortunately there seems to be a lot of computing power or a hack behind the breaking of the hashcash security -_-, I keep getting spam :-/

rel=”nofollow”

Quite some time ago google started honouring the rel="nofollow" attribute value pair in a html tags, which should prevent spammers from gaining a high page rank by spamming blogs with comments.

It is useless.

Spamming costs almost nothing, if there is a slightest amount of gain in it for the spammers they will keep doing it. Of all those hundred thousands of people visiting blogs and seeing comment spam a few will still follow the link, those few are enough for the spammers.

One could argue that the websites the comment spam point to now haven’t got a really high pagerank anymore. This is only partially true, for only a selective amount of people install the code to add the rel="nofollow" attribute in links that can be spammed. Even though the highest ranking blogs have already installed it, the tons of small blogs are still enough to raise the page rank enourmously.

In my opinion search engines should just ignore links which are considered spam.

Bye Bye Spam

I just installed Hash Cash, which is an anti spam plugin for WordPress.

Hash Cash protects this blog from spam by requiring the client to execute javascript which calculates a checksum of the content from a seed which is very hard to extract.

Since I installed it I haven’t got any spam comments :-).

The downside is that it disallows anyone who hasn’t got a javascript enabled browser to post a comment.

Now I still need to get some good means to combat trackback spam. Just putting them under moderation isn’t good enough for they keep coming

Beating Spam

Most mail clients now include spam filters, which are learning and improving themselves.

The problem though is that when spam keeps getting smarter your program has got too which still means reporting half of your spam mail as spam and checking your spam mail for your regular mail. Also when you start again after a reinstall and the spam filter has lost its experience you got to start over again.

Now, I guess it would be great to create a centralized independant organization specificly to regognize (and when it becomes successfull also extract the spammers for prosecution) spam.

The problem is how to organize such a centralized system, for when someone receives a spam email it has got to check with the centralized server whether the spam is spam. The amounts of spam are huge and doubling the enourmous bandwidth and cpu spam has costed already isn’t a very pleasant foresight.

It would be feasable however to create a spam regognition service to run on a clients computer which updates itself with the latest definitions once in a while. This would undoubtly be way more efficient.

The only problem that we are left with is how to get such a system to be intergrated with existing applications, if no one uses it it would be rather useless.

When you have some idea`s on this, please share them.

Strange spam

The last few days this blog has been under heavy attack of comment spam. Although the excellent wordpress filters have put all of it in the moderation queue it still is quite some work so sift out any comments that actually are of a real person.

The odd thing I noticed out of curiousity is that the links don’t even seem to work on more than half of all the spam comments. They are basicly flooding you with for them hopefully tempting comments and if someone finaly has been tempted enough to click one it doesn’t work!

Chain emails suck & Asia

I have received 5 chain emails in my mailbox today claiming that when I forward it to a douzen other people putting my name in it would help the victims of the tsunami in asia.
How? How can miljons of emails help those who most of them haven’t got computers (anymore, or never had) to receive email!

Usualy people forward one email to 10 others at least, the count on most emails I received was 400. So lets assume that every forwards the email 10 times, and this continues for 400 times:

10 ** 400 == 1e400

That are actually more emails than people in the world, usualy people get the same mail back from someone else later in the chain.

Lets assume that 500 miljon people receive a certain chain mail à 100 KB bandwidth for the sender and receiver combined.

That makes 500 milj times 100 KB is 50 TB..

1 Gig usualy costs a provider lets say 5 cents: 5 cents times 50 TB is $2500.

If we would just don’t forward chain mails but all send a simple postcard to asia we would let them show we care and we would save $2500 for the mail providers who will lower prises, which will result in more money for users which eventually results in more money for the world economy including asia!

Never forward a chain mail

I would like to use this moment to say I am shocked by the tsunami in Asia and I do care for the miljons over there, I hope this single post will convince at least one person to stop forwarding mails, that would save an average of about $100 over some time, my donation for them.

Update on the anti-email-harvester mailto links

In the previous post I described a simple though effective method to get rid of the constantly cleverer spam email harvester bots.

I’ve made a little update on the algorithm, it now uses only 1 number for each character and uses a cascading incremental xor transform.

Python code for the algorithm itself:

def alphaicx(s):
    ret = ""
    cascvalue = 0
    for i in range(0, len(s)):
        ret = ret + chr(ord(s[i]) ^ cascvalue)
        cascvalue = (ord(ret[i]) + 1) % 255 
    return ret
def betaicx(s):
    ret = ""
    cascvalue = 0
    for i in range(0, len(s)):
        ret = ret + chr(ord(s[i]) ^ cascvalue)
        cascvalue = ((ord(ret[i]) ^ cascvalue) + 1) % 255
    return ret

I designed the algorithm in Python. Python is great for that kind of stuff.

As you can see there are 2 functions, when you encode something with alphaicx you can decode it with betaicx, and visa versa. betaicx creates tougher code though. This encryption is pretty lousy, but hard enough to stop spam bots.

I’ve ported betaicx to PHP, and alphaicx to Javascript. The running example (very usefull though) has been updated.

The PHP/Javascript code for the function:

function JSBotProtect($text){
	$cxred = "0";
	$cascval = 0;
	for($i = 0; $i < strlen($text); $i++){
		$value = (ord($text[$i]) ^ $cascval);
		$cxred .= "," . $value;
		$cascval = (($value ^ $cascval) + 1) % 255;
	}
	return <<<EOF
<script type="text/javascript">var cxred=String.fromCharCode({$cxred});
var uncxred=""; var cascval=0;for(i=1;i<cxred .length; i++)
{uncxred+=String.fromCharCode(cxred.charCodeAt(i)^cascval);
cascval=((uncxred.charCodeAt(i-1))+1)%255;}document.write(uncxred);</script>
EOF;
}

I’ll more compact uncxred storage. Probable just normal hex, or when I can get it working BASE64.

Protecting your email address against spam bots

Spam bots get smarter these days in harvesting email addresses. They usualy use a regex which searches for ‘.. dot .. ltd’, which isn’t that resource intensive. When that is done a more advanced regex is put in there to get the email adress somehow removing stuff like ‘spam’.

Using normal javascript encoding doesn’t work anymore, for it isn’t that hard for a spider to regognize encoded strings and decode them, whether this is in javascript code or normal html escapes.

Therefore we need to get more inventive:

function JSBotProtect($text){
	$xorred = "0";
	$layer = "0";
	for($i = 0; $i < strlen($text); $i++){
		$layerbit = mt_rand(0, 255);
		$xorred .= "," . (string)(ord($text[$i]) ^ $layerbit);
		$layer .= "," . (string)$layerbit;
	}
	return <<<EOF
	<script type="text/javascript">
		var xorred = String.fromCharCode({$xorred});
		var layer = String.fromCharCode({$layer});
		var unxorred = "";
		for(i = 1; i < xorred.length; i++){
			unxorred += String.fromCharCode(
				xorred.charCodeAt(i)^layer.charCodeAt(i));
		}
		document.write(unxorred);
	</script>
EOF;
}

This PHP function returns a javascript block of code which stores the sensitive string like an email address in 2 parts, which when xorred with eachother result in the original email address.

An implementation to get a mailto: link