The (or at least a better) Solution to Spam

There is no easy way to distinguish between a human and a spambot. It’s an arms race which we’ll always be behind. I’m talking here about spam in more general—not only on e-mail but also on for instance Wikipedia or on blogposts. Even if we would have a perfect-solution to test whether there is a human behind something, we still have to combat cheap labour in India: real people spamming “by hand”.

I think the solution is a Web of Trust similar to that of PGP. An identity (not necessarily a person) publishes who she trusts (not to be spammy/phishy) or not trusts. Ideally everyone would be in one big web. Only someone who my blog via-via trusts may post.

Obviously one still may gather trust of people over time and enter the web of trust and then start spamming with that identity. However, then that identity will be marked untrusted by people and also the people who initially marked the identity as trusted will be less trusted. Also, there are way more sophisticated measures of establishment in the web/trust to conceive than just being trusted via-via by one identity.

There is no way to prevent spam perfectly, but the amount of work that has to go in to making an identity trusted and established in the web is several orders of magnitude greater than any other protection we have. The big problem: we don’t have such an ubiquitous web of trust yet. (Yes, it’ll be in SINP if I’ll get around to working on it)

10 thoughts on “The (or at least a better) Solution to Spam”

  1. There are enough networks on which to build this. SINP, yes, but also OpenID, for instance. Or just plain e-mail addresses.
    On top of this network, you’ll have to build a ‘karma’ system which allows a basic but effective indication of the level of trust. There is one problem to solve though: how to make it decentralised.
    For example, if a new person tries to reply on your weblog, how does your weblog know to which server it has to go to ask whether the new guy can be trusted? I guess it could broadcast a request to all trusted entities it knows and wait for a reply. But what if the via-via network extends over several hops?
    Interesting thought, though. πŸ™‚

  2. Karma is no good for it is someone elses unweighted addition.

    For instance, if someone who I really trust very well trusts someone else then I trust that person more than if he would be trusted by some stranger. When karma would’ve been used there wouldn’t have been a difference.

    To reiterate, it’s not about how many people trust you or how well they trust you — you shouldn’t trust just people. It’s about how many people you trust (or indirectly trust) trust someone.

  3. I understand. Perhaps karma is not a correct way to describe it. But it should account for something if a certain person you don’t directly trust, is trusted by more than one of your direct ring of ‘trustees.’

    By the way, this reminds me of something like the ErdΓΆs number. Except it doesn’t have to do with writing articles. Let’s call it a Relative Trust ID (or Number). The person himself has an RTI/N of 0. Every person he/she trusts receives an RTI/N of 1. People you don’t directly trust, but are trusted by the people with an RTI/N of 1 (ie. the people you trust), receive an RTI/N of 2. This goes on and on, and the number keeps incrementing for every ‘hop.’
    The number is relative, which is important. Someone with an RTI/N of 1 for you, can have an RTI/N of 3 for someone else.

    This is a simple system which can be built upon almost any (reliable) identity system (eg. e-mail, SINP or OpenID).

    Maybe something for a CodeYard project? πŸ™‚

  4. What if I create 100 ‘fake’ identities and let them all increase the karma of one identity. Has that identity become anymore trustworthy? You’ve got no way to know whether those 100 identities are genuine or just made to raise karma of someone else.

    You get the idea of web of trust. But there are slightly more factors to take into account. It isn’t just “I trust him” or “I don’t” — there should be lots of gray values “I trust him marginally”. Also take into account that if someone is trusted indirectly via two different paths that makes him more trustworthy.

    An easy way to implement this in SINP would to associate a PGP key to your SINP identity. (Abuse the e-mail field in PGP for that). Add a node in the public id doc with which you claim your SINP address by including a PGP signature. Lets call this node the ‘claimnode’. Then include for each of the people you trust their claimnodes and how much you trust them.

    I don’t see an easy way to do this e-mail or the existing keyserver infrastructure. Keyservers aren’t designed to be trustworthy. Albeit they can’t manipulate any data for it all is secured by PGP, they can delay keys being published. If you’d publish your trustees, a malicious keyserver could refuse to publish them.

    Oh, and openID sucks so bad I wouldn’t even consider it.

    The ideal solution though would be a dedicated network where there is no need to trust any server (where you still need to do that with SINP for instance). See the The Future of the Internet and Kademlia if you are interested.

    Yeah, it certainly could be a Codeyard project. First I got to find some sparetime to work on this or on any of my other ideas (eg. Fileystem, Graph compiler, etc).

  5. Karma wasn’t what I meant. Forget about it already. πŸ˜‰

    The relative trust idea can’t be abused that way. Even if you have 100 identities that somehow try to boost trust levels, there is no way you will be harmed because they will have to earn a trust level relative to you (which means they have to get into your trust network as well).

    I’ve seen the talk by Van Jacobsen, which was really interesting. I agree with you that this trust system must work decentralised. Actually, that’s the only effective way to build truly relative trust. I am not very familiar with decentralised protocols, though, but it’s a good solution.

    With “building up existing identity systems” I did not mean that this trust system should be integrated into any of them. I meant that there must be something that can quite reliably identify people on the internet. IP is not good enough because it is subject to change which is usually not controlled by the user (rather the ISP). E-mail addresses are a decent identifier (people rarely change their address and if they do it is their decision). The drawback of change is that the trust network has to be build all over again (for that particular person).

  6. Ok, then your relative trust idea sounds pretty damn close to my indirect trust :).

    Real decentralised protocols aren’t that difficult. Read the WP about Kademlia and you’ll see it’s simple.

    There is a little issue with e-mail (and SINP for that matter): you can’t trust DNS. The way to avoid DNS spoofing is to use SSL certificates. Certificates issues by a CA. A Central Authority. Not really distributed — not really safe.

    As far as I know there is only one way to have unspoofable identifiers: to use hashes. Offcourse it’s only safe if the hash covers all data you request.

    PGP works this way. 99BA289B is the last part of my PGP key hash. It identifies my PGP key and none can give you a key when you request 99BA289B and let you believe its mine because you can hash the key and find whether it is really mine by checking whether that hash ends in 99BA289B too.

    Everyone should get PGP or we should hope that the Future of the Internet is near.

  7. My idea of relative trust is different from your idea because it only cares about ‘yes’ or ‘no.’

    “Do you trust that person, yes or no? Do you trust this website, yes or no?”

    The difference in weight is created by the ‘hops.’

    “I don’t know this guy, but maybe someone else trusts him. I won’t trust him as much then, though.”

    In my opinion, having options like “I trust that person a little bit” make the concept too complicated. The number of hops and the number of possible ‘links’ to a person should give enough information to compute such a level.

    Using something like PGP for extra security seems like a good idea, but it might decrease the adoption of such a system (unless it isn’t really too complicated). The power of a system like this is, of course, that it will be widely adopted.

  8. I think a system with just a yes or no will not work that well: it’s too hard to draw the line when to trust someone and when not to with just yes or no. More importantly, you’ll have people that are too naive or just too paranoia.

    Working with a value between 0 and 1 (0 = barely trusted, 1 = ultimately trusted — offcourse an enduser will see a few options instead of being asked to enter the number manually) does make it a bit more complex but it is still very much manageable.

    I think that the main issue would lie in how to balance the formula. How much will the weight of a trust be diminished over a few hops and each of their trusts? And how much weight will be added by having multiple (separate) paths.

    I think the solution is to let the user pick. Let him have a slider to determine some variables. Obviously they wont be asked for real values but get something like: “barely trust far away indirect trusted — fully trust far away indirect trusted”.

    Hm.. that makes me itching to actually make a specification for it.

  9. You’ll end up throwing in a bunch of calculations anyhow. By the way, what’s the significance of a multi-level trust indication?
    At least as far as spam goes, the binary solution should be effective enough. It’s also a lot more agile and convenient, in my opinion.

    What if you have contradicting sources? One says the person can be trusted, the other says no way. This is complicated enough to handle with two options already, but how would you do this when there are multiple intermediate options?

    By the way, if you were to write a specification, I think you could draw half of the information from the comments here. This is starting to look more like a discussion forum. πŸ˜›

  10. Although I don’t know whether to make 0 neutral or untrusted it really doesn’t make that of a difference for the calculations. When several persons say something about a certain individual, the resulting trust is the average of their trust given weighted by the trust I give them.

    Off course, if one wanted, one could have his own algorithm. It doesn’t really care which algorithm I use for the system itself. For instance, if a lot of people don’t like the behavior described above they’ll create a version that will use the weighted minima of the trusts instead of the weighted average.

    Or even better, not just only output the trust of someone but also let the user know about the amount of indirectness and consensus about it.

    I think that the two solutions described above for contradicting sources are both cleaner than one you’d have to invent for “binary” trust.

    Discussions, luckily, are everywhere :).

Leave a Reply

Your email address will not be published. Required fields are marked *