Unique followers on Twitter

As pointed out by Jorg Kennis, Twitter‘s new lists feature make it hard to determine the amount of unique followers. I’ve written a simple script, using a slightly modified TweePy, to determine the amount of unique followers.

Example usage:

bas@w-nz ~/twitter-unique-followers $ python twitter-unique-followers.py JorgK -ubwesterb
Password:
rate_limit_status: remaining_hits: 112
Jorg Kennis
 followed directly by 182
 in lists
  RPtje/vriendenbekenden subscribed to by 0
  JorgK/TechNL subscribed to by 0
  sentfanwyaerda/nijmegen1 subscribed to by 9
  JorgK/Friends subscribed to by 0
  sjorsjes/Community subscribed to by 0
  robinspeijer/iPhone subscribed to by 1
  nielsschooneman/iPhone subscribed to by 0
  JeanPaulH/iPhoneclub subscribed to by 5
number of unique followers: 198

You can download it here. I could write a simple webpage with the same functionality, if anyone would mind.

Normul

Normul normalizes URLs. It expands shortened URLs:

>>> from normul import normul
>>> normul('http://bit.ly/1I4VQ')
{'type': 'other', 'normalized': 'http://www.shinguz.ch/MySQL/mysql_mv.html', 'original': 'http://bit.ly/1I4VQ'}

And shows useful links for hosted-images:


>>> normul('http://yfrog.com/6c5krj')
{'image': {'full': 'http://img228.imageshack.us/img228/1079/5kr.jpg', 'thumbnail': 'http://img228.imageshack.us/img228/1079/5kr.th.jpg'}, 'type': 'image', 'original': 'http://yfrog.com/6c5krj', 'normalized': 'http://yfrog.com/6c5krj'}

You can find the simple but convenient sourcecode here.

Cantor never bores (1)

Given a set of countable sets K, such that K is totally ordered by inclusion, videlicet for every A,B\in K either A\subseteq B or A\supseteq B. Intuitively, for at every step in this chain one element at least must be added, one expects the set K to be countable as well.

Suppose K is countable. Then the union, \bigcup K is a countable union of countable sets, hence countable. (Suppose k: \mathbb N \to K is an enumeration of K and f_i: \mathbb N \to k(i) enumerations of the elements of the chain. Then f_0(0), f_1(0), f_0(1), f_2(0), f_1(1), f_0(2), \ldots enumerates \bigcup K.)

Thus \bigcup K is an upper bound of K. In the poset of countable subsets of some set U, of which \bigcup K is a subset, every non-empty chain has an upper bound. Hence, using Zorn’s lemma there is a maximal element, say M.

Suppose U is uncountable, then there exists a \star \in U \backslash M. M \cup \{\star\} is most definitely also countable and M \subset M \cup \{\star\} which contradicts M’s maximality. We are forced to conclude that there exists an uncountable chain of countable sets.

Cantor’s set theory keeps surprising.

Update: an example of such a chain is the set of the countable ordinals.

Another update: a “more concrete” example are the downsets in \mathbb Q without the empty set and \mathbb Q itself. These downsets correspond to real numbers, see Dedekind Cuts.

PijsMarietje

At the faculty for sciences there are canteens for students. In each of these, there’s sound equipment connected to linux boxes. On each of those linux boxes, we run a music-request-server called Marietje. I just finished writing a front-end in Javascript. It wasn’t a pain. As instead, the use of jQuery was a bliss.

The frontend for one of those boxes and the source code (see the ajax folder).

Damned DOM (1)

When I wanted to react to any changes to a input textbox immediately, my first instrinct was to use onChange. onChange, however, is called when the input loses focus. onKeyPress then? Isn’t called on backspaces. onKeyDown, maybe? It does get called, but the effect of the keystroke isn’t yet applied, for the return value determines whether it that is done in the first place. (Same story for onKeyPress by the way.) onKeyUp does work a bit, except if someone is holding down a single key, for a while.

The solution: hook onKeyUp and use setTimeout with a timeout of 0. Yugh. I hate DOM.

Big Fat Disclaimer: I actually tested this only on one browser.

Unicode to ASCII (1)

When I want to generate usernames from real names, which can contain non-ascii characters, you can’t simply ignore the unicode characters. For instance, danielle@blaat.org is the right e-mail address for DaniŽlle, danille@blaat.org isn’t.

There’s trick. Unicode has got a single code for Ž itself, but it has also got a code which (simplified) adds ® on top of the previous character. The unicode standard defines a normal form in which (at least) all such characters, which can be, are represented using such modifiers. If you then simply ignore the non-ascii representable codes, you’ll get the desired result.

In python: unicodedata.normalize('NFKD', txt).encode('ASCII', 'ignore').

However, this isn’t the right solution. For instance, in german, one prefers ue as a replacement of Ł over u.

Django annoyances: no reverse select_related

Consider

for page in Page.objects.all():
  print page.title
  for comment in page.comments.all():
    print comment
.

There will be a single query to fetch all pages, but there will be for every page another query to fetch its comments. Luckily, Django has got a nice trick up its sleave: select_related. Would I use instead of Page.objects.all(), Page.objects.select_related('comments').all() then Django will use a single joined query to prefetch comments for each page.

However, Django’s select_related only supports forward one-to-many references. No many-to-many; certainly no reverce many-to-many; no reverse one-to-many and no, not even reverse one-to-one (yet). A developer claims it’s impossible (which is bullshit), another asks for patches, which means he doesn’t care doing it himself.

It’s quite easy to manually code around the missing reverse select_related, but it takes too many ugly lines compared to the single word it could’ve been.

GStreamer: accurate duration

When decoding, for instance, a variable-bitrate MP3, gstreamer reported durations are, to say the least, estimates. I’ve tried to get a better result in a few ways. First off, some files yield a duration tag, but even if you’re lucky and it is there, there are no guaranties about precision. After that I tried seeking to the end (GST_SEEK_END) of the stream and querying the position, which gstreamer didn’t like. Finally, routing the audio into a fakesink, waiting for the end of stream and then querying for the position gives the right result. It’s not the prettiest method, but it works.

This is a Python script that prints the duration of a media to stdout.

Aperitif for order

Assign 1 to True and 0 to False. Now the minimum corresponds to “and” and maximum to “or”. If you give it a bit more though, less or equal to corresponds to implication. This is a lot more general than this specific case. Add .5 for a third value (eg. NULL) and it still yields natural results.

We can recognize the behaviour of minima and maxima in a lot of other things. Take for instance set inclusion as order with intersection as minimum and union as maximum. Actually, the link between general order and set inclusion is frequently made to then propose that “intersection of two set of cases” and “logical and” do look a lot alike.

This is just the tip of the huge iceberg. Order appears everywhere! Everywhere in Math. Everywhere in CS. And its just recognizing simple order I’ve demonstrated. Other useful concepts in order theory that I didn’t even touch are Galois Connections and formal concept analysis.

Oh, another example of an order are integers with bitwise or and bitwise and. It is left as an exercise to the reader when one integer is greater than another.

Interested? Buy an Introduction to Lattices and Order.

On-demand singleton for Python

Some singletons eat slightly more resources, than you want to give them for free. For instance, if you have a home-brew threadpool singleton, you don’t want it to create its threads if you are not going to use it. The solution: a simple function that creates a stub which proxies attribute access to an ad-hoc created instance.

Usage: create_ondemand_singleton('mylibrary.Threadpool', MyThreadPoolClass).

The Cuckoo Hashtable

A Hashtable algorithm is a specific algorithm to implement key-value pair datastructure with efficient by-key look-ups using hashing of the keys. A hashtable contains a list of buckets. In a simple implementation, the i-th bucket, contains the key-value pairs in a list of which the key has a hash that is i modulo the amount of buckets. The hash-table would increase the amount of buckets if any bucket contains more than a fixed amount of pairs. This results in a constant-time look-up, but an insertion might invoke a expensive rebuild.

There are a few variations on Hash Tables. I’d like to share a really smart one: the Cuckoo Hashtable.

The Cuckoo hashtable expects two different hash-functions for the keys. Instead of storing a list of pairs in each bucket, the Cuckoo Hash table stores a single pair in each bucket. When inserting a pair, it is inserted in one of the two possible locations. If it happens to be occupied, the old pair is replaced. Then the replaced pair is inserted at its other possible location, potentially kicking out another pair. This step is repeated, until there is no displaced pair or a loop is detected. When a loop is detected, the hash-functions can be changed, if the buckets are for the most part empty -or- when the table is almost full, the amount of buckets can be increased. It can be shown that an insertion has a amortized constant time. (The buckets can be resized in-place, and each entry is repositioned as if it was displaced by another.)