Similarity Metrics and Clustering and Merging of Positions

klondenberg kai.londenberg at googlemail.com
Sat Nov 28 10:41:21 EST 2009


Hi ...

I'm new on the mailing list, but I'm going to spare my introduction
for later and dive right in :)

What I'm currently missing here (maybe I just didn't see it) is any
discussion about the application of similarity metrics and clustering
algorithms when it comes to the problem of reducing a vast amount of
possible positions to a few that can be voted upon.

A position can propably be described by a text.

Common metrics describing the similarity of texts are:

- The number of differing lines.
- A Pearson-Smiliarity-Score of the word-vectors of the text
documents.

But these metrics are no good for - for example - lawmaking where a
single very common word like "not" can make all the difference. The
only one capable of determining how similar two positions are is a
human.

Now, of course it's impractical to let everyone vote about the
similarity of every pair of 1000 different positions. But it might be
practical to show every delegate a selection of 10 or so randomly
picked positions and let him rate how similar these are to his own.

Maybe this step even needs a second iteration for some positions, but
I guess it would be possible to create a reasonably complete metric
just from a few samples.

Using this metric it would be possible to use Clustering algorithms to
form working groups of delegates who should reduce their number of
positions.

Repeat these steps (similarity metric, cluster, merge) as often as
neccessary and a few distinct but strong positions will emerge.

Any opinions on this ?

Kai Londenberg







More information about the Votorola mailing list