Similarity Metrics and Clustering and Merging of Positions

Thu Dec 3 13:07:25 EST 2009

Hi ..

2009/12/3 Thomas von der Elbe <ThomasvonderElbe at gmx.de>:
> Hello Kai,
>
> welcome to the list!
>
>> What I'm currently missing here (maybe I just didn't see it) is any
>> discussion about the application of similarity metrics and clustering
>> algorithms when it comes to the problem of reducing a vast amount of
>> possible positions to a few that can be voted upon.
>>
>
> What an interesting idea! Even though I dont see the need to reduce the
> amount of different positions in order to vote upon them.
>
> I picture Votorola working like this: Everybody can have his own
> position, but because they are structured in trees, the vast amount of
> them will be arranged according to affinity. This will be done by the
> users themselves: they look for the best place to vote by going from the
> trunk of the tree further out to the leaves. This way they would only
> have to choose a few times (at each bifurcation) between a low number of
> alternatives. A difference-engine will help here by showing only the
> differences.

This is very similar to what I proposed, but it assumes that a single
hierarchy can be imposed onto the list of opinions. This is hardly a
valid assumption.

Both techniques rely on user feedback and analysis, so there is no
difference when it comes to that. Both take their intelligence from
the users not from some kind of AI.

Assume a position is composed of many sub-positions. Let's denote
these with capital letters from A to Z.

Let's say, there are five people who have the following positions:

1.) ADFE
2.) FEDC
3.) ADKZ
4.) ZKAD
5.) DEKJ

Now, if these sub-positions are normalized (i.E. ist is known that the
sub-position D of Person 1. equals the sub-position D of person 2)
which is in itself already a bold statement,  then it might be
possible to create a tree from this by sorting the sub-positions. The
following list of positions would emerge:

1.) ADEF
2.) CDEF
3.) ADKZ
4.) ADKZ
5.) DEJK

>>From this I could create the following trees:

Tree 1:

-A
-->D
----->EF
----->KZ
-C
-->DEF
-DEJK

Or the following tree (which is better in some ways but not others)

Tree 2:

-DEF
-->A
-->C
-ADKZ
-DEJK

Which tree do I choose ? There are algorithms to create good trees
(look up decision tree classifiers).

The risk here is, that the first positions change the direction the
debate is taking way too much.

Now,  look at the alternative using metrics and clustering:

I let people rank the similarity between documents. Assuming they are
good at this, they will find something along the lines of this metric
(where 0 means equality, and 4 is complete independence)

similarity(ADEF, FEDC) = 1
similarity(ADEF, CDEF) = 1
similarity(ADEF, ADKZ) = 2
similarity(ADEF, DEJK) = 2
similarity(CDEF,ADKZ) = 3
similarity(CDEF,DEJK) = 2
similarity(ADKZ, DEJK) = 1

Now, using techniques like dimensional scaling, I can try to find
virtual coordinates for each position, trying to place the positions
in such a way that their distance in that coordinate space equals the
distance of the user-generated similarity metric. With such a picture,
it's easier to determine who stands where, how to make out very
similar or very dissimilar positions, and where compromises might be
made. Using clustering techniques, I can even create clusters if 2
dimensions are not enough to visualize this.

>
> Another possible way to find your place to vote, would be through a
> classification-system, which shows clusters of positions suitable for
> you through tagging for example.

Tags might help in the normalization of sub-positions, but won't help
if a position can consist of multiple combined sub-positions, each
worthy of a tag.

> But the application you describe here, could maybe be used for the same
> purpose: as a search-engine for the most similiar position to yours in
> the whole tree/forest (or even across different voting-engines).

bye,

Kai Londeberg