Functional, generic and scalable vote-counting implementation for delegational trees

Sun Jun 30 21:58:27 EDT 2013

Hi,

for some time I have been working with Clojure on a simple, yet powerful 
counting prototype for a flexible vote-engine. Finally I have enough 
experience with Clojure and could simplify my design (read: complected 
confusions) as thoroughly as possible to show that a very simple and highly 
adjustable algorithm for delegational vote trees is doable. Furthermore it 
proves the bottle-neck is memory-consumption of the voter-registry (and access 
to it), as already estimated. (1) I have reserved 3 GiB of heap memory for the 
JVM process. This scenario is only a first demo, if you have different 
requirements, let me know, and I will probably implement it.
It runs in the browser, too (just not as fast).

Legend:
; lines after a semi-colon are comments 
(Lisp expression is in parentheses)
=> result of previous lisp expression

Example scenario for a single poll (2)
==================================
; This is our transactional voter registry, starting with an empty map.
(def test-registry (atom {})) => Atom...

; Load more than 5 million votes equally distributed by 10 voters per
; candidate. Each letter in upper and lower case + one digit marks an 
; individual end-candidate and its voters. Each end-candidate holds
; 11111 votes.
; It takes ~45 secs here.
(time (swap! test-registry (fn [old] (reduce #(scenario1 %1 0 10000 %2) old
                                              "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"))))
=> {"a0" {:voter "a0" :cand nil :voters [{ ... } ...]} ... }

; Total > 5 million voters, this means persistent hash-map entries !
(count @test-registry)
=> 5201040

; Calculate end-candidate count, complexity in O(log n). Practically instant.
(count-votes @test-registry "c2") 
=> 11111
; Pick one of the leaf-voters.
(@test-registry "c20000") 
=> {:voter "c20000", :candidate "c2000", :voters #{}}

; create cycle
(swap! test-registry #(vote % "c2" "c20000"))
=> { ... {:voter "c20000", :candidate "c2000", :voters #{"c2"}} ... }
; count again, still the same ...
(count-votes @test-registry "c2") => 11111

Conclusions:

1. Clojure's STM and especially the JVM are ridiculously fast. (This code is 
totally unoptimized and bangs on immutable data structures with millions of 
nested entries. (I could easily optimize it for this scenario with transients 
for instance.))
2. We can count in memory for now until we reach millions of users per poll 
(then some distributed storage is necessary anyway). Still votes are a well-
reflected user input which will not happen in millions at once (read in less 
than a minute constantly). Even if it does, memory will be a constraint 
before. But any optimization here is premature, because scaling will hit other 
bottlenecks before. At any point we can snapshot the hash-map atomically, e.g. 
to store it in the database/json/*. 
3. Historical counting is easy and always doable if the votes are timestamped. 
Any calculated count/registry state can be shared by all following counts.
4. Further investigations should rather leverage the recent P2P features of 
browsers (3) and distributed data types to allow a completely distributed 
browser app (having agnostic code which can run on servers as well). This 
comes from the insight that convergence of the counts has not to happen 
quickly and that benefits of a central vote-counting are low performance-wise 
(because the memory limit hurts and suggests distributed servers). (4)
5. It does not reflect the complete output count-table of Votorola yet, still 
the other columns of the output table have little to do with counting imo. The 
registry can always be extended to additional attributes, since it is a hash-
map, but one can implement that separately. 

conseo

(1) http://zelea.com/w/User:Conseo-Polyc0l0rNet/Trees_of_Transactions
(2) https://github.com/ghubber/ptcount/blob/master/src/ptcount/core.clj
(3) https://github.com/openpeer/opjs
(4) e.g. mentioned here for Riak: 
http://christophermeiklejohn.com/coq/2013/06/11/distributed-data-structures.html