Proposed rationalization of vote verification architecture

Thu Apr 4 05:32:23 EDT 2013

Hi all,

this is an effort of (for now informal) standardization of the mirroring 
architecture as well as creating a modular (as in swappable) count-engine for 
Votorola (both involves making the vote-data flow tool-agnostic and 
standardized). Feedback wanted!

Michael Allan wrote:
> Conseo had some ideas about how vote mirroring and the verification
> architecture might marry.  We discussed these and sketched a rough
> design together, which I've appended to the old verification document:
> http://zelea.com/project/votorola/a/count/verification.xht#rationalization
> 
> Please review C, correct if needed.

We have had some mumble discussion on how to break the counting process apart 
to allow free determination of it, first by developers of alternate versions, 
later by allowing voters themselves to express programmatically what their 
vote means and potentially how the votes of their voters flow through their 
node. By pinning down the data involved, we have also solved the verification 
use case of the resulting counts, it is the same except for publication of the 
vote-counts and can be considered a first step of running your own vote-
server. See Mike's comment and documentation above.

Requirements for a vote mirroring network
=========================================

To allow different algorithms and code-bases to run on the mirrored network of 
vote-servers we propose the following counting pipeline starting from raw user 
data processing to counted result:

mirroring network                  individual vote-server

                                   translation
peers on internet <--> p2p cache -------------> local cache ---> count-engine

(The mirroring network can be considered, if you like to, as a bittorrent 
swarm mutually sharing it's voters raw data and pulling a feed of new data 
torrents from its individual mirror-selection. This guarantees immutability 
and cryptographic integrity of data for free and avoids dependencies on high 
reliability cloud storage. Different (p2p-)techniques are combinable to 
distribute data, although bittorrent seems to be a good starting point. All we 
need to do is organize copying for read-access.)

Design consequences
===================

* It is enough to make data accessible to the network, p2p support can 
actually be handled by an intermediate proxy or a friendly mirror, so the 
minimum requirement to participate in the network has to be to allow one peer 
(a super-seeder) in the p2p-network sufficient bandwidth and free access to 
the data. (This can be done from home for small to medium servers.)

* All data distributed by the network is historical (timestamped), especially 
each distinct datum created by a user. Each datum should at least contain the 
user/group-identity and a direction towards which assent has been expressed. 

* For the translation process to be as adequately implementable by each tool 
as possible, the datum should contain as much information as possible to 
facilitate user's freedom in the network. No tool should keep user generated 
specifics (fields) from the data. This cannot be enforced, but is considered 
bad practice in interest of the user and hence of the network.

* No data is ever deleted or updated (as in overwritten). It is an immutable 
data-value and as such available in the network and part of public history. 
Instead a timestamped new datum is offered which represents the value valid 
from this timestamp on, and so forth. Tools have to supply updates in this way 
of novelty to the network.

* The design of the local cache and count-engine is totally up to the vote-
server designer. Starting with translation step going further right of the 
pipeline to counting, each tool is free to represent the data in any form. 
Votorola will allow different count-engines, implemented in any language 
running on the JVM, e.g. Python, Ruby, Scala, JavaScript, Groovy ... (and 
Clojure for my personal first implementation). For now it will still ship with 
the current default Java one, the first steps are towards defining proper data 
represantations between implementations. The translation step can be shared 
for all Votorola implementations.

* Each user generated input datum should be made accessible to the network in 
less than a day to avoid privileged access to data from popular peers.

* All data of assent is supposed to be in the public domain as it is already 
ruled under US law. Actually we should pin that down internationally. Has 
anybody ideas on how to ensure openness of the data globally. ODbL comes to my 
mind. (1) State legislation will probably render licenses invalid e.g. in the 
States, but it would make sense to have some safety net internationally imo. 

* None of these points can be enforced, but has to be at least followed by two 
participants for the network to be functionable. Free-riders can always be 
attacked on these grounds in front of the users and by scraping their data 
(which is in the public domain).

Can you see where this network design is exclusive? Can you find 
discriminatory constraints? 

Votorola specific counting standardization
==========================================

As is already the case in snapshotted form, the result should be made 
available in machine-readable form for verification as in the process outlined 
by Mike.
For Votorola, the count-engine has to expose a consistent format of the 
result. Maps (as in SQL rows) with standardized fields (non-exclusive optional 
ones always addable) have to be defined. Both input and output data of 
Votorola is supposed to be representable in a SQL table that way (a set of 
these maps/tuples in general relational algebra).

conseo

(1) http://opendatacommons.org/licenses/odbl/

P.S.: For those interested in some computational thoughts on the scalability 
of the counting side, a brain dump of mine is here: 
http://zelea.com/w/User:4consensus_WebDe/Trees_of_Transactions
I have to think about how to include older data later (e.g. from the past).