Proposed rationalization of vote verification architecture
conseo
conseo at polyc0l0r.net
Thu Apr 4 05:32:23 EDT 2013
Hi all,
this is an effort of (for now informal) standardization of the mirroring
architecture as well as creating a modular (as in swappable) count-engine for
Votorola (both involves making the vote-data flow tool-agnostic and
standardized). Feedback wanted!
Michael Allan wrote:
> Conseo had some ideas about how vote mirroring and the verification
> architecture might marry. We discussed these and sketched a rough
> design together, which I've appended to the old verification document:
> http://zelea.com/project/votorola/a/count/verification.xht#rationalization
>
> Please review C, correct if needed.
We have had some mumble discussion on how to break the counting process apart
to allow free determination of it, first by developers of alternate versions,
later by allowing voters themselves to express programmatically what their
vote means and potentially how the votes of their voters flow through their
node. By pinning down the data involved, we have also solved the verification
use case of the resulting counts, it is the same except for publication of the
vote-counts and can be considered a first step of running your own vote-
server. See Mike's comment and documentation above.
Requirements for a vote mirroring network
=========================================
To allow different algorithms and code-bases to run on the mirrored network of
vote-servers we propose the following counting pipeline starting from raw user
data processing to counted result:
mirroring network individual vote-server
translation
peers on internet <--> p2p cache -------------> local cache ---> count-engine
(The mirroring network can be considered, if you like to, as a bittorrent
swarm mutually sharing it's voters raw data and pulling a feed of new data
torrents from its individual mirror-selection. This guarantees immutability
and cryptographic integrity of data for free and avoids dependencies on high
reliability cloud storage. Different (p2p-)techniques are combinable to
distribute data, although bittorrent seems to be a good starting point. All we
need to do is organize copying for read-access.)
Design consequences
===================
* It is enough to make data accessible to the network, p2p support can
actually be handled by an intermediate proxy or a friendly mirror, so the
minimum requirement to participate in the network has to be to allow one peer
(a super-seeder) in the p2p-network sufficient bandwidth and free access to
the data. (This can be done from home for small to medium servers.)
* All data distributed by the network is historical (timestamped), especially
each distinct datum created by a user. Each datum should at least contain the
user/group-identity and a direction towards which assent has been expressed.
* For the translation process to be as adequately implementable by each tool
as possible, the datum should contain as much information as possible to
facilitate user's freedom in the network. No tool should keep user generated
specifics (fields) from the data. This cannot be enforced, but is considered
bad practice in interest of the user and hence of the network.
* No data is ever deleted or updated (as in overwritten). It is an immutable
data-value and as such available in the network and part of public history.
Instead a timestamped new datum is offered which represents the value valid
from this timestamp on, and so forth. Tools have to supply updates in this way
of novelty to the network.
* The design of the local cache and count-engine is totally up to the vote-
server designer. Starting with translation step going further right of the
pipeline to counting, each tool is free to represent the data in any form.
Votorola will allow different count-engines, implemented in any language
running on the JVM, e.g. Python, Ruby, Scala, JavaScript, Groovy ... (and
Clojure for my personal first implementation). For now it will still ship with
the current default Java one, the first steps are towards defining proper data
represantations between implementations. The translation step can be shared
for all Votorola implementations.
* Each user generated input datum should be made accessible to the network in
less than a day to avoid privileged access to data from popular peers.
* All data of assent is supposed to be in the public domain as it is already
ruled under US law. Actually we should pin that down internationally. Has
anybody ideas on how to ensure openness of the data globally. ODbL comes to my
mind. (1) State legislation will probably render licenses invalid e.g. in the
States, but it would make sense to have some safety net internationally imo.
* None of these points can be enforced, but has to be at least followed by two
participants for the network to be functionable. Free-riders can always be
attacked on these grounds in front of the users and by scraping their data
(which is in the public domain).
Can you see where this network design is exclusive? Can you find
discriminatory constraints?
Votorola specific counting standardization
==========================================
As is already the case in snapshotted form, the result should be made
available in machine-readable form for verification as in the process outlined
by Mike.
For Votorola, the count-engine has to expose a consistent format of the
result. Maps (as in SQL rows) with standardized fields (non-exclusive optional
ones always addable) have to be defined. Both input and output data of
Votorola is supposed to be representable in a SQL table that way (a set of
these maps/tuples in general relational algebra).
conseo
(1) http://opendatacommons.org/licenses/odbl/
P.S.: For those interested in some computational thoughts on the scalability
of the counting side, a brain dump of mine is here:
http://zelea.com/w/User:4consensus_WebDe/Trees_of_Transactions
I have to think about how to include older data later (e.g. from the past).
More information about the Votorola
mailing list