Table public.in_vote format
Michael Allan
mike at zelea.com
Thu May 23 16:10:10 EDT 2013
You're right C, we should fix this. We're losing information.
> Change the table format from its current form:
>
> servicename | voteremail | xml
>
> where the xml contains at least a timestamp and the candidate-email (can be
> empty), as well as a dart-sector.
>
> to:
>
> timestamp | servicename | voteremail | candidateemail | ds
I agree we need a timestamp column. But I also like the flexibility
of the XML column, especially for prototyping. So maybe like this:
timestamp | servicename | voteremail | xml
> I propose a primary key over the tuple (timestamp, servicename,
> voteremail) for now, as these should be unique (you shouldn't cast
> different votes for one voter on the same poll at once). ...
Yes.
> ... We also should have a B-Tree index on timestamp and any index
> (probably hash-map) on servicename and voteremail to make querying
> fast ...
I think we automatically get fast indeces for the primary keys.
> ... You can still get the same query behaviour, by just selecting
> the newest vote. ...
I guess the most common query (vocount's) will be for the latest votes
of a given servicename.
> What do you think? I can help to adjust the Java code, if you don't
> like to put time into the effort.
Okay. If we agree on the columns, then please go ahead.
Mike
conseo said:
> Hi Mike,
>
> Problems:
> 1) The voting table does not store all vote events, but rather updates them in
> place. Hence we have no complete history of vote events, but only compiled
> snapshots of certain intervals.
> 2) The current xml string makes it fairly difficult to compile a historical
> vote count, because I need to query by timestamp as well.
>
> Proposal:
>
> Change the table format from its current form:
>
> servicename | voteremail | xml
>
> where the xml contains at least a timestamp and the candidate-email (can be
> empty), as well as a dart-sector.
>
> to:
>
> timestamp | servicename | voteremail | candidateemail | ds
>
> which basically destructures the xml in plain SQL to make it queryable. If you
> would like to have a collection in a field to flexibly save attribtues like ds
> (dartsector) without schema changes in the future, I'd propose to use the new
> JSON support in Postgres. This would still allow to get some of SQL's query
> functionality for data in this field (e.g. filtering by JSON attributes) as
> well as validation. (1)
>
> I propose a primary key over the tuple (timestamp, servicename, voteremail)
> for now, as these should be unique (you shouldn't cast different votes for one
> voter on the same poll at once). We also should have a B-Tree index on
> timestamp and any index (probably hash-map) on servicename and voteremail to
> make querying fast, but this is not necessary to pin down the data format,
> just taking the benefits of SQL at runtime.
>
> Votes are not updated in place then, but added as new events (rows) with a
> newer timestamp. You can still get the same query behaviour, by just selecting
> the newest vote. You can also trivially compose the current xml form from the
> SQL query, so I can't see any big breaks by this change. We need to recreate
> the table from current data once, I can do so for you.
>
> What do you think? I can help to adjust the Java code, if you don't like to
> put time into the effort.
>
> conseo
>
> (1) http://www.postgresql.org/docs/current/static/functions-json.html
> There is also a xml type:
> http://www.postgresql.org/docs/current/static/functions-xml.html
> but since we use JSON for most communication, I think it is the best fit.
> Accessing JSON:
> http://people.planetpostgresql.org/andrew/index.php?/archives/249-Using-PLV8-to-index-JSON.html
> >From my side we can also keep the field as XML-String as long as we pull
> timestamp and candidate out of it.
More information about the Votorola
mailing list