Harvest scheduling and job management

Michael Allan mike at zelea.com
Mon Mar 26 01:40:21 EDT 2012


> conseo said:
> > 2. We then harvest forward and not backward, which gives us no
> > guarantee that we can match the <10s live criterium or we have to
> > burst for any number of new posts forwards (this means we burst
> > that way on every Kick!). Picture 100 posts sent since the last
> > update, which we cannot outrule imo.
  
I said:
> You could modify (c) to update backwards from the end bound to the
> last message cached, but it would complicate the design.  I think a
> forward update is cleaner and fast enough (faster than 10s). ...

I was wrong.  It would only be fast enough if the harvester received a
kick for every message, not just the difference messages.  It would
then fetch and re-parse the archive's index for every message that was
posted and thus keep the marker near the end of the archive, close to
the next diff message whenver it comes.  But the extra fetching and
parsing to keep it there would nearly double the load on the servers,
both local harvester and remote archive.

It's better to update backwards and to employ a complex of markers to
keep track of the resulting fragments.  This way a single fetch/parse
of the index can be reused to crawl back over the non-diff messages,
which might be numerous.  This was C's original design.

-- 
Michael Allan

Toronto, +1 416-699-9528
http://zelea.com/



More information about the Votorola mailing list