Harvester Roadmap

Michael Allan mike at zelea.com
Fri May 11 09:05:53 EDT 2012


> > So to cover a new forum, we just give it a page in the pollwiki...
> > Then its messages will automatically appear in the harvest? ...
>
> Yes, it does that at runtime and it keeps the cache up-to-date for those 
> forums already (update every 3 hours).

Excellent, that's a big step forward.

> > Whatever the order, we might not want to deploy a talk track (2) on
> > stage unless it can sync with the forums. ...
>
> What do you mean with "sync"? Atm. we have <3 hours update interval
> which makes a track 1) already quite usable imo and 2) has already
> everything we need to write a talk track. But if we come over issues
> in the track design we can fix them before we target the detector
> handling or further back-end work.

I do not think we should poll the archives, not even as a fall back.
It's maybe okay for initializing the forums, especially the inactive
ones.  But for day-to-day use, the front will be too slow (3 hours),
and the back won't scale (too many forums in the world).

> > I have one doubt about that.  I vaguely recollect discussing an
> > email-based subscription detector for Mailman. (?)  Why did we
> > discuss that?  Do you recall?  Hopefully it's not needed, because
> > a bridge detector is much easier.
> 
> There are two problems with it: 1) The kick event does not have to
> occur, so messages might get lost if the author of the message does
> not trigger the difference event after sending to the forum. ...

I guess you're right, but that's only because the difference bridge is
not the only place to a view a difference.  So maybe we should raise a
kick on every request to the difference *cache*.  That would cover the
bridge itself, the bridge footings in the draft, and anything else we
added in future.  If no drafter cares to look at the posted difference
in *some* manner, then it's not important and there's no need to
trigger a kick or to harvest anything.

Again, this approach is simple.  But more than ever, it places a
burden on efficiency.  There will be many redundant kicks.

> ... 2) We don't know from where the event comes (because it is
> likely that it is clicked in the mail (or other native client) and
> we have outruled referer-id for that reason (because we cannot burst
> on all forums all the time). ...

You mean the difference bridge (or cache, or whatever) won't know what
forum the difference was posted in?  True, but the difference will
resolve to the drafters, the drafters to the candidate, and the
candidate to the forum!  (Ref the use cases linked in my last).  All
these expensive resolutions will have to be skipped ofc for redundant
kicks.  That probably means the determination of redundancy must be a
function of the difference key itself.

  boolean isHarvested( DiffKey diff )

If the kicker called that function and aborted redundant kicks, then
it would be very fast indeed.

> ... I am afraid, but I think we need a Maildir/Mailman detector to
> get our <10s goal reliably.  Whether we simply write a detector for
> Maildir which can handle all kind of forum updates (often you can
> get notified for new messages by mail, which might work fairly well
> for many forums) or we separate it for each forum type is yet open.

Adminstering all those subscriptions will be complicated and will only
work for mail-based forums.  We gotta try for a more elegant solution.

-- 
Michael Allan

Toronto, +1 416-699-9528
http://zelea.com/



More information about the Votorola mailing list