Harvester Design

Michael Allan mike at zelea.com
Tue Mar 6 19:04:27 EST 2012


> > So CacheWAP is nothing but a dumb web wrapper and the wrapped
> > Cache must expose *at least* as much functionality in terms of
> > read methods in its own API.  For example, see CountWAP and
> > CountTable.PollView.
 
conseo said:
> I have added a getter for the DiffMessageTable, yet I pictured the
> HarvestWAP to construct its own DiffMessageTable as it was until
> now. Fixed?

I think DiffMessageTable == Cache.  You can keep the same class name
and correct the diagram if you like, but I think it's helpful to have
"cache" in the name instead of "table" because it really *is* a cache
and not an original store.  Maybe like this:
 

                Administrator           Reader              Reader
                     |                    |                   |
                     | harvest            | read              | read
                     |                    |                   |
           read      V       store        V         read      V
  Archive <------ Harvester -------> HarvestCache <------ HarvestWAP
    .                |
    .                | register
    .                |
    .                V
    .              Kicker          (note the correction of
    .                ^              method names on Kicker)
    .                |
    .                | raise
    .     listen     |
  Forum <-------- Detector


So you would: hg mv DiffMessageTable.java HarvestCache.java

> > ... (Pipermail, Google Groups, Sympa, Freenode IRC Logger, etc.).
> > The various [archive] formats can be defined by wiki pages, so the
> > actual parameter above will probably just name the page.  We'll
> > have to doc all of this in step 3.
> > http://mail.zelea.com/list/votorola/2012-February/001311.html
> 
> I would configure Forums in the Wiki (like Mailman,
> Freenode+Channel) and then add archives to it (Pipermail,
> Irsii-Archive). The question is to what users relate. They are
> present on a forum, but they might also prefer to be linked to a
> certain Archive. We have to talk about this, because I need to
> understand how to define the Wiki properties right.

We already have forums.  http://zelea.com/w/Concept:Forum
So the candidate's position page will have one or more Forum
properties that point to all the forums in which discussions are
happening.  http://zelea.com/w/Property:Forum

  Position
    -> Forum

Forums have a property defining the archive location:
http://zelea.com/w/Property:Archive_URL
All we need to add is an archive format:

  Forum
    -> Archive URL
    -> Archive format

The value could be a string as in "Pipermail".  I'll add this tonight
and look over the docs.  This is pretty much done if you agree.

> ... Also the command line interface is still a very cloudy idea to
> me (we haven't had something like that yet and the Cache and Kicker
> will run all the time, do you picture the Harvesters to connect to
> these by demand? How will this connection happen (not the same JVM)?

It depends on the command.  What commands do we need?  I guess at
least these, for example:

  voharvest clear FORUM   - clear FORUM from the cache
  voharvest detect        - run the harvest detectors
  voharvest harvest FORUM - harvest any new messages

I don't really know if the admin needs "clear" and "harvest", but
probably we need them for testing right?  The "clear" command would
construct its own HarvestCache and use its API to clear the cache.
The "harvest" command would construct the appropriate Harvester and
give it a kick or something to force an update.  It does not matter
(or should not matter) if instances of the HarvestCache and various
Harvesters are already running in the "detect" daemon.  The purpose is
not to control the daemon but to manipulate the cache, which is simply
a database/filebase.

> > > >      c) CacheWAP (Javadoc documenting the web API)
> > 
> > Does the client (the current diff feed) really need cDiff?  If so,
> > the format is unclear, because a diff is specified by up to 4
> > revision numbers.  See WP_D and DiffCacheSS for examples.
> 
> No, removed it. hPoll is not enough though, right?

Whatever the diff feed needs to run, because that's currently your
only client.  Later you can extend the API to support the talk track
if it needs additional request methods.

So 2 is almost done.  3a is pretty much done if you agree.  That
leaves 3b and 4 to document:

> > 3. Document the configuration of the Pipermail harvester.  The
> >     various harvesters should have similar forms of configuration,
> >     but this cannot be required.  There are two major parts to the
> >     configuration:
> >
> >       a) User configuration in pollwiki, such as archive location
> >       b) Administrative configuration on server
> >
> >  4. Draft the command interface for the Pipermail harvester.
> >     Again the various harvesters should have similar command
> >     interfaces, but this cannot be required.

-- 
Michael Allan

Toronto, +1 416-699-9528
http://zelea.com/



More information about the Votorola mailing list