Harvester Design

conseo 4consensus at web.de
Tue Mar 6 12:41:01 EST 2012


Hey M,

> 
> Thanks for posting this.  I reply by mail because there's lots to
> cover.
> 
> > > 2. Draft the Javadoc API for the various components beginning with
> > > 
> > >    those most pointed at by arrows:
> > >      a) Cache
> 
> http://whiletaker.homeip.net/votorola/harvester/javadoc/votorola/a/diff/harv
> est/Cache.html
> 
> This does not seem enough to support CacheWAP.  CacheWAP can only
> access the messages (read) through the Cache API:
> 
> 
>                  Administrator       Reader         Reader
> 
>                       | harvest        | read         | read
> 
>             read      V      store     V     read     V
>   Archive <------ Harvester -------> Cache <------ CacheWAP
>      .                |
>      .                | listen
>      .                |
>      .                V
>      .              Kicker
>      .                ^
>      .                |
>      .                | trigger
>      .     listen     |
>    Forum <-------- Detector
> 
> 
> So CacheWAP is nothing but a dumb web wrapper and the wrapped Cache
> must expose *at least* as much functionality in terms of read methods
> in its own API.  For example, see CountWAP and CountTable.PollView.

I have added a getter for the DiffMessageTable, yet I pictured the HarvestWAP 
to construct its own DiffMessageTable as it was until now. Fixed?

> 
> > >      b) Kicker
> 
> http://whiletaker.homeip.net/votorola/harvester/javadoc/votorola/a/diff/harv
> est/Kicker.html
> 
> I don't think registerEventProvider will work for the detectors.  What
> they want is a "raise" method on the Kicker.  Then they just call it.
> 
> For the harvesters, we need to specify an archive format at time of
> registration.  Maybe something like this:
> 
>   register( KickReceiver receiver, String archiveFormat )

Done.

> 
> A harvester only wants to receive kicks for the archive format that it
> knows how to deal with (Pipermail, Google Groups, Sympa, Freenode IRC
> Logger, etc.).  The various formats can be defined by wiki pages, so
> the actual parameter above will probably just name the page.  We'll
> have to doc all of this in step 3.
> http://mail.zelea.com/list/votorola/2012-February/001311.html

I would configure Forums in the Wiki (like Mailman, Freenode+Channel) and then 
add archives to it (Pipermail, Irsii-Archive). The question is to what users 
relate. They are present on a forum, but they might also prefer to be linked 
to a certain Archive. We have to talk about this, because I need to understand 
how to define the Wiki properties right. 
Also the command line interface is still a very cloudy idea to me (we haven't 
had something like that yet and the Cache and Kicker will run all the time, do 
you picture the Harvesters to connect to these by demand? How will this 
connection happen (not the same JVM)?

> 
> > >      c) CacheWAP (Javadoc documenting the web API)
> 
> http://whiletaker.homeip.net/votorola/harvester/javadoc/votorola/a/diff/harv
> est/CacheWAP.html
> 
> You asked where to put this.  I think it belongs under s.wap, where
> the other WAP interfaces are exposed.  So maybe s.wap.HarvestCacheWAP,
> or just HarvestWAP (like you suggested).  The conventional prefix
> would then be "h" instead of "c".
> 
> Does the client (the current diff feed) really need cDiff?  If so, the
> format is unclear, because a diff is specified by up to 4 revision
> numbers.  See WP_D and DiffCacheSS for examples.

No, removed it. hPoll is not enough though, right?

> 
> What is the purpose of the "resultTruncated" error?  What does the
> client do with this information?

The client might fetch Bites incrementally. "error" is the wrong field for 
that though. The client can use parsedDate to do that for now and we can add 
it later, so removed.

> 
> The form of the "list of difference diff messages" seems awkward for
> the client.  It's not in chronological order and it doesn't correspond
> to the bite form, which I guess we still need for the scenes on the
> client side.  Why not stick with something like this?
> 
>   "bite": [
>      // bites in order of parsed date, newest first
>      {
>         "difference": {
>           "a": A,
>           "aR": AR,
>           "b": B,
>           "bR": BR
>         },
>         "message": {
>            "content": "CONTENT",
>            "location": "LOCATION",
>            "title": "TITLE"
>         },
>         "parsedDate": PARSED DATE,
>         "persons": [
>            // etc
>         ]
>      }
>      // and so on, for each bite
>   ]
> 
> Field names can be Javadoc links into the API (cf. CountWAP), so
> there's no need to duplicate the descriptions here.  You could also
> omit mention of biter decorations because they're to be documented by
> the biters themselves.

Done.

c



More information about the Votorola mailing list