Harvester Design

Michael Allan mike at zelea.com
Mon Mar 5 20:53:37 EST 2012


Hey C,

Thanks for posting this.  I reply by mail because there's lots to
cover.

> > 2. Draft the Javadoc API for the various components beginning with
> >    those most pointed at by arrows:
> >
> >      a) Cache

http://whiletaker.homeip.net/votorola/harvester/javadoc/votorola/a/diff/harvest/Cache.html

This does not seem enough to support CacheWAP.  CacheWAP can only
access the messages (read) through the Cache API:


                 Administrator       Reader         Reader
                      |                |              |
                      | harvest        | read         | read
                      |                |              |
            read      V      store     V     read     V
  Archive <------ Harvester -------> Cache <------ CacheWAP
     .                |
     .                | listen
     .                |
     .                V
     .              Kicker
     .                ^
     .                |
     .                | trigger
     .     listen     |
   Forum <-------- Detector


So CacheWAP is nothing but a dumb web wrapper and the wrapped Cache
must expose *at least* as much functionality in terms of read methods
in its own API.  For example, see CountWAP and CountTable.PollView.

> >      b) Kicker

http://whiletaker.homeip.net/votorola/harvester/javadoc/votorola/a/diff/harvest/Kicker.html

I don't think registerEventProvider will work for the detectors.  What
they want is a "raise" method on the Kicker.  Then they just call it.

For the harvesters, we need to specify an archive format at time of
registration.  Maybe something like this:

  register( KickReceiver receiver, String archiveFormat )

A harvester only wants to receive kicks for the archive format that it
knows how to deal with (Pipermail, Google Groups, Sympa, Freenode IRC
Logger, etc.).  The various formats can be defined by wiki pages, so
the actual parameter above will probably just name the page.  We'll
have to doc all of this in step 3.
http://mail.zelea.com/list/votorola/2012-February/001311.html

> >      c) CacheWAP (Javadoc documenting the web API)

http://whiletaker.homeip.net/votorola/harvester/javadoc/votorola/a/diff/harvest/CacheWAP.html

You asked where to put this.  I think it belongs under s.wap, where
the other WAP interfaces are exposed.  So maybe s.wap.HarvestCacheWAP,
or just HarvestWAP (like you suggested).  The conventional prefix
would then be "h" instead of "c".

Does the client (the current diff feed) really need cDiff?  If so, the
format is unclear, because a diff is specified by up to 4 revision
numbers.  See WP_D and DiffCacheSS for examples.

What is the purpose of the "resultTruncated" error?  What does the
client do with this information?

The form of the "list of difference diff messages" seems awkward for
the client.  It's not in chronological order and it doesn't correspond
to the bite form, which I guess we still need for the scenes on the
client side.  Why not stick with something like this?

  "bite": [
     // bites in order of parsed date, newest first
     {
        "difference": {
          "a": A,
          "aR": AR,
          "b": B,
          "bR": BR
        },
        "message": {
           "content": "CONTENT",
           "location": "LOCATION",
           "title": "TITLE"
        },
        "parsedDate": PARSED DATE,
        "persons": [
           // etc
        ]
     }
     // and so on, for each bite
  ]

Field names can be Javadoc links into the API (cf. CountWAP), so
there's no need to duplicate the descriptions here.  You could also
omit mention of biter decorations because they're to be documented by
the biters themselves.

-- 
Michael Allan

Toronto, +1 416-699-9528
http://zelea.com/



More information about the Votorola mailing list