Harvester Roadmap
conseo
4consensus at web.de
Sun Apr 8 21:15:11 EDT 2012
Hi,
I want to give you an update on my work so far. As a quick remainder, my
current task is to write a harvesting solution which can scale with thousands
of parallel harvests (the current prototype doesn't).
Harvesting is the process of finding discussions related to differences
between drafts and can include any forum which is defined in the Pollwiki by
users (this is enough to set the harvester up, no admin interference needed.)
PipermailHarvester fills the DB and fills the (obsolete) feed in crossforum.
Next steps in getting the Harvest framework done are:
1. Extend the PipermailHarvester to track any number of forums and not just
one.
2. Track the state properly for each forum (internal to PipermailHarvester).
3. Make this state persistent on disk (internal to PipermailHarvester).
4. Design a proper SQL table layout for the gathered messages (internal to
DiffMessageTable).
5. Configure the forums by querying the wiki on startup instead of hardcoding
them (PipermailHarvester). This should also happen from time to time during
runtime.
6. Make HarvestCache configurable. (the URL to the difference bridge)
7. Properly reuse TCP sessions in the IOReactor (internal to HarvestRunner).
I hope to finish the first three till end of the week. The database layout
might take a bit longer then, since I have little experience with PostgresSQL
and my MySQL days are some years in the past. If somebody has SQL experience,
feel free to ping back.
Anybody can take any of the steps as they barely build upon each. Go ahead!
:-)
conseo
P.S.: I finally got a domain under proper control, so I might soon be able to
run a German instance of Votorola. But first we need to get the mini-beta
finished :-)
More information about the Votorola
mailing list