Harvester Roadmap

conseo 4consensus at web.de
Sun Apr 8 21:15:11 EDT 2012


Hi,

I want to give you an update on my work so far. As a quick remainder, my 
current task is to write a harvesting solution which can scale with thousands 
of parallel harvests (the current prototype doesn't). 
Harvesting is the process of finding discussions related to differences 
between drafts and can include any forum which is defined in the Pollwiki by 
users (this is enough to set the harvester up, no admin interference needed.)

PipermailHarvester fills the DB and fills the (obsolete) feed in crossforum. 
Next steps in getting the Harvest framework done are:

1. Extend the PipermailHarvester to track any number of forums and not just 
one. 
2. Track the state properly for each forum (internal to PipermailHarvester).
3. Make this state persistent on disk (internal to PipermailHarvester).
4. Design a proper SQL table layout for the gathered messages (internal to 
DiffMessageTable).
5. Configure the forums by querying the wiki on startup instead of hardcoding 
them (PipermailHarvester). This should also happen from time to time during 
runtime.
6. Make HarvestCache configurable. (the URL to the difference bridge)
7. Properly reuse TCP sessions in the IOReactor (internal to HarvestRunner).


I hope to finish the first three till end of the week. The database layout 
might take a bit longer then, since I have little experience with PostgresSQL 
and my MySQL days are some years in the past. If somebody has SQL experience, 
feel free to ping back.

Anybody can take any of the steps as they barely build upon each. Go ahead! 
:-)

conseo

P.S.: I finally got a domain under proper control, so I might soon be able to 
run a German instance of Votorola. But first we need to get the mini-beta 
finished :-)



More information about the Votorola mailing list