Solutions
SegmEdit
SegmEdit allows to browse and edit XML files (in TrueViz format) containing information about structure of PDF documents (words, lines, zones) and about zones' classification (title, author, abstract, etc.) One of the components of the solution is a server responsible for distribution of documents to be processed.
SegmEdit was created in order to create a test suite for page segmentation and zone classification algorithms, which are part of a metadata extraction framework developed at CeON.
It is an open source software, wrote in Python using wxWidgets library. The code on GPL v3 license can be downloaded from our repository: https://svn.ceon.pl/research/SegmEdit/.