Wednesday, August 11, 2010

"Remaking" AFP

I have been working on my AFP "remake" engine.

The engine involves three parts.  There is a "scanner" which processes an AFP file looking for items we are interested, e.g., embedded color or objects like page segments.  The scanner emits an "stitch list" that tells subsequent passes what we would like to change in the AFP structure.  In addition the scanner extracts the parts of the AFP file into sub-files and passes them on to a parallel array of processors to be processed. 

The parallel array of processors have the second part, the "element processor" available to crunch the pieces.  The element processors have knowledge of what needs to be done to the pieces of the file, i.e., apply a specific color transform, and can recognize the pieces of AFP and their structure.  The read in the AFP, alter it, and write out a new version.

The last part is the "stitcher".  The stitcher waits around for the parts of the file to complete processing and then reassembles the AFP file from the "stitch list".  The stitch list tells the stitcher what its waiting for so every once and a while it wakes up, looks to see if the parts are done, and, if they are, does its job.

The idea for this comes from an existing Lexigraph product called Krypton which works in a similar fashion.  Though for PDF we don't break the PDF apart to process it - most of the parallelization comes from processing pre-existing pieces of PDF into a larger aggregated PDF file.  So this idea works - its been in production in Asia for many years at this point.

This if much more efficient than a "single pass" model where a given application would run through a single AFP file - particularly on today's multi-core servers.

No comments:

Post a Comment