Tuesday, August 31, 2010

More formal specs...

We have been working in the background here (despite various distractions) on the detailed specifications for the AFP product AFX Raster Pro:

AFX Raster Pro Specifications

APX Raster Pro is available on Apple OSX/Linux/Windows as a command line server application.  It supports simultaneous conversion of files between standard formats such as PDF, PostScript, JPEG, TIFF, PSEG IOCA.  Supports conversion of other file types via any third party raster application that can produce TIFF files as output (Apago Piktor, GhostScript, ImageMagick, etc.).
  • Support is standardized for AFP at 240, 300, and 600 dpi - though any resolution can be supported.
  • Support for other file formats is available at any resolution.
  • Output images may be automatically rotated 0, 90, 180, and 270 degrees (clockwise or counter-clockwise) during conversion and output files may support multiple rotations within the same file.
  • Image transparency is supported through IOCA image transparency masks and can also be calculated through color-based specifications.
  • All output files can be encoded for color as CMYK or RGB.  Other file dependent color spaces are also supported depending on the format used.
  • Image resizing is support along with multiple algorithms for resizing: linear, gaussian, hamming, and blackman.
  • All output files can be directed to unique output folders based on file type.

Color transformations are supported as follows:
  • APX Raster Pro supports a set of parameterized built-in color transforms (called parametric transforms):
             - RGB, CMYK and GRAY to/from RGB, CMYK,  and
             - GRAY, IOCA YCrCb and YCbCr to CMYK and RGB.
  • Shading functions that support manipulation of intensity of CMYK and RGB grey.
  • Shade-based conversions to manipulate the intensity of all color shades.
  • Color specification as percentage, decimal or fractions.
  • Color processing may be applied to images (icmyk), strokes (cmyk), filles (CMYK) in any combination.
  • Color-table-based conversions, i.e., icmyk*(21#,93#,255#,0#)=>cmyk(1,0,0,0).
  • Parametric transforms can be applied over specific sets of colors and named as Color Procs.
  • Color procs can be prioritized in a specific order for application to images.  Color procs can also be triggered based on file extension or type.
  • When multiple color procs are stacked support is provided to ensure that any given input color value is affect by exactly one color proc.

Monday, August 30, 2010

Wednesday, August 25, 2010

Tweaking color...

We have a system that converts PDF to AFP.  Since we any commercial tool to convert the PDF to a raster some of them produce different results for PDF gray.

In general we require that customers always create CMYK images for projects destined to AFP because some AFP devices only support required image features, such as transparency, in CMYK.  Now customers don't always do what they are told so sometimes people do things like create a PDF Gray in a file instead of a CMYK Gray.

So what's the difference?  If you create a document in CMYK then you can generally rely on the creation tools to put the colors you define as CMYK colors into the file in a predictable way.  This means that if I say I want CMYK(23%, 10%, 5%, 6%) that's what I will get.  So if I say I want CMYK(0,0,0,50%) then I should get 50% black and no other colors.

Gray, on the other hand, is different.  While you might imagine that Gray(50%) is the same as CMYK(0,0,0,50%) it may not be - depending on the tools you are using.  Further, the AFP IOCA model does not have direct support for gray-only or black and white images (it only supports RGB, CMYK, YCrCb and YCbCr). 

Without getting into very complex color issues we can say that there are various methods for producing gray on a CMYK device.  One method is to use only CMY in approximately equal amounts to simulate gray.  Another method is to use CMYK and to vary the proportion of K inversely to C, M, and Y in order to render part of the image with K and part with CMY.  A third option is to use only K as if it where Gray and ignore CMY.

Additionally device ICC profiling further affects this by doing its own conversions between gray and various gray representations.

So rendering gray as an AFP color is driven, in this situation, by a number of factors:

  1. Device registration.
  2. Choice of representation for gray during AFP conversion.
  3. Device representation of an AFP gray representation.

So for my current application #1 is likely to take priority because when printing very small type errors in device registration between the colors create low-quality output.

So what we do is create a color transformation for AFP Raster Pro that tells it to convert shades of CMY gray to K black which eliminates the need for registration issues.

Tuesday, August 24, 2010

I guess I am on the right track...

I received an email for a product called AFP Tuner from MakeAFP PTE LTD.

I don't know where they are from - somewhere in Asia I believe - but their product includes one of the elements I am working on: "Replaces legacy AFP page segments with JPEG/TIFF/GIF color images."

I suppose this is good news in a couple of ways.

First off it validates the idea of basic AFP tuning (there are corresponding PDF tuning products as well).  Somewhere, half way around the world, someone has the same idea.  The bulk of this product focuses on other aspects of AFP like font substitution and there doesn't seem to be a parallelization aspect - but I am pleased over all to see other like-minded companies out there.

Secondly, they don't seem real interested in what you can do with color.  "Legacy AFP page segments" means ugly 1-bit images so fixing them is not hard or interesting.  Making them look good, on the other hand, is a different matter.

One interesting idea they present is this: "Replaces legacy shading patterns or shading images with vector color or gray background graphics..."  Basically this involves identifying "gray boxes" of various sorts that have been constructed with images and replacing them with graphic commands to draw and fill a box.  Others have asked me about this feature and I am very close to being able to provide it.

This offering is also a "command line" style application based on the information they provide.

I think that in the long run there will need to be a UI plus an underlying command line application to make this type of tool easier to deal with for end-users.

AFP vs PPML

I have been experimenting with AFP external resources (see previous posts).

It seems clear that this is a very powerful model for handling what PPML calls "reusable objects" - basically anything you want to reuse is placed in a resource package at the front of the job.  Each resource is given a unique name.  Inside the job when you wish to access the resource on a given page you map it into the page's environment and then place the object.

AFP only offers rotation, simple positioning, some limited scaling, clipping and other functions as opposed to PPMLs full CTM model.  But its an effective model and fairly easy to use.

So at least from a "functional equivalence" perspective I can address some print capabilities requiring this type of function.

Monday, August 23, 2010

After some experimentation...

I have been fooling around with various means of doing external image resources.  Basically I think that, for the most part, the viewers do not support it very well.  Pasting the same IOCA image data into the page directly works just fine in most AFP viewers.

I an still waiting on some technical support for a couple of the viewers to see if that clears things up.

I also started working on how well AFP works when stitching together parts of an image.

For this I broke a larger image into parts and set up some AFP pages to stitch them.  To do this I set the resolution of the page and image to be the same, in this case 600 dpi, and then placed the images next to each other.  This seems to work well in all the various viewers.

It is my as yet unproven belief that AFP, unlike PDF, allows devices specific pixel alignment and placement.  PDF is, as you may or may not know, totally device independent in that regard and, further, you are not allowed to "know" in the PDF code about the device resolution.  PDF images have a specific resolution at which they are defined but they are always transformed via a CTM prior to display.  Since CTMs are based on floating point numbers there is no guarantee that things will line up evenly on the display device.

AFP seems very focused on specific image and device resolutions.  So far I have no reason to believe that it does not allow stitching of images and so forth at the device pixel level.  There is no CTM-style scaling for IOCA images so I don't expect much problem.

A note on the viewers.

There are two main types - browser plug-ins and stand-alone applications.  Both IBM and ISIS offer browser-based viewers.  I have been using both for the last several days.  IBM also offers a stand-alone application.

The ISIS viewer offers the most impressive display capabilities so far.  I really like its anti-aliasing capabilities for display and its fast, smooth scrolling and scaling.

The IBM viewer is very tolerant of errant AFP and supports external resources as you would expect.  IBM set up a joint venture called InfoPrint with its printer division and Ricoh in 2007.  So far all of the AFP support still appears to be on the IBM site.

I guess the real question is how all of this will print.  So far I have not worried too much about that but that will no doubt be a source of misery very soon.

Friday, August 20, 2010

AFP Viewers...

Now that I am creating full AFP files I have started experimenting with various "free" or "demo" AFP viewers. Some simple googling will turn up a number of them:

IBM

ISIS

Compulsive Coder

CreDo

More here... 

Of course, there are many AFP commercial products available from some of these same companies as well as from companies like GMC, Elixir, Barr, and many, many more.

My experience so far playing around with some of these tools has been, to say the least, mixed.

For me there are two perspectives: a novice AFP user and an experienced software developer.

As a novice AFP user I can only equate my experience to my novice PDF user experiences from around 1998 and 1999.  At that time Adobe had just released Acrobat 3.0 and most of what I did with Acrobat started with that version.

As a software developer I am familiar with studying manuals and documents to determine how to create software and output, in this case AFP, that conforms.  I am familiar with building tools to check my own work and to validate it.  I am also familiar with discovering what is "missing" from the manuals and standards by experimentation.

So what's important as a novice?  Well, for me I'd like a tool that was simple and reliable and did what it was documented to do.  For the most part all the tools I have been looking at, at least on the surface, do this.  I like a tool that fits naturally with the AFP environment, i.e., something that's not a chore to deal with.

So, from this perspective, the first area of excitement I found was the notion of external AFP resources.  The idea, at least from what I can see, in AFP is that AFP print jobs can be split into two components: a set of resources and a job that uses them.  The resources (things like Page Segments, Images, and so forth) all have names that can be referenced in the job.   The idea is that you can separately transmit resources to a printer and then multiple jobs that reference them in order to save rasterizing and RIP time.

The resources can also be prepended to the job file so that they are "part of the job" in that a single transmission of resources and job together provide a complete definition.

The AFP manuals provide a complete and detailed description of what is supposed to work and how in this regard.  Each of the tools I have played with has a mechanism to support this.  Basically they all allow you to specify a directory where "external resources" can reside.  Jobs you present to the software do not need to have all the assets embedded and, when a reference to an asset is found that's not directly in the job stream, the directory is searched for the missing asset.

(Note to anyone following this blog - if your software is mentioned directly here - or you would be interested in me using your software and writing about it - please let me know.  I will promise that if I write about it here I will always give you a chance to respond to my findings before I write about them here - considering I may be doing something "wrong" as a novice.)

So far I have had a variety of different experiences with the external resource function.  Something software works as advertised and others display a variety of interesting issues - most notably either crashing or displaying nothing.

For example, one would imagine that attaching the resources to the job versus referencing the resources from a directory wouldn't make a difference.

Wednesday, August 18, 2010

A real AFP page...

So after much work and toil my AFP application is now working as a true AFP creation application.

Most of the previous work has been related to creation of Page Segments (PSEGs) which are images or graphic objects.  Work has been centered around creating images in this format, performing color transformations, and so on.

The creation of actual AFP pages is really not a large step beyond this but in terms of remade AFP it is nice to finally see months of work validated.  The remake system currently supports altering PSEGs and other AFP constructs in existing AFP files.

The thing I am currently interested in is creating AFP pages that use PSEG images - both as a way to work with new AFP pages and as a platform for my new PDF/AFP compression model.

So my first task was to figure out what the "simplest" valid AFP file might be with a single image.  It turns out not be be that complex.  Basically you can think of the file like this:

     (BEGIN_RESOURCE_GROUP rt='BRG' x='D3A8C6' name='LXG00000' )
     (INSERT  id='821321_XXXX' /)
        (END_RESOURCE_GROUP rt='ERG' x='D3A9C6' name='LXG00000' /)
      (/BEGIN_RESOURCE_GROUP)

      (BEGIN_DOCUMENT rt='BDT' x='D3A8A8)
          (BEGIN_PAGE rt='BPG' x='D3A8AF')
            (BEGIN_ACTIVE_ENVIRONMENT_GROUP rt='BAG' x='D3A8C9' /)
          (MAP_PAGE_SEGMENT rt='MPS' name001='821321  ' x='D3B15F'/)
          (PAGE_DESCRIPTOR rt='PGD' XpgBase='00' YpgBase='00' XpgUnits='14400' YpgUnits='14400' XpgSize='11880' YpgSize='15840'  x='D3A6AF'/)
              (END_ACTIVE_ENVIRONMENT_GROUP rt='EAG' x='D3A9C9' /)
            (/BEGIN_ACTIVE_ENVIRONMENT_GROUP)
        (INCLUDE_PAGE_SEGMENT rt='IPS' x='D3AF5F' name='821321  ' XpsOset='400' YpsOset='700' /)
        (INCLUDE_PAGE_SEGMENT rt='IPS' x='D3AF5F' name='821321  ' XpsOset='800' YpsOset='1500' /)
            (END_PAGE rt='EPG' x='D3A9AF' /)
          (/BEGIN_PAGE)
        (END_DOCUMENT rt='EDT' x='D3A9A8' /)
      (/BEGIN_DOCUMENT)


The file consists of two parts (using simple non-XML XML so blogger won't choke): a resource which is the image to display and a document with a single page that displays that image.  The image is actual inserted from another AFP file (which we will talk about in another post) - but basically its the wholesale insertion of an AFP PSEG.  This resource could already be stored on the printer as a reusable object but we include it to make the AFP file completely self defining.  The PSEG has a name associated with it which is used by subsequent AFP to identify it.

The AFP document is quite straightforward and consists of a BEGIN/END document pair of AFP records.  Inside this pair is a BEGIN/END page to define the actual page.

In AFP an environment group (here enclosed by a BEGIN/END environment pair) appears immediately after the BEGIN page definition to describe the layout (height, width, resolution, included resources, and so on) of the page.  In this case two AFP records do this: a page descriptor (PGD) and a map page segment (MPS).

The page content is simply to include page segment records (IPS) that reference the image by name and indicate a position on the page to place it.

Monday, August 16, 2010

Data Driven Color Debugging... (part 3)

This falls along the same lines as dealing with any other data driven aspect of industrial printing.

First, you have to set the process up correctly.  This involves several steps:
  1. Identifying what needs to be change and why.
  2. Determining what color changes to apply.
  3. Testing the color changes relative to color approval.
  4. Testing the data aspects the determine the color changes.
The key difference here between testing data driven color and, say, data driven content is that three elements are involved in the color.  First, you have to pick the right color to change and making sure the color change process recognizes the appropriate shades, etc.  Second, you have to make sure what is produced passes all approvals.  Third you have to make sure that data you need to recognize when to apply the color change is present.

In general not too different than any other color work save for step #3.

Our general model for #3 has been to give transforms textual names, e.g., "LightGray1", and to match job identifiers with a table of transforms.  This allows a human to quickly determine what transforms go with what work.  The second element is to attach metadata to the job in order to trigger the proper transform group.  We do this with TLEs (or PDF Bookmarks).  Each TLE or bookmark describes a span of pages and links that span of pages to a set of transforms.

Having this data embedded in the document makes debugging and tracking problems straightforward.

On the application side pdfExpress XM and APX Raster Pro report what transforms are applied by page range and bookmark as the transformations occur.  This allows support personnel to quickly determine if the proper transforms are being applied.

Most imaging processes we are involved in already support some form of metadata per page for other reasons, e.g., mailing or mag strip encoding, so adding additional information for color support is not an issue.

On the debugging side, given this information, its not too hard to see if the proper transforms trigger simply by inspecting the log.  However, that in and of itself may not be adequate.  Sometimes the output will be wrong.

Typically "wrong output" will come from one of a few sources: 
  • Changes to input that were not tested.
  • Lack of full spectrum testing on job setup.
  • Programming errors.
Input changes, or content creep as we described previously, are common.   The CSR end of the business has to be made aware that when customers supplying new content art there may be workflow impact - especially if the change involves color, e.g., a logo change.  This is an organizational issue which must be addressed.

Testing failure is inevitable and is due to a number of issues.  Many programmers are basically unaware of color and how it works and make wrong assumptions - particularly when coding color related functions into the workflow. 

Another issues is lack of coverage.  A customer may be supplying dozens of logos - all with slightly different incorrect colors.  You have to make sure that you identify all the logos in question for correction - not just the first few you encounter.

You must also look for interactions between color transforms, e.g., if I am changing a gray to another color and I am also changing the shade of another, similar gray I don't want there to be a bad interaction.   This requires careful output inspection.

Programming errors are usually not found until real data is provided.  Test data will often cover only what a programmer expects to test and not the full range of real world issues.

Friday, August 13, 2010

Debugging Color... (cont.)

(Continued from the PDF Outsider because the same issues apply here...)

So once there is a problem it has to be sorted out.

Many times the source of the problem is hard to identify.  For example, proofs are always produced with a new customer is taken on and the customer sees something and approves it.  It turns out to be quite difficult to manage color in that context if equipment and workflow is changing.  Remember there might be a dozen steps along the way in the process.  Each step involves its own version of software and hardware.  Since these jobs run over time (months or years) various elements get upgraded and changed along the way - perhaps by another silo in the organization unrelated to production.  We call this "configuration creep".

Configuration creep is significant problem in large companies with multiple print devices and multiple plants and debugging color relative to it can be challenging because it may not be possible to "go back" to the approval configuration.  For example, a conversion element is upgraded and some new default converts color to RGB instead of CMYK - either intentionally or inadvertently.

Configuration creep is the nightmare of any production manager because its totally out of his control yet it can completely stop production.  Debugging this problem is horrific because you have to trace back and determine what, if anything, might have changed.

Another debugging area is what we call "content creep".

Content creep is process by which elements of a large, complex production jobs change over time.  This is particularly important in workflows that involve cached assets, i.e., VDP workflows with PPML or AFP.  In this context assets of various sorts are used in jobs but are not organized as part of the job.  What I mean by this in AFP you can reference external job elements that are not part of the per job per se.  In the print job there is a reference to element - typically cached in the printer - rather than the actual element.  There are analogous scenarios for PDF (PPML).

So as long the elements don't change everything works.  But what happens when a logo referenced in dozens of jobs changes?  For example, I am holding output from a job and it contains the new logo.  Is this particular job supposed to use the new logo?  Large workflows generally try and control content creep with elaborate asset management processes.  But these are only as good as the people who use them and quite often mistakes are made along the way.  Sometimes the shop floor personnel don't know that the asset should change and report problems that are not problems at all.

Another interesting debugging issue follows along these lines:  A customer calls and says that there is "ghosting" around small type on some specific jobs.  After much anguish it is determined that what's happening is that on some parts of the job around 6pt type the device is not registering colors accurately and CMYK black is causing a problem.  So here you have to debug the hardware first and determine if its working correctly.  Given that it is you next have to figure out how the CMYK black got into the workflow.  Since jobs come and go over the course of a year you may discover that someone working on the job last year created a bad asset.  Of course CMYK is not allowed but that doesn't mean people don't find creative ways to inject it into the workflow.

You have to look for everything from a new asset sent by a customer that bypassed checking (somebody was in a hurry to get the new asset into production and just assumed it was correct) to a personnel change and someone forgot to follow the standard asset management steps.

Next post we will cover data driven color problems...

Wednesday, August 11, 2010

"Remaking" AFP

I have been working on my AFP "remake" engine.

The engine involves three parts.  There is a "scanner" which processes an AFP file looking for items we are interested, e.g., embedded color or objects like page segments.  The scanner emits an "stitch list" that tells subsequent passes what we would like to change in the AFP structure.  In addition the scanner extracts the parts of the AFP file into sub-files and passes them on to a parallel array of processors to be processed. 

The parallel array of processors have the second part, the "element processor" available to crunch the pieces.  The element processors have knowledge of what needs to be done to the pieces of the file, i.e., apply a specific color transform, and can recognize the pieces of AFP and their structure.  The read in the AFP, alter it, and write out a new version.

The last part is the "stitcher".  The stitcher waits around for the parts of the file to complete processing and then reassembles the AFP file from the "stitch list".  The stitch list tells the stitcher what its waiting for so every once and a while it wakes up, looks to see if the parts are done, and, if they are, does its job.

The idea for this comes from an existing Lexigraph product called Krypton which works in a similar fashion.  Though for PDF we don't break the PDF apart to process it - most of the parallelization comes from processing pre-existing pieces of PDF into a larger aggregated PDF file.  So this idea works - its been in production in Asia for many years at this point.

This if much more efficient than a "single pass" model where a given application would run through a single AFP file - particularly on today's multi-core servers.

Tuesday, August 10, 2010

Compressing AFP...

I have become interested in a scheme for compressing PDF into an AFP stream.

This would be useful for applying color transforms "up front" of the conversion to AFP (rather than in AFP or out of AFP).

The idea is along the lines of this.

However, in the world of PDF documents some changes would be necessary.  First off, the data being examined has somewhat different properties than the images FITSIO is trying to compress.  Sequential images of planets and gas nebula are unlike sequential images of bank statements.   Another difference is that we can assume basically unlimited CPU/disk for parallelization.  Finally, there is no "transmission" requirement to send the images long distances via radio.

Though AFP supports tiling directly as an IOCA image construct my feeling is that its not a commonly used construct and that making the tiles more general, i.e., full IOCA images on their own, would be a much better idea.

Another element of this is reuse of tiles.  Business documents tend to be constructed from templates with a long-running stream of pages, i.e., a mail stream, containing a small number of templates.  Within each type of template individual changes occupy a relatively small portion of the document.

The only catch is that you have to be able to quickly determine the reuse level of each tile...

Wednesday, August 4, 2010

Not to be forgotten...

As I said I've been posting over in the Lone Wolf blog.

At this point I am almost through the discussion of how the color transformation process functions - so that will be out of the way.

As for our product APX Raster Pro and AFP - the combination of these is currently functioning - though with a slightly less general set of transforms than described on Lone Wolf.

At this point we are actively looking for Beta testers and such for this AFP product.

I hope to finish up the Lone Wolf stuff within a week or two and return to this part of the blog.