Page-by-Page Encoding of DjVu Documents
A report by PlanetDjVu, February 22, 2003
Updated March 7, 2003
Why Page-by-Page DjVu Encoding is a Significant Feature:
Many times, all of the page images for a DjVu document that you are creating are of the same type. For example, if you have scanned a contract containing only text, you likely will have saved the page images as bitonal TIFF files. In another example, if you have scanned a magazine in color, then you miust likely will have saved the page images as color JPEG files.
But what about the case of a book that is mostly text pages, but has some color photograph pages in the center? In this case you probably want to scan all the text pages as bitonal TIFF, and the pages in the center (and the covers) as color JPEG files. To encode this book into a DjVu file, you need the page-by-page encoding feature of JRAPublish.
JRAPublish encodes DjVu files by analysing each page, and then determining the best DjVu encoding profile based on the bit-depth (bitonal, grayscale, color) and the resolution (dots per inch) of the individual page. In our book example, the text pages will use a "Bitonal-300" encoding profile, while the color pages and covers might use a "Scan-200" encoding profile. Each pages uses the correct and optimum encoding profile, automatically.
This is a significant advance over any other DjVu encoding products, which do not do page-by-page encoding, and which require you to specify the single encoding profile to be used for all pages in the document and indeed in the entire processing job.
Page-by-Page Encoding is also very useful when converting from existing multipage documents like PDF. Each page of the PDF file is analysed and determined to be either bitonal or color, before assigning an encoder profile.
Page-by-Page encoding is enabled by default in JRAPublish - it is the "default" profile. In most cases this profile will work fine, so most of the time you don't have to think about encoding profiles.
For special jobs, you can pick a specific encoder profile, and you can even create your own custom encoder profiles using the interactive Segmentation Profile Editor, another feature that is unique to the JRAPublish application.
For exceptional cases where you need to use multiple custom profiles for a single document, you can process the pages in separate "encoder" page groups, and then you can combine the DjVu pages into a multipage DjVu document using JRAConvert, the companion product to JRAPublish.
Page-by-Page Encoding Not Supported by DjVu Encode SDK:
While designing and developing the JRAPublish and JRAConvert applications, we found that while the DjVu Encode SDK did not support this, there was a work-around method that we could use. Unfortunately, this meant that we could not support the "shared shape dictionary" compression feature. We reported the problem to LizardTech, and the fixes to the DjVu Encode SDK were made, in what was planned as a 3.5.3 upgrade to the SDK. Unfortunately, this upgrade was not made, so today the DjVu Encode SDK still does not support Page-by-Page encoding, and neither do any of LizardTech's DjVu encoding products.
Page-by-Page Encoding is available at the Any2DjVu Conversion Server!
To use the page-by-page encoding feature of the Any2DjVu Conversion Server, first create a ZIP or TAR file of all your mixed raster image pages, and then upload it to the server after selecting the "Scanned Document - Color/Mixed" option. Your pages will be encoded according to the bit-depth of each page image, just like in the JRAPublish application.
What commercial page-by-page encoding software is available today?
None, unfortunately. Licensing problems with LizardTech prevent us from releasing the JRAPublish application, and the code behind the Any2DjVu Conversion Server similarly is not available for commercial licensing. But the good news is that the Any2DjVu conversion server is on-line now and free, so if you just have that occasional page-by-page encoding requirement, turn to Any2DjVu!
In the Future...
We hope that, some day in the future, licensing conditions for DjVu libraries and SDKs will change, and we can then offer you Page-by-Page encoding for DjVu documents in the JRAPublish application. We also hope that LizardTech will some day fix the problems in the DjVu Encode SDK and in their other DjVu encoding products so that Page-by-Page encoding is supported,.
Example DjVu Document that was produced with Page-by-Page Encoding:
This sample issue of The Craftsman magazine was scanned at multiple resolutions. The pages containing only text were scanned as bitonal TIFF files. The pages containing photographs were scanned as grayscale TIFF files. The color covers were scanned as color JPEG files. All pages were encoded to DjVu in one batch operation using the "Default" (page-by-page) encoding feature of JRAPublish.
Further Reading:
For a full comparison of the many features in JRAPublish / JRAConvert to other DjVu-encoding products, click here to open the JRAPublish Comparison document in PDF format. Please note that LizardTech no longer publishes specifications for their DjVu products, so the features listed for their products may have changed since this comparison was made, and we would not know about it. The comparision is as accurate as we could make it under this circumstance.
|