This file was compressed by first converting
the pdf into ps with xpdf and then using
the following command:
%djvudigital --cseparg=-p100 --words --threshold=99 \
--fg-image-colors=1024 --fg-colors=128 --psrotate=90 \
--bg-slices=72+11+10+6 superhero.ps
To compress this file, I had to recognize that the
content of the pdf file is designed to showcase
some of the strengths of pdf. It liberally uses gradients
and line-art features that are shared among pages.
To deal with such a file, I chose to move as many
things as possible into the foreground (even images)
and thus give them a chance to be shared between pages.
This is the meaning of options --threshold
and --fg-image-colors. To help this strategy,
the -cseparg option attempts to maximize sharing
between pages. The --fg-colors option reduces the number
of distinct colors in the foreground in order to limit the size
of the foreground data. The --bg-slices option reduces the
quality of the background because, after all, the background
is no longer rich in details.
Mr. Isaacs explains on page 14 that
"A PDF file can never be better than the
content from which it is created".
All his presentation explains is that one should avoid
intermediate steps that could hide the structure of the
original content.
DjVu was designed to remove this constraint.
We could print the superhero file, scan the pages
and still produce a DjVu file with a decent size
(not as good as this one, but decent).
In other words, the DjVu compressors are designed
to recover the structure of the document from whatever
data is available (pixels for djvudocument, postscript
for djvudigital). The format itself only implements a simple
document structure (foreground/background) but gives
many opportunities to conceal potential structure
discovery errors.
In short:
----------------------------------------------------------------------
Gold-in Garbage-in
----------------------------------------------------------------------
PDF Gold-out Garbage-out
DJVU Gold-out Acceptable-out
----------------------------------------------------------------------
- Leon