DjVu-Digital vs. "Super Hero" PDF
A faceoff by James Rile, PlanetDjVu, November 20, 2002

One year ago, Dov Isaacs of Adobe produced a "Super Hero" PDF, an example of
the use of superior compression methods in PDF files.  See the review at:

Here is an excerpt:
"Do you think "super hero" is too strong a word to describe this file? OK, then you create a PDF like this:

84 slides from PowerPoint
Color graphics on every page
30 font faces subset embedded
17 line art drawings
30 screen shots
28 bitmaps (in addition to the screen shots)
Four languages
Looks great on-screen
Prints like a champ

... and is 1.14 MB in size."

We at PlanetDjVu decided to challenge this "super hero" example file by converting it to DjVu-Digital format.  Our first result was pretty good, but then Leon Bottou, the "original format author" of DjVu, joined in the challenge and provided the final DjVu file that is linked to below.

The resulting DjVu file wins in the size comparision - it is 25% smaller than the best that PDF can offer!  Open both files using the links below and you will see that the DjVu version opens and displays more quickly than even this "Super Hero" PDF!

DjVu is 25% smaller than PDF

Now who is the "Super Hero"?   Why, DjVu is!

Here is what Leon had to say about this winning DjVu rendition of superhero.pdf:
This file was compressed by first converting
the pdf into ps with xpdf and then using
the following command:

%djvudigital --cseparg=-p100 --words --threshold=99 \
                   --fg-image-colors=1024 --fg-colors=128 --psrotate=90 \

To compress this file, I had to recognize that the
content of the pdf file is designed to showcase
some of the strengths of pdf.  It liberally uses gradients
and line-art features that are shared among pages.

To deal with such a file, I chose to move as many
things as possible into the foreground (even images)
and thus give them a chance to be shared between pages.

This is the meaning of options --threshold
and --fg-image-colors.   To help this strategy,
the -cseparg option attempts to maximize sharing
between pages.  The --fg-colors option reduces the number
of distinct colors in the foreground in order to limit the size
of the foreground data. The --bg-slices option reduces the
quality of the background because, after all, the background
is no longer rich in details.

Mr. Isaacs explains on page 14 that
"A PDF file can never be better than  the
 content from which it is created".
All his presentation explains is that one should avoid
intermediate steps that could hide the structure of the
original content.

DjVu was designed to remove this constraint.
We could print the superhero file, scan the pages
and still produce a DjVu file with a decent size
(not as good as this one, but decent).

In other words, the DjVu compressors are designed
to recover the structure of the document from whatever
data is available (pixels for djvudocument, postscript
for djvudigital).  The format itself only implements a simple
document structure (foreground/background) but gives
many opportunities to conceal potential structure
discovery errors.

In short:

                    Gold-in            Garbage-in
PDF            Gold-out          Garbage-out
DJVU          Gold-out          Acceptable-out

- Leon

Hosted by uCoz