Over 90 percent of the information in the world
is still on paper. Many of those paper documents include color graphics and/or photographs
that represent significant invested value. And almost none of that rich content is on the
Internet.
That's
because scanning such documents and getting them onto a Web site has been problematic at
best. At the high resolution necessary to ensure the readability of the text and to
preserve the quality of the images, file sizes become far too bulky for acceptable
download speed. Reducing resolution to achieve satisfactory download speed means
forfeiting quality and legibility. Conventional web formats such as JPEG, GIF, and PNG
produce prohibitively large image files at decent resolution. As a result, Web site
content developers have been largely unable to leverage existing printed materials.
DjVu (pronounced
"d?j? vu") is a new image compression technology developed since 1996 at
AT&T Labs to solve precisely that problem. DjVu allows the distribution on the
Internet of very high resolution images of scanned documents, digital documents, and
photographs. DjVu allows content developers to scan high-resolution color pages of books,
magazines, catalogs, manuals, newspapers, historical or ancient documents, and make them
available on the Web.
Information
that was previously trapped in hard copy form can now be made available to wide audience.
Research institutions,
libraries, and government agencies can give access to their archives. Companies can
distribute internal documents on their intranets.
The commercialization of DjVu is handled by
Seattle-based LizardTech Inc. in partnership with
AT&T Labs. DjVu is an open standard. The file format specification, as well as an open
source implementations of the decoder (and part of the encoder) are available.
DjVu typically achieves compression ratios about 5 to 10 times better
than existing methods such as JPEG and GIF for color documents, and 3 to 8 times than TIFF
for black and white documents. Scanned pages at 300 DPI in full color can be compressed
down to 30 to 100KB files from 25MB.. Black-and-white pages at 300 DPI typically occupy 5
to 30KB when compressed. This puts the size of high-quality scanned pages within the realm
of an average HTML page (which is typically around 50KB).
For color
document images that contain both text and pictures, DjVu files are typically 5 to 10
times smaller than JPEG at similar quality. For black-and-white pages, DjVu files are
typically 10 to 20 times smaller than JPEG and five times smaller than GIF. DjVu files are
also about 3 to 8 times smaller than black and white PDF files produced from scanned
documents (scanned documents in color are impractical in PDF).
In addition
to scanned documents, DjVu can also be applied to documents produced electronically in
formats such as Adobe's PostScript or PDF. In that case, the file sizes are between 15 to
20KB per page at 300 DPI.
The DjVu plug-in is available for standard Web
browsers on various platforms. The DjVu plug-in allows for easy panning and zooming of
document images. A unique on the fly decompression technology allows images that normally
require 25MB of RAM to be decompressed to require only 2MB of RAM.
Conventional
image viewing software decompresses images in their entirety before displaying them. This
is impractical for high-resolution document images since they typically go beyond the
memory capacity of many PCs, causing excessive disk swapping. DjVu, on the other hand,
never decompresses the entire image,
but instead keeps
the image in memory in a compact form, and decompresses the piece displayed on the screen
in real time as the user views the image. Images as large as 2,500 pixels by 3,300 pixels
(a standard page image at 300 DPI) can be downloaded and displayed on very low-end PCs.
The DjVu format is progressive. Users get an
initial version of the page very quickly, and the visual quality of the page progressively
improves as more bits arrive. For example, the text of a typical magazine page would
appear in just three seconds over a 56Kbps modem connection. In another second or two, the
first versions of the pictures and backgrounds will appear. Then, after a few more
seconds, the final full-quality version of the page is completed.
One of the
main technologies behind DjVu is the ability to separate an image into a background layer
(i.e., paper texture and pictures) and foreground layer (text and line drawings).
Traditional image compression techniques are fine for simple photographs, but they
drastically degrade sharp color transitions between adjacent highly contrasted areas -
which is why they render type so poorly. By separating the text from the backgrounds, DjVu
can keep the text at high resolution (thereby preserving the sharp edges and maximizing
legibility), while at the same time compressing the backgrounds and pictures at lower
resolution with a wavelet-based compression technique.
DjVu is used by many commercial and non-commercial web sites on the Web today.