DjVuServe - now serves SearchPDF!
A report by PlanetDjVu, March 10, 2003
Introduction to DjVuServe
DjVuServe is a CGI Script developed and released by Leon Bottou, the author of the DjVu format, in January 2002. DjVuServe is a CGI script that can be executed by a server for serving DjVu documents. This program is able to convert a BUNDLED multi-page document into an INDIRECT document on the fly
Why is this significant?
This is significant because while the INDIRECT format is optimal for web serving, it is the BUNDLED format that is optimal for file storage. With DjVuServe, it is now possible to "have your cake and it it too!"
The INDIRECT format, when stored on a disk, must be contained in a separate subfolder, to hold the many files that make up the multipage format. A document collection stored in INDIRECT format, then, is a collection of subfolders.
The BUNDLED format, when stored on a disk, is just a single file, so all documents can be stored in one folder. This is easier to maintain, and also occupies slightly less disk space.
How is this done?
So how can a DjVu file be stored on disk in BUNDLED format, but be served up on the web in INDIRECT format?
Suppose that a large bundled multi-page DjVu document is available at the following URL.
The CGI program djvuserve lets you access this same document as an indirect multi-page DjVu document using the following URL.
The special component file name index.djvu is recognized as a request for the index of the corresponding indirect multi-page document. In fact, when you access a bundled document using djvuserve, the browser gets redirected to the following URL:
and then behaves as if the bundled file was a directory containing the various component files of an equivalent indirect document.
Why is DjVuServe beneficial for SearchPDF?
The DjVuServe CGI Script, when used on the same server with SearchPDF, is beneficial because in the BUNDLED format, DjVu files can be easily combined with other searchable file types that are supported by SearchPDF, like PDF, HTML and XML, all of which are stored on disk as single files (no INDIRECT architecture for these formats). If you wish, all these formats, including DjVu, can reside in a single folder.
Another benefit is that when DjVu files are stored in BUNDLED format, SearchPDF can "spider" to them on a remote web server and include them in a searchable document index. This cannot be done with DjVu files stored on disk in the INDIRECT format. Now a searchable index can be created from DjVu files residing on many different remote servers!
Where can I find out more about DjVuServe and get this CGI Script?
The homepage for the DjVuServe CGI Script is: http://djvu.sourceforge.net/doc/man/djvuserve.html
My DjVu files are now stored in INDIRECT format. How can I batch-convert them to BUNDLED for use with DjVuServe?
You can easily perform this conversion with JRAConvert, when and if it is licensed for release. You can write a script to do this using the Command Line Encoder (Document Express Enterprise Edition Non-GUI Version), or you can write a script using DjVuLibre on a Unix platform.
Where can I learn more about SearchPDF?
Demonstrations of SearchPDF (formerly DjVuSearch) can be accessed by clicking the Search menu at PlanetDjVu, and futher information, including published prices, are located at www.searchpdf.com.