DjVu Text-behind-Handwriting Introduced!
by James Rile, PlanetDjVu, October, 2001
PlanetDjVu presents the first example of text-behind-handwriting
PlanetDjVu is pleased to present the first example of hidden text behind handwriting in a DjVu document. This was made possible with the DjVu Encode SDK 3.5 and the new XML export and import features of this SDK.
What is hidden text behind handwriting?
Let's begin with hidden text behind print words. This innovation was introduced in PDF with the release of the Acrobat Capture product 6 years ago. The ASCII text, which can be copied and searched upon, is hidden behind the image of the printed words which are visible in the page image.
DjVu was the second format to offer this feature in October, 2000.
The way that Acrobat Capture and DjVu Encoders create the text that is placed under the image-of-text is to perform a process called OCR (Optical Character Recognition). Simply put, the text is derived (recognized) from the image of the text.
OCR doesn't work with handwriting, so in this example the text was typed in by a person and then PLACED word-for-word under the handwriting.
How can hidden text be placed behind handwriting?
This capability has been possible for some time in Acrobat Capture Reviewer, for PDF Image + Hidden Text format. The technique in this tool is to draw first a Text Aone, then place a Text Block within that, and finally to place each line of text within that. A line of text is a string of ASCII text that is copied into a text line. A text line in PDF is defined by its line end points. The string of ASCII text is proportionally stretched between the end points of the line. Since handwriting is not proportionally spaced, the ASCII words do not always fall exactly under the image of the handwritten words. Although this technique can be used in Acrobat Capture Reviewer, it is not easy to do. There is not yet a commercial application designed to facilitiate the placing of ASCII text under images of handwriting.
Our demonstrated technique in DjVu uses an approach where highlight rectangles are first drawn around the handwritten words in DjVu Solo, then the metadata is exported to XML using the DjVu 3.5 SDK. The ASCII text is added using an XML editor, and then the XML is imported back to the DjVu file, again using the DjVu 3.5 SDK. This is a proof-of-concept process, but it is not a production solution.
How would a text-for-handwriting editor function?
First, object zones would be generated around each handwritten word. Then, using a drag-and-drop operation, each line or word would be dropped into the appropriate image zone.
What is the benefit of hidden-text-behind handwriting?
The benefit is that historic and significant handwritten documents become full-text searchable with search term highlighting, and the text can be copied from the page.
Open and perform word finds on the Thermodynamics notes DjVu file.
Click here to open these handwritten thermodynamics notes in DjVu format. Use the Find command in the DjVu Viewer toolbar to find the following words with highlighting:
Heat flow in a thick sphere hollow
Conduction problems involving surface heat emission
Thick cylinder with surface loss
Flow of compressible fluids
Conservation of mass
Jet propulsion systems
Expansion efficiency through a turbine
Gas turbine jet engine
Design of a Parsons turbine
Limitation on the power that a given turbine may develop