A Diversity of Written as well as Spoken Languages
by James Rile, PlanetDjVu, March 5, 2002
It is a well-known fact that hundreds of languages are spoken around the world, but we in the digital publishing and document imaging industries often do not take pause to consider that written and published documents exist in all of these languages as well.
Optical Character Recognition (OCR) products today tend to focus on just a few primary written languages to the exclusion of all the rest.
At PlanetDjVu, we are busy developing the means to OCR all these other written languages for the DjVu file format, so that they are full-text searchable and so that text can be copied and pasted from them.
In the International Collection of the Gallery, you will find a sampling of the languges listed below that we can OCR for DjVu.
Keep visiting this site as we will be announcing our new OCR product for DjVu soon. Meanwhile, you might like to browse the list of languages (176 of them!) that we support:
Languages
Armenian (Eastern, Western, Grabar) An indo-european language forming its own group. Official language of Armenia, spoken also in Georgia, Azerbaidzhan, Russia. The old literary armenian - Grabar - is now used exclusively as the language of the clergy. The modern literary language has two main varieties - Eastern (Yerevan), spoken in Armenia and Western, spoken in Near East and Western Europe. A mother tongue for some 7 million people.
Bulgarian A south slavic language. Official language of Bulgaria. A mother tongue for some 9 million people.
Catalan A romance language (ibero-romance subgroup). A mother tongue for some 8 million people in Spain (Catalonia, Valencia, Balearic islands), France (Roussillon, East Pyrenees), Andorra and Sardinia island. One of the official languages of the above-stated spanish provinces and Andorra.
Croatian A south slavic language. Considered to be the same language as Serbian (forming the single Serbocroatian language, the only difference being in the spelling system used - cyrillic for Serbian and latin for Croatian) until the emergence of the independent Croatia. Official language of Croatia. A mother tongue for some 5 million people.
Czech A west slavic language. Official language of Czech Republic, spoken also in Slovakia. A mother tongue for some 10 million people.
Danish A gemanic (scandinavian) language. Official language of Denmark, spoken also in Greenland and Faroe islands. A mother tongue for some 5 million people.
Dutch (Netherlands and Belgium) A germanic language. Official language of Netherlands and Belgium. A mother tongue for some 20 million people.
English A germanic language. The main international language. A UN language. The official language of USA, Canada, Great Britain, Ireland (officially second to Irish), Australia, New Zealand, India (on a temporary status) and 15 african states: Republic of South Africa, Nigeria, Ghana, Uganda etc. A mother tongue for more than 400 million people.
Estonian A finno-ugric (baltic-finnic) language. Official language of Estonia. A mother tongue for some 1 million people.
Finnish A finno-ugrig (baltic-finnic) language. Official language of Finland, spoken also in Russia (Karelia, St.Petersburg region), Sweden. A mother tongue for some 5 million people.
French A romance language. A UN language. Official language of France, Belgium, Switzerland, Luxemburg, Monaco, Andorra, Canada, Haiti, several african states: Benin, Cote d'Ivoire, Burkina Faso, Gabon, Guinea, Zaire, Congo, Mali, Niger, Senegal, Togo, Tchad, Burundi, Rwanda, Central African Republic, Madagascar, Cameroon, Seychelles, Comoros, Jibuti, Vanuatu (Oceania). A mother tongue for more than 100 million people.
German (new and old spelling) A germanic language. Official language of Germany, Austria, Switzerland, Luxemburg, Belgium. A mother tongue for some 100 million people.
Greek An indo-european language forming its own group. Official language of Greece and Cyprus. A mother tongue for some 12 million people.
Hungarian An ugric (uralic) language. Official language of Hungary, spoken also in nearby counties such as Jugoslavia, Austria, Slovakia, Romania, Ukraine. A mother tongue for some 14 million people.
Italian A romance language. Official language of Italy. A mother tongue for some 70 million people.
Latvian A baltic language. Official language of Lativa. A mother tongue for some 2 million people.
Lithuanian A baltic language. Official language of Lithuania. A mother tongue for some 3 million people.
Norwegian (nynorsk and bokmal) A germanic (scandinavian) language. Official language of Norway. The literary language exists in two forms: nynorsk and bokmal (the latter is more Danish). A mother tongue for some 4 million people.
Polish A west slavic language. Official language of Poland. A mother tongue for some 40 million people.
Portuguese (Portugal and Brazil) A romance language. Official language of Portugal, Brazil, Angola, Mozambique, Guinea-Bissau, Cape Verde, Sao Tome and Principe. A mother tongue for some 170 million people.
Romanian A romance language. Official language of Romania. A mother tongue for some 25 million people.
Russian An east slavic language. Official language of Russian Federation, spoken also in all CIS states and baltic states. A mother tongue for some 200 million people.
Slovak A west slavic language. Official language of Slovakia, spoken also in nearby regions of Hungary, Romania and Ukraine. A mother tongue for some 5 million people.
Spanish A romance language. Official language of Spain, all Latin American countries (save Brazil) and Equatorial Guinea. A UN language. A mother tongue for some 325 million people.
Swedish A germanic (scandinavian) language. Official language of Sweden and Finland. A mother tongue for some 10 million people.
Tatar A turkic language. Spoken in Russia (Tatarstan, Bashkir, Chuvashiya, Mari El etc.). A mother tongue for some 6 million people.
Turkish A turkic language. Official language of Turkey and Cyprus, spoken also in Greece, Bulgaria, Romania, Iran and Iraq. A mother tongue for some 55 million people.
Ukrainian An east slavic language. Official language of Ukraine, spoken also in Russia and Byelorussia. A mother tongue for some 40 million people.
Additional languages
Abkhaz An abkhazo-adyghian (caucasian) language. Spoken in Georgia (Abkhazia). A mother tongue for some 100 thousand people.
Adyghian An abkhazo-adyghian (caucasian) language. Spoken in Russia (Adyghea, Krasnodar region). A mother tongue for some 120 thousand people.
Afrikaans A germanic language. One of the official languages of Republic of South Africa. A mother tongue for some 6 million South African afrikaners (boers) - descendants of Netherlands colonists.
Agul A lezgian (dagestanian) language. Spoken in Russia (Dagestan, Stavropol region) and Azerbaidzhan. A mother tongue for some 15 thousand people.
Albanian An indo-european language forming its own group. Official language of Albania. A mother tongue for some 5 million people in Albania, Jugoslavia (Kosovo), Italy, Greece.
Altai A turkic language. Spoken in Russia (Altai). A mother tongue for some 55 thousand people.
Avar An avar-andi-dido (dagestanian) language. Spoken in Russia (Dagestan) and Azerbaidzhan. A mother tongue for some 600 thousand people.
Aymara A quechumaran language (one of the languages of South America indians). One of the three official languages of Bolivia. A mother tongue for some 2 million aymara indians living in Peru and Bolivia. Most Aymara speakers speak also Quechua and Spanish. Some scientists prefer to treat Aymara not as a single language with some 10 dialects but as of Aymara language group.
Azerbaijani (cyrillic), Azerbaijani (latin) A turkic language. Official language of Azerbaidzhan. A mother tongue for some 14-20 million people in Iran, Azerbaidzhan, Armenia, Georgia.
Bashkir A turkic language. Spoken in Russia (Bashkiriya and nearby regions). A mother tongue for some 900 thousand people.
Basque An isolate language. A mother tongue for some 600 thousand people in Spain and France.
Belarusian An east slavic language. Official language of Byelorussia. A mother tongue for some 9 million people.
Bemba A bantu language. A mother tongue for some 5 million people in Zambia, Zaire, Congo and Tanzania.
Blackfoot A west algonkian language. A mother tongue for less than 10 thousand indians in USA and Canada.
Breton A brythonic (celtic) language. A mother tongue for some 1 million bretons in France.
Bugotu An oceanic language (member of malayo-polynesian branch of austronesian languages) spoken in south east Solomon islands.
Buryat A mongolian language. Spoken in Russia (Buryatia). A mother tongue for some 360 thousand people.
Cebuano A philippinean (austronesian) language. Spoken in central Philippines. Usually considered to be a group of closely related languages (bisayan). A mother tongue for some 24 million people.
Chamorro An austronesian language spoken in western Micronesia, particularly on Guam island. A mother tongue for some 60 thousand people.
Chechen A nakh (caucasian) language. A mother tongue for some 800 thousand people in Russia (Chechnya, Ingushetia and Dagestan).
Chukchee A luorawetlan language spoken in Russia (Chukchee and Koryak regions). A mother tongue for some 10 thousand people.
Chuvash A turkic language spoken in Russia (Chuvashiya). A mother tongue for some 1.4 million people.
Corsican Usually considered to be a dialect of Italian, spoken on Corsica island. A mother tongue for some 100 thousand people.
Crimean Tatar A turkic language spoken in Ukraine (Crimea). A mother tongue for some 700 thousand people.
Crow A siouan language spoken in Montana, USA. A mother tongue for less than 10 thousand people.
Dakota A siouan language spoken in north USA (South Dakota, Montana). A mother tongue for some few thousand people.
Dargwa A dagestanian language. Spoken in Russia (Dagestan). A mother tongue for some 360 thousand people.
Dungan A sino-tibetan language spoken in Kyrgyzstan, Kazakhstan and Uzbekistan. A mother tongue for some 50 thousand people.
Eskimo (cyrillic), Eskimo (latin) An eskimo-aleut language. Spoken in south-east Chukchee peninsula (Russia), Alaska and nearby regions (USA), arctic regions of Canada, Greenland. A mother tongue for some 100 thousand people.
Even A manchu-tungus language spoken in Russia (Okhotsk, Yakutia, Magadan region). A mother tongue for some 5 thousand people.
Evenki A manchu-tungus language spoken in China, Russia (from Yenisey to Sakhalin), Mongolia. A mother tongue for some 30 thousand people (in Russia - some 10 thousand).
Faroese A germanic (scandinavian) language. Official language of Faroe islands (autonomous Danish possession), spoken also in some other regions of Denmark. A mother tongue for some 40 thousand people.
Fijian An austronesian language spoken on Fiji islands. A mother tongue for some 300 thousand people.
Frisian A germanic language spoken in Noord-Holland and Friesland (Netherlands), North Frisian islands, Helgoland island, Saterland (Germany). A mother tongue for some 400 thousand people.
Friulian A romance language. Usually considered to be a rhaeto-romanic language. Spoken in Friuli-Venezia Giulia (Italy). A mother tongue for some 700 thousand people.
Gagauz A turkic language spoken in Southern Moldavia. A mother tongue for some 180 thousand people.
Galician A romance language frequently referred to as a dialect of Spanish or Portuguese, spoken in Spain (Galicia). A mother tongue for some 4 million people.
Ganda A bantu language spoken in Uganda. A mother tongue for some 3 million people.
German (Luxemburg) One of the official languages of Luxemburg (also called Luxembourgian). Usually considered to be a Moselle-Franconian dialect of German.
Guarani A tupian language spoken in Paraguay and nearby regions in Brazil, Argentina and Bolivia. A mother tongue for some 3 million guarani indians.
Hani A sino-tibetan (lolo-burmish) language spoken in China, north Myanmar, Thailand, Laos and Vietnam. Also called Akha. A mother tongue for some 1 million people.
Hausa A afro-asiatic language. Spoken in Nigeria, Niger, Cameroon, Ghana, Benin, Togo. A mother tongue for some 40 million people.
Hawaiian An austronesian (polynesian) language spoken on Hawaii islands. A mother tongue for some 20 thousand people.
Icelandic A germanic (scandinavian) language. Official language of Iceland. A mother tongue for some 250 thousand people.
Indonesian An austronesian language called Malay (some scientists consider it to be a dialect of Malay) before 1945. Official language of Indonesia under the name Bahasa Indonesia, used also for international communication. A mother tongue for some 160 million people.
Ingush A nakh language spoken in Ingushetia. A mother tongue for some 200 thousand people.
Irish A celtic language. First official language of Ireland. A mother tongue for less than 100 thousand people.
Jingpo A tibeto-burman language spoken in south China and Myanmar. A mother tongue for some 600 thousand people.
Kabardian An abkhazo-adyghian (caucasian) language spoken in Kabardino-Balkaria, Karachay-Cherkessia, North Ossetia (Mozdok), Adyghea and nearby regions of Krasnodar and Stavropol regions. A mother tongue for some 300 thousand people.
Kalmyk A mongolian language spoken in Russia (Kalmykia). A mother tongue for some 140 thousand people.
Karachay-balkar A turkic language (some prefer to consider this language to be made up of two separate, but closely related Karachay and Balkar languages) spoken in Russia
(Kabardino-Balkaria, Karachay-Cherkessia). A mother tongue for some 200 thousand people.
Karakalpak A turkic language spoken in Karakalpakiya (Uzbekistan). A mother tongue for some 300 thousand people.
Kasub Usually considered a dialect of Polish, spoken in Poland.
Kawa A kadai language (considered to be related both to thai and austronesian languages) spoken in China. A mother tongue for less than 50 thousand people.
Kazakh A turkic language. Official language of Kazakhstan.
Khakass A turkic language spoken in Russia (Khakasiya). A mother tongue for some 60 thousand people.
Khanty An ugric language spoken in Russia (Tyumen and Tomsk regions). A mother tongue for some 15 thousand people.
Kikuyu A bantu language spoken in central Kenya. A mother tongue for some 6 million people.
Kirghiz A turkic language. Official language of Kyrgyzstan, spoken also in China. A mother tongue for some 2 million people.
Kongo A bantu language spoken in Congo, Zaire, Angola. A mother tongue for some 10 million people.
Koryak A luorawetlan language spoken in Russia (Koryak region). A mother tongue for some 5 thousand people.
Kpelle A mande (Niger-Congo) language spoken in Guinea and Liberia. A mother tongue for less than 1 million people.
Kumyk A turkic language spoken in Russia (Dagestan). A mother tongue for some 230 thousand people.
Kurdish A west iranian language. The second official language of Iraq. Spoken in Turkey, Iran, Iraq, Syria, CIS states (kurdish diaspora). A mother tongue for some 20 million people.
Lak A dagestanian language spoken in Russia (Dagestan). A mother tongue for some 100 thousand people.
Latin An italic language. Official language of Vatican and of classical Roman literature.
Lezgi A dagestanian language spoken in Russia (Dagestan) and Azerbaidzhan. A mother tongue for some 350 thousand people.
Luba A bantu language spoken in Zaire. A mother tongue for some 6 million people.
Macedonian A south slavic language. Official language of Macedonia. A mother tongue for some 2 million people.
Malagasy An austronesian language. Official language of Madagascar. A mother tongue for some 10 million people.
Malay An austronesian language. Official language of Malaysia. A mother tongue for some 20 million people.
Malinke A mande (Niger-Congo) language. Spoken in Senegal, Guinea, Mali, Liberia, Sierra Leone. A mother tongue for some 4 million people.
Maltese A semitic language. Official language of Malta. A mother tongue for some 400 thousand people.
Mansy An ugric language spoken in Russia (West Siberia). A mother tongue for some 4 thousand people.
Maori A polynesian language spoken in New Zealand. A mother tongue for some 300 thousand people.
Mari A common name for two closely related languages - Plain Mari and Mountain Mari. Spoken in Russia (Mari El, Tatariya). A finno-ugric (uralic) language. A mother tongue for some 540 thousand people.
Maya A maya language spoken in Mexico, Guatemala and Honduras. A mother tongue for some 1 million indians.
Miao A myao-yao language spoken in China, Vietnam, partly in Laos and Thailand. A mother tongue for some 8 million people.
Minangkabau An austronesian language spoken on Sumatra island (central and western regions). A mother tongue for some 6 million people.
Mohawk An iroquoian language spoken in north-east USA and nearby Canada regions (Ontario and Erie lakes). A mother tongue for some 10 thousand people.
Moldavian A romance language. Official language of Moldavia. Usually considered to be a variety of Romanian. A mother tongue for some 3 million people.
Mongol A mongolian language. Official language of Mongolia, spoken also in China (Inner Mongolia). A mother tongue for some 5 million people.
Mordvin A common name for two closely related languages: Moksha-Mordvin and Erzya-Mordvin. A volga-finnic (uralic) language. Spoken in Russia (Mordvinia). A mother tongue for some 1 million people.
Nahuatl An aztec-tanoan language. Spoken in Mexico. A mother tongue for some 1 million people.
Nenets A samoyedic (uralic) language spoken in Russia (Yamalo-Nenets and Dolgano-Nenets regions). A mother tongue for some 25 thousand people.
Nivkh An isolated language spoken in Russia (Sakhalin, Amur region). A mother tongue for some 1 thousand people.
Nogay A turkic language spoken in Russia (Karachay-Cherkessia, Krasnodar region). A mother tongue for some 55 thousand people.
Nyanja A bantu language. Official language of Malawi, spoken also in Zambia, Mozambique and Zimbabwe. A mother tongue for some 7 million people.
Ojibway An algonkian language spoken in USA and Canada. A mother tongue for some few thousand indians.
Ossetian An east iranian language. Spoken in Russia (North Ossetia) and Georgia (South Ossetia). A mother tongue for some 500 thousand people.
Papiamento A spanish-based creole language. Spoken on Aruba, Bonaire and Curacao islands. A mother tongue for less than 1 million people.
Provencal A romance language spoken in south France and Italian Alps. A mother tongue for some 2-10 million people.
Quechua A quechumaran language. One of the official languages of Peru and Bolivia, spoken also in Equador and north Colombia, Chile and Argentina. A mother tongue for some 7-13 million indians.
Rhaeto-romanic A romance language. One of the official languages of Switzerland. A mother tongue for some 4 thousand people (Graubünden canton).
Romany An indian language spoken in gypsy diaspora throughout the whole world. A mother tongue for some 1-5 million people.
Rundi A bantu language. One of the official languages of Burundi, spoken also in Tanzania and Zaire. A mother tongue for some 7 million people.
Russian (old spelling) The regular Russian in old (pre-revolution) spelling.
Rwanda A bantu language. One of the official languages of Rwanda, spoken also in Burundi, Zaire, Uganda and Tanzania. A mother tongue for some 12 million people.
Sami (Lappish) A finno-ugric language spoken in north Norway, Sweden, Finland, Russia (Kola peninsula). A mother tongue for some 50 thousand people.
Samoan A polynesian language. One of the official languages of Western Samoa. A mother tongue for some 240 thousand people.
Scottish Gaelic A celtic language spoken in Scotland, Hebrides and in Nova Scotia (Canada). A mother tongue for less than 100 thousand people.
Selkup A samoyedic (uralic) language spoken in Russia (Krasnoyarsk and Tomsk regions). A mother tongue for some 2 thousand people.
Serbian (cyrillic) A south slavic language (see also Croatian). Official language of Jugoslavia. A mother tongue for some 8 million people.
Shona A bantu language spoken in Zimbabwe, Mozambique, Botswana and Republic of South Africa. A mother tongue for some 8 million people.
Slovenian A south slavic language. Official language of Slovenia, spoken also in nearby regions of Austria and Italy. A mother tongue for some 2 million people.
Somali An afro-asiatic (cushitic) language. Official language of Somali, spoken also in Jibuti, Ethiopia, Kenya. A mother tongue for some 9 million people.
Sorbian A west slavic language. Usually considered to be made up of two languages - Upper Sorbian and Lower Sorbian. Spoken in Germany (Saxony). A mother tongue for some 100 thousand people.
Sotho A bantu language. One of the official languages of Lesotho, spoken also in Republic of South Africa. A mother tongue for some 5 million people.
Sunda An austronesian language spoken in Indonesia (western Java island). A mother tongue for some 25 million people.
Swahili A bantu language. The main language of international communication (the commercial lingua franca) in central and east Africa (particularly in Tanzania and Kenya). Official language of Uganda. A mother tongue for some 10-50 million people.
Swazi A bantu language. Official language of Swaziland, spoken also in north-east Republic of South Africa. A mother tongue for some 2 million people.
Tabasaran A dagestanian spoken in Russia (Dagestan). A mother tongue for some 75 thousand people.
Tajik An iranian language. Official language of Tadjikistan, spoken also in Uzbekistan. A mother tongue for some 3 million people.
Tagalog A philippinean (austronesian) language. Official language of Philippines. A mother tongue for some 35 million people.
Tahitian A polynesian (austronesian) language. Official language of French Polynesia, spoken also in New Caledonia and Vanuatu. A mother tongue for some 100 thousand people.
Tok Pisin An english-based creole language. Official language of Papua New Guinea. A mother tongue for some 3 million people.
Tongan A polynesian (austronesian) language. Official language of Tonga, spoken also in New Zealand, Fiji, Western Samoa. A mother tongue for some 120 thousand people.
Tswana A bantu language. Spoken in Botswana and Republic of South Africa. One of the official languages of Botswana. A mother tongue for some 4 million people.
Tun A thai language spoken in south China. A mother tongue for some 700 thousand people.
Turkmen A turkic language. Official language of Turkmenistan. A mother tongue for some 4 million people.
Tuvinian A turkic language spoken in Russia (Tuva). A mother tongue for some 190 thousand people.
Udmurt A permian language spoken in Russia (Udmurtia, Kirovsk region). A mother tongue for some 550 thousand people.
Uzbek (cyrillic), Uzbek (latin) A turkic language. Official language of Uzbekistan, spoken also in China and Afghanistan. A mother tongue for some 17 million people.
Welsh A celtic language spoken in Wales (Great Britain). A mother tongue for some 500 thousand people.
Wolof A Niger-Congo language (west atlantic branch). Official language of Senegal, spoken also in Gambia and Mauritania. A mother tongue for some 4 million people.
Xhosa A bantu language spoken in Republic of South Africa. A mother tongue for some 8 million people.
Yakut A turkic language spoken in Russia (Yakutia). A mother tongue for some 310 thousand people.
Zapotec An indian language spoken in south Mexico. A mother tongue for some 400 thousand people.
Zulu A bantu language spoken in Republic of South Africa and Zimbabwe. A mother tongue for some 8 million people.
|