<
 
 
 
 
ž
>
You are viewing an archived web page, collected at the request of United Nations Educational, Scientific and Cultural Organization (UNESCO) using Archive-It. This page was captured on 13:18:22 Oct 20, 2016, and is part of the UNESCO collection. The information on this web page may be out of date. See All versions of this archived page.
Loading media information hide
 UNESCO.ORG | Education | Natural Sciences | Social & Human Sciences | Culture | Communication & Information

WebWorld

Graphic element

Communication and Information Activities

Graphic element

Multilingualism in Cyberspace

Language constitutes the foundation of communication and is fundamental to cultural and historical heritage.
Documents

Methods to computerize "little equipped" languages and groups of languages

In 2004, less than 1 % of the 6000 or so languages of the world are beneftitting from the opportunities that computerization offers, such as a broad range of services going from text processing to machine translation.This thesis, which focuses on the lesser used languages - the "pi-languages" - seeks to propose solutions to cure their digital underdevelopment.
In a first part, intended to show the complexity of the problem, we present the languages' diversity, the technologies used, as well as the approaches of the various actors: linguistic populations, software publishers, the United Nations, States... A technique for measuring the computerization degree of a language - the sigma-index - is proposed, as well as several optimization methods.

The second part deals with the computerization of the Laotian language and concretely presents the results obtained for this language by applying the methods described previously. The described achievements contributed to improve the sigma-index of the Laotian language by approximately 4 points, this index being currently evaluated with 8.7/20.

In the third part, we show that an approach by groups of languages can reduce the computerization costs thanks to the use of a modular architecture associating existing general software and specific complements. For the most language-related parts, complementary generic lingware tools give the populations the possibility to computerize their languages by themselves. We validated this method by applying it to the syllabic segmentation of Southeast Asian languages with unsegmented writings, such as Burmese, Khmer, Laotian and Siamese (Thai).

Details
Collation 277
French these_Berment.pdf
French
Author(s) Vincent Berment
Publication year 2004