A Method to Convert Sana’ani Accent to Modern Standard Arabic

G. H. Al-Gaphari, M. Al-Yadoumi

Abstract


This paper presents an efficient mechanism to convert Sana’ani dialect to modern standard Arabic. The mechanism is based on morphological rules
related to Sana’ani dialect as well as Modern Standard Arabic. Such rules facilitate the dialect conversion to its corresponding MSA. The mechanism
tokenizes the input dialect text and divides each token into stem and its affixes; such affixes can be categorized into two categories: dialect affixes
and/or MSA affixes. At the same time, the stem could be dialect stem or MSA stem. Therefore, our mechanism, implemented by using a simple MSA
stemmer, must pay attention to such situations. Then our dialect stemmer is applied to strip the resulting token and extract dialect affixes. At this point,
the rules are applied to decide when to carry out the extraction of an affix. The experiment shows that Sana’ani dialect has three classes of distortions,
which are prefixes, suffixes, and stems distortions. The algorithm normalizes such distortion based on the morphological rules. For each morphological
rule the mechanism checks possibility of applying such a rule. That means if rule conditions be met, then the dialect affix will be replaced by its
corresponding MSA. If there is no restriction on applying the rule related to the distorted stem, then the rule can be considered as a parallel corpus of the
dialect and MSA. Finally, the experiment computes the distortion ratio of MSA in Sana’ani dialect. For a Sana’ani dialect sample of 9386 words,
16.29% of them have distorted suffixes, 0.70% have distorted prefixes and 2.17% contain distorted stems. These percentages are related only to the
processed words.


Full Text:

PDF

Refbacks

  • There are currently no refbacks.



E-ISSN: 2008-8310

   ISSN: 2008-8302