Abstract: This paper proposes a series of carefully designed a Ge’ez POS tagging using Hybrid approach. Trigram N tag tagger combined with the human written rule, regular expression and morphological pattern analysis based tagger of Ge’ez part of speech tagger. Ge’ez literature on syntax, morphology and grammar are reviewed to understand the nature of the language and also to identify possible tag sets. Experiments aiming at evaluating the influence of automatic pre-annotated on the manual part-of-speech annotation of a corpus, both from the efficiency and the accuracy points of view, with a specific attention drawn to biases. As a result, 26 broad tag sets were identified and 15,154 words from around 1,305 sentences collected from one genre i.e., Holy bible. Then, those words ware manually tagged by Ge’ez language professionals for training and testing purpose.
The hybrid of TnT with human annotated rule, regex and morphological pattern analysis of Ge’ez language is assumed to perform better than the TnT taggers taken alone. Individual and hybrid experiments have conducted for the three types of taggers namely the TnT tagger, TnT with Regex tagger and Hybrid tagger. The results are 77.87%, 82.23% and 94.32% performances are obtained for TnT tagger, TnT with Regex tagger and Hybrid taggers respectively. Therefore, the performance of Hybrid approach have the best than individual performance. Finally, this paper concludes Hybrid approach have permissive result for Semitic languages.
Keywords: Ge’ez, POS tagger, NLP, TnT, Regex, Hybrid POS tagger.
Title: Ge’ez POS Tagger Using Hybrid Approach
Author: HAGOS GEBREMEDHIN GEBREMESKEL
International Journal of Computer Science and Information Technology Research
ISSN 2348-1196 (print), ISSN 2348-120X (online)
Research Publish Journals