Developing a Comprehensive Standard Persian Positional Tagset

Mohammad Amin Mahdavi

Abstract


One of the primary tools used in text processing tasks such as information retrieval, text extraction, and text mining, is a corpus that is enhnaced by linguistic tags.  In a corpus development effort, the role of a POS-tagger is to assign a linguistic tag to every textual token.  POS annotation relies heavily on a tagset based on a linguistic theory.  Text processing in Persian, too, follows this common practice.  Several tagsets have been introduced, so far, to annotate Persian corpora.  However, each tagset has followed a specific standard and linguistic theory.  The resulting tagsets contain a limited number of tags, which renders them inadequate for a larger scope of research.  This study is inspired by EAGLES, MULTEXT-East, positional tagset standards to produce a comprehensive standard positional tagset for Persian.  The proposed tagset is also informed by the existing Persian tagsets.  The proposed Persian Positional Tagset (PPT) is designed to be used for morphological, lexical, and syntactic annotations of Persian corpora.

DOR: 98.1000/1726-8125.2018.16.165.0.1.68.116


Keywords


Persian Positional Tagset; Persian POS tagset; Standard Persian Tagset; Persian Morphosyntactic tagse

Full Text:

PDF

Refbacks

  • There are currently no refbacks.



E-ISSN: 2008-8310

   ISSN: 2008-8302