Mohammad Reza Keyvanpour; Zahra Karimi Zandian; Zahra Abdolhosseini
Volume 16, Issue 2 , July 2018
Abstract
Query expansion is a method for improving retrieval performance by supplementing an original query with additional terms. This process improves the quality of search engine results and helps users to find the required information. In the recent years, different methods have been proposed in this area. ...
Read More
Query expansion is a method for improving retrieval performance by supplementing an original query with additional terms. This process improves the quality of search engine results and helps users to find the required information. In the recent years, different methods have been proposed in this area. In addition to such a variety of different approaches in this area and necessity of the study of their characteristics, the lack of a comprehensive classification based on candidate expansion terms extraction methods and also suitable and complete criteria to evaluate them, make the precise study, comparison and evaluation of methods for query expansion and choosing appropriate method based on need difficult for researchers. Therefore, in this paper a new useful framework is presented. In the proposed framework, in addition to the identification of three basic approaches based on the candidate expansion terms extraction methods for query expansion and expressing their properties, appropriate criteria for qualitative evaluation of these methods will be described. Next, the proposed approaches will be evaluated qualitatively based on these criteria. Using the systematic and structured framework proposed in this paper leads a useful platform for researchers to be provided for the comparative study of existing methods in the field, investigating their features specially their drawbacks to improve them and choosing appropriate method based on their needs.
fatemeh momenipour; mohammadreza keyvanpour
Volume 14, Issue 2 , July 2016
Abstract
Stemming is the process of finding the main morpheme of a word andit is used in natural language processing, text mining and informationretrieval systems. A stemmer extracts the stem of the words. We can classifyPersian stemmers in to three main classes: structural stemmers, dictionarybased stemmers ...
Read More
Stemming is the process of finding the main morpheme of a word andit is used in natural language processing, text mining and informationretrieval systems. A stemmer extracts the stem of the words. We can classifyPersian stemmers in to three main classes: structural stemmers, dictionarybased stemmers and statistical stemmers.The precision of structural stemmers is low and the expenses of dictionary basedstemmers is high, so the main goal of this research is to design and implementa statistical stemmer based on hidden markov model with high precision which can reduce the sizeof indexed file and increase the speedof information retrieval systems. Our proposed stemmer, finds the prefixes and suffixes of a word and removethem, so the rest of the word is the stem. But there are some exceptions inPersian words which lead to stem those words by mistakes. So we collect a dictionaryof Persian stemmers. Our proposed stemmers, search a word in the dictionary, if it is not there , itfinds the stem of it by hmm based stemmer. This stemmer is tested in Bijankhancorpus and Hamshahri test collection. The results show increment in meanaverage precision and recall. The speed of the Information retrieval system isincreased and the size of indexed filesis decreased by the algorithm.
Zahra Abdolhosseini; Mohammad Reza Keyvanpour
Volume 13, Issue 2 , July 2015
Abstract
Persian natural language processing (NLP) researchers have many limitations to access linguistic tools which are suitable for text processing. Therefore, researchin Persian text processing is very limited. Since dataset is an important requirement for experiments and their evaluation, we aimed to create ...
Read More
Persian natural language processing (NLP) researchers have many limitations to access linguistic tools which are suitable for text processing. Therefore, researchin Persian text processing is very limited. Since dataset is an important requirement for experiments and their evaluation, we aimed to create appropriate corpora for information retrieval and natural language processing in Persian. The provided corpora in this article are based on HAMSHAHRI dataset which is appropriate for simple information retrieval and simple natural language processing because it has not been tagged. We converted this dataset to tagged collection and increased its text quality. The new corpora minimize the text preprocessing requirement. Here we have used STep-1 tools for text processing and have proposed some ideas to remove the bugs of these tools in order to increase their quality. At the end we used the new corpora for text retrieval and results showed performance improvement.