Document Type : Articles
Controlled vocabularies have been frequently used in information retrieval systems. Control of the vocabularies and evaluating the utility of their terms are two critical questions. This research aims at the development of Persian subject headings through statistical analyses. The current research was conducted on more than 450,000 records extracted from the electronic version of National Bibliography of Iran (NBI). Data has been processed through data mining techniques. The correlation analysis was performed to determine the relationship between the number of items in NBI and the number of Persian subject headings as well as the rank of each subject heading and its use frequency in NBI.The count of new subject headings vs. the count of new catalogued materials in NBI grew linearly at the beginning and increased logarithmically when the number of catalogued materials reached 3,200. The analysis of the use frequency of distinct headings within NBI resulted in three classes: most, frequent, and normal used subject headings. The findings partly agree with Lancaster’s prediction, as he states that a controlled vocabulary will grow very fast in the beginning. It was also found that the majority of subject headings are rarely used by NBI. It is due to absence of a mechanism to control the building of new headings.