Document Type : Articles

Authors

1 Associate Prof., Information Technology Department, Iranian Research Institute for Information Science and Technology (IranDoc), Tehran, Iran

2 B.Sc., Industrial Engineering Department, Islamic Azad University, Central Tehran Branch, Tehran, Iran.

3 Professor, Industrial Engineering Department, Sharif University of Technology, Tehran, Iran

Abstract

Today, academic research plays a very influential role in the economic development of countries. These researches are often recorded and disseminated in the form and structure of theses and dissertations in scientific institutes. The better the quality of this data in the systems that collect and distribute it, the more it can be used and exploited by organizations and businesses. Therefore, providing this data requires proper monitoring to put the output of the recording and dissemination process in good condition. This paper offers a framework for evaluating theses and dissertation data quality. In the framework, the data inconsistency coding structure is introduced and presented in Word and PDF files and in the form of metadata (bibliographic information). The approaches presented in data quality methodologies (TDQM and DWQ) are also used to provide solutions to improve data quality in the provisioning phase. At this stage, approaches such as owner attribution to data or process, root cause analysis, process control, and continuous monitoring are considered. The focus group method determines the operational strategies for quality improvement. Finally, process-oriented techniques, such as quality control checklists and image processing, and data-driven approaches, such as data cleansing, are localized and developed in this section to improve the quality of theses/dissertation documents. The provided improvement solutions were categorized into two different groups. Guiding the user in the "Theses/Dissertations" registration process is identified as a process-driven category. On the other hand, introducing a specific format for "Theses/Dissertations" files and resolving the quality issues of PDF files were among the data-driven solutions.

Keywords

Ashtarian Esfahani, A., Ershadi, M. J. & Azizi, A. (2020). Monitoring indicators of research data using I-MR control charts. Iranian Journal of Information Processing and Management 35 (4), 957-933. https://doi.org/10.35050/JIPM010.2020.025 [in Persian]
Avenali, A., Batini, C., Bertolazzi, P. & Missier, P. (2008). Brokering infrastructure for minimum cost data procurement based on quality-quantity models. Decision Support Systems 45 (1), 95-109. https://doi.org/10.1016/j.dss.2007.10.012
Azeroual, O., Ershadi, M. J., Azizi, A., Banihashemi, M. & Abadi, R. E. (2021). Data quality strategy selection in CRIS: Using a hybrid method of SWOT and BWM. Informatica, 45(1), 65-80. https://doi.org/10.31449/inf.v45i1.2995
Azeroual, O., Saake, G., Abuosba, M. & Schöpfel, J. (2020). Data quality as a critical success factor for user acceptance of research information systems. Data, 5 (2), 35. https://doi.org/10.3390/data5020035
Batini, C., Cabitza, F., Cappiello, C. & Francalanci, C. (2008). A comprehensive data quality methodology for web and structured data. International Journal of Innovative Computing and Applications 1 (3),205-218. https://doi.org/10.1504/IJICA.2008.019688
Batini, C., Cappiello, C., Francalanci, C. & Maurino, A. (2009). Methodologies for data quality assessment and improvement. ACM Computing Surveys (CSUR), 41 (3), 1-52. https://doi.org/10.1145/1541880.1541883  
Cahyono, S. H. & Sucahyo, Y. G. (2020). Pengukuran Kualitas Data Menggunakan framework total data quality management (TDQM): Studi Kasus Sistem Informasi Beasiswa Universitas Indonesia Data Quality Assessment Using the TDQM Framework: A Case Study of University of Indonesia (UI) Scholarship Information System. Jurnal IPTEK-KOM (Jurnal Ilmu Pengetahuan dan Teknologi Komunikasi), 22 (2), 193-206. https://doi.org/10.17933/iptekkom.22.2.2020.193-206
De Amicis, F., Barone, D. & Batini, C. (2006). An analytical framework to analyze dependencies among data quality dimensions. In ICIQ (pp. 369-383).
Edris Abadi, R., Ershadi, M. J. & Niaki, S. T. A. (in Press). A clustering approach for data quality results of research information systems. Information Discovery and Delivery. https://doi.org/10.1108/IDD-07-2022-0063
Elouataoui, W., El Alaoui, I., El Mendili, S. & Gahi, Y. (2022). An advanced big data quality framework based on weighted metrics. Big Data and Cognitive Computing, 6(4), 153. https://doi.org/10.3390/bdcc6040153
English, L. P. (1999). Improving data warehouse and business information quality: methods for reducing costs and increasing profits. J. Wiley & Sons.
Eppler, M. J. & Muenzenmayer, P. (2002, November). Measuring information quality in the web context: a survey of state-of-the-art instruments and an application methodology. In Proceedings of the Seventh International Conference on Information Quality ICIQ (pp. 187-196).
Ershadi, M. J. & Ershadi, M. M. (2018). Implementation of failure modes and effects analysis in detergent production companies: A case study. Environmental Quality Management 27 (3), 89-95. https://doi.org/10.1002/tqem.21531
Ershadi, M. J., Jalalimanesh, A. & Nasiri, J. (2019). Designing a metadata quality model: case study of registration system. Iranian Journal of Information Processing & Management 34 (4): 1528-1499.
Ershadi, M. J. & Nabizadeh, M. (2022). Providing a structural methodology for measuring and analyzing the quality of theses and dissertations in the country. Iranian Journal of Information Processing and Management, 37(3), 667-694. https://doi.org/10.35050/JIPM010.2022.256 [in Persian]
Ershadi, M. J. & Omidzadeh, D. (2018). Customer validation using hybrid logistic regression and credit scoring model: A case study.  Quality - Access to Success, 19 (167), 59-62. Retrieved from https://www.researchgate.net/profile/Mohammad-Ershadi-3/publication/329671299_Customer_validation_using_hybrid_logistic_regression_and_credit_scoring_model_A_case_study/links/5da1d84345851553ff8c1288/Customer-validation-using-hybrid-logistic-regression-and-credit-scoring-model-A-case-study.pdf
Ershadi, M. J., Rajabi, T., Shirani, F. & Rezaee, N. (2016). Application of root-cause analysis on quality problem solving of research information systems: A case study on dissemination system of theses and dissertations (GANJ). Iranian Journal of Information Management, 1 (1), 89-75. Retrieved from https://www.aimj.ir/article_50658_d3bd1b73f795d1dbaa1206ffd6bb7c84.pdf?lang=en   [in Persian]
Falge, C., Otto, B. & Österle, H. (2012, January). Data quality requirements of collaborative business processes. In 2012 45th Hawaii International Conference on System Sciences (pp. 4316-4325). IEEE. Retrieved from https://silo.tips/download/data-quality-requirements-of-collaborative-business-processes#
Falorsi, P. D. & Righi, P. (2008). A balanced sampling approach for multi-way stratification designs for small area estimation. Survey Methodology, 34(2), 223-234. Retrieved from https://www.istat.it/en/files/2016/10/Falorsi-engSURVEY_METH.pdf
Glowalla, P., Balazy, P., Basten, D. & Sunyaev, A. (2014, January). Process-driven data quality management--An application of the combined conceptual life cycle model. In 2014 47th Hawaii International Conference on System Sciences (pp. 4700-4709). IEEE.
Günther, L. C., Colangelo, E., Wiendahl, H. H. & Bauer, C. (2019). Data quality assessment for improved decision-making: A methodology for small and medium-sized enterprises. Procedia Manufacturing 29, 583-591. Retrieved from https://publica-rest.fraunhofer.de/server/api/core/bitstreams/e24462fd-a7b2-4597-9548-73a5a4d70978/content
Heinrich, B., Klier, M. & Kaiser, M. (2009). A procedure to develop metrics for currency and its application in CRM. Journal of Data and Information Quality (JDIQ) 1 (1), 1-28. https://doi.org/10.1145/1515693.1515697
Jeusfeld, M. A., Quix, C. & Jarke, M. (1998). Design and Analysis of Quality Information for Data Warehouses. In: Ling, TW., Ram, S., Li Lee, M. (eds) Conceptual Modeling – ER ’98. ER 1998. Lecture Notes in Computer Science, vol 1507. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49524-6_28
Kapsner, L. A., Kampf, M. O., Seuchter, S. A., Kamdje-Wabo, G., Gradinger, T., Ganslandt, T. & Prokosch, H. U. (2019). Moving towards an EHR data quality framework: the MIRACUM approach. In German Medical Data Sciences: Shaping Change–Creative Solutions for Innovative Medicine (pp. 247-253). IOS Press.
Khosroanjom, D., Ahmadzade, M., Niknafs, A. & Mavi, R. K. (2011). Using fuzzy AHP for evaluating the dimensions of data quality. International Journal of Business Information Systems 8 (3), 269-285. https://doi.org/10.1504/IJBIS.2011.042409
Kwon, O., Lee, N. & Shin, B. (2014). Data quality management, data usage experience and acquisition intention of big data analytics. International Journal of Information Management 34 (3), 387-394. https://doi.org/10.1016/j.ijinfomgt.2014.02.002
Lee, Y. W., Strong, D. M., Kahn, B. K. & Wang, R. Y. (2002). AIMQ: A methodology for information quality assessment. Information & Management, 40(2), 133-146. https://doi.org/10.1016/S0378-7206(02)00043-5
Long, J. A. & Seko, C. E. (2014). A cyclic-hierarchical method for database data-quality evaluation and improvement. In Information quality (pp. 52-66). Routledge.
Loshin, D. (2001). Enterprise knowledge management: The data quality approach. Morgan Kaufmann.
Liu, Q., Feng, G., Zhao, X. & Wang, W. (2020). Minimizing the data quality problem of information systems: A process-based method. Decision Support Systems 137. 113381. https://doi.org/10.1016/j.dss.2020.113381
Michelberger, B., Mutschler, B. & Reichert, M. (2011). Towards process-oriented information logistics: Why quality dimensions of process information matter. Lecture Notes in Informatics (EMISA 2011), (pp.107-120). Bonn: Gesellschaft für Informatik. 
Nikiforova, A. (2020). Definition and evaluation of data quality: User-oriented data object-driven approach to data quality assessment. Baltic Journal of Modern Computing 8 (3), 391-432. https://doi.org/10.22364/bjmc.2020.8.3.02
Ochoa, X.  & Duval, E. (2006). Quality metrics for learning object metadata. In EdMedia+ Innovate Learning (pp. 1004-1011). Association for the Advancement of Computing in Education (AACE).
Peltier, J. W., Zahay, D. & Lehmann, D. R. (2013). Organizational learning and CRM success: a model for linking organizational practices, customer data quality, and performance. Journal of Interactive Marketing 27(1), 1-13. https://doi.org/10.1016/j.intmar.2012.05.001
Petrović, M. (2020). Data quality in customer relationship management (CRM): Literature review. Strategic Management, 25 (2), 40-47. https://doi.org/https://doi.org/10.5937/StraMan2002040P
Pipino, L. L., Lee, Y. W. & Wang, R. Y. (2002). Data quality assessment. Communications of the ACM, 45(4), 211-218. https://doi.org/10.1145/505248.506010
Rahman, M. S., Mannan, M., Hossain, M.A., Zaman, A. H. & Hassan, H. (2018). Tacit knowledge-sharing behavior among the academic staff: Trust, self-efficacy, motivation and big five personality traits embedded model. International Journal of Educational Management, 32 (5): 761-782. https://doi.org/10.1108/IJEM-08-2017-0193
Robinson, J.  (2019). Focus groups. In: Atkinson, P., Delamont, S., Cernat, A., Sakshaug, J. W. and Williams, R. A. (eds.) SAGE Research Methods: An Encyclopedia. SAGE. http://dx.doi.org/10.4135/9781526421036821959
Nyumba, T. O., Wilson, K., Derrick, C. J. & Mukherjee, N. (2018). The use of focus group discussion methodology: Insights from two decades of application in conservation. Methods in Ecology and Evolution, 9(1),20-32. https://doi.org/10.1111/2041-210X.12860
Russell-Rose, T., Chamberlain, J. & Azzopardi, L. (2018). Information retrieval in the workplace: A comparison of professional search practices. Information Processing & Management, 54 (6), 1042-1057. https://doi.org/10.1016/j.ipm.2018.07.003
Scannapieco, M., Virgillito, A., Marchetti, C., Mecella, M. & Baldoni, R. (2004). The DaQuinCIS architecture: a platform for exchanging and improving data quality in cooperative information systems. Information Systems, 29(7), 551-582. https://doi.org/10.1016/j.is.2003.12.004
Sharma, S. (2020). Big data analytics for customer relationship management: A systematic review and research agenda. In Advances in Computing and Data Sciences: 4th International Conference, ICACDS 2020, Valletta, Malta, April 24–25, 2020, Revised Selected Papers 4 (pp. 430-438). Springer Singapore.
Sidi, F., Panahy, P. H. S., Affendey, L. S., Jabar, M. A., Ibrahim, H. & Mustapha, A. (2012). Data quality: A survey of data quality dimensions. In 2012 International Conference on Information Retrieval & Knowledge Management (pp. 300-304). IEEE. Kuala Lumpur. [DOI:10.1109/InfRKM.2012.6204995]
Su, Z. & Jin, Z. (2007). A methodology for information quality assessment in the designing and manufacturing processes of mechanical products. In Information Quality Management: Theory and Applications (pp. 190-220). IGI Global. https://doi.org/10.4018/978-1-59904-024-0.ch009
Taleb, I., Serhani, M. A. & Dssouli, R. (2018). Big data quality assessment model for unstructured data. In 2018 International Conference on Innovations in Information Technology (IIT) (pp. 69-74). IEEE. AL AIN UAE. https://doi.org/10.1109/INNOVATIONS.2018.8605945
Vaziri, R., Mohsenzadeh, M. & Habibi, J. (2017). Measuring data quality with weighted metrics. Total Quality Management & Business Excellence, 30(5-6), 708-720. https://doi.org/10.1080/14783363.2017.1332954 
Wang, R.Y. (1998). A product perspective on total data quality management. Communications of the ACM, 41(2), 58-65. https://doi.org/10.1145/269012.269022   
Wang, R. Y. & Strong, D. M. (1996). Beyond accuracy: What data quality means to data consumers. Journal of Management Information Systems 12 (4), 5-33. https://doi.org/10.1080/07421222.1996.11518099
Wang, R. Y. & Stuart, E. M. (1990). A polygen model for heterogeneous database systems: The source tagging perspective. In Proceedings of the 16th International Conference on Very Large Data Bases (pp. 519-538). San Francisco, CA, United States. Retrieved from http://web.mit.edu/tdqm/www/tdqmpub/polygenmodelAug90.pdf