Developing a Prediction Model for Author Collaboration in Bioinformatics Research Using Graph Mining Techniques and Big Data Applications

Fezzeh Ebrahimi, Asefeh Asemi, Ahmad Shabani, Amin Nezarat


Nowadays, scientific collaboration has dramatically increased due to web-based technologies, advanced communication systems, and information and scientific databases. The present study aims to provide a predictive model for author collaborations in bioinformatics research output using graph mining techniques and big data applications. The study is applied-developmental research adopting a mixed-method approach, i.e., a mix of quantitative and qualitative measures. The research population consisted of all bioinformatics research documents indexed in PubMed (n=699160). The correlations of bioinformatics articles were examined in terms of weight and strength based on article sections including title, abstract, keywords, journal title, and author affiliation using graph mining techniques and big data applications. Eventually, the prediction model of author collaboration in bioinformatics research was developed using the abovementioned tools and expert-assigned weights. The calculations and data analysis were carried out using Expert Choice, Excel, Spark, and Scala, and Python programming languages in a big data server. Accordingly, the research was conducted in three phases: 1) identifying and weighting the factors contributing to authors’ similarity measurement; 2) implementing co-authorship prediction model; and 3) integrating the first and second phases (i.e., integrating the weights obtained in the previous phases). The results showed that journal title, citation, article title, author affiliation, keywords, and abstract scored 0.374, 0.374, 0.091, 0.075, 0.055, and 0.031. Moreover, the journal title achieved the highest score in the model for the co-author recommender system. As the data in bibliometric information networks is static, it was proved remarkably effective to use content-based features for similarity measures. So that the recommender system can offer the most suitable collaboration suggestions. It is expected that the model works efficiently in other databases and provides suitable recommendations for author collaborations in other subject areas. By integrating expert opinion and systemic weights, the model can help alleviate the current information overload and facilitate collaborator lookup by authors.


Recommender System; Co-author; Graph Theory; Network Analysis; Bibliographic Networks; Research collaboration

Full Text:



Aanonson, John.1987. Precision and Recall in Title keyword searchers. Information technology and libraries, 14(3): 162-170

Achary, R. 2011.An author recommendation system using both content-based and collaborative filtering methods. master's thesis. Department of computer engineering and computer science, California state university. Available from ProQuest Dissertations and Theses database.

Al Hasan M, Chaoji V, Salem S, Zaki M. 2006. Link prediction using supervised learning. In: SDM’06: Workshop on Link Analysis, Counter-terrorism and Security.

Andrikopoulos, A., Samitas, A., & Kostaris, K. 2016. Four decades of the Journal of Econometrics: Coauthorship patterns and networks. Journal of Econometrics, 195(1), 23-32. doi:

Bayat A (2002) Science, medicine, and the future: Bioinformatics. Bmj. 324, 1018-1022.

Beel, J., Gipp, B., Langer, S. et al.2016. Research-paper recommender systems: a literature survey. Int J Digit Libr 17, 305–338.

Benton D. 1996. Bioinformatics — principles and potential of a new multidisciplinary tool. Trends in Biotechnology 14: 261-272.

Brandão, Michele A., Mirella M. Moro 2012. Affiliation Influence on Recommendation in Academic Social Networks. Proceedings of the 6th Alberto Mendelzon International Workshop on Foundations of Data Management. Ouro Preto, Brazil, June 27-30, 230-234

Cabanac, G. 2011.Accuracy of inter-researcher similarity measures based on topical and social clues. Scientometrics 87, 597–620.

Chaoji, M., AlHasan, M. 2008. ORIGAMI: A Novel and Effective Approach for Mining Representative Orthogonal Graph Patterns. Statistical Analysis and Data Mining. 1. 67-84.

Chirita, P. A., Costache, S., Nejdl, W. & Handschuh, S. 2007. P-TAG: large scale automatic generation of personalized annotation tags for the web. WWW '07: Proceedings of the 16th international conference on World Wide Web (p./pp. 845-854), New York, NY, USA: ACM Press. ISBN: 978-1-59593-654-7

Cota, R.G., Ferreira, A.A., Nascimento, C., Gonçalves, M.A. and Laender, A.H.F. 2010. An unsupervised heuristic‐based hierarchical method for name disambiguation in bibliographic citations. J. Am. Soc. Inf. Sci., 61: 1853-1870. doi:10.1002/asi.21363

Das, K., Samanta, S. & Pal, M. 2018. Study on centrality measures in social networks: a survey. Soc. Netw. Anal. Min. 8, 13.

Davarpanah, M. (1996). Investigating the compatibility of Persian article titles with their content. Iranian Research Institute for Information Science and Technology (IRANDOC), 12 (2), 1-12. [in Persian]

Farhadi, M., JamZad, M. (2018). Examining similarity criteria in content-based image retrieval. CSJ. No. 9:13-27. [in Persian]

Ferrara, F., Pudota, N. and Tasso, C. 2011. A keyphrase-based paper recommender system. In Italian Research Conference on Digital Libraries. Springer, 14–25.

Ghanei Rad, M. A. (2006). Status of the scientific community in the field of social sciences. Jnoe, 27, 27-55. [in Persian]

Ghare-Chamani, J. (2013). Provide a way to suggest referrals in the referral network. (Master’s thesis). Sharif University of Technology, Computer Engineering Department. [in Persian]

Han, H., Giles, C. L., Zha, H., Li, C., Tsioutsiouliklis, K. 2004. Two supervised learning approaches for name disambiguation in author citations. In proceedings of the 4th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 296-305.

Hasheminejad, M, Motieeyan, Z., Nasiri, J. (2019). Comparison of a recommender text system with three criteria for measuring cosine similarity, Euclidian distance and Manhattan, The 6th International Congress on Development and Promotion of Fundamental Science and Technolpgy in Society, Tehran. [in Persian]

Ho, Thi Kim Thoa, Bui, Quang Vu, and Bui, M. 2019. Co-author Relationship Prediction in Bibliographic Network: A New Approach Using Geographic Factor and Latent Topic Information. In Proceedings of the Tenth International Symposium on Information and Communication Technology (SoICT 2019). Association for Computing Machinery, New York, NY, USA, 69–77. DOI:

Kamyar, M. (2014). Automatic extraction of concepts from text based on linguistic methods. (phd’s thesis). ferdowsi university of mashhad, Department of Computer Engineering. [in Persian]

Kiani, M. (2020). Information ecology in field of bioinformatics with emphasis on thematic relationships. (phd’s thesis). Isfahan university, Department of Knowledge and Information Science. [in Persian]

Li, Xinyi, Chen, Yifan, Pettit, Benjamin and Rijke, Maarten De. 2019. Personalised Reranking of Paper Recommendations Using Paper Content and User Behavior. ACM Trans. Inf. Syst. 37, 3, Article 31 (March 2019), 23 pages.

Magara, M. Benard, Ojo, S. O. and Zuva, T. 2018. A comparative analysis of text similarity measures and algorithms in research paper recommender systems. Conference on Information Communications Technology and Society (ICTAS), Durban, pp. 1-5, doi: 10.1109/ICTAS.8368766.

Makarov, I., Bulanov, O., & Zhukov, L. E. 2017. Co-author Recommender System. Paper presented at the Models, Algorithms, and Technologies for Network Analysis, Cham

Mooney, R. J. and Roy, L. 2000. Content-based book recommending using learning for text categorization. in Proceedings of the fifth ACM conference on Digital libraries, pp. 195–204. DOI: https:

Nascimento, C. Laender, A. H., da Silva, A. S. and Gonçalves, M. A. 2011. A source independent framework for research paper recommendation. in Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries, pp. 297–306.

Rathipriya, R,.Thiyagarajann,R and Thangavel.K. 2014. Recommendation of Web Pages using Weighted K-Means Clustering", International Journal of Computer Applications, vol. 86, no. 14, pp.44-48. Retrieved July, 12, 2019 from:

Roemer, R, Borchardt, R. 2015. Meaningful Metrics: A 21st Century Librarian's Guide to Bibliometrics, Altmetrics and Research Impact. USA: ACRL. ISBN: 978-0-8389-8755-1

Saaty, T. L. 1980. The Analytical Hierarchy Process. New York: McGraw-Hill.

Salton, G., Buckley, C. 1988. Term-weighting approaches in automatic text Retrieval. Information Processing & Management, volume 24(5), pp. 513–523.

Sun Y, Barber R, Gupta M, Aggarwal CC, Han J. 2011. Co-author relationship prediction in heterogeneous bibliographic networks. In: Advances in Social Networks Analysis and Mining (ASONAM), 2011 International Conference on,. p. 121–128. 00044. Available from: abs_all.jsp?arnumber=5992571.

Tabarzeh, F. (2018). Analysis of the scientific cooperation network of university professors in the field of social sciences. (Master’s thesis). Tarbiat Modares University, Faculty of Humanities. [in Persian]

Wang, C, Satuluri, V, and Parthasarathy, S. 2007. Local Probabilistic Models for Link Prediction. In Seventh IEEE International Conference on Data Mining (ICDM 2007). IEEE, Omaha, NE, USA, 322–331. ICDM.2007.108

Wilkinson, s., Silverman, D. 2004. Focus group research. Qualitative research: Theory, method, and practice, 177-199.

Wu, F., Mi, L., Li, X., Huang, L. and Tong, Y. 2018. Identifying Potential Standard Essential Patents Based on Text Mining and Generative Topographic Mapping. 2018 IEEE International Symposium on Innovation and Entrepreneurship (TEMS-ISIE), Beijing, pp. 1-9.

Yan, E., & Guns, R. 2014. Predicting and recommending collaborations: An author, institution, and country-level analysis. Journal of Informetrics, 8, 295–309. doi:10.1016/j.joi.2014.01.008


  • There are currently no refbacks.

E-ISSN: 2008-8310

   ISSN: 2008-8302