Jalal Rezaeenour; Mansoureh Yari Eili; Esmaeil Hadavandi; Mohammad Hossein Roozbahani
Volume 16, Issue 1 , February 2018
Abstract
This study aims to predict the amount of attention news articles ultimately receive using data mining technology. As well known, useful knowledge in Online Social Networking Services (Such as Digg, Twitter, Facebook and YouTube) is often hidden in large amounts of web data. Generally, due to dimensionality, ...
Read More
This study aims to predict the amount of attention news articles ultimately receive using data mining technology. As well known, useful knowledge in Online Social Networking Services (Such as Digg, Twitter, Facebook and YouTube) is often hidden in large amounts of web data. Generally, due to dimensionality, irrelevant attributes will deteriorate the performance of the learning algorithms which increases training and testing times. In this paper, to reduce this impact in predicting the popularity of online news, a new feature selection algorithm is proposed based on Mutual Information. Cellucci-Mutual Information-based Feature Selection (MIFS) is firstly employed to select the most informative variables which affect the popularity of a news article. Then the selected features are used to train an Extreme Learning Machine (ELM) neural network. Experimental tests using practical datasets from the UCI repository were implemented to validate the performance of the proposed model. The analyses demonstrate that the proposed method can extract the most important features of online news data and can accurately predict future popularity. The prediction accuracy of ELM can improve dramatically using C_MIFS. With error rates RMSE=0.16 and MAPE=0.23. Hence, the new data mining model can provide practical application for online content popularity forecasting for digital media websites.