Document Type : Articles


1 National Research and Innovation Agency

2 IPB University


R&D is one of the key drivers of technological progress and contributes to increased productivity and profit growth. Indonesian percentage of Gross Domestic Expenditure on R & R&D (GERD) to GDP in 2018 is one of the Global Competitiveness Index indicators, only reaches 0.28% and is dominated by the government sector, while the industrial sector is only 7.34%. One of the reasons for this small value is that the data collection of R&D on the business sector in Indonesia has not been carried out optimally. A classification model is needed to determine the data collection target so that the results are more optimal. The main objective of this study is to classify R&D industries actors in Indonesia using XGBoost and then analyze the features for R&D industries actors using SHAP. XGBoost is one of the black-box models that is difficult to interpret, and SHAP is one of the interpretation methods. The classification results using XGBoost obtained the accuracy, AUC, and F1-Score values of 79.61%, 0.7646, and 84.44%, respectively. Based on the Shapley value of the SHAP method, it was found that the average growth in R&D expenditure had the highest contribution. The feature's contribution to the estimation will be even higher if the mean of R&D expenditure growth is higher (more than 0). The other one is the ratio of researchers to R&D human resources. If the ratio is more than 75%, it will negatively contribute. Finally, exports and State-Owned Enterprise (BUMN) feature with the smallest contribution.


  1. Chen T, Guestrin C. 2016. Xgboost: A scalable tree boosting system. Knowledge discovery and data mining. ACM: 785-794.
  2. Inekwe, John Nkwoma. (2014). The Contribution of R&D Expenditure to Economic Growth in Developing Economies. Springer.
  3. Miller, Tim. (2017). “Explanation in artificial intelligence: Insights from the social sciences”. arXiv Preprint arXiv:1706.07269.
  4. Molnar, Christoph. (2020). A Guide for Making Black Box Models Explainable. Available at :
  5. Sartono, B & Syafitri, UD. (2010). Metode pohon gabungan: solusi pilihan untuk mengatasi kelemahan pohon regresi dan klasifikasi tunggal. Forum Statistika dan Komputasi 15 (1): 1-7.
  6. Shapley, Lloyd S. (1953). "A value for n-person games." Contributions to the Theory of Games 2.28: 307-317.
  7. Štrumbelj, Erik, and Igor Kononenko. (2014). "Explaining prediction models and individual predictions with feature contributions." Knowledge and information systems 41.3 : 647-665.
  8. Lundberg, Scott M., and Su-In Lee. (2017). "A unified approach to interpreting model predictions." Advances in Neural Information Processing Systems.
  9. Lundberg, Scott M., Gabriel G. Erion, and Su-In Lee. (2018). "Consistent individualized feature attribution for tree ensembles." arXiv preprint arXiv:1802.03888.
  10. Zhou ZH. (2012). Ensemble methods: foundations and algorithms. Chapman and Hall/CRC.Available at : %20methods%20-%20Zhou.pdf