A Hybrid Machine Learning Model Based on Global and Local Learner Algorithms for Diabetes Mellitus Prediction

Article Preview

Abstract:

Health is a critical condition for living things, even before the technology exists. Nowadays the healthcare domain provides a lot of scope for research as it has extremely evolved. The most researched areas of health sectors include diabetes mellitus (DM), breast cancer, brain tumor, etc. DM is a severe chronic disease that affects human health and has a high rate throughout the world. Early prediction of DM is important to reduce its risk and even avoid it. In this study, we propose a DM prediction model based on global and local learner algorithms. The proposed global and local learners stacking (GLLS) model; combines the prediction algorithms from two largely different but complementary machine learning paradigms, specifically XGBoost and NB from global learning whereas kNN and SVM (with RBF kernel) from local learning and aggregates them by stacking ensemble technique using LR as meta-learner. The effectiveness of the GLLS model was proved by comparing several performance measures and the results of different contrast experiments. The evaluation results on UCI Pima Indian diabetes data-set (PIDD) indicates the model has achieved the better prediction performance of 99.5%, 99.5%, 99.5%, 99.1%, and 100% in terms of accuracy, AUC, F1 score, sensitivity, and specificity respectively, compared to other research results mentioned in the literature. Moreover, to better validate the GLLS model performance, three additional medical data sets; Messidor, WBC, ILPD, are considered and the model also achieved an accuracy of 82.1%, 98.6%, and 89.3% respectively. Experimental results proved the effectiveness and superiority of our proposed GLLS model.

You might also be interested in these eBooks

Info:

Pages:

65-88

Citation:

Online since:

January 2022

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2022 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] Y. K. Afework, T. G. Debelee, Detection of bacterial wilt on enset crop using deep learning approach, in: International Journal of Engineering Research in Africa, Vol. 51, Trans Tech Publ, 2020, p.131–146.

DOI: 10.4028/www.scientific.net/jera.51.131

Google Scholar

[2] T. G. Debelee, F. Schwenker, A. Ibenthal, D. Yohannes, Survey of deep learning in breast cancer image analysis, Evolving Systems 11 (1) (2020) 143–163.

DOI: 10.1007/s12530-019-09297-2

Google Scholar

[3] T. G. Debelee, S. R. Kebede, F. Schwenker, Z. M. Shewarega, Deep learning in selected cancers' image analysis—a survey, Journal of Imaging 6 (11) (2020) 121.

DOI: 10.3390/jimaging6110121

Google Scholar

[4] T. G. Debelee, M. Amirian, A. Ibenthal, G. Palm, F. Schwenker, Classification of mammograms using convolutional neural network based feature extraction, in: International Conference on Information and Communication Technology for Develoment for Africa, Springer, 2017, p.89–98.

DOI: 10.1007/978-3-319-95153-9_9

Google Scholar

[5] T. G. Debelee, F. Schwenker, S. Rahimeto, D. Yohannes, Evaluation of modified adaptive k-means segmentation algorithm, Computational Visual Media 5 (4) (2019) 347–361.

DOI: 10.1007/s41095-019-0151-2

Google Scholar

[6] T. G. Debelee, A. Gebreselasie, F. Schwenker, M. Amirian, D. Yohannes, Classification of mammograms using texture and cnn based extracted features, in: Journal of Biomimetics, Biomaterials and Biomedical Engineering, Vol. 42, Trans Tech Publ, 2019, p.79–97.

DOI: 10.4028/www.scientific.net/jbbbe.42.79

Google Scholar

[7] S. Rahimeto, T. G. Debelee, D. Yohannes, F. Schwenker, Automatic pectoral muscle removal in mammograms, Evolving Systems (2019) 1–8.

DOI: 10.1007/s12530-019-09310-8

Google Scholar

[8] S. R. Kebede, T. G. Debelee, F. Schwenker, D. Yohannes, Classifier based breast cancer segmentation, in: Journal of Biomimetics, Biomaterials and Biomedical Engineering, Vol. 47, Trans Tech Publ, 2020, p.41–61.

DOI: 10.4028/www.scientific.net/jbbbe.47.41

Google Scholar

[9] E. S. Biratu, F. Schwenker, T. G. Debelee, S. R. Kebede, W. G. Negera, H. T. Molla, Enhanced region growing for brain tumor mr image segmentation, Journal of Imaging 7 (2) (2021) 22.

DOI: 10.3390/jimaging7020022

Google Scholar

[10] P. Saeedi, I. Petersohn, P. Salpea, B. Malanda, S. Karuranga, N. Unwin, S. Colagiuri, L. Guariguata, A. A. Motala, K. Ogurtsova, et al., Global and regional diabetes prevalence estimates for 2019 and projections for 2030 and 2045: Results from the international diabetes federation diabetes atlas, Diabetes research and clinical practice 157 (2019) 107843.

DOI: 10.1016/j.diabres.2019.107843

Google Scholar

[11] C. Hettiarachchi, C. Chitraranjan, A machine learning approach to predict diabetes using short recorded photoplethysmography and physiological characteristics, in: Conference on Artificial Intelligence in Medicine in Europe, Springer, 2019, p.322–327.

DOI: 10.1007/978-3-030-21642-9_41

Google Scholar

[12] Z.H. Zhou, Ensemble methods: foundations and algorithms, Chapman and Hall/CRC, (2019).

Google Scholar

[13] M. F. Kabir, S. A. Ludwig, Enhancing the performance of classification using super learning, Data-Enabled Discovery and Applications 3 (1) (2019) 5.

DOI: 10.1007/s41688-019-0030-0

Google Scholar

[14] C.X. Zhang, S.W. Kim, J.S. Zhang, On selective learning in stochastic stepwise ensembles, International Journal of Machine Learning and Cybernetics 11 (1) (2020) 217–230.

DOI: 10.1007/s13042-019-00968-9

Google Scholar

[15] X. Fan, C.-H. Lung, S. A. Ajila, et al., Using hybrid and diversity-based adaptive ensemble method for binary classification, International Journal of Intelligence Science 8 (03) (2018) 43.

DOI: 10.4236/ijis.2018.83003

Google Scholar

[16] Z. Xu, Z. Wang, A risk prediction model for type 2 diabetes based on weighted feature selection of random forest and xgboost ensemble classifier, in: 2019 Eleventh International Conference on Advanced Computational Intelligence (ICACI), IEEE, 2019, p.278–283.

DOI: 10.1109/icaci.2019.8778622

Google Scholar

[17] A. Sarwar, M. Ali, J. Manhas, V. Sharma, Diagnosis of diabetes type-II using hybrid machine learning based ensemble model, International Journal of Information Technology 12 (2) (2020) 419–428.

DOI: 10.1007/s41870-018-0270-5

Google Scholar

[18] Y. Yang, Ensemble learning, in: temporal data mining via unsupervised ensemble learning, Elsevier, 2017, p.35–56.

DOI: 10.1016/b978-0-12-811654-8.00004-x

Google Scholar

[19] N. Bhavana, M. S. Chadaga, K. Pradeep, A review of ensemble machine learning approach in prediction of diabetes diseases, International Journal on Future Revolution in Computer Science & Communication Engineering 4 (3) (2018) 463–466.

Google Scholar

[20] N. Nnamoko, A. Hussain, D. England, Predicting diabetes onset: An ensemble supervised learning approach, in: 2018 IEEE Congress on Evolutionary Computation (CEC), IEEE, 2018, p.1–7.

DOI: 10.1109/cec.2018.8477663

Google Scholar

[21] A. Husain, M. H. Khan, Early diabetes prediction using voting based ensemble learning, in: International Conference on Advances in Computing and Data Sciences, Springer, 2018, p.95–103.

DOI: 10.1007/978-981-13-1810-8_10

Google Scholar

[22] M. F. Kabir, S. A. Ludwig, Enhancing the performance of classification using super learning, Data-Enabled Discovery and Applications 3 (1) (2019) 5.

DOI: 10.1007/s41688-019-0030-0

Google Scholar

[23] Z. Xu, Z. Wang, A risk prediction model for type 2 diabetes based on weighted feature selection of random forest and xgboost ensemble classifier, in: 2019 Eleventh International Conference on Advanced Computational Intelligence (ICACI), IEEE, 2019, p.278–283.

DOI: 10.1109/icaci.2019.8778622

Google Scholar

[24] Information on https://www.kaggle.com/uciml/pima-indians-diabetes-database.

Google Scholar

[25] Information on https://archive.ics.uci.edu/ml/index.php.

Google Scholar

[26] G. Chhabra, V. Vashisht, J. Ranjan, A comparison of multiple imputation methods for data with missing values, Indian Journal of Science and Technology 10 (19) (2017) 1–7.

Google Scholar

[27] M. Maniruzzaman, M. J. Rahman, M. Al-MehediHasan, H. S. Suri, M. M. Abedin, A. El-Baz, J. S. Suri, Accurate diabetes risk stratification using machine learning: role of missing value and outliers, Journal of medical systems 42 (5) (2018) 1–17.

DOI: 10.1007/s10916-018-0940-7

Google Scholar

[28] X.-Y. Liu, S.-T. Wang, M.-L. Zhang, Transfer synthetic over-sampling for class-imbalance learning with limited minority class data, Frontiers of Computer Science 13 (5) (2019) 996–1009.

DOI: 10.1007/s11704-018-7182-1

Google Scholar

[29] M. Alghamdi, M. Al-Mallah, S. Keteyian, C. Brawner, J. Ehrman, S. Sakr, Predicting diabetes mellitus using smote and ensemble machine learning approach: The henry ford exercise testing (fit) project, PloS one 12 (7) (2017) e0179805.

DOI: 10.1371/journal.pone.0179805

Google Scholar

[30] N. V. Chawla, Data mining for imbalanced datasets: An overview, Data mining and knowledge discovery handbook (2009) 875–886.

DOI: 10.1007/978-0-387-09823-4_45

Google Scholar

[31] D. J. Hand, V. Vinciotti, Local versus global models for classification problems: fitting models where it matters, The American Statistician 57 (2) (2003) 124–131.

DOI: 10.1198/0003130031423

Google Scholar

[32] T. M. Mitchell, et al., Machine learning (1997).

Google Scholar

[33] D. H. Wolpert, Stacked generalization, Neural networks 5 (2) (1992) 241–259.

Google Scholar

[34] T. Chen, C. Guestrin, Xgboost: A scalable tree boosting system, in: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 2016, p.785–794.

DOI: 10.1145/2939672.2939785

Google Scholar

[35] G. H. John, P. Langley, Estimating continuous distributions in bayesian classifiers, arXiv preprint arXiv:1302.4964 (2013).

Google Scholar

[36] I.H. Witten, E. Frank, M.A. Hall: Practical machine learning tools and techniques. Morgan Kaufmann (2005) p.578.

Google Scholar

[37] D. W. Aha, D. Kibler, M. K. Albert, Instance-based learning algorithms, Machine learning 6 (1) (1991) 37–66.

DOI: 10.1007/bf00153759

Google Scholar

[38] S. Raschka, V. Mirjalili, Python machine learning: Machine learning and deep learning with python, Scikit-Learn, and TensorFlow. Second editioned (2017).

DOI: 10.1002/9781119557500.ch5

Google Scholar

[39] K. Raza, Improving the prediction accuracy of heart disease with ensemble learning and majority voting rule, in: U-Healthcare Monitoring Systems, Elsevier, 2019, p.179–196.

DOI: 10.1016/b978-0-12-815370-3.00008-6

Google Scholar

[40] B. Farran, A. M. Channanath, K. Behbehani, T. A. Thanaraj, Predictive models to assess risk of type 2 diabetes, hypertension and comorbidity: machine-learning algorithms and validation using national health data from kuwait—a cohort study, BMJ open 3 (5) (2013) e002457.

DOI: 10.1136/bmjopen-2012-002457

Google Scholar

[41] Y. Jiao, P. Du, Performance measures in evaluating machine learning based bioinformatics predictors for classifications, Quantitative Biology 4 (4) (2016) 320–330.

DOI: 10.1007/s40484-016-0081-2

Google Scholar

[42] M. F. Faruque, I. H. Sarker, et al., Performance analysis of machine learning techniques to predict diabetes mellitus, in: 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE), IEEE, 2019, p.1–4.

DOI: 10.1109/ecace.2019.8679365

Google Scholar

[43] T. M. Alam, M. A. Iqbal, Y. Ali, A. Wahab, S. Ijaz, T. I. Baig, A. Hussain, M. A. Malik, M. M. Raza, S. Ibrar, et al., A model for early prediction of diabetes, Informatics in Medicine Unlocked 16 (2019) 100204.

DOI: 10.1016/j.imu.2019.100204

Google Scholar

[44] A. Choudhury, D. Gupta, A survey on medical diagnosis of diabetes using machine learning techniques, in: Recent developments in machine learning and data analytics, Springer, 2019, p.67–78.

DOI: 10.1007/978-981-13-1280-9_6

Google Scholar

[45] Y. Srivastava, P. Khanna, S. Kumar, Estimation of gestational diabetes mellitus using azure ai services, in: 2019 Amity International Conference on Artificial Intelligence (AICAI), IEEE, 2019, p.321–326.

DOI: 10.1109/aicai.2019.8701307

Google Scholar

[46] D. Vigneswari, N. K. Kumar, V. G. Raj, A. Gugan, S. Vikash, Machine learning tree classifiers in predicting diabetes mellitus, in: 2019 5th international conference on advanced computing & communication systems (ICACCS), IEEE, 2019, p.84–87.

DOI: 10.1109/icaccs.2019.8728388

Google Scholar

[47] D. Sisodia, D. S. Sisodia, Prediction of diabetes using classification algorithms, Procedia computer science 132 (2018) 1578–1585.

DOI: 10.1016/j.procs.2018.05.122

Google Scholar

[48] M. F. Kabir, S. A. Ludwig, Enhancing the performance of classification using super learning, Data-Enabled Discovery and Applications 3 (1) (2019) 5.

DOI: 10.1007/s41688-019-0030-0

Google Scholar

[49] R. Birjais, A. K. Mourya, R. Chauhan, H. Kaur, Prediction and diagnosis of future diabetes risk: a machine learning approach, SN Applied Sciences 1 (9) (2019) 1–8.

DOI: 10.1007/s42452-019-1117-9

Google Scholar

[50] H. Kaur, V. Kumari, Predictive modelling and analytics for diabetes using a machine learning approach, Applied computing and informatics (2020).

DOI: 10.1016/j.aci.2018.12.004

Google Scholar

[51] M. Jahangir, H. Afzal, M. Ahmed, K. Khurshid, R. Nawaz, Eco-amlp: A decision support system using an enhanced class outlier with automatic multilayer perceptron for diabetes prediction, arXiv preprint arXiv:1706.07679 (2017).

DOI: 10.1109/intellisys.2017.8324209

Google Scholar

[52] M. Maniruzzaman, M. J. Rahman, M. Al-MehediHasan, H. S. Suri, M. M. Abedin, A. El-Baz, J. S. Suri, Accurate diabetes risk stratification using machine learning: role of missing value and outliers, Journal of medical systems 42 (5) (2018) 1–17.

DOI: 10.1007/s10916-018-0940-7

Google Scholar

[53] Z. Xu, Z. Wang, A risk prediction model for type 2 diabetes based on weighted feature selection of random forest and xgboost ensemble classifier, in: 2019 Eleventh International Conference on Advanced Computational Intelligence (ICACI), IEEE, 2019, p.278–283.

DOI: 10.1109/icaci.2019.8778622

Google Scholar