Optimized Malware Detection Framework Using Principal Components Analysis

Article Preview

Abstract:

Software that can damage an information system asset is considered a malware, such information systems have been rendered to several destructive attacks mainly due to the emergence of the Internet. Conventional Antimalware software is not effective at eliminating malware due to its many evasion techniques such as polymorphism and code obfuscation. Antimalware software is ineffectual and defenseless against zero-day attacks as it can only eliminate malware for which it has signatures. K Nearest Neighbor, Decision Tree and Support Vector Machine are some of the leading classifiers that has successfully detect and classify Malware but optimal accuracy of detection has not been achieved, in addition, false positives and false negatives persists because the hyperparameters of these classifiers were not optimized and noise was not filtered out of the datasets using feature selection technique. The aim of this research is to develop an optimized malware detection and classification framework employing Principal Components Analysis to mitigate the curse of dimensionality while utilizing optimal hyperparameters of chosen classifiers to boost accuracy of malware detection and classification as well as reduction of false positives and false negatives. This research employed K Nearest Neighbor, Decision Tree, and Support Vector Machine to detect and classify malware with CICMalmem dataset to train the model. Grid search optimization was combined with K-fold cross-validation to optimize the hyperparameters of the selected classifiers in order to boost the model's performance and achieve high detection accuracy as well as low false positives and low false negatives. Machine learning performance metrics such as the F1 Score, Precision, Recall, and Confusion Matrix were used to evaluate the Research Model. K Nearest Neighbor generated Zero False Positives while KNN, Decision Tree and Support Vector Machine achieved Accuracy of 99%, 98.64, and 100% respectively.

You might also be interested in these eBooks

Info:

Periodical:

Engineering Headway (Volume 37)

Pages:

147-161

Citation:

Online since:

March 2026

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2026 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] F. A. Aboaoja, Zainal, A. Ali, A. Ghaleb, A. M. Alsolami, F. A. & M. A. Rassam, Dynamic Extraction of Initial Behavior for Evasive Malware Detection. Mathematics, 11(2), 2023, 1–23

DOI: 10.3390/math11020416

Google Scholar

[2] F.Abri, S. Siami-Namini, M. A. Khanghah, F. M. Soltani, & A. S Namin, Can Machine/Deep Learning Classifiers Detect Zero-Day Malware with High Accuracy? Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019, 3252–3259.

DOI: 10.1109/bigdata47090.2019.9006514

Google Scholar

[3] F., Adi Rafrastara, Ghozi, Rakhmat Sani, W.Budi Handoko, R. L. Rahmawan, E. Pramudya, & F.M. Abdollah , Integrating Information Gain and Chi-Square for Enhanced Malware Detection Performance. Journal of Information and Communication Technology, 24(1), 2025, 79–101.

DOI: 10.32890/jict2025.24.1.4

Google Scholar

[4] I. S. Al-Mejibli, J. K. Alwan, & D. H Abd, The effect of gamma value on support vector machine performance with different kernels. International Journal of Electrical and Computer Engineering, 10(5), 2020. 5497–5506.

DOI: 10.11591/ijece.v10i5.pp5497-5506

Google Scholar

[5] M. Asam, S. J. Hussain, M. Mohatram, S. Khan, H., Jamal, T., Zafar, A., Khan, A., Ali, M. U., & U. Zahoora, Detection of exceptional malware variants using deep boosted feature spaces and machine learning. Applied Sciences (Switzerland), 11(21), 2021.

DOI: 10.3390/app112110464

Google Scholar

[6] A. Assegie, An Optimized KNN Model for Signature-Based Malware Detection. International Journal of Computer Engineering in Research Trends, 8(2), 2021, 46–49.

Google Scholar

[7] A. Blum, L. Wang, Machine Learning Theory. Carnegie Melon Universit, School of Computer Science, 2011, 26.

Google Scholar

[8] A. K. Gárate-Escamila, A.Hajjam El Hassani, & E. Andrès, Classification models for heart disease prediction using feature selection and PCA. Informatics in Medicine Unlocked, 2020, 19.

DOI: 10.1016/j.imu.2020.100330

Google Scholar

[9] Arakkal, A., Pazheri Sharafudheen, S., & Vasudevan, A. R. (2023, December). Crypto-Ransomware Detection: A Honey-File Based Approach Using Chi-Square Test. In International Conference on Information Systems Security (pp.449-458). Cham: Springer Nature Switzerland.

DOI: 10.1007/978-3-031-49099-6_27

Google Scholar

[10] Zhou, H., Wang, X., & Zhu, R. (2022). Feature selection based on mutual information with correlation coefficient. Applied intelligence, 52(5), 5457-5474.

DOI: 10.1007/s10489-021-02524-x

Google Scholar

[11] Kamalov, F., Sulieman, H., Alzaatreh, A., Emarly, M., Chamlal, H., & Safaraliev, M. (2025). Mathematical methods in feature selection: A review. Mathematics, 13(6), 996.

DOI: 10.3390/math13060996

Google Scholar

[12] Siraj, M. J., Ahmad, T., & Ijtihadie, R. M. (2022). Analyzing ANOVA F-test and Sequential Feature Selection for Intrusion Detection Systems. International Journal of Advances in Soft Computing & Its Applications, 14(2).

DOI: 10.15849/ijasca.220720.13

Google Scholar

[13] Alasmari, A., Farooqi, N., & Alotaibi, Y. (2024). Sentiment analysis of pilgrims using CNN-LSTM deep learning approach. PeerJ Computer Science, 10, e2584.

DOI: 10.7717/peerj-cs.2584

Google Scholar

[14] Rosyada, S., Rafrastara, F. A., Ramadhani, A., Ghozi, W., & Yassin, W. (2024). Enhancing XGBoost Performance in Malware Detection through Chi-Squared Feature Selection. Jurnal Sisfokom (Sistem Informasi dan Komputer), 13(3), 396-402.

DOI: 10.32736/sisfokom.v13i3.2293

Google Scholar

[15] Gazzan, M., & Sheldon, F. T. (2024). An incremental mutual information-selection technique for early ransomware detection. Information, 15(4), 194.

DOI: 10.3390/info15040194

Google Scholar

[16] Hasan, R., Biswas, B., Samiun, M., Saleh, M. A., Prabha, M., Akter, J., ... & Abdullah, M. (2025). Enhancing malware detection with feature selection and scaling techniques using machine learning models. Scientific Reports, 15(1), 9122.

DOI: 10.1038/s41598-025-93447-x

Google Scholar

[17] Al-Nafjan, K., Al-Hussein, M. A., Alghamdi, A. S., Haque, M. A., & Ahmad, I. (2012). Intrusion detection using PCA based modular neural network. International Journal of Machine Learning and Computing, 2(5), 583.

DOI: 10.7763/ijmlc.2012.v2.194

Google Scholar

[18] Rafrastara, F. A., Ghozi, W., Sani, R. R., Handoko, L. B., Abdussalam, A., Pramudya, E. R., & M Abdollah, F. (2025). Integrating Information Gain and Chi-Square for Enhanced Malware Detection Performance. Journal of Information and Communication Technology (JICT), 24(1), 79-101.

DOI: 10.32890/jict2025.24.1.4

Google Scholar

[19] Kish, A. (2025). Rationalizing Relational Schemata: A Statistical Approach using PCA and Dimensionality Reduction.

Google Scholar

[20] Akhtar, N. (2025). Integrating Principal Component Analysis and Deep Learning Methods for Data Representation and Image Denoising (Doctoral dissertation).

Google Scholar

[21] Mishra, S. P., Sarkar, U., Taraphder, S., Datta, S., Swain, D., Saikhom, R., ... & Laishram, M. (2017). Multivariate statistical data analysis-principal component analysis (PCA). International Journal of Livestock Research, 7(5), 60-78.

DOI: 10.5455/ijlr.20170415115235

Google Scholar

[22] Wagner, F. (2015). GO-PCA: An unsupervised method to explore gene expression data using prior knowledge. PloS one, 10(11), e0143196.

DOI: 10.1371/journal.pone.0143196

Google Scholar

[23] Palma, J., & Pierdominici‐Sottile, G. (2023). On the uses of PCA to characterise molecular dynamics simulations of biological macromolecules: basics and tips for an effective use. ChemPhysChem, 24(2), e202200491.

DOI: 10.1002/cphc.202200491

Google Scholar

[24] Lanjun, S., Zhijian, L., Xiongfei, M., Yanchao, Z., Shuhan, H., Le, L., & Lin, W. (2025). Rapid identification of marine microplastics by laser-induced fluorescence technique based on PCA combined with SVM and KNN algorithm. Environmental Research, 269, 120947.

DOI: 10.1016/j.envres.2025.120947

Google Scholar

[25] Odeh, A., Taleb, A. A., Alhajahjeh, T., & Navarro, F. (2025). Advanced memory forensics for malware classification with deep learning algorithms. Cluster Computing, 28(6), 353.

DOI: 10.1007/s10586-025-05104-7

Google Scholar

[26] Guo, Y. (2023). A review of machine learning-based zero-day attack detection: Challenges and future directions. Computer communications, 198, 175-185.]

DOI: 10.1016/j.comcom.2022.11.001

Google Scholar

[27] Akhtar, M.S., & Feng, T. (2022). Malware analysis and detection using machine learning algorithms. Symmetry, 14(11), 2304.

DOI: 10.3390/sym14112304

Google Scholar

[28] P.V. Shijoa, A. Salimb Integrated static and dynamic analysis for malware detection International Conference on Information and Communication Technologies (ICICT 2014)

Google Scholar

[29] Zhuoma (2019) Combination Method for Android Malware Detection Based on Control Flow Graphs and Machine Learning Algorithms, School of Cyber Engineering, Xidian University, Xi'an 71007, 2169-3536 (c) 2018 IEEE.

DOI: 10.1109/access.2019.2896003

Google Scholar

[30] Adeyemi, D. S. (2024). Effectiveness of machine learning models in intrusion detection systems: A systematic review. Communication in Physical Sciences, 11(4), 1060-1088.

Google Scholar

[31] Liu, L., Wang, B. S., Yu, B., & Zhong, Q. X. (2017). Automatic malware classification and new malware detection using machine learning. Frontiers of Information Technology & Electronic Engineering, 18(9), 1336-1347.

DOI: 10.1631/fitee.1601325

Google Scholar

[32] David, O. E., & Netanyahu, N. S. (2015, July). Deepsign: Deep learning for automatic malware signature generation and classification. In 2015 International Joint Conference on Neural Networks (IJCNN) (pp.1-8). IEEE

DOI: 10.1109/ijcnn.2015.7280815

Google Scholar