Detecting Fake Job Posting Using ML Classifications and Ensemble Model

Article Preview

Abstract:

In this project, we create a fraudulent checker tool to detect fake job postings using NLP (Natural Language Processing) and ML approaches (Random Forest Classifiers, Logistic Regression, Support Vector Machines, and XGBoost Classifiers). These approaches will be compared and then combined into an ensemble model which is used for our job detector. The aim is to predict using machine learning for real or fake job prediction results with the highest accuracy. Dataset analysis is performed by supervised machine learning techniques (SMLT) and collects a variety of information such as variable identification, missing value handling, and data validation analysis. Data cleaning and preparation along with visualization are performed on the entire dataset. The ensemble model is created at the end using ML Algorithms like XGBoost, SVM, Logistic Regression, and Random Forest Classifier by choosing 4 of the best contributing features. The model produced at the end will be implemented in a Flask application for demonstration.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

362-369

Citation:

Online since:

February 2023

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2023 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] de Oliveira, Nicollas R., Pedro S. Pisa, Martin A. Lopez, Dianne S.V. de Medeiros, and Diogo M.F. Mattos. 2021. Identifying Fake News on Social Networks Based on Natural Language Processing: Trends and Challenges, Information 12, no. 1: 38. Doi: /10.3390/info12010038.

DOI: 10.3390/info12010038

Google Scholar

[2] Nasiba Mahdi Abdulkareem & Adnan Mohsin Abdulazeez, 2021. Machine Learning Classification Based on Radom Forest Algorithm: A Review,, International Journal of Science and Business, IJSAB International, vol. 5(2), pages 128-142. RePEc:aif: journal: v:5:y:2021:i:2:p:128-142.

Google Scholar

[3] Saleh Hussein, Ameer, Rihab Salah Khairy, Shaima Miqdad Mohamed Najeeb, and Haider Th.Salim Alrikabi. 2021. Credit Card Fraud Detection Using Fuzzy Rough Nearest Neighbor and Sequential Minimal Optimization With Logistic Regression,. International Journal of Interactive Mobile Technologies(iJIM)15(05):pp.24-42.

DOI: 10.3991/ijim.v15i05.17173

Google Scholar

[4] Yu, Yinshan, Mingzhen Shao, Lingjie Jiang, Yongbin Ke, Dandan Wei, Dongyang Zhang, Mingxin Jiang, and Yudong Yang. Quantitative analysis of multiple components based on support vector machine (SVM)., Optik 237 (2021): 166759.

DOI: 10.1016/j.ijleo.2021.166759

Google Scholar

[5] Giannakas, Filippos, Christos Troussas, Akrivi Krouska, Cleo Sgouropoulou, and Ioannis Voyiatzis. XGBoost and deep neural network comparison: The case of teams' performance., In International Conference on Intelligent Tutoring Systems, pp.343-349. Springer, Cham, 2021.

DOI: 10.1007/978-3-030-80421-3_37

Google Scholar

[6] Sarker, Iqbal H. Machine learning: Algorithms, real-world applications, and research directions., SN Computer Science 2, no. 3 (2021): 1-21.

Google Scholar

[7] Gozum, Ivan Efreaim A., Harvey Gain M. Capulong, Joseph Renus F. Galang, and Jose Ma W. Gopez. An ayuda to the least advantaged: providing a program for those who were hit the hardest during the COVID-19 pandemic., Journal of Public Health 43, no. 2 (2021): e317-e318.

DOI: 10.1093/pubmed/fdab014

Google Scholar

[8] Powers, David & Ailab,. (2011). Evaluation: From precision, recall and F-measure to ROC, informedness, markedness & correlation. J. Mach. Learn. Technol. 2. 2229-3981. 10.9735/2229-3981.

Google Scholar

[9] H. Hairani, A. Anggrawan, A. I. Wathan, K. A. Latif, K. Marzuki and M. Zulfikri, The Abstract of Thesis Classifier by Using Naive Bayes Method,, 2021 International Conference on Software Engineering & Computer Systems and 4th International Conference on Computational Science and Information Management (ICSECS-ICOCSIM), 2021, pp.312-315,.

DOI: 10.1109/icsecs52883.2021.00063

Google Scholar

[10] Yanfeng Zhang and Peikun He, A revised AdaBoost algorithm: FM-AdaBoost,, 2010 International Conference on Computer Application and System Modeling (ICCASM 2010), 2010, pp. V11-277-V11-281,.

DOI: 10.1109/iccasm.2010.5623209

Google Scholar

[11] X. Yu and X. yu, The Research on an Adaptive k-Nearest Neighbors Classifier,, 2006 5th IEEE International Conference on Cognitive Informatics, 2006, pp.535-540,.

DOI: 10.1109/coginf.2006.365542

Google Scholar

[12] C. Jun, Z. Fan, and F. Shan, Building up multi-layered perceptrons as classifier system for decision support,, in Journal of Systems Engineering and Electronics, vol. 6, no. 2, pp.32-39, June (1995).

Google Scholar

[13] M. Wozniak, Experiments with Boosted Decision Tree Classifiers,, 2008 Eighth International Conference on Intelligent Systems Design and Applications, 2008, pp.552-557,.

DOI: 10.1109/isda.2008.215

Google Scholar

[14] S. Yamaki, S. Seki, N. Sugita and M. Yoshizawa, Performance Evaluation of Cross Correlation Functions Based on Correlation Filters,, 2021 20th International Symposium on Communications and Information Technologies (ISCIT), 2021, pp.145-149,.

DOI: 10.1109/iscit52804.2021.9590596

Google Scholar