Clustering Tax Compliance Data Using k-Means Algorithm: A Study Case of Manufacturing Companies in Indonesia

Article Preview

Abstract:

This research examines the use of machine learning to group a collection of data related to the tax compliance of manufacturing firms in the Greater Jakarta area of Indonesia. The data set was obtained through a survey and was able to collect data from 209 respondents who represented the finance department of the companies. The k-means algorithm is applied to develop machine learning. The clustering aims at dividing the data set on the basis of similarity into three clusters. The result showed that the machine learning model was able to cluster the data into three groups. An evaluation was presented by comparing the clustering result with a classification result based on the average survey score that has been studied previously. The evaluation shows a small correlation between the clusters and the average survey score. Compliance of tax payers is a complex system and cannot be merely indicated based in the survey score. The clustering technique demonstrated its usefulness in uncovering intriguing patterns, distributions, and the fundamental structure of the data.

You might also be interested in these eBooks

Info:

Periodical:

Engineering Headway (Volume 27)

Pages:

711-718

Citation:

Online since:

October 2025

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2025 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] L. H. A. Al-Ttaffi, S. A. Bin-Nashwan, and M. R. Amrah, "The influence of tax knowledge on tax compliance behaviour: A case of yemeni individual taxpayers," Journal of Business Management and Accounting, vol. 10, no. 2, pp.15-30, 2020.

DOI: 10.32890/jbma2020.10.2.2

Google Scholar

[2] L. Batrancea, A. Nichita, J. Olsen, C. Kogler, E. Kirchler, E. Hoelzl, A. Weiss, B. Torgler, J. Fooken, J. Fuller, et al., "Trust and power as determinants of tax compliance across 44 nations," Journal of Economic Psychology, vol. 74, p.102191, 2019.

DOI: 10.1016/j.joep.2019.102191

Google Scholar

[3] M. Bellon, E. Dabla-Norris, S. Khalid, and F. Lima, "Digitalization to improve tax compliance: Evidence from vat e-invoicing in peru," Journal of Public Economics, vol. 210, p.104661, 2022.

DOI: 10.1016/j.jpubeco.2022.104661

Google Scholar

[4] P. Lois, G. Drogalas, A. Karagiorgos, and A. Chlorou, "Tax compliance during fiscal depression periods: the case of greece," EuroMed Journal of Business, vol. 14, no. 3, pp.274-291, 2019.

DOI: 10.1108/emjb-02-2019-0028

Google Scholar

[5] H. T. H. Le, V. T. B. Tuyet, C. T. B. Hanh, and Q. Hung, "Factors affecting tax compliance among small-and medium-sized enterprises: Evidence from vietnam," The Journal of Asian Finance, Economics and Business (JAFEB), vol. 7, no. 7, pp.209-217, 2020.

DOI: 10.13106/jafeb.2020.vol7.no7.209

Google Scholar

[6] E. Carsamer and A. Abbam, "Religion and tax compliance among smes in ghana," Journal of Financial Crime, vol. 30, no. 3, pp.759-775, 2023.

DOI: 10.1108/jfc-01-2020-0007

Google Scholar

[7] A. Ghosal, A. Nandy, A. K. Das, S. Goswami, and M. Panday, "A short review on different clustering techniques and their applications," Emerging Technology in Modelling and Graphics: Proceedings of IEM Graph 2018, pp.69-83, 2020.

DOI: 10.1007/978-981-13-7403-6_9

Google Scholar

[8] M. Ahmed, R. Seraj, and S. M. S. Islam, "The k-means algorithm: A comprehensive survey and performance evaluation," Electronics, vol. 9, no. 8, p.1295, 2020.

DOI: 10.3390/electronics9081295

Google Scholar

[9] X. Zhang and D. J. Hill, "Clustering of uncertain load model parameters with k-medoids algorithm," in 2018 IEEE Power & Energy Society General Meeting (PESGM), pp.1-5, IEEE, 2018.

DOI: 10.1109/pesgm.2018.8586038

Google Scholar

[10] K. S. Dorman and R. Maitra, "An efficient k-modes algorithm for clustering categorical datasets," Statistical Analysis and Data Mining: The ASA Data Science Journal, vol. 15, no. 1, pp.83-97, 2022.

DOI: 10.1002/sam.11546

Google Scholar

[11] D. P. Ismi and M. Murinto, "Clustering based feature selection using partitioning around medoids (pam)," Jurnal Informatika Ahmad Dahlan, vol. 14, no. 2, pp.50-57, 2020.

DOI: 10.26555/jifo.v14i2.a17620

Google Scholar

[12] K. Djouzi and K. Beghdad-Bey, "A review of clustering algorithms for big data," in 2019 International Conference on Networking and Advanced Systems (ICNAS), pp.1-6, IEEE, 2019.

DOI: 10.1109/icnas.2019.8807822

Google Scholar

[13] A. E. Ezugwu, A. M. Ikotun, O. O. Oyelade, L. Abualigah, J. O. Agushaka, C. I. Eke, and A. A. Akinyelu, "A comprehensive survey of clustering algorithms: State-of-the-art machine learning applications, taxonomy, challenges, and future research prospects," Engineering Applications of Artificial Intelligence, vol. 110, p.104743, 2022.[14] A. K. Jain, M. N. Murty, and P. J. Flynn, "Data clustering: a review," ACM computing surveys (CSUR), vol. 31, no. 3, pp.264-323, 1999.

DOI: 10.1016/j.engappai.2022.104743

Google Scholar

[15] P. Govender and V. Sivakumar, "Application of k-means and hierarchical clustering techniques for analysis of air pollution: A review (1980-2019)," Atmospheric pollution research, vol. 11, no. 1, pp.40-56, 2020.

DOI: 10.1016/j.apr.2019.09.009

Google Scholar

[16] H. Teichgraeber and A. R. Brandt, "Clustering methods to find representative periods for the optimization of energy systems: An initial framework and comparison," Applied energy, vol. 239, pp.1283-1293, 2019.

DOI: 10.1016/j.apenergy.2019.02.012

Google Scholar

[17] K. G. Al-Hashedi and P. Magalingam, "Financial fraud detection applying data mining techniques: A comprehensive review from 2009 to 2019," Computer Science Review, vol. 40, p.100402, 2021.

DOI: 10.1016/j.cosrev.2021.100402

Google Scholar

[18] X. Huang and H. Tang, "Measuring multi-volatility states of financial markets based on multifractal clustering model," Journal of Forecasting, vol. 41, no. 3, pp.422-434, 2022.

DOI: 10.1002/for.2820

Google Scholar

[19] H. Alashwal, M. El Halaby, J. J. Crouse, A. Abdalla, and A. A. Moustafa, "The application of unsupervised clustering methods to alzheimer's disease," Frontiers in computational neuroscience, vol. 13, p.31, 2019.

DOI: 10.3389/fncom.2019.00031

Google Scholar

[20] M. Halkidi, Y. Batistakis, and M. Vazirgiannis, "On clustering validation techniques," Journal of intelligent information systems, vol. 17, pp.107-145, 2001.

DOI: 10.1023/a:1012801612483

Google Scholar

[21] N. Uddin, A. Dwianika, I. P. Sofia, and R. Tchamna, "Classification of corporate tax compliance in indonesia based on k-nearest neighbors algorithm," in 2023 IEEE World AI IoT Congress (AIIoT), pp.297-304, IEEE, 2023.

DOI: 10.1109/aiiot58121.2023.10174541

Google Scholar

[22] J. Blömer, C. Lammersen, M. Schmidt, and C. Sohler, "Theoretical analysis of the k-means algorithm-a survey," Algorithm Engineering: Selected Results and Surveys, pp.81-116, 2016.

DOI: 10.1007/978-3-319-49487-6_3

Google Scholar

[23] A. Dwianika, I. P. Sofia, N. Uddin, and I. Retnaningtyas, "The impact of social identity on tax compliance in indonesia during the pandemic covid-19," International Journal of Professional Business Review: Int. J. Prof. Bus. Rev., vol. 8, no. 4, p.4, 2023.

DOI: 10.26668/businessreview/2023.v8i4.1113

Google Scholar

[24] A. Dwianika, I. P. Sofia, N. Uddin, and I. Retnaningtyas, "The identity social of tax compliance: How it impacts manufacturing business," KnE Social Sciences, pp.734-742, 2023.

DOI: 10.18502/kss.v8i12.13720

Google Scholar

[25] M. Fishbein and I. Ajzen, "Belief, attitude, intention, and behavior: An introduction to theory and research," 1977.

Google Scholar