Exploring Sacred Texts: Leveraging Computer Science for Dataset Similarity Analysis in Religious Studies

Article Preview

Abstract:

Studying the Quran and the Hadith side by side can help us understand that the two are fundamental and two main resources and essential wellspring of Islamic knowledge and law. There are many debates about similarities between those holy scriptures from many famous preachers and scholars. Technology can be used as an alternative solution to solve these problems. There are at least two overall approaches to determine text-similarity; the vector space model and semantic similarity —define the similarity or the distance. The similarity between words is often represented by a similarity between concepts associated with the words. This paper presents a method for identifying semantic sentence similarity among each sentence from each dataset using semantic relation of word senses between different synsets using WordNet path similarity and Wu-Palmer similarity. This method is also evaluated and has acceptable accuracy. Although both Path Similarity and Wu-Palmer Similarity successfully identify the similarity between two sentences; still, they have slightly different accuracy. The Wu-Palmer similarity is superior to path similarity when identifying sentences between Quran Sahih International and An-Nawawi Forty Hadith Translation. Looking ahead, we might be able to improve our results by using multipliers such as reverse document frequency (TF-IDF), combining the results of several steps in WordNet similarity, using vector space models, and optimal matching methods.

You might also be interested in these eBooks

Info:

Periodical:

Engineering Headway (Volume 6)

Pages:

227-235

Citation:

Online since:

April 2024

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2024 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] S. International, The Qur'an English Meaning. Jeddah, 1997.

Google Scholar

[2] S. Wan and R. A. Angryk, "Measuring semantic similarity using WordNet-based context vectors," Conf. Proc. - IEEE Int. Conf. Syst. Man Cybern., no. January, p.908–913, 2007.

DOI: 10.1109/ICSMC.2007.4413585

Google Scholar

[3] G. Carenini, R. T. Ng, and X. Zhou, "Summarizing emails with conversational cohesion and subjectivity," ACL-08 HLT - 46th Annu. Meet. Assoc. Comput. Linguist. Hum. Lang. Technol. Proc. Conf., no. June, p.353–361, 2008.

Google Scholar

[4] J. Shen, J. Xiao, X. He, J. Shang, S. Sinha, and J. Han, "Entity Set Search of Scientific Literature," p.565–574, 2018.

DOI: 10.1145/3209978.3210055

Google Scholar

[5] D. Bär, T. Zesch, and I. Gurevych, "Text reuse detection using a composition of text similarity measures," 24th Int. Conf. Comput. Linguist. - Proc. COLING 2012 Tech. Pap., no. December, p.167– 184, 2012.

Google Scholar

[6] S. M. Alzahrani, N. Salim, and A. Abraham, "Understanding plagiarism linguistic patterns, textual features, and detection methods," IEEE Trans. Syst. Man Cybern. Part C Appl. Rev., vol. 42, no. 2, p.133–149, 2012.

DOI: 10.1109/TSMCC.2011.2134847

Google Scholar

[7] A. Visa, J. Toivonen, H. Vanharanta, and B. Back, "Contents matching defined by prototypes: Methodology verification with books of the Bible," J. Manag. Inf. Syst., vol. 18, no. 4, p.87–100, 2001.

DOI: 10.1080/07421222.2002.11045702

Google Scholar

[8] X. Y. Liu, Y. M. Zhou, and R. S. Zheng, "Measuring semantic similarity in wordnet," Proc. Sixth Int. Conf. Mach. Learn. Cybern. ICMLC 2007, vol. 6, no. August, p.3431–3435, 2007.

DOI: 10.1109/ICMLC.2007.4370741

Google Scholar

[9] M. Alhawarat, M. Hegazi, and A. Hilal, "Processing the Text of the Holy Quran: a Text Mining Study," Int. J. Adv. Comput. Sci. Appl., vol. 6, no. 2, 2015.

DOI: 10.14569/ijacsa.2015.060237

Google Scholar

[10] I. Atoum and A. Otoom, "Efficient Hybrid Semantic Text Similarity using Wordnet and a Corpus," Int. J. Adv. Comput. Sci. Appl., vol. 7, no. 9, p.124–130, 2016.

DOI: 10.14569/ijacsa.2016.070917

Google Scholar

[11] C. Linguistics, ACL 2016 The 54th Annual Meeting of the Association for Computational Linguistics Proceedings of the 1st Workshop on Representation Learning for NLP. 2016.

Google Scholar

[12] M. Farouk, "Measuring text similarity based on structure and word embedding," Cogn. Syst. Res., vol. 63, p.1–10, 2020.

DOI: 10.1016/j.cogsys.2020.04.002

Google Scholar

[13] H. Liu and P. Wang, "Assessing sentence similarity using WordNet based word similarity," J. Softw., vol. 8, no. 6, p.1451–1458, 2013.

DOI: 10.4304/jsw.8.6.1451-1458

Google Scholar

[14] G. A. Miller, "WordNet: A Lexical Database for English," Commun. ACM, vol. 38, no. 11, p.39–41, 1995.

DOI: 10.1145/219717.219748

Google Scholar

[15] S. Bird, E. Klein, and E. Loper, Natural Language Processing with Python, vol. 1st. California: O'Reilly Media, Inc., 2009.

Google Scholar

[16] K. Sarkar and R. Law, "A Novel Approach to Document Classification using WordNet," p.1–14, 2015, [Online]. Available: http://arxiv.org/abs/1510.02755.

Google Scholar

[17] F. Saríc, G. Glavaš, M. Karan, J. Šnajder, and B. D. Bašíc, "TakeLab: Systems for measuring semantic text similarity," *SEM 2012 - 1st Jt. Conf. Lex. Comput. Semant., vol. 2, p.441–448, 2012.

Google Scholar

[18] H. Rubenstein and J. B. Goodenough, "Contextual correlates of synonymy," Commun. ACM, vol. 8, no. 10, p.627–633, 1965.

DOI: 10.1145/365628.365657

Google Scholar