Evolving Text Matching: A Systematic Review of Classical and Modern Approaches in the Neural Network

Article Preview

Abstract:

Writing matching has evolved dramatically from simple string comparison algorithms to sophisticated natural language processing techniques. This comprehensive literature review examines matching methods over the last 20 years, with special emphasis on transitioning from traditional frameworks to modern NLP methods to identify opportunities for practical theoretical integration and development exploring both models' fundamental principles, strengths and limitations. Our systematic review covers three main areas: (1) classical text matching algorithms, including Levenstein distance, Boyer-Moore, and Knuth-Morris-Pratt; (2) modern NLP techniques, such as transformer-based models and contextual ontologies; and (3) emerging hybrid approaches that seek to integrate these approaches. Intensive analysis of more than 40 papers from leading areas in information retrieval, natural language processing, and algorithmic evolution reveals key patterns in adopting text-matching strategies and highlights promising directions for future research. The study highlights a significant difference between the computational efficiency of traditional methods and the logical comprehension capabilities of modern NLP methods. Our study examines various attempts to bridge this gap and discusses the challenges and opportunities in integrating classical and modern approaches. We examine how different approaches manage the trade-off between computational complexity, logical clarity, and application-specific requirements.

You might also be interested in these eBooks

Info:

Periodical:

Engineering Headway (Volume 35)

Pages:

195-209

Citation:

Online since:

February 2026

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2026 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] H. Manning, C. D., & Schütze, Foundations of Statistical Natural Language Processing, (3rd ed.). MIT Press, 2021.

Google Scholar

[2] K. Zhang, T., & Lee, Hybrid Text Matching with Neural N-grams. ACL 2021, 2021.

Google Scholar

[3] V. I. Levenshtein, "Binary Codes Capable of Correcting Deletions, Insertions and Reversals," Sov. Phys. Dokl., vol. 10, p.707–710, 1966.

Google Scholar

[4] J. S. Boyer, R. S., & Moore, "A fast string searching algorithm," Commun. ACM, vol. 20, no. (10), p.762–772.

DOI: 10.1145/359842.359859

Google Scholar

[5] V. R. Knuth, D. E., Morris, J. H., & Pratt, "Fast pattern matching in strings," SIAM J. Comput., vol. 6, no. 2, p.323–350.

DOI: 10.1137/0206024

Google Scholar

[6] A. Tiskin, "Bounded-length Smith-Waterman alignment," Leibniz Int. Proc. Informatics, LIPIcs, vol. 143, no. 16, p.1–12, 2019.

Google Scholar

[7] J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of deep bidirectional transformers for language understanding," NAACL HLT 2019 - 2019 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. - Proc. Conf., vol. 1, no. Mlm, p.4171–4186, 2019.

DOI: 10.18653/v1/n19-1423

Google Scholar

[8] A. Vaswani et al., "Attention is all you need," Adv. Neural Inf. Process. Syst., vol. 2017-Decem, no. Nips, p.5999–6009, 2017.

Google Scholar

[9] R. Li, M., & Thompson, "Efficient Indexing for Large-Scale Text Matching," SIGIR 2023, p.567–576.

Google Scholar

[10] P. A. Sherstnev, A. S. Polyakova, and L. V. Lipinskiy, "Comparative analysis of the efficiency of classical and neural network approaches for text vectorization in solving classification problems," 2022, p.050031.

DOI: 10.1063/5.0106058

Google Scholar

[11] H. Iuchi et al., "Representation learning applications in biological sequence analysis," Comput. Struct. Biotechnol. J., vol. 19, p.3198–3208, 2021.

DOI: 10.1016/j.csbj.2021.05.039

Google Scholar

[12] A. H. Muhammad, K. Kusrini, and I. Oyong, "Revisiting the challenges and surveys in text similarity matching and detection methods," J. Inform., vol. 16, no. 3, p.127, 2022.

DOI: 10.26555/jifo.v16i3.a23471

Google Scholar

[13] F. J. Damerau, "A technique for computer detection and correction of spelling errors," Commun. ACM, vol. 7, no. 3, p.171–176, 1964.

DOI: 10.1145/363958.363994

Google Scholar

[14] M. S. Smith, T. F., & Waterman, "Identification of Common Molecular Subsequences," J. Mol. Biol., vol. 147, no. (1), p.195–197, 2019.

Google Scholar

[15] R. M. Karp and M. O. Rabin, "Efficient randomized pattern-matching algorithms," IBM J. Res. Dev., vol. 31, no. 2, p.249–260, Mar. 1987.

DOI: 10.1147/rd.312.0249

Google Scholar

[16] W. H. Gomma and A. A. Fahmy, "A Survey of Text Similarity Approaches," Int. J. Comput. Appl., vol. 68, no. 13, p.13–18, 2013.

Google Scholar

[17] G. Salton and M. J. McGill, "Introduction to Modem Information," p.375–384, 1983, [Online]. Available: http://portal.acm.org/citation.cfm?id=1893971.1894017.

Google Scholar

[18] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, "Distributed representations ofwords and phrases and their compositionality," Adv. Neural Inf. Process. Syst., p.1–9, 2013.

Google Scholar

[19] P. M. Brennan, J. J. M. Loan, N. Watson, P. M. Bhatt, and P. A. Bodkin, "Pre-operative obesity does not predict poorer symptom control and quality of life after lumbar disc surgery," Br. J. Neurosurg., vol. 31, no. 6, p.682–687, 2017.

DOI: 10.1080/02688697.2017.1354122

Google Scholar

[20] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, "Enriching Word Vectors with Subword Information," Trans. Assoc. Comput. Linguist., vol. 5, p.135–146, 2017.

DOI: 10.1162/tacl_a_00051

Google Scholar

[21] Y. Liu et al., "RoBERTa: A Robustly Optimized BERT Pretraining Approach," no. 1, 2019, [Online]. Available: http://arxiv.org/abs/1907.11692.

Google Scholar

[22] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salakhutdinov, and Q. V. Le, "XLNet: Generalized autoregressive pretraining for language understanding," Adv. Neural Inf. Process. Syst., vol. 32, no. NeurIPS, p.1–18, 2019.

Google Scholar

[23] J. Mueller and A. Thyagarajan, "Siamese Recurrent Architectures for Learning Sentence Similarity," Proc. AAAI Conf. Artif. Intell., vol. 30, no. 1, Mar. 2016.

DOI: 10.1609/aaai.v30i1.10350

Google Scholar

[24] S. Humeau, K. Shuster, M.-A. Lachaux, and J. Weston, "Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring," Apr. 2019.

Google Scholar

[25] E. Gogoulou, A. Ekgren, T. Isbister, and M. Sahlgren, "Cross-lingual Transfer of Monolingual Models," Sep. 2021, [Online]. Available: http://arxiv.org/abs/2109.07348.

Google Scholar

[26] J. Libovický and A. Fraser, "Neural String Edit Distance," SPNLP 2022 - 6th Work. Struct. Predict. NLP, Proc. Work., p.52–66, 2022.

DOI: 10.18653/v1/2022.spnlp-1.6

Google Scholar

[27] X. Jiang, J. Ma, and J. Chen, "Progressive Filtering for Feature Matching," in ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, May 2019, p.2217–2221.

DOI: 10.1109/ICASSP.2019.8682372

Google Scholar

[28] C. V. Fuenteslópez, A. McKitrick, J. Corvi, M.-P. Ginebra, and O. Hakimi, "Biomaterials text mining: A hands-on comparative study of methods on polydioxanone biocompatibility," N. Biotechnol., vol. 77, p.161–175, Nov. 2023.

DOI: 10.1016/j.nbt.2023.09.001

Google Scholar

[29] H. X. Rodriguez, "Artificial Intelligence (AI) and the Practice of Law," Sedona Conf. J., vol. 24, no. forthcoming, 2023, [Online]. Available: https://thesedonaconference.org/publications.

Google Scholar

[30] K. Staffs, "Guidelines for performing systematic literature reviews in software engineering," Tech. report, Ver. 2.3 EBSE Tech. Report. EBSE, no. January 2007, p.1–57, 2007.

Google Scholar

[31] K. Petersen, S. Vakkalanka, and L. Kuzniarz, "Guidelines for conducting systematic mapping studies in software engineering: An update," Inf. Softw. Technol., vol. 64, p.1–18, Aug. 2015.

DOI: 10.1016/j.infsof.2015.03.007

Google Scholar

[32] R. T. W. Webster, Jane, "ANALYZING THE PAST TO PREPARE FOR THE FUTURE: WRITING A LITERATURE REVIEW," MIS Q., vol. Vol. 26, no. o. 2, pp. xiii–xxiii.

Google Scholar

[33] P. Brereton, B. A. Kitchenham, D. Budgen, M. Turner, and M. Khalil, "Lessons from applying the systematic literature review process within the software engineering domain," J. Syst. Softw., vol. 80, no. 4, p.571–583, Apr. 2007.

DOI: 10.1016/j.jss.2006.07.009

Google Scholar

[34] H. Zhang, M. A. Babar, and P. Tell, "Identifying relevant studies in software engineering," Inf. Softw. Technol., vol. 53, no. 6, p.625–637, Jun. 2011.

DOI: 10.1016/j.infsof.2010.12.010

Google Scholar

[35] C. Wohlin, "Guidelines for snowballing in systematic literature studies and a replication in software engineering," ACM Int. Conf. Proceeding Ser., 2014.

DOI: 10.1145/2601248.2601268

Google Scholar

[36] T. Dybå and T. Dingsøyr, "Empirical studies of agile software development: A systematic review," Inf. Softw. Technol., vol. 50, no. 9–10, p.833–859, Aug. 2008.

DOI: 10.1016/j.infsof.2008.01.006

Google Scholar

[37] B. A. Kitchenham, D. Budgen, and P. Brereton, Evidence-Based Software Engineering and Systematic Reviews. Chapman and Hall/CRC, 2015.

DOI: 10.1201/b19467

Google Scholar

[38] D. S. Cruzes and T. Dybå, "Research synthesis in software engineering: A tertiary study," Inf. Softw. Technol., vol. 53, no. 5, p.440–455, May 2011.

DOI: 10.1016/j.infsof.2011.01.004

Google Scholar

[39] X. Zhou, Y. Jin, H. Zhang, S. Li, and X. Huang, "A Map of Threats to Validity of Systematic Literature Reviews in Software Engineering," in 2016 23rd Asia-Pacific Software Engineering Conference (APSEC), IEEE, 2016, p.153–160.

DOI: 10.1109/APSEC.2016.031

Google Scholar

[40] A. Saad, U. U. Sheikh and Z. A. A. Alyasseri, "An Efficient Layout Index Characters for Automatic License Plate Recognition System Based on the YOLO-v8 Detector," 2024 IEEE 8th International Conference on Signal and Image Processing Applications (ICSIPA), Kuala Lumpur, Malaysia, 2024, pp.1-5.

DOI: 10.1109/ICSIPA62061.2024.10701017

Google Scholar

[41] A. H. Abdulkhaleq, A. W. Altaher, A. Saad and H. M. Al-Jawahry, "Automatic Vehicle License Plate Recognition Using Lightweight Deep Learning Approach," 2023 6th International Conference on Engineering Technology and its Applications (IICETA), Al-Najaf, Iraq, 2023, pp.143-148.

DOI: 10.1109/IICETA57613.2023.10351297

Google Scholar