Data Mining | Scientific.Net

A Hybrid Based CNN-RNN Model for Improved Mining Techniques for Twetter Social Media Data and Analytics

Authors: Donatus O. Njioku, Juliet N. Odii, Christopher I. Ofoegbu, Francisca O. Nwokoma, Uchenna C. Onyemauche, Innocent Harvey H. Ajunwa

Abstract: This fast and quick growth of social media platforms have created vast repositories of unstructured data, necessitating advanced techniques to extract actionable insights. This study addresses the challenge of analyzing large-scale social media data by developing and evaluating deep learning models for sentiment classification. A preprocessing pipeline incorporating emoticon replacement, text normalization, tokenization, and stemming was applied to a dataset of 160,000 tweets. Two neural network architectures—a baseline Recurrent Neural Network (RNN) and a hybrid Convolutional Neural Network-Recurrent Neural Network (CNN-RNN)—were trained and compared. The hybrid CNN-RNN model demonstrated superior performance, achieving 76% accuracy, compared to the RNN’s 50%, underscoring the importance of combining local feature extraction with sequential dependency modeling. Temporal and lexical analyses further revealed trends in user engagement and sentiment expression. These findings highlight the effectiveness of hybrid deep learning architectures for social media analytics and provide a framework for future research in real-time sentiment monitoring.

61

EHD-ABC: An Enhanced History-Driven Artificial Bee Colony Algorithm for Improved Data Clustering

Authors: Hussein Ala’a Al-Kaabi, Fuqdan A. Al-Ibraheemi, Hussein Ali Hussein Al Naffakh, Mohammed Riyadh Al-Rikabi, Ali Kadhem Jasim

Abstract: Data clustering is a critical data mining technique for grouping similar objects and differentiating dissimilar ones. While advancements in machine learning, statistical, and metaheuristic methods have addressed some challenges, issues like accuracy, efficiency, and scalability persist. Building on the History-Based Artificial Bee Colony (HD-ABC) algorithm, this paper introduces the Enhanced History-Driven Artificial Bee Colony (EHD-ABC) algorithm. Refining the historical memory mechanism and optimization process, the proposed algorithm achieves improved clustering accuracy, reduced computational complexity, and enhanced efficiency. Experimental results on artificial and real-world datasets demonstrate EHD-ABC's superiority over existing methods in clustering quality and error reduction, such as HD-ABC and K-means.

227

Development of a Group Decision Making Method for Ranking Alternatives: Selection of most Preferred Data Mining Algorithm for a Construction Project

Authors: Abdulqader Al-Khafaji

Abstract: This study presents a methodology for selecting the most preferred data mining algorithm for a construction project, leveraging the Analytical Hierarchy Process (AHP) .AHP, known for its application to complex decision-making problems, is adapted in this research to fit the context of data mining.The methodology involves significant modifications, including creating a collective decision-making environment that accommodates participants from diverse backgrounds and establishing a suitable data collection method tailored for AHP.The study contributes in two key areas. First, it designs and develops the methodology, enabling AHP to be effectively used for selecting data mining algorithms in construction projects. This adaptation considers the specific needs of the domain, allowing experts from different fields to contribute without requiring a comprehensive understanding of the entire model. Second, the methodology is applied to the problem, addressing existing limitations in the literature.By incorporating all relevant performance measures and leveraging expert knowledge, it facilitates informed decision-making even in the absence of extensive model testing data.The study's data was collected from two distinct participant groups: construction practitioners and machine learning experts, focusing on their personal preferences. This approach enhances the methodology's robustness and relevance to real-world applications. The proposed methodology demonstrated its effectiveness through various applications. A preference for Artificial Neural Networks (ANN) was observed in predicting concrete compressive strength, with a 59.4% weighting due to their capability to handle large datasets and non-linear relationships. In cost estimation tasks, Support Vector Machines (SVM) outperformed other models, receiving a 64.9% preference and achieving a lower mean absolute percentage error (MAPE) of 7.06%. The AHP-based approach maintained consistency across evaluations, with consistency ratios below 0.10, confirming the reliability of group judgments in the algorithm selection process.

177

Investigation of Machining Condition for Barrel End Mill Based on Data-Mining Method for Tool Catalog Database

Authors: Shu Uchida, Natsuki Oyaizu, Masao Nakagawa, Toshiki Hirogaki, Eiichi Aoyama

Abstract: The recent development of computer-aided design/computer-aided manufacturing (CAD/CAM) systems has enabled unskilled workers to generate NC programs easily. However, determining the cutting conditions, which is crucial for machining, still relies on the knowledge and experience of the skilled workers. Therefore, this study aimed to discover tacit knowledge about cutting using data mining methods and construct a system to support unskilled workers. Given the recent progress in the practical use of barrel tools, this study attempts to predict the cutting conditions of barrel tools by utilizing catalog information on radius and ball end mills. First, the databases of all the tools were integrated. Next, new variables were introduced for highly accurate predictions. After verifying the validity of the new variables through cutting experiments, they were used to predict cutting conditions. It was found that the new variables could be used in the clustering process to achieve highly accurate predictions.

49

Analysis on Students’ Academic Performance in Relation to the Results of Pre-University Examination

Authors: Chong Qi, Sabariah Binti Saharan

Abstract: There is a great deal of uncertainty regarding the factors that influence their final year grade, which includes their entry qualification. This paper investigates the impact of entry qualification and pre-university CGPA on student performance at the university level. Entry qualifications are critical for educational institutions or educational providers to ensure the quality of the graduates. The goal of this study is to analyze and compare performance of Bachelor of Science (Industrial Statistics) with Honours (BWQ) students. Total of 54 students were selected form the Faculty of Applied Sciences and Technology (FAST), Universiti Tun Hussein Onn Malaysia (UTHM). The students are coming from Malaysian Higher School Certificate (STPM) and Malaysian Matriculation Programme. Paired t test and Z test were carried out to analyze the impact of pre-university’s CGPA and each semester’s GPA as well as impact of entry qualification towards their final year grade. Classification and Regression Tree (CART), K-Nearest Neighbors and Naïve Bayes were used to develop and predict the students’ performance. The findings show that there is no relation between the result obtained from previous semester towards the next semester. Meanwhile, students from STPM outperform Matriculation in terms of their GPA per semester, pre-university CGPA as well as their final CGPA. The K-Nearest Neighbors and Naïve Bayes models have been documented as the most efficient data mining techniques in predicting student performance with the highest percentage of accuracy of 100%.

173

Farm Track Application Development Using Web Mining and Web Scraping

Authors: K. Karthigeiyan, M. Durga Devi

Abstract: As an agricultural country, India's economy is heavily reliant on agricultural yield growth and agroindustry goods. Food demand is rising as the world's population grows by the day. Climatic conditions are the foundation for growing the best produce. The internet technology is advancing on a regular basis, and companies are becoming digitized. Every company has a website or a mobile application that they use to give services to its customers.

566

A Hybrid Machine Learning Model Based on Global and Local Learner Algorithms for Diabetes Mellitus Prediction

Authors: Derara Duba Rufo, Taye Girma Debelee, Worku Gachena Negera

Abstract: Health is a critical condition for living things, even before the technology exists. Nowadays the healthcare domain provides a lot of scope for research as it has extremely evolved. The most researched areas of health sectors include diabetes mellitus (DM), breast cancer, brain tumor, etc. DM is a severe chronic disease that affects human health and has a high rate throughout the world. Early prediction of DM is important to reduce its risk and even avoid it. In this study, we propose a DM prediction model based on global and local learner algorithms. The proposed global and local learners stacking (GLLS) model; combines the prediction algorithms from two largely different but complementary machine learning paradigms, specifically XGBoost and NB from global learning whereas kNN and SVM (with RBF kernel) from local learning and aggregates them by stacking ensemble technique using LR as meta-learner. The effectiveness of the GLLS model was proved by comparing several performance measures and the results of different contrast experiments. The evaluation results on UCI Pima Indian diabetes data-set (PIDD) indicates the model has achieved the better prediction performance of 99.5%, 99.5%, 99.5%, 99.1%, and 100% in terms of accuracy, AUC, F1 score, sensitivity, and specificity respectively, compared to other research results mentioned in the literature. Moreover, to better validate the GLLS model performance, three additional medical data sets; Messidor, WBC, ILPD, are considered and the model also achieved an accuracy of 82.1%, 98.6%, and 89.3% respectively. Experimental results proved the effectiveness and superiority of our proposed GLLS model.

65

A Data Mining Approach to Investigate the Carbon Nanotubes Mechanical Properties via High-Throughput Molecular Simulation

Authors: Yi Xiang, Go Yamamoto

Abstract: The relationship of geometrical properties and mechanical properties of carbon nanotubes (CNTs) was investigated by using high-throughput molecular simulation. Geometrical properties such as diameter, number of walls, chirality, and crosslink density were considered. As a key factor in determining the mechanical properties of composites reinforced with CNTs, nominal tensile strength is the focus in this study, which can be calculated by fracture force divided by the full cross-sectional area including the hollow core and the wall thickness. The fracture mode, nominal tensile strength, and nominal Young’s modulus under the condition of CNTs outermost tube loading axial tensile test were evaluated. Three types of fracture modes led by different crosslink densities of CNTs were obtained. By data-mining through large amounts of datasets, we showed that CNTs with small diameter, large number of walls, and crosslinks between walls can have high nominal tensile strength. We demonstrated that zigzag-type CNTs with crosslink density of approximately 1.5% - 2.5%, armchair-type CNTs with crosslink density of approximately 3% - 4% can help improve the load transfer from the outer tube to the inner tube the most.

29

Improved BVBUC Algorithm to Discover Closed Itemsets in Long Biological Datasets

Authors: Fatimah Audah Md Zaki, Nurul Fariza Zulkurnain

Abstract: The task in mining closed frequent itemsets requires the algorithm to mine the frequent ones then determine its closure. The efficiency of closure computation is very important as it will determine the total mining time and the required memory. Over the years, many closure computation methods have been proposed to achieve these goals. However, to the best of our knowledge, there is no suitable method that can be adapted for algorithms that enumerate the rowset lattice, which is effective for biological datasets. Therefore, this paper proposed a method for computing closure compare with the method used in BVBUC algorithm method. Finally, BVBUC_I is proposed and the performances of these algorithms were evaluated using two synthetic datasets and three real datasets. The results of these tests proved the efficiency of the proposed method.

157

Potentials for Error Detection and Process Visualization in Assembly Lines Using a Parallel Coordinates Plot

Authors: Christian Sand, Tobias Lechler, Patricia Schuh, Jörg Franke

Abstract: Assembly lines consist of chained or unchained stations, yet usually only single stations are regarded individually for process and quality analytics. Since the quality of the final product depends on interactions of process parameters along the assembly flow, it is insufficient to analyze process parameters of each station separately. Therefore, data of every single assembly station along the assembly line has to be collected and stored. To explore such a big amount of multidimensional data and their correlations, different techniques are established. In this paper, assembly flows and their respective data are visualized using a parallel coordinates plot (PCP). Here, this technique visualizes process parameter combinations along the whole assembly chain. The contribution of this paper is to prove that the presented approach enables a fast detection of stations with malicious impacts on the product quality, when it comes to complex assembly lines. The goal is to help users to detect global problems in those lines, not only single station problems. Furthermore, the relevance of various processes to the quality (good or defective) of the final good shall be revealed.

10

Papers by Keyword: Data Mining