Feature Selection | Scientific.Net

Effect of Feature Selection on Data-Driven Prediction of Catalyst Performance: A Case Study on Methanol Formation from Thermocatalytic CO₂ Hydrogenation on Cu-Based Catalysts

Authors: Novianto Nur Hidayat, Usman Sudibyo, Achmad Wahid Kurniawan, Fariz Hasim Arvianto, Muhammad Naufal, Wahyu Aji Eko Prabowo, Harun Al Azies, Muhamad Akrom

Abstract: CO₂ conversion to methanol via thermocatalytic hydrogenation is one of the viable alternatives to address climate change problem while producing a valuable industrial product. However, this comes with a challenge, i.e., predicting the performance of catalytic systems. In this work, we present a data-driven study to predict the performance of Cu-based catalyst based on a compiled dataset consisting of 15 features obtained from experiment data. Furthermore, we implement feature selection techniques such as univariate, RFE, and XGBoost to investigate how the performance of the prediction model changes with varied number of features. The results show that features selected by RFE method yields the best performance with 7 number of features, capable of even outperforms the baseline model in terms of accuracy and feasibilty. This suggests that feature selection technique is relevant in terms of constructing a machine learning model for predicting methanol production via CO₂ thermocatalytic hydrogenation.

1

Application of Multilayer Perceptron for Estimating Relative Humidity in Cilacap, Indonesia

Authors: Wahyu Abdillah, Silmi Fauziati

Abstract: In recent decades, relative humidity has become a research topic that has received increasing attention due to its important role in climate change and global warming. One of the most typical issues with relative humidity is data loss due to instrument deterioration. This research attempts to apply feature selection and hyperparameter tuning methods as an approach to optimizing the reliability of the multilayer perceptron (MLP) model to predict relative humidity values designed into the MLP-CV framework. The coefficient of determination (R²), root mean squared error (RMSE), and absolute error (MAE) are used to determine the model's correctness. The results showed that the MLP-CV model had better accuracy compared to the MLP model for predicting relative humidity missing values, with R² = 0.788, RMSE = 1.838, and MAE = 1.431.

192

Additive Manufacturing Section Image Features for Magnetic Processing Characteristics Prediction

Authors: Lien Kai Chang, Tsung Wei Chang, Jhih Cheng Huang, De Qian Liu, Mi Ching Tsai, Ming Huwi Horng

Abstract: Metal additive manufacturing encompasses multiple techniques, among which Selective Laser Melting (SLM) is extensively employed for fabricating highly complex, precise, and uniquely shaped metal parts. However, obtaining accurate product characteristics often requires complex experimentation, which can potentially damage the products. Thus, there is a need to develop an automated method for predicting product characteristics. To forecast these attributes, details related to metal additive manufacturing products were documented, including process parameters and textural features. These features were extracted from product’s longitudinal sectional images and layer-by-layer images, using the gray-level co-occurrence matrix (GLCM). Subsequently, machine learning (ML) models such as Support Vector Regression (SVR), XGBoost, and LightGBM were employed to predict product properties and compare their performance. The experimental results indicated stronger correlations between process parameters and textural features in longitudinal section images compared to layer-by-layer ones. Moreover, the models demonstrated high predictive accuracy, particularly XGBoost and LightGBM, with R² score approaching 0.9 for all properties. These findings highlight the superiority and feasibility of the proposed approach. Furthermore, this method shows potential for accurately predicting a variety of product properties, fulfilling the needs of multiple application scenarios.

99

Prediction of High-Entropy Alloy Phases Using Soft Computing Techniques

Authors: Akeem Damilola Akinwekomi

Abstract: High-entropy alloys (HEAs) have excellent properties that are being explored for potential applications in many engineering fields. Their excellent properties strongly depend on their phases. The vastness of alloy compositions that can be synthesized makes it extremely challenging to experimentally investigate all the possible HEA types. To mitigate these challenges, more efficient and systematic computational techniques can be applied to the existing experimental data to accelerate HEA design and discovery. Therefore, this study developed three soft computing classification models based on artificial neural network, k-nearest neighbor (kNN), and support vector machine (SVM) to classify solid solution, amorphous and intermetallic phases in HEAs. Empirical studies showed that hyperparameter optimization improved classification accuracies of the classifiers with kNN (92%) outperforming ANN (86%) and SVM (90%) using all five predictive features. Feature selection did not improve the classification accuracy of any of the model. This studied demonstrated the importance of applying soft computing techniques and hyperparameter optimization for enhancing the classification accuracies of models to predict the phases in HEAs.

3

Data Analysis and Visualization of Mechanical Properties of Aluminium Coils, Focusing on Chemical Composition, Annealing Temperatures and Holding Time

Authors: Patrick Pfeiffer, Josef Berneder, Alexander Haidenthaler, Peter Schulz

Abstract: As a producer of aluminium coils in a broad variety of applications AMAG faces challenges to control and monitor a long, multi-step production process with an immense number of parameters. Identifying impactful parameters or outliers becomes increasingly difficult when considering multiple production steps. Monitoring many coils over a big set of parameters manually is difficult, time consuming and error-prone and thus an unreasonable endeavour. To support employees in technology- and process-oriented domains, AMAG data scientists develop analytical tools for data exploration and data analysis. Based on material data containing mechanical properties in deformation tests, chemical composition, hot rolling temperature, intermediate annealing, and pre-heating duration we propose a framework of data collecting mechanisms and subsequent statistical methods to analyse and visualise data. The produced visuals can be interactively explored by material experts to gain better understanding of the complex interactions in production parameters and the effect on mechanical properties. Incorporating many coils at once, the framework offers a means to point out problems in process stability. A collaboration and a feedback loop between material scientists and data scientists is key to further develop advanced analytical methods.

123

Statistical Methods for Predicting of the Quality of Aluminum Ingots

Authors: Marco Johannes Tschimpke, Alexander Gerber, Steffen Neubert, Manuela Schreyer, Wolfgang Trutschnig

Abstract: In recent years, methods from Data Science and Artificial Intelligence have become more and more important in various fields of economy and everyday life. Those methods are, for instance, used in context of driving assistance systems or queries in search engines. Our current works aims at developing and/or improving methods from statistics and machine learning to select relevant features concerning the product quality of aluminum ingots. During the production of aluminum, numerous process signals, such as temperature curves, are recorded. To quantify the dependency of the ingot-quality on different signals, existing statistical methods need to be adjusted and extended to the timeseries setting. The first problem tackled is the definition of a criterion numerically describing the quality of the ingots and therefore allowing to compare ingots with respect to their quality (independent of the final format of the product). A second, nontrivial challenge is to detect those process signals relevant for the ingot quality and account for possible interrelations. Our contribution sketches how timeseries information can be aggregated/discretisized and describes various candidate approaches for features selection.

109

The Effect of Feature Selection on Gray Level Co-Occurrence Matrix (GLCM) for the Four Breast Cancer Classifications

Authors: Marrisaeka Mawarni, Fitri Utaminingrum, Wayan Firdaus Mahmudy

Abstract: Breast cancer is ranked first as the most common cancer case affecting women in the world. Early detection of breast cancer can increase the chances of survival in patients. The role of the radiologist is necessary for the detection of breast cancer, and the radiologists often have limitations in conducting disease consultations with so many patients. The detection gives a subjective result because the process is based on the decision-making of the radiologists. In this work, we proposed a system to detect and classify breast cancer accurately to anticipate delays in patient handling and subjective result. We proposed a digital image processing method using mammograms to classify breast cancer into four categories based on tissue density, namely BI-RADS I, II, III, and IV. The main stages carried out in this research are images processing, feature extraction, data normalization, feature selection, classification, and parameter optimization. This method uses GLCM to extract texture features and two feature selection methods namely, RFE-RF and Chi-Square. The method was tested with various classifiers such as SVM, KNN, Random Forests, and Decision Trees. The hyper-parameters of the classifier were optimized using GridSearch. The final result is measure using accuracy. In this work, Random Forest with the RFE-RF gives the highest accuracy of 99.7%. Feature selection offers a significant impact on improving accuracy. The results of this work prove that our system can classify breast cancer with high accuracy. So that our system can solve problems to assist radiologists in screening mammograms and help make decisions to diagnose patients with breast cancer based on density.

168

Face Recognition Algorithm Based on Haar-Like Features and Gentle Adaboost Feature Selection via Sparse Representation

Authors: Qing Wei Wang, Zi Lu Ying, Lian Wen Huang

Abstract: This paper proposed a new face recognition algorithm based on Haar-Like features and Gentle Adaboost feature selection via sparse representation. Firstly, All the images including face images and non face images are normalized to size and then Haar-Like features are extracted . The number of Haar-Like features can be as large as 12,519. In order to reduce the feature dimension and retain the most effective features for face recognition, Gentle Adaboost algorithm is used for feature selection. Selected features are used for face recognition via sparse representation classification (SRC) algorithm. Testing experiments were carried out on the AR database to test the performance of the new proposed algorithm. Compared with traditional algorithms like NS, NN, SRC, and SVM, the new algorithm achieved a better recognition rate. The effect of face recognition rate changing with feature dimension showed that the new proposed algorithm performed a higher recognition rate than SRC algorithm all the time with the increasing of feature dimension, which fully proved the effectiveness and superiority of the new proposed algorithm.

299

Multi-Sensor Intelligent Monitoring of High-Speed Grinding for Brittle and Hard Materials

Authors: Yin Chen Ma, Jian Guo Yang

Abstract: This paper deals with an intelligent multi-sensor monitoring system, which focus on the characteristic of transient occurrence in high speed grinding and its application to the machining of brittle and hard materials. Different sensors are used to collect workpiece vibration, acoustic emission, force and displacement signals, which are used to define the stability of grinding process and monitoring the fault in high speed machining. Although there is a lot of methods have been reported in recent literature for monitoring grinding process, they have not a systematic method which can totally reflect the characteristic of high speed grinding. On the other hand，no single sensor or feature has been shown to be successfully and precisely all grinding faults. This paper combined different feature selection including time-frequency domain or wavelet methods and sensor fusion based on clustering method to deal with the stability condition test in high-speed grinding. The validity of the proposed method and the excellent detection accuracy is demonstrated through tests with SiC machining in high-speed grinding.

309

Feature Selection Algorithm for Hyperlipidemia Classification

Authors: Qi Rui Zhang, He Xian Wang, Jiang Wei Qin

Abstract: This paper reports a comparative study of feature selection algorithms on a hyperlipimedia data set. Three methods of feature selection were evaluated, including document frequency (DF), information gain (IG) and aχ² statistic (CHI). The classification systems use a vector to represent a document and use tfidfie (term frequency, inverted document frequency, and inverted entropy) to compute term weights. In order to compare the effectives of feature selection, we used three classification methods: Naïve Bayes (NB), k Nearest Neighbor (kNN) and Support Vector Machines (SVM). The experimental results show that IG and CHI outperform significantly DF, and SVM and NB is more effective than KNN when macro-averaging F₁ measure is used. DF is suitable for the task of large text classification.

110

Papers by Keyword: Feature Selection