Enhancing Hansen Solubility Predictions: A Combined Approach Using Integrated Datasets and Optimized Machine Learning Techniques

Article Preview

Abstract:

Accurate prediction of Hansen Solubility Parameters (HSPs) is important for understanding chemical compatibility in fields like pharmaceuticals, cosmetics and chemical engineering. This study aims to enhance HSP prediction by employing machine learning techniques and using a large, extended dataset from the Hansen Solubility Parameter in Practice (HSPiP) software. Models like XGBoost, CatBoost, LightGBM, as well as ensemble methods, were used for regression, optimized through hyperparameter tuning, feature selection and evaluated using Root Mean Squared Error (RMSE), Mean Absolute Error (MAE) and R-squared (R2) metrics. The results indicated that using a wide variety of molecular components improves prediction accuracy and enhances the model’s applicability across different compounds. The findings additionally show that advanced machine learning methods can significantly improve HSP prediction accuracy, facilitating more precise solubility estimates and advancing applications in chemical and materials science.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

53-60

Citation:

Online since:

October 2025

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2025 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] Hansen, Charles M. "The three dimensional solubility parameter." Danish Technical: Copenhagen 14 (1967).

Google Scholar

[2] Lee, Sumin, et al. "Novel solubility prediction models: Molecular fingerprints and physicochemical features vs graph convolutional neural networks." ACS omega 7.14 (2022): 12268-12277.

DOI: 10.1021/acsomega.2c00697

Google Scholar

[3] Belmares, M., et al. "Hildebrand and Hansen solubility parameters from molecular dynamics with applications to electronic nose polymer sensors." Journal of computational chemistry 25.15 (2004): 1814-1826.

DOI: 10.1002/jcc.20098

Google Scholar

[4] Hildebrand, J. H., and R. L. Scott. "The Solubility of Non-Electrolytes, Reinhold, New York." (1950).

Google Scholar

[5] Ahmed, Darya Rasul, and Fahmi F. Muhammadsharif. "A Review of Machine Learning in Organic Solar Cells." (2024).

Google Scholar

[6] Wu, Xiaotong, et al. "Machine learning in the identification, prediction and exploration of environmental toxicology: Challenges and perspectives." Journal of Hazardous Materials 438 (2022): 129487.

DOI: 10.1016/j.jhazmat.2022.129487

Google Scholar

[7] He, Lei, et al. "Applications of computational chemistry, artificial intelligence, and machine learning in aquatic chemistry research." Chemical Engineering Journal 426 (2021): 131810.

DOI: 10.1016/j.cej.2021.131810

Google Scholar

[8] Hastings, Janna, et al. "Learning chemistry: exploring the suitability of machine learning for the task of structure-based chemical ontology classification." Journal of Cheminformatics 13 (2021): 1-20.

DOI: 10.1186/s13321-021-00500-8

Google Scholar

[9] Terrell, Evan. "Estimation of Hansen solubility parameters with regularized regression for biomass conversion products: An application of adaptable group contribution." Chemical Engineering Science 248 (2022): 117184.

DOI: 10.1016/j.ces.2021.117184

Google Scholar

[10] AlQasas, Neveen, and Daniel Johnson. "The use of neural network modeling for the estimation of the Hansen solubility parameters of polymer films from contact angle measurements." Surfaces and Interfaces 44 (2024): 103721.

DOI: 10.1016/j.surfin.2023.103721

Google Scholar

[11] Chi, Mingzhe, et al. "Atomistic descriptors for machine learning models of solubility parameters for small molecules and polymers." Polymers 14.1 (2021): 26.

DOI: 10.3390/polym14010026

Google Scholar

[12] Lee, Min-Hsuan. "Interpretable machine-learning for predicting power conversion efficiency of non-halogenated green solvent-processed organic solar cells based on Hansen solubility parameters and molecular weights of polymers." Solar Energy 261 (2023): 7-13.

DOI: 10.1016/j.solener.2023.05.050

Google Scholar

[13] Li, Chunrong, et al. "Machine learning approach to predict Hansen solubility parameters of cocrystal coformers via integrating group contribution and COSMO-RS." Journal of Molecular Liquids (2024): 125319.

DOI: 10.1016/j.molliq.2024.125319

Google Scholar

[14] Wigh, Daniel S., Jonathan M. Goodman, and Alexei A. Lapkin. "A review of molecular representation in the age of machine learning." Wiley Interdisciplinary Reviews: Computational Molecular Science 12.5 (2022): e1603.

DOI: 10.1002/wcms.1603

Google Scholar

[15] Pang, Jiayun, Alexander WR Pine, and Abdulai Sulemana. "Using natural language processing (NLP)-inspired molecular embedding approach to predict Hansen solubility parameters." Digital Discovery 3.1 (2024): 145-154.

DOI: 10.1039/d3dd00119a

Google Scholar

[16] Choi, Phillip, Tom A. Kavassalis, and Alfred Rudin. "Estimation of the three-dimensional solubility parameters of alkyl phenol ethoxylates using molecular dynamics." Journal of colloid and interface science 150.2 (1992): 386-393

DOI: 10.1016/0021-9797(92)90208-4

Google Scholar

[17] Panayiotou, Costas. "Solubility parameter revisited: an equation-of-state approach for its estimation." Fluid Phase Equilibria 131.1-2 (1997): 21-35

DOI: 10.1016/s0378-3812(96)03221-9

Google Scholar

[18] Perea, J. Darío, et al. "Combined computational approach based on density functional theory and artificial neural networks for predicting the solubility parameters of fullerenes." The Journal of Physical Chemistry B 120.19 (2016): 4431-4438.

DOI: 10.1021/acs.jpcb.6b00787

Google Scholar

[19] Sanchez‐Lengeling, Benjamin, et al. "A Bayesian approach to predict solubility parameters." Advanced Theory and Simulations 2.1 (2019): 1800069.

Google Scholar

[20] Cvetković, Darja, et al. "Enhancing Hansen Solubility Predictions with Molecular and Graph-Based Approaches." Chemometrics and Intelligent Laboratory Systems (2024): 105168.

DOI: 10.1016/j.chemolab.2024.105168

Google Scholar

[21] Pang, Jiayun, Alexander WR Pine, and Abdulai Sulemana. "Using natural language processing (NLP)-inspired molecular embedding approach to predict Hansen solubility parameters." Digital Discovery 3.1 (2024): 145-154.

DOI: 10.1039/d3dd00119a

Google Scholar

[22] Li, Chunrong, et al. "Machine learning approach to predict Hansen solubility parameters of cocrystal coformers via integrating group contribution and COSMO-RS." Journal of Molecular Liquids (2024): 125319.

DOI: 10.1016/j.molliq.2024.125319

Google Scholar

[23] Saini, Vaneet. "Machine learning prediction of empirical polarity using SMILES encoding of organic solvents." Molecular diversity 27.5 (2023): 2331-2343.

DOI: 10.1007/s11030-022-10559-6

Google Scholar

[24] Abbott, Steven. "Solubility science: principles and practice." University of Leeds: Leeds, UK (2017): 109-110.

Google Scholar

[25] Chen, T. & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.

DOI: 10.1145/2939672.2939785

Google Scholar

[26] Dorogush, A.V., Ershov, V. & Gulin, A. (2018). CatBoost: gradient boosting with categorical features support. arXiv:1810.11363.

Google Scholar

[27] Ke, G. et al. (2017). LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Advances in Neural Information Processing Systems.

Google Scholar

[28] Dietterich, T.G. (2000). Ensemble methods in machine learning. Multiple Classifier Systems.

Google Scholar

[29] Akiba, T. et al. (2019). Optuna: A Next-generation Hyperparameter Optimization Framework. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.

DOI: 10.1145/3292500.3330701

Google Scholar

[30] RDKit: Open-source cheminformatics. https://www.rdkit.org

Google Scholar

[31] Moriwaki H, Tian Y-S, Kawashita N, Takagi T (2018) Mordred: a molecular descriptor calculator. Journal of Cheminformatics 10:4

DOI: 10.1186/s13321-018-0258-y

Google Scholar

[32] Rofik, Rofik, and Nurul Hidayat. "Improving the Accuracy of the Logistic Regression Algorithm Model using SelectKBest in Customer Prediction Based on Purchasing Behavior Patterns." Future Computer Science Journal 1.1 (2023): 9-17.

Google Scholar