Enabling Data-Driven Solubility Modeling at GSK: Enhancing Purge Predictions for Mutagenic Impurities DOI
Luigi Da Vià,

Matthias Depoortere,

Robert D. Willacy

et al.

Organic Process Research & Development, Journal Year: 2024, Volume and Issue: unknown

Published: Nov. 11, 2024

In the pharmaceutical industry, solubility is a critical parameter influencing various stages of drug development, from early discovery to commercial manufacturing. This work showcases high-throughput screening workflow and describes steps required standardize curate data suitably allow automated flow. Using high-quality data, we developed quantitative structure–property relationship model using gradient boosting molecular descriptors, requiring only 2D structure generate predictions. The accuracy competitive with alternative approaches where additional physical not required. A key use case for predictions made in this way developing control strategies mutagenic impurities, allowing data-driven consistent method calculating contribution purge calculations. Further perspective given on future application as prediction algorithm approach methodologies supporting development general, highlighting potential federated learning which technological overcome barrier cross-industry sharing.

Language: Английский

Frontiers of molecular crystal structure prediction for pharmaceuticals and functional organic materials DOI Creative Commons
Gregory J. O. Beran

Chemical Science, Journal Year: 2023, Volume and Issue: 14(46), P. 13290 - 13312

Published: Jan. 1, 2023

Molecular crystal structure prediction has matured to the point where it can routinely facilitate discovery and design of new organic materials.

Language: Английский

Citations

41

Pharmaceutical Digital Design: From Chemical Structure through Crystal Polymorph to Conceptual Crystallization Process DOI Creative Commons
Christopher L. Burcham, Michael F. Doherty, Baron Peters

et al.

Crystal Growth & Design, Journal Year: 2024, Volume and Issue: 24(13), P. 5417 - 5438

Published: June 24, 2024

A workflow for the digital design of crystallization processes starting from chemical structure active pharmaceutical ingredient (API) is a multistep, multidisciplinary process. simple version would be to first predict API crystal and, it, corresponding properties solubility, morphology, and growth rates, assuming that nucleation controlled by seeding, then use these parameters This usually an oversimplification as most APIs are polymorphic, stable alone may not have required development into drug product. perspective, experience Lilly Digital Design project, considers fundamental theoretical basis prediction (CSP), free energy, rate prediction, current state simulation. illustrated applying modeling techniques real examples, olanzapine succinic acid. We demonstrate promise using ab initio computer solid form selection process in development. also identify open problems application computational achieving accuracy immediate implementation currently limit applicability approach.

Language: Английский

Citations

10

Predicting polymer solubility from phase diagrams to compatibility: a perspective on challenges and opportunities DOI Creative Commons
Jeffrey G. Ethier, Evan R. Antoniuk, Blair Brettmann

et al.

Soft Matter, Journal Year: 2024, Volume and Issue: 20(29), P. 5652 - 5669

Published: Jan. 1, 2024

Advances in physical models and data science are improving predictions of polymer–solvent phase behavior we discuss the different approaches taken today remaining barriers to making broadly useful predictions.

Language: Английский

Citations

10

A Machine Learning Approach for the Prediction of Aqueous Solubility of Pharmaceuticals: A Comparative Model and Dataset Analysis DOI Creative Commons
Mohammad Amin Ghanavati, Soroush Ahmadi, Sohrab Rohani

et al.

Digital Discovery, Journal Year: 2024, Volume and Issue: unknown

Published: Jan. 1, 2024

Three ML models and their ensemble predict aqueous solubility of small organic molecules using different representations: GCN with molecular graphs, EdgeConv ESP maps, XGBoost tabular features from Mordred descriptors.

Language: Английский

Citations

6

Exploration of the Solubility Hyperspace of Selected Active Pharmaceutical Ingredients in Choline- and Betaine-Based Deep Eutectic Solvents: Machine Learning Modeling and Experimental Validation DOI Creative Commons
Piotr Cysewski, Tomasz Jeliński, Maciej Przybyłek

et al.

Molecules, Journal Year: 2024, Volume and Issue: 29(20), P. 4894 - 4894

Published: Oct. 16, 2024

Deep eutectic solvents (DESs) are popular green media used for various industrial, pharmaceutical, and biomedical applications. However, the possible compositions of systems so numerous that it is impossible to study all them experimentally. To remedy this limitation, solubility landscape selected active pharmaceutical ingredients (APIs) in choline chloride- betaine-based deep was explored using theoretical models based on machine learning. The available data APIs, comprising a total 8014 points, were collected neat solvents, binary solvent mixtures, DESs. This set augmented with new measurements sulfa drugs dry descriptors learning protocol obtained from σ-profiles considered molecules computed within COSMO-RS framework. A combination six sets 36 regressors tested. Taking into account both accuracy generalization, concluded best regressor nuSVR regressor-based predictive trained relative intermolecular interactions twelve-step averaged simplification σ-profiles.

Language: Английский

Citations

6

Prediction of acetylene solubility by a mechanism-data hybrid-driven machine learning model constructed based on COSMO-RS theory DOI

Yao Mu,

Tianying Dai,

Jiahe Fan

et al.

Journal of Molecular Liquids, Journal Year: 2024, Volume and Issue: 414, P. 126194 - 126194

Published: Oct. 5, 2024

Language: Английский

Citations

4

Towards the prediction of drug solubility in binary solvent mixtures at various temperatures using machine learning DOI Creative Commons
Zeqing Bao, Gary Tom,

Austin Cheng

et al.

Journal of Cheminformatics, Journal Year: 2024, Volume and Issue: 16(1)

Published: Oct. 28, 2024

Abstract Drug solubility is an important parameter in the drug development process, yet it often tedious and challenging to measure, especially for expensive drugs or those available small quantities. To alleviate these challenges, machine learning (ML) has been applied predict as alternative approach. However, majority of existing ML research focused on predictions aqueous and/or at specific temperatures, which restricts model applicability pharmaceutical development. bridge this gap, we compiled a dataset 27,000 datapoints, including molecules measured range binary solvent mixtures under various temperatures. Next, panel models were trained with their hyperparameters tuned using Bayesian optimization. The resulting top-performing models, both gradient boosted decision trees (light boosting extreme boosting), achieved mean absolute errors (MAE) 0.33 LogS (S g/100 g) holdout set. These further validated through prospective study, wherein four predicted by then in-house experiments. This study demonstrated that accurately solutes different whose features closely align within (MAE < 0.5 LogS). support future facilitate advancements field, have made code openly available. Scientific contribution Our advances state-of-the-art predicting leveraging uniquely comprehensive dataset. Unlike studies predominantly focus solvents fixed our work enables prediction variety over broad temperature range, providing practical insights modeling realistic applications. along open access significant steps process new molecule discovery, analysis formulation. Graphical

Language: Английский

Citations

4

Developing a model-driven workflow for the digital design of small-scale batch cooling crystallisation with the antiviral lamivudine DOI Creative Commons

Thomas Pickles,

Chantal L. Mustoe, Christopher Boyle

et al.

CrystEngComm, Journal Year: 2024, Volume and Issue: 26(6), P. 822 - 834

Published: Jan. 1, 2024

A model-driven workflow that uses digital tools and small-scale experiments to maximise the efficiency in achieving a desired set of crystallisation responses, kinetics objectives.

Language: Английский

Citations

3

Hybrid Semi-mechanistic and Machine Learning Solubility Regression Modeling for Crystallization Process Development DOI
Gustavo Lunardon Quilló, Satyajeet Bhonsale, A. Collas

et al.

Crystal Growth & Design, Journal Year: 2025, Volume and Issue: unknown

Published: Feb. 10, 2025

Solubility regression modeling is foundational for several chemical engineering applications, particularly crystallization process development. Traditionally, these models rely on parametric semimechanistic approaches such as the Van't Hoff Jouyban-Acree (VH-JA) cosolvency model. Although generally provide narrow prediction intervals, they can exhibit increased bias when dealing with significant solute heat capacities or complex mixture effects. This study explores machine learning, including Random Forests, Support Vector Machines, Gaussian Process Regression, and Neural Networks, potential alternatives. While most learning offered a lower training error, it was observed that their predictive quality quickly deteriorates further from data. Hence, hybrid approach explored to leverage low of variance VH-JA model through heterogeneous locally weighted bagging ensembles. Key methodology quantifying, tracking, minimizing uncertainty using ensemble. illustrated case solubility ketoconazole in binary mixtures 2-propanol water. The optimal ensemble, comprising 58% stepwise 42% models, reduced root-mean-squared error maximum absolute percentage by ≈30% compared full VH-JA, while preserving comparable interval.

Language: Английский

Citations

0

Interactive Knowledge-Based Kernel PCA for Solvent Selection DOI Creative Commons
Samuel Boobier,

Joseph Heeley,

Thomas Gärtner

et al.

ACS Sustainable Chemistry & Engineering, Journal Year: 2025, Volume and Issue: 13(11), P. 4349 - 4368

Published: March 14, 2025

Selecting more sustainable solvents is a crucial component to mitigating the environmental impacts of chemical processes. Numerous tools have been developed address this problem within pharmaceutical industry, employing data-driven approaches such as multidimensional scaling or principal analysis (PCA). Interactive knowledge-based kernel PCA variant that allows users shape 2D solvent maps by defining positions data points, imparting expert knowledge was not included in original descriptor set. We applied interactive task selection and present an intuitive interface integrated into AI4Green, electronic laboratory notebook encourages chemistry. A set evidence-based user guidelines were used combination with identify four potential substitutions for example thioesterification reaction.

Language: Английский

Citations

0