Leveraging natural language processing to curate the tmCAT, tmPHOTO, tmBIO, and tmSCO datasets of functional transition metal complexes DOI Creative Commons
Ilia Kevlishvili, R. Michel, Aaron Garrison

et al.

Faraday Discussions, Journal Year: 2024, Volume and Issue: unknown

Published: June 26, 2024

The breadth of transition metal chemical space covered by databases such as the Cambridge Structural Database and derived computational database tmQM is not conducive to application-specific modeling development structure-property relationships. Here, we employ both supervised unsupervised natural language processing (NLP) techniques link experimentally synthesized compounds in their respective applications. Leveraging NLP models, curate four distinct datasets: tmCAT for catalysis, tmPHOTO photophysical activity, tmBIO biological relevance, tmSCO magnetism. Analyzing substructures within each dataset reveals common motifs designated We then use these structures augment our initial datasets application, yielding a total 21 631 tmCAT, 4599 tmPHOTO, 2782 tmBIO, 983 tmSCO. These are expected accelerate more targeted screening refined relationships with machine learning.

Language: Английский

Simulating Metal-Imidazole Complexes DOI Creative Commons
Zhen Li, Subhamoy Bhowmik, Luca Sagresti

et al.

Journal of Chemical Theory and Computation, Journal Year: 2024, Volume and Issue: 20(15), P. 6706 - 6716

Published: July 31, 2024

One commonly observed binding motif in metalloproteins involves the interaction between a metal ion and histidine's imidazole side chains. Although previous imidazole-M(II) parameters established flexibility reliability of 12–6–4 Lennard-Jones (LJ)-type nonbonded model by simply tuning ligating atom's polarizability, they have not been applied to multiple-imidazole complexes. To fill this gap, we systematically simulate complexes (ranging from one six) for five ions (Co(II), Cu(II), Mn(II), Ni(II), Zn(II)) which appear metalloproteins. Using extensive (40 ns per PMF window) sampling assemble free energy association profiles (using OPC water standard HID charge models AMBER) comparing equilibrium distances DFT calculations, new set was developed focus on energetic geometric features The obtained agree with experimental calculated distances. validate our model, show that can close thermodynamic cycle metal-imidazole up six molecules first solvation shell. Given success closing cycles, then used same extended method other (Ag(I), Ca(II), Cd(II), Cu(I), Fe(II), Mg(II)) obtain parameters. Since these reproduce one-imidazole geometry accurately, hypothesize will reasonably predict higher-level coordination numbers. Hence, did extend analysis Overall, results shed light metal–protein interactions emphasizing importance ligand–ligand metal-π-stacking within

Language: Английский

Citations

8

Integrating Machine Learning and Quantum Circuits for Proton Affinity Predictions DOI Creative Commons
H.-Q. Jin, Kenneth M. Merz

Journal of Chemical Theory and Computation, Journal Year: 2025, Volume and Issue: unknown

Published: Feb. 17, 2025

A key step in interpreting gas-phase ion mobility coupled with mass spectrometry (IM-MS) data for unknown structure prediction involves identifying the most favorable protonated structure. In gas phase, site of protonation is determined using proton affinity (PA) measurements. Currently, and ab initio computation methods are widely used to evaluate PA; however, both resource-intensive time-consuming. Therefore, there a critical need efficient estimate PA, enabling rapid identification complex organic molecules multiple binding sites. this work, we developed fast accurate method PA by descriptors combination machine learning (ML) models. Using comprehensive set 186 descriptors, our model demonstrated strong predictive performance, an R2 0.96 MAE 2.47 kcal/mol, comparable experimental uncertainty. Furthermore, designed quantum circuits as feature encoders classical neural network. To effectiveness hybrid quantum-classical model, compared its performance traditional ML models reduced derived from full set. correlation analysis showed that quantum-encoded representations have stronger positive target values than original features do. As result, outperformed counterpart achieved consistent same on noiseless simulator real hardware, highlighting potential predictions.

Language: Английский

Citations

0

Partial to Total Generation of 3D Transition-Metal Complexes DOI Creative Commons
H.-Q. Jin, Kenneth M. Merz

Journal of Chemical Theory and Computation, Journal Year: 2024, Volume and Issue: unknown

Published: Sept. 9, 2024

The design of transition-metal complexes (TMCs) has drawn much attention over the years because their important applications as metallodrugs and functional materials. In this work, we present an extension our recently reported approach, LigandDiff [Jin et al.

Language: Английский

Citations

3

Leveraging natural language processing to curate the tmCAT, tmPHOTO, tmBIO, and tmSCO datasets of functional transition metal complexes DOI Creative Commons
Ilia Kevlishvili,

Roland St. Michel,

Aaron Garrison

et al.

Published: May 6, 2024

The breadth of transition metal chemical space covered by databases such as the Cambridge Structural Database and derived computational database tmQM is not conducive to application-specific modeling development structure–property relationships. Here, we employ both supervised unsupervised natural language processing (NLP) techniques link experimentally synthesized compounds in their respective applications. Leveraging NLP models, curate four distinct datasets: tmCAT for catalysis, tmPHOTO photophysical activity, tmBIO biological relevance, tmSCO magnetism. Analyzing substructures within each dataset reveals common motifs designated We then use these structures augment our initial datasets application, yielding a total 21,631 tmCAT, 4,599 tmPHOTO, 2,782 tmBIO, 983 tmSCO. These are expected accelerate more targeted screening refined relationships with machine learning.

Language: Английский

Citations

1

Partial to Total Generation of 3D Transition Metal Complexes DOI Creative Commons
H.-Q. Jin, Kenneth M. Merz

Published: May 31, 2024

The design of transition metal complexes has drawn much attention over the years because their important applications as metallodrugs and functional materials. In this work, we present an extension our recently reported approach, LigandDiff. new model, which call multi-LigandDiff, is more flexible greatly outperforms its predecessor. This scaffold-based diffusion model allows de novo ligand either with existing ligands or without any ligand. Moreover, it users to predefine denticity generated Our results indicate that multi-LigandDiff can generate well-defined great transferability regard metals coordination geometries. terms application, successfully designs 338 Fe(II) SCO from only 47 experimentally validated complexes. And these are configurationally diverse reasonable. Overall, show ideal tool novel scratch.

Language: Английский

Citations

1

Partial to Total Generation of 3D Transition Metal Complexes DOI Creative Commons
H.-Q. Jin, Kenneth M. Merz

Published: May 31, 2024

The design of transition metal complexes has drawn much attention over the years because their important applications as metallodrugs and functional materials. In this work, we present an extension our recently reported approach, LigandDiff. new model, which call multi-LigandDiff, is more flexible greatly outperforms its predecessor. This scaffold-based diffusion model allows de novo ligand either with existing ligands or without any ligand. Moreover, it users to predefine denticity generated Our results indicate that multi-LigandDiff can generate well-defined great transferability regard metals coordination geometries. terms application, successfully designs 338 Fe(II) SCO from only 47 experimentally validated complexes. And these are configurationally diverse reasonable. Overall, show ideal tool novel scratch.

Language: Английский

Citations

1

Stable and accurate atomistic simulations of flexible molecules using conformationally generalisable machine learned potentials DOI Creative Commons
Christopher D. Williams, Jas Kalayan, Neil A. Burton

et al.

Chemical Science, Journal Year: 2024, Volume and Issue: 15(32), P. 12780 - 12795

Published: Jan. 1, 2024

We present a strategy for generating global machine learned potentials capable of accurate, fast and stable atomistic simulations flexible molecules. Key to stability is training datasets that contain all conformers the target molecule.

Language: Английский

Citations

1

Toward AI/ML-assisted discovery of transition metal complexes DOI
H.-Q. Jin, Kenneth M. Merz

Annual reports in computational chemistry, Journal Year: 2024, Volume and Issue: unknown

Published: Jan. 1, 2024

Language: Английский

Citations

1

Leveraging natural language processing to curate the tmCAT, tmPHOTO, tmBIO, and tmSCO datasets of functional transition metal complexes DOI Creative Commons
Ilia Kevlishvili, R. Michel, Aaron Garrison

et al.

Faraday Discussions, Journal Year: 2024, Volume and Issue: unknown

Published: June 26, 2024

The breadth of transition metal chemical space covered by databases such as the Cambridge Structural Database and derived computational database tmQM is not conducive to application-specific modeling development structure-property relationships. Here, we employ both supervised unsupervised natural language processing (NLP) techniques link experimentally synthesized compounds in their respective applications. Leveraging NLP models, curate four distinct datasets: tmCAT for catalysis, tmPHOTO photophysical activity, tmBIO biological relevance, tmSCO magnetism. Analyzing substructures within each dataset reveals common motifs designated We then use these structures augment our initial datasets application, yielding a total 21 631 tmCAT, 4599 tmPHOTO, 2782 tmBIO, 983 tmSCO. These are expected accelerate more targeted screening refined relationships with machine learning.

Language: Английский

Citations

0