Data Generation for Machine Learning Interatomic Potentials and Beyond DOI
Maksim Kulichenko, Benjamin Nebgen, Nicholas Lubbers

et al.

Chemical Reviews, Journal Year: 2024, Volume and Issue: 124(24), P. 13681 - 13714

Published: Nov. 21, 2024

The field of data-driven chemistry is undergoing an evolution, driven by innovations in machine learning models for predicting molecular properties and behavior. Recent strides ML-based interatomic potentials have paved the way accurate modeling diverse chemical structural at atomic level. key determinant defining MLIP reliability remains quality training data. A paramount challenge lies constructing sets that capture specific domains vast space. This Review navigates intricate landscape essential components integrity data ensure extensibility transferability resulting models. We delve into details active learning, discussing its various facets implementations. outline different types uncertainty quantification applied to atomistic acquisition correlations between estimated true error. role samplers generating informative structures highlighted. Furthermore, we discuss via modified surrogate potential energy surfaces as innovative approach diversify also provides a list publicly available cover

Language: Английский

Machine Learning for Electrocatalyst and Photocatalyst Design and Discovery DOI
Haoxin Mai, Tu C. Le, Dehong Chen

et al.

Chemical Reviews, Journal Year: 2022, Volume and Issue: 122(16), P. 13478 - 13515

Published: July 21, 2022

Electrocatalysts and photocatalysts are key to a sustainable future, generating clean fuels, reducing the impact of global warming, providing solutions environmental pollution. Improved processes for catalyst design better understanding electro/photocatalytic essential improving effectiveness. Recent advances in data science artificial intelligence have great potential accelerate electrocatalysis photocatalysis research, particularly rapid exploration large materials chemistry spaces through machine learning. Here comprehensive introduction to, critical review of, learning techniques used research provided. Sources electro/photocatalyst current approaches representing these by mathematical features described, most commonly methods summarized, quality utility models evaluated. Illustrations how applied novel discovery elucidate electrocatalytic or photocatalytic reaction mechanisms The offers guide scientists on selection research. application catalysis represents paradigm shift way advanced, next-generation catalysts will be designed synthesized.

Language: Английский

Citations

290

SELFIES and the future of molecular string representations DOI Creative Commons
Mario Krenn, Qianxiang Ai, Senja Barthel

et al.

Patterns, Journal Year: 2022, Volume and Issue: 3(10), P. 100588 - 100588

Published: Oct. 1, 2022

Artificial intelligence (AI) and machine learning (ML) are expanding in popularity for broad applications to challenging tasks chemistry materials science. Examples include the prediction of properties, discovery new reaction pathways, or design molecules. The needs read write fluently a chemical language each these tasks. Strings common tool represent molecular graphs, most popular string representation, Smiles, has powered cheminformatics since late 1980s. However, context AI ML chemistry, Smiles several shortcomings—most pertinently, combinations symbols lead invalid results with no valid interpretation. To overcome this issue, molecules was introduced 2020 that guarantees 100% robustness: SELF-referencing embedded (Selfies). Selfies simplified enabled numerous chemistry. In perspective, we look future discuss representations, along their respective opportunities challenges. We propose 16 concrete projects robust representations. These involve extension toward domains, exciting questions at interface languages, interpretability both humans machines. hope proposals will inspire follow-up works exploiting full potential representations

Language: Английский

Citations

160

Representations of molecules and materials for interpolation of quantum-mechanical simulations via machine learning DOI Creative Commons
Marcel F. Langer,

Alex Goeßmann,

Matthias Rupp

et al.

npj Computational Materials, Journal Year: 2022, Volume and Issue: 8(1)

Published: March 16, 2022

Computational study of molecules and materials from first principles is a cornerstone physics, chemistry, science, but limited by the cost accurate precise simulations. In settings involving many simulations, machine learning can reduce these costs, often orders magnitude, interpolating between reference This requires representations that describe any molecule or material support interpolation. We comprehensively review discuss current relations them, using unified mathematical framework based on many-body functions, group averaging, tensor products. For selected state-of-the-art representations, we compare energy predictions for organic molecules, binary alloys, Al-Ga-In sesquioxides in numerical experiments controlled data distribution, regression method, hyper-parameter optimization.

Language: Английский

Citations

118

The central role of density functional theory in the AI age DOI Open Access
Bing Huang, Guido Falk von Rudorff, O. Anatole von Lilienfeld

et al.

Science, Journal Year: 2023, Volume and Issue: 381(6654), P. 170 - 175

Published: July 13, 2023

Density functional theory (DFT) plays a pivotal role for the chemical and materials science due to its relatively high predictive power, applicability, versatility computational efficiency. We review recent progress in machine learning model developments which has relied heavily on density synthetic data generation design of architectures. The general relevance these is placed some broader context sciences. Resulting DFT based models with efficiency, accuracy, scalability, transferability (EAST), indicates probable ways routine use successful experimental planning software within self-driving laboratories.

Language: Английский

Citations

106

Toward Excellence of Electrocatalyst Design by Emerging Descriptor‐Oriented Machine Learning DOI
Jianwen Liu, Wenzhi Luo, Lei Wang

et al.

Advanced Functional Materials, Journal Year: 2022, Volume and Issue: 32(17)

Published: Jan. 15, 2022

Abstract Machine learning (ML) is emerging as a powerful tool for identifying quantitative structure–activity relationships to accelerate electrocatalyst design by from historic data without explicit programming. The algorithms, data/database, and descriptors are usually the decisive factors ML play pivotal role electrocatalysis they contain essence of catalysis physicochemical nature. Despite considerable research efforts regarding with ML, lack universal selection tactics bridging gap between structures activity impedes its wider application. A timely summary application in helps deepen understanding nature improve scope efficiency. This review summarizes geometrical, electronic, used input training predicting reveal general rules their electrocatalysts. In response challenges hydrogen evolution reaction, oxygen reduction CO 2 nitrogen these areas tracked progress prospective changes. Additionally, potential automated discovery discussed other well‐known electrocatalytic processes.

Language: Английский

Citations

80

Accelerated chemical science with AI DOI Creative Commons
Seoin Back,

Alán Aspuru-Guzik,

Michele Ceriotti

et al.

Digital Discovery, Journal Year: 2023, Volume and Issue: 3(1), P. 23 - 33

Published: Dec. 6, 2023

The ASLLA Symposium focused on accelerating chemical science with AI. Discussions data, new applications, algorithms, and education were summarized. Recommendations for researchers, educators, academic bodies provided.

Language: Английский

Citations

46

Biomolecular dynamics with machine-learned quantum-mechanical force fields trained on diverse chemical fragments DOI Creative Commons
Oliver T. Unke,

Martin Stöhr,

Stefan Ganscha

et al.

Science Advances, Journal Year: 2024, Volume and Issue: 10(14)

Published: April 5, 2024

The GEMS method enables molecular dynamics simulations of large heterogeneous systems at ab initio quality.

Language: Английский

Citations

31

ChatGPT in the Material Design: Selected Case Studies to Assess the Potential of ChatGPT DOI
Jyotirmoy Deb, Lakshi Saikia, Kripa Dristi Dihingia

et al.

Journal of Chemical Information and Modeling, Journal Year: 2024, Volume and Issue: 64(3), P. 799 - 811

Published: Jan. 18, 2024

The pursuit of designing smart and functional materials is paramount importance across various domains, such as material science, engineering, chemical technology, electronics, biomedicine, energy, numerous others. Consequently, researchers are actively involved in the development innovative models strategies for design. Recent advancements analytical tools, experimentation, computer technology additionally enhance design possibilities. Notably, data-driven techniques like artificial intelligence machine learning have achieved substantial progress exploring applications within science. One approach, ChatGPT, a large language model, holds transformative potential addressing complex queries. In this article, we explore ChatGPT's understanding science by assigning some simple tasks subareas computational findings indicate that while ChatGPT may make minor errors accomplishing general tasks, it demonstrates capability to learn adapt through human interactions. However, issues output consistency, probable hidden errors, ethical consequences should be addressed.

Language: Английский

Citations

20

Extrapolative prediction of small-data molecular property using quantum mechanics-assisted machine learning DOI Creative Commons
Hajime Shimakawa, Akiko Kumada, Masahiro Sato

et al.

npj Computational Materials, Journal Year: 2024, Volume and Issue: 10(1)

Published: Jan. 10, 2024

Abstract Data-driven materials science has realized a new paradigm by integrating domain knowledge and machine-learning (ML) techniques. However, ML-based research often overlooked the inherent limitation in predicting unknown data: extrapolative performance, especially when dealing with small-scale experimental datasets. Here, we present comprehensive benchmark for assessing performance across 12 organic molecular properties. Our large-scale reveals that conventional ML models exhibit remarkable degradation beyond training distribution of property range structures, particularly small-data To address this challenge, introduce quantum-mechanical (QM) descriptor dataset, called QMex, an interactive linear regression (ILR), which incorporates interaction terms between QM descriptors categorical information pertaining to structures. The QMex-based ILR achieved state-of-the-art while preserving its interpretability. results, QMex proposed model serve as valuable assets improving predictions small datasets discovery novel materials/molecules surpass existing candidates.

Language: Английский

Citations

18

Inverse design of promising electrocatalysts for CO2 reduction via generative models and bird swarm algorithm DOI Creative Commons
Zhilong Song,

Linfeng Fan,

Shuaihua Lu

et al.

Nature Communications, Journal Year: 2025, Volume and Issue: 16(1)

Published: Jan. 26, 2025

Directly generating material structures with optimal properties is a long-standing goal in design. Traditional generative models often struggle to efficiently explore the global chemical space, limiting their utility localized space. Here, we present framework named Material Generation Efficient Global Chemical Space Search (MAGECS) that addresses this challenge by integrating bird swarm algorithm and supervised graph neural networks, enabling effective navigation of immense space towards materials target properties. Applied design alloy electrocatalysts for CO2 reduction (CO2RR), MAGECS generates over 250,000 structures, achieving 2.5-fold increase high-activity (35%) compared random generation. Five predicted alloys— CuAl, AlPd, Sn2Pd5, Sn9Pd7, CuAlSe2 are synthesized characterized, two showing around 90% Faraday efficiency CO2RR. This work highlights potential revolutionize functional development, paving way fully automated, artificial intelligence-driven Designing longstanding challenge, as current methods vast effectively. authors combine model optimization novel highly active electroreduction.

Language: Английский

Citations

2