Materials property prediction with uncertainty quantification: A benchmark study DOI Open Access
Daniel Varivoda, Rongzhi Dong, Sadman Sadeed Omee

et al.

Applied Physics Reviews, Journal Year: 2023, Volume and Issue: 10(2)

Published: May 23, 2023

Uncertainty quantification (UQ) has increasing importance in the building of robust high-performance and generalizable materials property prediction models. It can also be used active learning to train better models by focusing on gathering new training data from uncertain regions. There are several categories UQ methods, each considering different types uncertainty sources. Here, we conduct a comprehensive evaluation methods for graph neural network-based evaluate how they truly reflect that want error bound estimation or learning. Our experimental results over four crystal datasets (including formation energy, adsorption total bandgap properties) show popular ensemble NOT always best choice prediction. For convenience community, all source code accessed freely at https://github.com/usccolumbia/materialsUQ.

Language: Английский

Meta-MolNet: A Cross-Domain Benchmark for Few Examples Drug Discovery DOI
Qiujie Lv, Guanxing Chen, Ziduo Yang

et al.

IEEE Transactions on Neural Networks and Learning Systems, Journal Year: 2024, Volume and Issue: 36(3), P. 4849 - 4863

Published: Feb. 14, 2024

Predicting the pharmacological activity, toxicity, and pharmacokinetic properties of molecules is a central task in drug discovery. Existing machine learning methods are transferred from one resource rich molecular property to another data scarce same scaffold dataset. However, existing models may produce fragile highly uncertain predictions for new molecules. And these were tested on different benchmarks, which seriously affected quality their evaluation results. In this article, we introduce Meta-MolNet, collection benchmark algorithms, standard platform measuring model generalization uncertainty quantification capabilities. Meta-MolNet manages wide range datasets with high ratio molecules/scaffolds, often leads more difficult shift problems. Furthermore, propose graph attention network based cross-domain meta-learning, Meta-GAT, uses bilevel optimization learn meta-knowledge family dataset source domain. Meta-GAT benefits that reduces requirement sample complexity enable reliable target domain through internal iteration few examples. We evaluate as baselines community, demonstrates effectiveness proposed algorithm quantification. Extensive experiments demonstrate has state-of-the-art performance robustly estimates under examples constraints. By publishing AI-ready data, frameworks, baseline results, hope see suite become comprehensive AI-assisted discovery community. freely accessible at https://github.com/lol88/Meta-MolNet.

Language: Английский

Citations

9

Benchmarking uncertainty quantification for protein engineering DOI Creative Commons
Kevin P. Greenman, Ava P. Amini, Kevin Yang

et al.

PLoS Computational Biology, Journal Year: 2025, Volume and Issue: 21(1), P. e1012639 - e1012639

Published: Jan. 7, 2025

Machine learning sequence-function models for proteins could enable significant advances in protein engineering, especially when paired with state-of-the-art methods to select new sequences property optimization and/or model improvement. Such (Bayesian and active learning) require calibrated estimations of uncertainty. While studies have benchmarked a variety deep uncertainty quantification (UQ) on standard molecular machine-learning datasets, it is not clear if these results extend datasets. In this work, we implemented panel UQ regression tasks from the Fitness Landscape Inference Proteins (FLIP) benchmark. We compared across different degrees distributional shift using metrics that assess each method’s accuracy, calibration, coverage, width, rank correlation. Additionally, one-hot encoding pretrained language representations, tested retrospective Bayesian settings. Our indicate there no single best method all splits, metrics, uncertainty-based sampling often unable outperform greedy optimization. These benchmarks us provide recommendations more effective design biological machine learning.

Language: Английский

Citations

1

MaterialsAtlas.org: a materials informatics web app platform for materials discovery and survey of state-of-the-art DOI Creative Commons
Jianjun Hu,

Stanislav Stefanov,

Yuqi Song

et al.

npj Computational Materials, Journal Year: 2022, Volume and Issue: 8(1)

Published: April 11, 2022

Abstract The availability and easy access of large-scale experimental computational materials data have enabled the emergence accelerated development algorithms models for property prediction, structure generative design materials. However, lack user-friendly informatics web servers has severely constrained wide adoption such tools in daily practice screening, tinkering, space exploration by scientists. Herein we first survey current apps then propose develop MaterialsAtlas.org, a web-based toolbox discovery, which includes variety routinely needed exploratory including material’s composition validity check (e.g. charge neutrality, electronegativity balance, dynamic stability, Pauling rules), prediction band gap, elastic moduli, hardness, thermal conductivity), search hypothetical materials, utility tools. These can be freely accessed at http://www.materialsatlas.org . We argue that should widely developed community to speed up discovery processes.

Language: Английский

Citations

36

Protein engineering via Bayesian optimization-guided evolutionary algorithm and robotic experiments DOI Open Access
Ruyun Hu,

Lihao Fu,

Yongcan Chen

et al.

Briefings in Bioinformatics, Journal Year: 2022, Volume and Issue: 24(1)

Published: Dec. 23, 2022

Directed protein evolution applies repeated rounds of genetic mutagenesis and phenotypic screening is often limited by experimental throughput. Through in silico prioritization mutant sequences, machine learning has been applied to reduce wet lab burden a level practical for human researchers. On the other hand, robotics permits large batches rapid iterations engineering cycles, but such capacities have not well exploited existing learning-assisted directed approaches. Here, we report scalable batched method, Bayesian Optimization-guided EVOlutionary (BO-EVO) algorithm, guide multiple robotic experiments explore fitness landscapes combinatorial libraries. We first examined various design specifications based on an empirical landscape G domain B1. Then, BO-EVO was successfully generalized another Escherichia coli kinase PhoQ, as simulated NK with up moderate epistasis. This approach then library creation engineer enzyme specificity RhlA, key biosynthetic rhamnolipid biosurfactants. A 4.8-fold improvement producing target congener achieved after examining less than 1% all possible mutants four iterations. Overall, proves be efficient general without prior knowledge.

Language: Английский

Citations

30

Materials property prediction with uncertainty quantification: A benchmark study DOI Open Access
Daniel Varivoda, Rongzhi Dong, Sadman Sadeed Omee

et al.

Applied Physics Reviews, Journal Year: 2023, Volume and Issue: 10(2)

Published: May 23, 2023

Uncertainty quantification (UQ) has increasing importance in the building of robust high-performance and generalizable materials property prediction models. It can also be used active learning to train better models by focusing on gathering new training data from uncertain regions. There are several categories UQ methods, each considering different types uncertainty sources. Here, we conduct a comprehensive evaluation methods for graph neural network-based evaluate how they truly reflect that want error bound estimation or learning. Our experimental results over four crystal datasets (including formation energy, adsorption total bandgap properties) show popular ensemble NOT always best choice prediction. For convenience community, all source code accessed freely at https://github.com/usccolumbia/materialsUQ.

Language: Английский

Citations

21