Large language models in bioinformatics: applications and perspectives DOI Creative Commons
Jiajia Liu,

Mengyuan Yang,

Yankai Yu

и другие.

arXiv (Cornell University), Год журнала: 2024, Номер unknown

Опубликована: Янв. 1, 2024

Large language models (LLMs) are a class of artificial intelligence based on deep learning, which have great performance in various tasks, especially natural processing (NLP). typically consist neural networks with numerous parameters, trained large amounts unlabeled input using self-supervised or semi-supervised learning. However, their potential for solving bioinformatics problems may even exceed proficiency modeling human language. In this review, we will present summary the prominent used processing, such as BERT and GPT, focus exploring applications at different omics levels bioinformatics, mainly including genomics, transcriptomics, proteomics, drug discovery single cell analysis. Finally, review summarizes prospects bioinformatic problems.

Язык: Английский

Machine Learning Methods for Small Data Challenges in Molecular Science DOI

Bozheng Dou,

Zailiang Zhu,

Ekaterina Merkurjev

и другие.

Chemical Reviews, Год журнала: 2023, Номер 123(13), С. 8736 - 8780

Опубликована: Июнь 29, 2023

Small data are often used in scientific and engineering research due to the presence of various constraints, such as time, cost, ethics, privacy, security, technical limitations acquisition. However, big have been focus for past decade, small their challenges received little attention, even though they technically more severe machine learning (ML) deep (DL) studies. Overall, challenge is compounded by issues, diversity, imputation, noise, imbalance, high-dimensionality. Fortunately, current era characterized technological breakthroughs ML, DL, artificial intelligence (AI), which enable data-driven discovery, many advanced ML DL technologies developed inadvertently provided solutions problems. As a result, significant progress has made decade. In this review, we summarize analyze several emerging potential molecular science, including chemical biological sciences. We review both basic algorithms, linear regression, logistic regression (LR),

Язык: Английский

Процитировано

181

Generative Models as an Emerging Paradigm in the Chemical Sciences DOI Creative Commons
Dylan M. Anstine, Olexandr Isayev

Journal of the American Chemical Society, Год журнала: 2023, Номер 145(16), С. 8736 - 8750

Опубликована: Апрель 13, 2023

Traditional computational approaches to design chemical species are limited by the need compute properties for a vast number of candidates, e.g., discriminative modeling. Therefore, inverse methods aim start from desired property and optimize corresponding structure. From machine learning viewpoint, problem can be addressed through so-called generative Mathematically, models defined probability distribution function given molecular or material In contrast, model seeks exploit joint with target characteristics. The overarching idea modeling is implement system that produces novel compounds expected have set features, effectively sidestepping issues found in forward process. this contribution, we overview critically analyze popular algorithms like adversarial networks, variational autoencoders, flow, diffusion models. We highlight key differences between each models, provide insights into recent success stories, discuss outstanding challenges realizing discovered solutions applications.

Язык: Английский

Процитировано

169

A Survey on Generative Diffusion Models DOI
Hanqun Cao, Cheng Tan, Zhangyang Gao

и другие.

IEEE Transactions on Knowledge and Data Engineering, Год журнала: 2024, Номер 36(7), С. 2814 - 2830

Опубликована: Фев. 2, 2024

Deep generative models have unlocked another profound realm of human creativity. By capturing and generalizing patterns within data, we entered the epoch all-encompassing Artificial Intelligence for General Creativity (AIGC). Notably, diffusion models, recognized as one paramount materialize ideation into tangible instances across diverse domains, encompassing imagery, text, speech, biology, healthcare. To provide advanced comprehensive insights diffusion, this survey comprehensively elucidates its developmental trajectory future directions from three distinct angles: fundamental formulation algorithmic enhancements, manifold applications diffusion. Each layer is meticulously explored to offer a comprehension evolution. Structured summarized approaches are presented here.

Язык: Английский

Процитировано

111

UVCGAN: UNet Vision Transformer cycle-consistent GAN for unpaired image-to-image translation DOI
D. Torbunov, Yi Huang, H. Yu

и другие.

2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Год журнала: 2023, Номер unknown, С. 702 - 712

Опубликована: Янв. 1, 2023

Unpaired image-to-image translation has broad applications in art, design, and scientific simulations. One early breakthrough was CycleGAN that emphasizes one-to-one mappings between two unpaired image domains via generative-adversarial networks (GAN) coupled with the cycle-consistency constraint, while more recent works promote one-to-many mapping to boost diversity of translated images. Motivated by simulation needs, this work revisits classic framework boosts its performance outperform contemporary models without relaxing constraint. To achieve this, we equip generator a Vision Transformer (ViT) employ necessary training regularization techniques. Compared previous best-performing models, our model performs better retains strong correlation original image. An accompanying ablation study shows both gradient penalty self-supervised pre-training are crucial improvement. reproducibility open science, source code, hyperparameter configurations, pre-trained available at https://github.com/LS4GAN/uvcgan.

Язык: Английский

Процитировано

96

Exploring catalytic reaction networks with machine learning DOI
Johannes T. Margraf, Hyunwook Jung, Christoph Scheurer

и другие.

Nature Catalysis, Год журнала: 2023, Номер 6(2), С. 112 - 121

Опубликована: Янв. 26, 2023

Язык: Английский

Процитировано

88

Computer-aided multi-objective optimization in small molecule discovery DOI Creative Commons
Jenna C. Fromer,

Connor W. Coley

Patterns, Год журнала: 2023, Номер 4(2), С. 100678 - 100678

Опубликована: Фев. 1, 2023

Molecular discovery is a multi-objective optimization problem that requires identifying molecule or set of molecules balance multiple, often competing, properties. Multi-objective molecular design commonly addressed by combining properties interest into single objective function using scalarization, which imposes assumptions about relative importance and uncovers little the trade-offs between objectives. In contrast to Pareto does not require knowledge reveals However, it introduces additional considerations in algorithm design. this review, we describe pool-based de novo generative approaches with focus on algorithms. We show how relatively direct extension Bayesian plethora different models extend from single-objective similar ways non-dominated sorting reward (reinforcement learning) select for retraining (distribution propagation (genetic algorithms). Finally, discuss some remaining challenges opportunities field, emphasizing opportunity adopt techniques

Язык: Английский

Процитировано

75

Application of Computational Biology and Artificial Intelligence in Drug Design DOI Open Access
Yue Zhang, Mengqi Luo, Peng Wu

и другие.

International Journal of Molecular Sciences, Год журнала: 2022, Номер 23(21), С. 13568 - 13568

Опубликована: Ноя. 5, 2022

Traditional drug design requires a great amount of research time and developmental expense. Booming computational approaches, including biology, computer-aided design, artificial intelligence, have the potential to expedite efficiency discovery by minimizing financial cost. In recent years, approaches are being widely used improve efficacy effectiveness pipeline, leading approval plenty new drugs for marketing. The present review emphasizes on applications these indispensable in aiding target identification, lead discovery, optimization. Some challenges using also discussed. Moreover, we propose methodology integrating various techniques into design.

Язык: Английский

Процитировано

70

Reinvent 4: Modern AI–driven generative molecule design DOI Creative Commons
Hannes H. Loeffler, Jiazhen He, Alessandro Tibo

и другие.

Journal of Cheminformatics, Год журнала: 2024, Номер 16(1)

Опубликована: Фев. 21, 2024

REINVENT 4 is a modern open-source generative AI framework for the design of small molecules. The software utilizes recurrent neural networks and transformer architectures to drive molecule generation. These generators are seamlessly embedded within general machine learning optimization algorithms, transfer learning, reinforcement curriculum learning. enables facilitates de novo design, R-group replacement, library linker scaffold hopping optimization. This contribution gives an overview describes its design. Algorithms their applications discussed in detail. command line tool which reads user configuration either TOML or JSON format. aim this release provide reference implementations some most common algorithms based An additional goal with create education future innovation molecular available from https://github.com/MolecularAI/REINVENT4 released under permissive Apache 2.0 license. Scientific contribution. provides implementation where also being used production support in-house drug discovery projects. publication one code full documentation thereof will increase transparency foster innovation, collaboration education.

Язык: Английский

Процитировано

51

Generative AI for designing and validating easily synthesizable and structurally novel antibiotics DOI
Kyle Swanson, Gary Liu,

Denise B. Catacutan

и другие.

Nature Machine Intelligence, Год журнала: 2024, Номер 6(3), С. 338 - 353

Опубликована: Март 22, 2024

Язык: Английский

Процитировано

50

High-throughput property-driven generative design of functional organic molecules DOI
Julia Westermayr, Joe Gilkes,

Rhyan Barrett

и другие.

Nature Computational Science, Год журнала: 2023, Номер 3(2), С. 139 - 148

Опубликована: Фев. 6, 2023

Язык: Английский

Процитировано

48