Hyena architecture enables fast and efficient protein language modeling DOI Creative Commons
Y. T. Zhang, Bian Bian,

Manabu Okumura

et al.

iMetaOmics., Journal Year: 2024, Volume and Issue: unknown

Published: Dec. 7, 2024

Abstract The emergence of self‐supervised deep language models has revolutionized natural processing tasks and recently extended its applications to biological sequence analysis. Traditional models, primarily based on Transformer architectures, demonstrate substantial effectiveness in various applications. However, these are inherently constrained by the attention mechanism's quadratic computational complexity, , which limits their efficiency leads high costs. To address limitations, we introduce ProtHyena, a novel approach that leverages Hyena operator protein modeling. This innovative methodology alternates between subquadratic long convolutions element‐wise gating operations, circumvents constraints imposed mechanisms reduces complexity levels. enables faster more memory‐efficient modeling sequences. ProtHyena can achieve state‐of‐the‐art results comparable performance 8 downstream tasks, including engineering (protein fluorescence stability prediction), property prediction (neuropeptide cleavage, signal peptide, solubility, disorder, gene function structure prediction, with only 1.6 M parameters. architecture represents highly efficient solution for modeling, offering promising avenue fast analysis

Language: Английский

Deep Learning and Neural Networks: Decision-Making Implications DOI Open Access
Hamed Taherdoost

Symmetry, Journal Year: 2023, Volume and Issue: 15(9), P. 1723 - 1723

Published: Sept. 8, 2023

Deep learning techniques have found applications across diverse fields, enhancing the efficiency and effectiveness of decision-making processes. The integration these underscores significance interdisciplinary research. In particular, decisions often rely on output’s projected value or probability from neural networks, considering different values relevant output factor. This review examines impact deep systems, analyzing 25 papers published between 2017 2022. highlights improved accuracy but emphasizes need for addressing issues like interpretability, generalizability, to build reliable decision support systems. Future research directions include transparency, explainability, real-world validation, underscoring importance collaboration successful implementation.

Language: Английский

Citations

41

Expectation management in AI: A framework for understanding stakeholder trust and acceptance of artificial intelligence systems DOI Creative Commons
Marjorie Kinney, Maria Anastasiadou, Mijail Naranjo-Zolotov

et al.

Heliyon, Journal Year: 2024, Volume and Issue: 10(7), P. e28562 - e28562

Published: March 25, 2024

Language: Английский

Citations

14

Leveraging a meta-learning approach to advance the accuracy of Nav blocking peptides prediction DOI Creative Commons
Watshara Shoombuatong, Nutta Homdee, Nalini Schaduangrat

et al.

Scientific Reports, Journal Year: 2024, Volume and Issue: 14(1)

Published: Feb. 23, 2024

Abstract The voltage-gated sodium (Na v ) channel is a crucial molecular component responsible for initiating and propagating action potentials. While the α subunit, forming pore, plays central role in this function, complete physiological function of Na channels relies on interactions between subunit auxiliary proteins, known as protein–protein (PPI). blocking peptides (NaBPs) have been recognized promising alternative therapeutic agent pain itch. Although traditional experimental methods can precisely determine effect activity NaBPs, they remain time-consuming costly. Hence, machine learning (ML)-based that are capable accurately contributing silico prediction NaBPs highly desirable. In study, we develop an innovative meta-learning-based NaBP method (MetaNaBP). MetaNaBP generates new feature representations by employing wide range sequence-based descriptors cover multiple perspectives, combination with powerful ML algorithms. Then, these were optimized to identify informative features using two-step selection method. Finally, selected applied final meta-predictor. To best our knowledge, first meta-predictor prediction. Experimental results demonstrated achieved accuracy 0.948 Matthews correlation coefficient 0.898 over independent test dataset, which 5.79% 11.76% higher than existing addition, discriminative power surpassed conventional both training datasets. We anticipate will be exploited large-scale analysis narrow down potential NaBPs.

Language: Английский

Citations

12

The Role of Generative Artificial Intelligence in Digital Agri-Food DOI Creative Commons
Sakib Shahriar, Maria G. Corradini, Shayan Sharif

et al.

Journal of Agriculture and Food Research, Journal Year: 2025, Volume and Issue: unknown, P. 101787 - 101787

Published: March 1, 2025

Language: Английский

Citations

2

Robust enzyme discovery and engineering with deep learning using CataPro DOI Creative Commons
Zechen Wang, Dongqi Xie, Di Wu

et al.

Nature Communications, Journal Year: 2025, Volume and Issue: 16(1)

Published: March 20, 2025

Abstract Accurate prediction of enzyme kinetic parameters is crucial for exploration and modification. Existing models face the problem either low accuracy or poor generalization ability due to overfitting. In this work, we first developed unbiased datasets evaluate actual performance these methods proposed a deep learning model, CataPro, based on pre-trained molecular fingerprints predict turnover number ( k c t ), Michaelis constant K m catalytic efficiency / ). Compared with previous baseline models, CataPro demonstrates clearly enhanced datasets. representational mining project, by combining traditional methods, identified an (SsCSO) 19.53 times increased activity compared initial (CSO2) then successfully engineered it improve its 3.34 times. This reveals high potential as effective tool future discovery

Language: Английский

Citations

2

Using protein language models for protein interaction hot spot prediction with limited data DOI Creative Commons
Karen Sargsyan, Carmay Lim

BMC Bioinformatics, Journal Year: 2024, Volume and Issue: 25(1)

Published: March 16, 2024

Protein language models, inspired by the success of large models in deciphering human language, have emerged as powerful tools for unraveling intricate code life inscribed within protein sequences. They gained significant attention their promising applications across various areas, including sequence-based prediction secondary and tertiary structure, discovery new functional sequences/folds, assessment mutational impact on fitness. However, utility learning to predict residue properties based scant datasets, such protein-protein interaction (PPI)-hotspots whose mutations significantly impair PPIs, remained unclear. Here, we explore feasibility using language-learned representations features machine PPI-hotspots a dataset containing 414 experimentally confirmed 504 PPI-nonhot spots.

Language: Английский

Citations

8

Advancing the accuracy of tyrosinase inhibitory peptides prediction via a multiview feature fusion strategy DOI Creative Commons
Watshara Shoombuatong, Nalini Schaduangrat, Nutta Homdee

et al.

Scientific Reports, Journal Year: 2025, Volume and Issue: 15(1)

Published: Feb. 8, 2025

Language: Английский

Citations

1

Advanced machine learning framework for enhancing breast cancer diagnostics through transcriptomic profiling DOI Creative Commons

Mohamed J. Saadh,

Hanan Hassan Ahmed,

Radhwan Abdul Kareem

et al.

Discover Oncology, Journal Year: 2025, Volume and Issue: 16(1)

Published: March 17, 2025

This study proposes an advanced machine learning (ML) framework for breast cancer diagnostics by integrating transcriptomic profiling with optimized feature selection and classification techniques. A dataset of 1759 samples (987 patients, 772 healthy controls) was analyzed using Recursive Feature Elimination, Boruta, ElasticNet selection. Dimensionality reduction techniques, including Non-Negative Matrix Factorization (NMF), Autoencoders, transformer-based embeddings (BioBERT, DNABERT), were applied to enhance model interpretability. Classifiers such as XGBoost, LightGBM, ensemble voting, Multi-Layer Perceptron, Stacking trained grid search cross-validation. Model evaluation conducted accuracy, AUC, MCC, Kappa Score, ROC, PR curves, external validation performed on independent 175 samples. XGBoost LightGBM achieved the highest test accuracies (0.91 0.90) AUC values (up 0.92), particularly NMF BioBERT. The Voting method exhibited best accuracy (0.92), confirming its robustness. Transformer-based techniques significantly improved performance compared conventional approaches like PCA Decision Trees. proposed ML enhances diagnostic interpretability, demonstrating strong generalizability dataset. These findings highlight potential precision oncology personalized diagnostics.

Language: Английский

Citations

1

Open‐source large language models in action: A bioinformatics chatbot for PRIDE database DOI Creative Commons
Jingwen Bai,

Selvakumar Kamatchinathan,

Deepti J Kundu

et al.

PROTEOMICS, Journal Year: 2024, Volume and Issue: unknown

Published: March 31, 2024

ABSTRACT We here present a chatbot assistant infrastructure ( https://www.ebi.ac.uk/pride/chatbot/ ) that simplifies user interactions with the PRIDE database's documentation and dataset search functionality. The framework utilizes multiple Large Language Models (LLM): llama2, chatglm, mixtral (mistral), openhermes. It also includes web service API (Application Programming Interface), interface, components for indexing managing vector databases. An Elo‐ranking system‐based benchmark component is included in as well, which allows evaluating performance of each LLM improving documentation. not only users to interact but can be used find datasets using an LLM‐based recommendation system, enabling discoverability. Importantly, while our exemplified through its application database context, modular adaptable nature approach positions it valuable tool experiences across spectrum bioinformatics proteomics tools resources, among other domains. integration advanced LLMs, innovative vector‐based construction, benchmarking framework, optimized collectively form robust transferable infrastructure. open‐source https://github.com/PRIDE‐Archive/pride‐chatbot ).

Language: Английский

Citations

6

Bioinfo-Bench: A Simple Benchmark Framework for LLM Bioinformatics Skills Evaluation DOI Creative Commons
Qiyuan Chen, Cheng Deng

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: Oct. 21, 2023

A bstract Large Language Models (LLMs) have garnered significant recognition in the life sciences for their capacity to comprehend and utilize knowledge. The contemporary expectation diverse industries extends beyond employing LLMs merely as chatbots; instead, there is a growing emphasis on harnessing potential adept analysts proficient dissecting intricate issues within these sectors. realm of bioinformatics no exception this trend. In paper, we introduce B ioinfo -B ench , novel yet straightforward benchmark framework suite crafted assess academic knowledge data mining capabilities foundational models bioinformatics. systematically gathered from three distinct perspectives: acquisition, analysis, application, facilitating comprehensive examination LLMs. Our evaluation encompassed prominent ChatGPT, Llama, Galactica. findings revealed that excel drawing heavily upon training retention. However, proficiency addressing practical professional queries conducting nuanced inference remains constrained. Given insights, are poised delve deeper into domain, engaging further extensive research discourse. It pertinent note project currently progress, all associated materials will be made publicly accessible. 1

Language: Английский

Citations

11