MuLan-Methyl—multiple transformer-based language models for accurate DNA methylation prediction DOI Creative Commons
Wenhuan Zeng, Anupam Gautam, Daniel H. Huson

и другие.

GigaScience, Год журнала: 2022, Номер 12

Опубликована: Дек. 28, 2022

Abstract Transformer-based language models are successfully used to address massive text-related tasks. DNA methylation is an important epigenetic mechanism, and its analysis provides valuable insights into gene regulation biomarker identification. Several deep learning–based methods have been proposed identify methylation, each seeks strike a balance between computational effort accuracy. Here, we introduce MuLan-Methyl, learning framework for predicting sites, which based on 5 popular transformer-based models. The identifies sites 3 different types of methylation: N6-adenine, N4-cytosine, 5-hydroxymethylcytosine. Each the employed adapted task using “pretrain fine-tune” paradigm. Pretraining performed custom corpus fragments taxonomy lineages self-supervised learning. Fine-tuning aims at status type. collectively predict status. We report excellent performance MuLan-Methyl benchmark dataset. Moreover, argue that model captures characteristic differences species relevant methylation. This work demonstrates can be applications in biological sequence joint utilization improves performance. Mulan-Methyl open source, provide web server implements approach.

Язык: Английский

Cancer Drug Sensitivity Prediction Based on Deep Transfer Learning DOI Open Access
Weilin Meng, Xinyu Xu,

Zhichao Xiao

и другие.

International Journal of Molecular Sciences, Год журнала: 2025, Номер 26(6), С. 2468 - 2468

Опубликована: Март 10, 2025

In recent years, many approved drugs have been discovered using phenotypic screening, which elaborates the exact mechanisms of action or molecular targets drugs. Drug susceptibility prediction is an important type screening. Large-scale pharmacogenomics studies provided us with large amounts drug sensitivity data. By analyzing these data computational methods, we can effectively build models to predict susceptibility. However, due differences in distribution among databases, researchers cannot directly utilize from multiple sources. this study, propose a deep transfer learning model. We integrate genomic characterization cancer cell lines chemical information on compounds, combined Encyclopedia Cancer Cell Lines (CCLE) and Genomics Sensitivity (GDSC) datasets, through domain-adapted approach half-maximal inhibitory concentrations (IC50 values). Afterward, validity results our model verified. This study addresses challenge cross-database discrepancies by integrating multi-source heterogeneous constructing serves as reliable tool for precision development. Its widespread application facilitate optimization therapeutic strategies personalized medicine while also providing technical support high-throughput screening discovery new targets.

Язык: Английский

Процитировано

0

A Review on the Applications of Transformer-based language models for Nucleotide Sequence Analysis DOI Creative Commons
Nimisha Ghosh, Daniele Santoni, Indrajit Saha

и другие.

Computational and Structural Biotechnology Journal, Год журнала: 2025, Номер unknown

Опубликована: Март 1, 2025

Язык: Английский

Процитировано

0

Methyl-GP: accurate generic DNA methylation prediction based on a language model and representation learning DOI Creative Commons
Hao Xie,

Leyao Wang,

Yuqing Qian

и другие.

Nucleic Acids Research, Год журнала: 2025, Номер 53(6)

Опубликована: Март 20, 2025

Abstract Accurate prediction of DNA methylation remains a challenge. Identifying is important for understanding its functions and elucidating role in gene regulation mechanisms. In this study, we propose Methyl-GP, general predictor that accurately predicts three types from sequences. We found the conservation sequence patterns among different species contributes to enhancing generalizability model. By fine-tuning language model on dataset comprising multiple with similar employing fusion module integrate embeddings into high-quality comprehensive representation, Methyl-GP demonstrates satisfactory predictive performance identification. Experiments 17 benchmark datasets (4mC, 5hmC, 6mA) demonstrate superiority over existing predictors. Furthermore, by utilizing attention mechanism, have visualized learned model, which may help us gain deeper across various species.

Язык: Английский

Процитировано

0

Artificial intelligence-driven plant bio-genomics research: a new era DOI
Yang Lin, Hao Wang, Meiling Zou

и другие.

Tropical Plants, Год журнала: 2025, Номер 4(1), С. 0 - 0

Опубликована: Янв. 1, 2025

Язык: Английский

Процитировано

0

MuLan-Methyl—multiple transformer-based language models for accurate DNA methylation prediction DOI Creative Commons
Wenhuan Zeng, Anupam Gautam, Daniel H. Huson

и другие.

GigaScience, Год журнала: 2022, Номер 12

Опубликована: Дек. 28, 2022

Abstract Transformer-based language models are successfully used to address massive text-related tasks. DNA methylation is an important epigenetic mechanism, and its analysis provides valuable insights into gene regulation biomarker identification. Several deep learning–based methods have been proposed identify methylation, each seeks strike a balance between computational effort accuracy. Here, we introduce MuLan-Methyl, learning framework for predicting sites, which based on 5 popular transformer-based models. The identifies sites 3 different types of methylation: N6-adenine, N4-cytosine, 5-hydroxymethylcytosine. Each the employed adapted task using “pretrain fine-tune” paradigm. Pretraining performed custom corpus fragments taxonomy lineages self-supervised learning. Fine-tuning aims at status type. collectively predict status. We report excellent performance MuLan-Methyl benchmark dataset. Moreover, argue that model captures characteristic differences species relevant methylation. This work demonstrates can be applications in biological sequence joint utilization improves performance. Mulan-Methyl open source, provide web server implements approach.

Язык: Английский

Процитировано

19