Data augmentation with generative models improves detection of Non-B DNA structures DOI
Oleksandr Cherednichenko, Maria Poptsova

Computers in Biology and Medicine, Год журнала: 2024, Номер 184, С. 109440 - 109440

Опубликована: Ноя. 16, 2024

Язык: Английский

Benchmarking DNA large language models on quadruplexes DOI Creative Commons
Oleksandr Cherednichenko, Alan Herbert, Maria Poptsova

и другие.

Computational and Structural Biotechnology Journal, Год журнала: 2025, Номер 27, С. 992 - 1000

Опубликована: Янв. 1, 2025

Large language models (LLMs) in genomics have successfully predicted various functional genomic elements. While their performance is typically evaluated using benchmark datasets, it remains unclear which LLM best suited for specific downstream tasks, particularly generating whole-genome annotations. Current LLMs fall into three main categories: transformer-based models, long convolution-based and state-space (SSMs). In this study, we benchmarked different types of architectures maps G-quadruplexes (GQ), a type flipons, or non-B DNA structures, characterized by distinctive patterns roles diverse regulatory contexts. Although GQ forms from folding guanosine residues tetrads, the computational task challenging as bases involved may be on strands, separated large number nucleotides, made RNA rather than DNA. All performed comparably well, with DNABERT-2 HyenaDNA achieving superior results based F1 MCC. Analysis annotations revealed that recovered more quadruplexes distal enhancers intronic regions. The were better to detecting arrays likely contribute nuclear condensates gene transcription chromosomal scaffolds. Caduceus formed separate grouping generated de novo quadruplexes, while clustered together. Overall, our findings suggest complement each other. Genomic varying context lengths can detect distinct elements, underscoring importance selecting appropriate model task. code data underlying article are available at https://github.com/powidla/G4s-FMs.

Язык: Английский

Процитировано

0

Data augmentation with generative models improves detection of Non-B DNA structures DOI
Oleksandr Cherednichenko, Maria Poptsova

Computers in Biology and Medicine, Год журнала: 2024, Номер 184, С. 109440 - 109440

Опубликована: Ноя. 16, 2024

Язык: Английский

Процитировано

1