DNA sequence analysis landscape: a comprehensive review of DNA sequence analysis task types, databases, datasets, word embedding methods, and language models DOI Creative Commons
Muhammad Nabeel Asim, Muhammad Ali Ibrahim, Alam Zaib

и другие.

Frontiers in Medicine, Год журнала: 2025, Номер 12

Опубликована: Апрель 8, 2025

Deoxyribonucleic acid (DNA) serves as fundamental genetic blueprint that governs development, functioning, growth, and reproduction of all living organisms. DNA can be altered through germline somatic mutations. Germline mutations underlie hereditary conditions, while induced by various factors including environmental influences, chemicals, lifestyle choices, errors in replication repair mechanisms which lead to cancer. sequence analysis plays a pivotal role uncovering the intricate information embedded within an organism's understanding modify it. This helps early detection diseases design targeted therapies. Traditional wet-lab experimental traditional methods is costly, time-consuming, prone errors. To accelerate large-scale analysis, researchers are developing AI applications complement methods. These approaches help generate hypotheses, prioritize experiments, interpret results identifying patterns large genomic datasets. Effective integration with validation requires scientists understand both fields. Considering need comprehensive literature bridges gap between fields, contributions this paper manifold: It presents diverse range tasks methodologies. equips essential biological knowledge 44 distinct aligns these 3 AI-paradigms, namely, classification, regression, clustering. streamlines into consolidating 36 databases used develop benchmark datasets for different tasks. ensure performance comparisons new existing predictors, it provides insights 140 related word embeddings language models across development predictors providing survey 39 67 based predictive pipeline values well top performing encoding-based their performances

Язык: Английский

DNA sequence analysis landscape: a comprehensive review of DNA sequence analysis task types, databases, datasets, word embedding methods, and language models DOI Creative Commons
Muhammad Nabeel Asim, Muhammad Ali Ibrahim, Alam Zaib

и другие.

Frontiers in Medicine, Год журнала: 2025, Номер 12

Опубликована: Апрель 8, 2025

Deoxyribonucleic acid (DNA) serves as fundamental genetic blueprint that governs development, functioning, growth, and reproduction of all living organisms. DNA can be altered through germline somatic mutations. Germline mutations underlie hereditary conditions, while induced by various factors including environmental influences, chemicals, lifestyle choices, errors in replication repair mechanisms which lead to cancer. sequence analysis plays a pivotal role uncovering the intricate information embedded within an organism's understanding modify it. This helps early detection diseases design targeted therapies. Traditional wet-lab experimental traditional methods is costly, time-consuming, prone errors. To accelerate large-scale analysis, researchers are developing AI applications complement methods. These approaches help generate hypotheses, prioritize experiments, interpret results identifying patterns large genomic datasets. Effective integration with validation requires scientists understand both fields. Considering need comprehensive literature bridges gap between fields, contributions this paper manifold: It presents diverse range tasks methodologies. equips essential biological knowledge 44 distinct aligns these 3 AI-paradigms, namely, classification, regression, clustering. streamlines into consolidating 36 databases used develop benchmark datasets for different tasks. ensure performance comparisons new existing predictors, it provides insights 140 related word embeddings language models across development predictors providing survey 39 67 based predictive pipeline values well top performing encoding-based their performances

Язык: Английский

Процитировано

1