iLEC-DNA: Identifying Long Extra-chromosomal Circular DNA by Fusing Sequence-derived Features of Physicochemical Properties and Nucleotide Distribution Patterns DOI Creative Commons

Ahtisham Fazeel Abbasi,

Muhammad Nabeel Asim, Andreas Dengel

и другие.

Research Square (Research Square), Год журнала: 2023, Номер unknown

Опубликована: Сен. 29, 2023

Abstract Long extrachromosomal circular DNA (leccDNA) regulates several biological processes such as genomic instability, gene amplification, and oncogenesis. The identification of leccDNA holds significant importance to investigate its potential associations with cancer, autoimmune, cardiovascular, neurological diseases. In addition, understanding these can provide valuable insights about disease mechanisms therapeutic approaches. Conventionally , wet lab-based methods are utilized identify leccDNA, which hindered by the need for prior knowledge, resource-intensive processes, potentially limiting their broader applicability. To empower process across multiple species, paper in hand presents very first computational predictor. proposed iLEC-DNA predictor makes use SVM classifier along sequence-derived nucleotide distribution patterns physico-chemical properties-based features. study introduces a set 12 benchmark datasets related three namely Homo sapiens (HM), Arabidopsis Thaliana (AT), Saccharomyces cerevisiae (SC/YS). It performs large-scale experimentation under different experimental settings using more than 140 baseline predictors. outperforms predictors diverse producing average performance values 80.699%, 61.45% 80.7% terms ACC, MCC AUC-ROC all datasets. source code is available at https://github.com/FAhtisham/Extrachrosmosomal-DNA-Prediction. facilitate scientific community, web application https://sds_genetic_analysis.opendfki.de/iLEC_DNA//.

Язык: Английский

Application of machine learning and genomics for orphan crop improvement DOI Creative Commons
Tessa R. MacNish, Monica F. Danilevicz, Philipp E. Bayer

и другие.

Nature Communications, Год журнала: 2025, Номер 16(1)

Опубликована: Янв. 24, 2025

Orphan crops are important sources of nutrition in developing regions and many tolerant to biotic abiotic stressors; however, modern crop improvement technologies have not been widely applied orphan due the lack resources available. There representatives across major types conservation genes between these related species can be used improvement. Machine learning (ML) has emerged as a promising tool for Transferring knowledge from using machine improve accuracy efficiency crops. Here, authors review transferring breeding.

Язык: Английский

Процитировано

0

Transitioning from wet lab to artificial intelligence: a systematic review of AI predictors in CRISPR DOI Creative Commons

Ahtisham Fazeel Abbasi,

Muhammad Nabeel Asim,

Andreas Dengel

и другие.

Journal of Translational Medicine, Год журнала: 2025, Номер 23(1)

Опубликована: Фев. 4, 2025

Abstract The revolutionary CRISPR-Cas9 system leverages a programmable guide RNA (gRNA) and Cas9 proteins to precisely cleave problematic regions within DNA sequences. This groundbreaking technology holds immense potential for the development of targeted therapies wide range diseases, including cancers, genetic disorders, hereditary diseases. based genome editing is multi-step process such as designing precise gRNA, selecting appropriate Cas protein, thoroughly evaluating both on-target off-target activity Cas9-gRNA complex. To ensure accuracy effectiveness system, after cleavage, requires careful analysis resultant outcomes indels deletions. Following success artificial intelligence (AI) in various fields, researchers are now leveraging AI algorithms catalyze optimize system. achieve this goal AI-driven applications being integrated into each step, but existing predictors have limited performance many steps still rely on expensive time-consuming wet-lab experiments. primary reason behind low gap between CRISPR fields. Effective integration demands comprehensive knowledge domains. paper bridges research. It offers unique platform grasp deep understanding biological foundations step process. Furthermore, it provides details 80 available system-related datasets that can be utilized develop applications. Within landscape process, insights representation learning methods, machine methods trends, values 50 predictive pipelines. In context classifiers/regressors, thorough pipelines recommendations more robust

Язык: Английский

Процитировано

0

DNA sequence analysis landscape: a comprehensive review of DNA sequence analysis task types, databases, datasets, word embedding methods, and language models DOI Creative Commons
Muhammad Nabeel Asim, Muhammad Ali Ibrahim, Alam Zaib

и другие.

Frontiers in Medicine, Год журнала: 2025, Номер 12

Опубликована: Апрель 8, 2025

Deoxyribonucleic acid (DNA) serves as fundamental genetic blueprint that governs development, functioning, growth, and reproduction of all living organisms. DNA can be altered through germline somatic mutations. Germline mutations underlie hereditary conditions, while induced by various factors including environmental influences, chemicals, lifestyle choices, errors in replication repair mechanisms which lead to cancer. sequence analysis plays a pivotal role uncovering the intricate information embedded within an organism's understanding modify it. This helps early detection diseases design targeted therapies. Traditional wet-lab experimental traditional methods is costly, time-consuming, prone errors. To accelerate large-scale analysis, researchers are developing AI applications complement methods. These approaches help generate hypotheses, prioritize experiments, interpret results identifying patterns large genomic datasets. Effective integration with validation requires scientists understand both fields. Considering need comprehensive literature bridges gap between fields, contributions this paper manifold: It presents diverse range tasks methodologies. equips essential biological knowledge 44 distinct aligns these 3 AI-paradigms, namely, classification, regression, clustering. streamlines into consolidating 36 databases used develop benchmark datasets for different tasks. ensure performance comparisons new existing predictors, it provides insights 140 related word embeddings language models across development predictors providing survey 39 67 based predictive pipeline values well top performing encoding-based their performances

Язык: Английский

Процитировано

0

Long extrachromosomal circular DNA identification by fusing sequence-derived features of physicochemical properties and nucleotide distribution patterns DOI Creative Commons

Ahtisham Fazeel Abbasi,

Muhammad Nabeel Asim, Sheraz Ahmed

и другие.

Scientific Reports, Год журнала: 2024, Номер 14(1)

Опубликована: Апрель 24, 2024

Abstract Long extrachromosomal circular DNA (leccDNA) regulates several biological processes such as genomic instability, gene amplification, and oncogenesis. The identification of leccDNA holds significant importance to investigate its potential associations with cancer, autoimmune, cardiovascular, neurological diseases. In addition, understanding these can provide valuable insights about disease mechanisms therapeutic approaches. Conventionally, wet lab-based methods are utilized identify leccDNA, which hindered by the need for prior knowledge, resource-intensive processes, potentially limiting their broader applicability. To empower process across multiple species, paper in hand presents very first computational predictor. proposed iLEC-DNA predictor makes use SVM classifier along sequence-derived nucleotide distribution patterns physicochemical properties-based features. study introduces a set 12 benchmark datasets related three namely Homo sapiens (HM), Arabidopsis Thaliana (AT), Saccharomyces cerevisiae (SC/YS). It performs large-scale experimentation under different experimental settings using predictor, more than 140 baseline predictors, 858 encoder ensembles. outperforms predictors ensembles diverse producing average performance values 81.09%, 62.2% 81.08% terms ACC, MCC AUC-ROC all datasets. source code is available at https://github.com/FAhtisham/Extrachrosmosomal-DNA-Prediction . facilitate scientific community, web application https://sds_genetic_analysis.opendfki.de/iLEC_DNA/.

Язык: Английский

Процитировано

3

BERT-5mC: an interpretable model for predicting 5-methylcytosine sites of DNA based on BERT DOI Creative Commons
Shuyu Wang, Yinbo Liu, Yufeng Liu

и другие.

PeerJ, Год журнала: 2023, Номер 11, С. e16600 - e16600

Опубликована: Дек. 8, 2023

DNA 5-methylcytosine (5mC) is widely present in multicellular eukaryotes, which plays important roles various developmental and physiological processes a wide range of human diseases. Thus, it essential to accurately detect the 5mC sites. Although current sequencing technologies can map genome-wide sites, these experimental methods are both costly time-consuming. To achieve fast accurate prediction we propose new computational approach, BERT-5mC. First, pre-trained domain-specific BERT (bidirectional encoder representations from transformers) model by using promoter sequences as language corpus . deep two-way representation based on Transformer. Second, fine-tuned training dataset build model. The cross-validation results show that our achieves an AUROC 0.966 higher than other state-of-the-art such iPromoter-5mC, 5mC_Pred, BiLSTM-5mC. Furthermore, was evaluated independent test set, shows also methods. Moreover, analyzed attention weights generated identify number nucleotide distributions closely associated with modifications. facilitate use model, built webserver be freely accessed at: http://5mc-pred.zhulab.org.cn

Язык: Английский

Процитировано

6

Pretraining strategies for effective promoter-driven gene expression prediction DOI Creative Commons
Aniketh Janardhan Reddy, Michael H. Herschl, Xinyang Geng

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2023, Номер unknown

Опубликована: Фев. 27, 2023

The ability to deliver genetic cargo human cells is enabling rapid progress in molecular medicine, but designing this for precise expression specific cell types a major challenge. Expression driven by regulatory DNA sequences within short synthetic promoters, relatively few of these promoters are cell-type-specific. design cell-type-specific using model-based optimization would be impactful research and therapeutic applications. However, models from (promoter-driven expression) lacking most due insufficient training data those types. Although there many large datasets both endogenous promoter-driven other types, which provide information that could used transfer learning, strategies remain largely unexplored predicting expression. Here, we propose variety pretraining tasks, strategies, model architectures modelling To thoroughly evaluate various methods, two benchmarks reflect data-constrained dataset settings. In the setting, find followed learning highly effective, improving performance 24 − 27%. leads more modest gains, up 2%. We also best architecture when scratch. methods identify broadly applicable understudied our findings will guide choice suited gene delivery applications optimization. Our code available at https://github.com/anikethjr/promoter_models .

Язык: Английский

Процитировано

4

STM-ac4C: a hybrid model for identification of N4-acetylcytidine (ac4C) in human mRNA based on selective kernel convolution, temporal convolutional network, and multi-head self-attention DOI Creative Commons

Mengyue Yi,

Fenglin Zhou,

Yu Deng

и другие.

Frontiers in Genetics, Год журнала: 2024, Номер 15

Опубликована: Май 30, 2024

N4-acetylcysteine (ac4C) is a chemical modification in mRNAs that alters the structure and function of mRNA by adding an acetyl group to N4 position cytosine. Researchers have shown ac4C closely associated with occurrence development various cancers. Therefore, accurate prediction sites on human crucial for revealing its role diseases developing new diagnostic therapeutic strategies. However, existing deep learning models still limitations accuracy generalization ability, which restrict their effectiveness handling complex biological sequence data. This paper introduces learning-based model, STM-ac4C, predicting mRNA. The model combines advantages selective kernel convolution, temporal convolutional networks, multi-head self-attention mechanisms effectively extract integrate multi-level features RNA sequences, thereby achieving high-precision sites. On independent test dataset, STM-ac4C showed improvements 1.81%, 3.5%, 0.37% accuracy, Matthews correlation coefficient, area under curve, respectively, compared state-of-the-art technologies. Moreover, performance additional balanced imbalanced datasets also confirmed model's robustness ability. Various experimental results indicate outperforms methods predictive performance. In summary, excels mRNA, providing powerful tool deeper understanding significance modifications cancer treatment. Additionally, reveals key influence through region impact analysis, offering perspectives future research. source code data are available at https://github.com/ymy12341/STM-ac4C.

Язык: Английский

Процитировано

1

An Integrated Multi-Model Framework Utilizing Convolutional Neural Networks Coupled with Feature Extraction for Identification of 4mC Sites in DNA Sequences DOI
Muhammad Tahir, Shahid Hussain, Fawaz Khaled Alarfaj

и другие.

Computers in Biology and Medicine, Год журнала: 2024, Номер 183, С. 109281 - 109281

Опубликована: Окт. 30, 2024

Язык: Английский

Процитировано

0

4mCPred-GSIMP: Predicting DNA N4-methylcytosine sites in the mouse genome with multi-Scale adaptive features extraction and fusion DOI Creative Commons
Jianhua Jia, Yu Deng,

Mengyue Yi

и другие.

Mathematical Biosciences & Engineering, Год журнала: 2023, Номер 21(1), С. 253 - 271

Опубликована: Янв. 1, 2023

<abstract> <p>The epigenetic modification of DNA N4-methylcytosine (4mC) is vital for controlling replication and expression. It crucial to pinpoint 4mC's location comprehend its role in physiological pathological processes. However, accurate 4mC detection difficult achieve due technical constraints. In this paper, we propose a deep learning-based approach 4mCPred-GSIMP predicting sites the mouse genome. The encodes sequences using four feature encoding methods combines multi-scale convolution improved selective kernel adaptively extract fuse features from different scales, thereby improving representation optimization effect. addition, also use convolutional residual connections, global response normalization pointwise techniques optimize model. On independent test dataset, shows high sensitivity, specificity, accuracy, Matthews correlation coefficient area under curve, which are 0.7812, 0.9312, 0.8562, 0.7207 0.9233, respectively. Various experiments demonstrate that outperforms existing prediction tools.</p> </abstract>

Язык: Английский

Процитировано

1

iLEC-DNA: Identifying Long Extra-chromosomal Circular DNA by Fusing Sequence-derived Features of Physicochemical Properties and Nucleotide Distribution Patterns DOI Creative Commons

Ahtisham Fazeel Abbasi,

Muhammad Nabeel Asim, Andreas Dengel

и другие.

Research Square (Research Square), Год журнала: 2023, Номер unknown

Опубликована: Сен. 29, 2023

Abstract Long extrachromosomal circular DNA (leccDNA) regulates several biological processes such as genomic instability, gene amplification, and oncogenesis. The identification of leccDNA holds significant importance to investigate its potential associations with cancer, autoimmune, cardiovascular, neurological diseases. In addition, understanding these can provide valuable insights about disease mechanisms therapeutic approaches. Conventionally , wet lab-based methods are utilized identify leccDNA, which hindered by the need for prior knowledge, resource-intensive processes, potentially limiting their broader applicability. To empower process across multiple species, paper in hand presents very first computational predictor. proposed iLEC-DNA predictor makes use SVM classifier along sequence-derived nucleotide distribution patterns physico-chemical properties-based features. study introduces a set 12 benchmark datasets related three namely Homo sapiens (HM), Arabidopsis Thaliana (AT), Saccharomyces cerevisiae (SC/YS). It performs large-scale experimentation under different experimental settings using more than 140 baseline predictors. outperforms predictors diverse producing average performance values 80.699%, 61.45% 80.7% terms ACC, MCC AUC-ROC all datasets. source code is available at https://github.com/FAhtisham/Extrachrosmosomal-DNA-Prediction. facilitate scientific community, web application https://sds_genetic_analysis.opendfki.de/iLEC_DNA//.

Язык: Английский

Процитировано

0