GRAViTy-V2: a grounded viral taxonomy application DOI Creative Commons
Richard Mayne, Pakorn Aiewsakun, Dann Turner

и другие.

NAR Genomics and Bioinformatics, Год журнала: 2024, Номер 6(4)

Опубликована: Сен. 28, 2024

Abstract Taxonomic classification of viruses is essential for understanding their evolution. Genomic at higher taxonomic ranks, such as order or phylum, typically based on alignment and comparison amino acid sequence motifs in conserved genes. Classification lower genus species, usually nucleotide identities between genomic sequences. Building our whole-genome analytical framework, we here describe Genome Relationships Applied to Viral Taxonomy Version 2 (GRAViTy-V2), which encompasses a greatly expanded range features numerous optimisations, packaged an application that may be used general-purpose virus tool. Using 28 datasets derived from the ICTV 2022 taxonomy proposals, GRAViTy-V2 output was compared against human expert-curated classifications assignments 2023 round changes. produced taxonomies equivalent manually-curated versions down family level almost all cases, species levels. The majority discrepant results arose errors coding annotations INDSC records, inclusion incomplete genome sequences analysis. Analysis times ranged 1-506 min (median 3.59) with 17-1004 genomes mean length 3000–1 000 bases.

Язык: Английский

ProbML: A Machine Learning‐Based Genome Classifier for Identifying Probiotic Organisms DOI Open Access

Arjun Orkkatteri Krishnan,

Lalit Narayan Mudgal,

Vivek Kumar Soni

и другие.

Molecular Nutrition & Food Research, Год журнала: 2025, Номер unknown

Опубликована: Март 26, 2025

Probiotics are microorganisms that offer health benefits to the host. Traditional methods for identifying these organisms time-consuming and resource-intensive. This study addresses need a more efficient accurate approach probiotic identification using machine learning (ML) techniques. The present introduces ProbML, an ML-based from whole genome sequences of prokaryotes. Among five ML algorithms tested, XGBoost models demonstrated superior performance, achieving maximum accuracy 100% on data 95.45% independent test dataset. surpasses existing tools, which achieved 97.77% 66.28% same datasets, respectively. ProbML were used analyze 4728 genomes in Unified Human Gastrointestinal Genome database, classifying 650 as probiotics, with many previously unreported. A versatile GUI platform was also developed employs classification or can be generate custom classifiers based user-specific needs (https://github.com/sysbio-iitmandi/MLG_Dashboard). emphasizes power genomic advanced techniques accelerating discovery.

Язык: Английский

Процитировано

2

Harnessing AI-Powered Genomic Research for Sustainable Crop Improvement DOI Creative Commons
Elżbieta Wójcik‐Gront, Bartłomiej Zieniuk, Magdalena Pawełkowicz

и другие.

Agriculture, Год журнала: 2024, Номер 14(12), С. 2299 - 2299

Опубликована: Дек. 14, 2024

Artificial intelligence (AI) can revolutionize agriculture by enhancing genomic research and promoting sustainable crop improvement. AI systems integrate machine learning (ML) deep (DL) with big data to identify complex patterns relationships analyzing vast genomic, phenotypic, environmental datasets. This capability accelerates breeding cycles, improves predictive accuracy, supports the development of climate-resilient, high-yielding varieties. Applications such as precision agriculture, automated phenotyping, analytics, early pest disease detection demonstrate AI’s ability optimize agricultural practices while sustainability. Despite these advancements, challenges remain, including fragmented sources, variability in phenotyping protocols, ownership concerns. Addressing issues through standardized integration frameworks, advanced analytical tools, ethical will be critical for realizing full potential. review provides a comprehensive overview AI-powered research, highlights role training robust models, explores technological considerations practices.

Язык: Английский

Процитировано

3

MAFin: Motif Detection in Multiple Alignment Files DOI Creative Commons

Michail Patsakis,

Kimonas Provatas,

Fotis A. Baltoumas

и другие.

Bioinformatics, Год журнала: 2025, Номер unknown

Опубликована: Март 19, 2025

Whole Genome and Proteome Alignments, represented by the Multiple Alignment File (MAF) format, have become a standard approach in comparative genomics proteomics. These often require identifying conserved motifs, which is crucial for understanding functional evolutionary relationships. However, current approaches lack direct method motif detection within MAF files. We present MAFin, novel tool that enables efficient conservation analysis files to address this gap, streamlining genomic proteomic research. developed first Format MAFin multithreaded search of motifs using three approaches: 1) user-specified k-mers sequences. 2) with regular expressions, case one or more patterns are searched, 3) predefined Position Weight Matrices. Once has been found, detects instances calculates across aligned also percentage, provides information about levels each sequences, based on number matches relative length motif. A set statistics interpretation motif's level, detected exported JSON CSV downstream analyses. offered as Python package under GPL license multi-platform application available at: https://github.com/Georgakopoulos-Soares-lab/MAFin. Supplementary data at Bioinformatics online.

Язык: Английский

Процитировано

0

MAFcounter: an efficient tool for counting the occurrences of k-mers in MAF files DOI Creative Commons

Michail Patsakis,

Kimonas Provatas,

Aris Karatzikos

и другие.

BMC Bioinformatics, Год журнала: 2025, Номер 26(1)

Опубликована: Май 30, 2025

Язык: Английский

Процитировано

0

Unraveling diversity by isolating peptide sequences specific to distinct taxonomic groups DOI Creative Commons
Eleftherios Bochalis,

Michail Patsakis,

Nikol Chantzi

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2025, Номер unknown

Опубликована: Фев. 8, 2025

Abstract The identification of succinct, universal fingerprints that enable the characterization individual taxonomies can reveal insights into trait development and have widespread applications in pathogen diagnostics, human healthcare, ecology biomes. Here, we investigated existence peptide k-mer sequences are exclusively present a specific taxonomy absent every other taxonomic level, termed quasi-primes. By analyzing proteomes across 24,073 species, identified quasi-prime peptides to superkingdoms, kingdoms, phyla, uncovering their distributions functional relevance. These exhibit remarkable sequence uniqueness at six- seven-amino- acid lengths, offering evolutionary divergence lineage-specific adaptations. Moreover, show loci more prone harboring pathogenic variants, underscoring significance. This study introduces quasi-primes offers contributions proteomic diversity, pathways, adaptations tree life, while emphasizing potential impact on health disease.

Язык: Английский

Процитировано

0

Cellular Activity of CQWW Nullomer-Derived Peptides DOI Creative Commons
Steven Shave, Rebecka Isaksson, Nhan T. Pham

и другие.

ACS Omega, Год журнала: 2025, Номер 10(7), С. 6794 - 6800

Опубликована: Фев. 11, 2025

Analysis of observed protein sequences across all species within the UniProtKB/Swiss-Prot data set reveals CQWW as shortest absent stretch amino acids. While DNA can be found encoding sequence, it has never been to translated or included in manually curated sets proteins, existing only predicted, tentative and a single mature antibody sequence. We have synthesized this "nullomer" peptide, along with 13 derivatives, reversed, truncated, stereoisomers, alanine-scanning peptides, conjugated polyarginine stretches increase cellular uptake. their impact against healthy neuronal line six patient-derived glioblastoma cell lines spanning three clinical subtypes. Results reveal IC50 values averaging 4.9 μM for inhibition survival tested oncogenic lines. High-content phenotypic analysis features reverse-phase arrays failed discern clear mode action nullomer peptide but suggests mitochondrial impairment through GSK3 isoforms, supported by observations reduced stain intensities. With recent interest we see results study starting point further investigation into potentially therapeutic class.

Язык: Английский

Процитировано

0

A distribution-guided Mapper algorithm DOI Creative Commons

Yuyang Tao,

Shufei Ge

BMC Bioinformatics, Год журнала: 2025, Номер 26(1)

Опубликована: Март 5, 2025

The Mapper algorithm is an essential tool for exploring the data shape in topological analysis. With a dataset as input, outputs graph representing features of whole dataset. This often regarded approximation Reeb classic uses fixed interval lengths and overlapping ratios, which might fail to reveal subtle dataset, especially when underlying structure complex. In this work, we introduce distribution-guided named D-Mapper, utilizes property probability model intrinsic characteristics generate density-guided covers provide enhanced features. Moreover, metric accounting both quality overlap clustering extended persistent homology measure performance Mapper-type algorithms. Our numerical experiments indicate that D-Mapper outperforms various scenarios. We also apply SARS-COV-2 coronavirus RNA sequence explore different virus variants. results can vertical horizontal evolutionary processes viruses. code available at https://github.com/ShufeiGe/D-Mapper . from based on model. work demonstrates power fusing probabilistic models with

Язык: Английский

Процитировано

0

Inter-view contrastive learning and miRNA fusion for lncRNA-protein interaction prediction in heterogeneous graphs DOI Creative Commons
Yijun Mao,

Jiale Wu,

Jian Weng

и другие.

Briefings in Bioinformatics, Год журнала: 2025, Номер 26(2)

Опубликована: Март 1, 2025

Abstract Predicting long non-coding RNA (lncRNA)-protein interactions is essential for understanding biological processes and discovering new therapeutic targets. In this study, we propose a novel model based on inter-view contrastive learning miRNA fusion lncRNA-protein interaction (LPI) prediction, called ICMF-LPI, which utilizes heterogeneous information network to enhance LPI prediction. The integrates as mediator, constructing an lncRNA-miRNA-protein network, employs metapath extract diverse relationships from graphs. By fusing miRNA-related leveraging across inter-views, ICMF-LPI effectively captures potential interactions. Experimental results, including five-fold cross-validation, demonstrate the model’s superior performance compared several state-of-the-art methods, with significant improvements in area under receiver operating characteristic curve precision-recall metrics. Notably, even when direct connections are excluded, still achieves competitive predictive accuracy, performing comparably or better than some existing models. This demonstrates that proposed effective scenarios where data unavailable. approach offers promising direction developing models bioinformatics, particularly challenging conditions.

Язык: Английский

Процитировано

0

Establishing genome sequencing and assembly for non-model and emerging model organisms: a brief guide DOI Creative Commons
Tilman Schell, Carola Greve, Lars Podsiadłowski

и другие.

Frontiers in Zoology, Год журнала: 2025, Номер 22(1)

Опубликована: Апрель 17, 2025

Abstract Reference genome assemblies are the basis for comprehensive genomic analyses and comparisons. Due to declining sequencing costs growing computational power, projects now feasible in smaller labs. De novo non-model or emerging model organisms requires knowledge about size techniques extracting high molecular weight DNA. Next quality, amount of DNA obtained from single individuals is crucial, especially, when dealing with small organisms. While long-read technologies methods choice creating quality assemblies, pure short-read might bear most coding parts a but usually much more fragmented do not well resolve repeat elements structural variants. Several initiatives produce organism genomes provide rules standards assembly. However, sometimes part such an initiative does meet its standards. Therefore, if scientific question can be answered low contiguity intergenic parts, missing chromosome scale assembly should prevent publication. This review describes how set up animal project lab, estimate resources, deal suboptimal conditions. Thus, we aim suggest optimal strategies that fulfil needs according specific research questions, e.g. “How species related each other based on whole genomes?” (phylogenomics), populations within differ?” (population genomics), “Are differences between relevant conservation?” (conservation “Which selection pressure acting certain genes?” (identification genes under selection), “Did repeats expand contract recently?” (repeat dynamics).

Язык: Английский

Процитировано

0

The architecture of the genome integrates scale independence with inverse symmetry DOI

Greg Warr,

Les Hatton

Academia molecular biology and genomics., Год журнала: 2025, Номер 2(2)

Опубликована: Апрель 18, 2025

The simplest building blocks of the genome, k-mers, show two properties that are widely observed. Their frequency distribution is scale-free (a variant Zipfian distribution), and inverse symmetry k-mers observable on same strand. These phenomena linked; Watson–Crick base pairing generates (IS) under condition present both strands genome. A stable equilibrium k-mer in all genomes predicted by a purely probabilistic theory, Conservation Hartley–Shannon Information (CoHSI). This does not replace diverse mechanism-based explanations IS have been advanced, but principle, it aggregates operative mechanisms. CoHSI predicts follows from should decay gradually stochastically as genome size decreases length increases. predictions were tested 178 domains life viruses. precision decayed progressively decreased increased, regardless structure genome; DNA or RNA, nuclear plastid, double- single-stranded. No clear partition into IS-compliant non-compliant could be inferred. results suggest distributions linked emerge probabilistically mechanism-agnostic manner across three

Язык: Английский

Процитировано

0