Genotype-to-Phenotype Associations with Frequented Region Variants DOI
Indika Kahanda,

Buwani Manuweera,

Brendan Mumey

et al.

2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Journal Year: 2023, Volume and Issue: 12, P. 114 - 119

Published: Dec. 5, 2023

A pangenome represents the entire sequence content and variation of a population. As collections complete reference quality genomes become more common, so does prevalence pangenomes, necessitating need for scalable computational methods their analysis. Previously, we developed FindFRs identifying Frequented Regions in graphs, where Region is subgraph that frequently traversed by multiple sequences. In this work, propose FindFRs3, which an updated version capable with improved runtime memory efficiency, enabling analysis much larger graphs. addition, FindFRs3 identifies Variants (the unique subpaths through each region). We demonstrate utility these variants using them as input features machine learning models can predict genotype-to-phenotype associations large yeast pangenome. Biological insights gained from show novel technique allows nuanced detailed pangenomes.

Language: Английский

PanKmer: k-mer-based and reference-free pangenome analysis DOI Creative Commons
Anthony Aylward,

Semar Petrus,

Allen Mamerto

et al.

Bioinformatics, Journal Year: 2023, Volume and Issue: 39(10)

Published: Oct. 1, 2023

Abstract Summary Pangenomes are replacing single reference genomes as the definitive representation of DNA sequence within a species or clade. Pangenome analysis predominantly leverages graph-based methods that require computationally intensive multiple genome alignments, do not scale to highly complex eukaryotic genomes, limit their scope identifying structural variants (SVs), incur bias by relying on genome. Here, we present PanKmer, toolkit designed for reference-free pangenome datasets consisting dozens thousands individual genomes. PanKmer decomposes set input into table observed k-mers and presence–absence values in each These stored an efficient k-mer index data format encodes SNPs, INDELs, SVs. It also includes functions downstream index, such calculating similarity statistics between individuals at whole-genome local scales. For example, can be “anchored” any quantify variability conservation specific locus. This facilitates workflows with various biological applications, e.g. cases hybridization plant species. provides researchers valuable convenient means explore full genetic variation population, without bias. Availability implementation is implemented Python package components written Rust, released under BSD license. The source code available from Package Index (PyPI) https://pypi.org/project/pankmer/ well Gitlab https://gitlab.com/salk-tm/pankmer. Full documentation https://salk-tm.gitlab.io/pankmer/.

Language: Английский

Citations

12

Genome Survey of Sphallerocarpus gracilis Based on High-throughput Sequencing DOI Creative Commons
Shiming Qi, Chunmei Zhang,

Fang Yan

et al.

Research Square (Research Square), Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 24, 2025

Abstract Sphallerocarpus gracilis is a high-value medicinal and green health food product. The analysis of the genomic characteristic information S. can lay theoretical foundation for whole genome sequencing molecular mechanism research biosynthesis bioactive active ingredients. In this study, survey technology was employed to evaluate characteristics using K-mer analysis, smudgeplot used its chromosome ploidy. results showed that size sample approximately 1,071 Mb, corrected 1,063 Mb. heterozygosity rate, proportion repeat sequences, GC content were determined 1.22%, 76.33%, 35.70%, respectively. Based on maximum possible ploidy analyzed species AB type, corresponding diploid plant. Blast revealed have close relative relationship with Daucus carota (4.78%). summary, indicate S.gracilis complex large high repetition genome. This study provides basis future related research.

Language: Английский

Citations

0

Inferring Staphylococcus aureus host species and cross-species transmission from a genome-based model DOI Creative Commons

Wenyin Du,

Sitong Chen, Rong Jiang

et al.

BMC Genomics, Journal Year: 2025, Volume and Issue: 26(1)

Published: Feb. 17, 2025

Language: Английский

Citations

0

Pato: prediction of probiotic bacteria using metabolic features DOI
Rafaella Sinnott Dias,

Daniela Peres Martinez,

Fábio Pereira Leivas Leite

et al.

Brazilian Journal of Microbiology, Journal Year: 2025, Volume and Issue: unknown

Published: April 23, 2025

Language: Английский

Citations

0

SNP-Based and Kmer-Based eQTL Analysis Using Transcriptome Data DOI Creative Commons
Mei Ge, Chengyu Li, Zhiyan Zhang

et al.

Animals, Journal Year: 2024, Volume and Issue: 14(20), P. 2941 - 2941

Published: Oct. 11, 2024

Traditional expression quantitative trait locus (eQTL) mapping associates single nucleotide polymorphisms (SNPs) with gene expression, where the SNPs are derived from large-scale whole-genome sequencing (WGS) data or transcriptome data. While WGS provides a high SNP density, it also incurs substantial costs. In contrast, RNA-seq data, which more accessible and less expensive, can simultaneously yield expressions SNPs. Thus, eQTL analysis based on offers significant potential applications. Two primary strategies were employed for in this study. The first involved analyzing levels relation to variant sites detected between populations second approach utilized kmers, sequences of length k reads, represent associated these kmer genotypes expression. We discovered 87 association signals involving eGene basis SNP-based analysis. These genes include

Language: Английский

Citations

2

Harnessing AI-Powered Genomic Research for Sustainable Crop Improvement DOI Creative Commons
Elżbieta Wójcik‐Gront, Bartłomiej Zieniuk, Magdalena Pawełkowicz

et al.

Agriculture, Journal Year: 2024, Volume and Issue: 14(12), P. 2299 - 2299

Published: Dec. 14, 2024

Artificial intelligence (AI) can revolutionize agriculture by enhancing genomic research and promoting sustainable crop improvement. AI systems integrate machine learning (ML) deep (DL) with big data to identify complex patterns relationships analyzing vast genomic, phenotypic, environmental datasets. This capability accelerates breeding cycles, improves predictive accuracy, supports the development of climate-resilient, high-yielding varieties. Applications such as precision agriculture, automated phenotyping, analytics, early pest disease detection demonstrate AI’s ability optimize agricultural practices while sustainability. Despite these advancements, challenges remain, including fragmented sources, variability in phenotyping protocols, ownership concerns. Addressing issues through standardized integration frameworks, advanced analytical tools, ethical will be critical for realizing full potential. review provides a comprehensive overview AI-powered research, highlights role training robust models, explores technological considerations practices.

Language: Английский

Citations

2

A k-mer-based pangenome approach for cataloging seed-storage-protein genes in wheat to facilitate genotype-to-phenotype prediction and improvement of end-use quality DOI Creative Commons

Zhaoheng Zhang,

Dan Liu,

Binyong Li

et al.

Molecular Plant, Journal Year: 2024, Volume and Issue: 17(7), P. 1038 - 1053

Published: May 24, 2024

Wheat is a staple food for more than 35% of the world's population, with wheat flour used to make hundreds baked goods. Superior end-use quality major breeding target; however, improving it especially time-consuming and expensive. Furthermore, genes encoding seed-storage proteins (SSPs) form multi-gene families are repetitive, gaps commonplace in several genome assemblies. To overcome these barriers efficiently identify superior SSP alleles, we developed "PanSK" (Pan-SSP k-mer) genotype-to-phenotype prediction based on an SSP-based pangenome resource. PanSK uses 29-mer sequences that represent each gene at pangenomic level reveal untapped diversity across landraces modern cultivars. Genome-wide association studies k-mers identified 23 associated novel targets improvement. We evaluated effect rye secalin found removal ω-secalins from 1BL/1RS translocation lines enhanced quality. Finally, using machine-learning-based inspired by PanSK, predicted phenotypes high accuracy genotypes alone. This study provides effective approach design genes, enabling varieties processing capabilities improved

Language: Английский

Citations

1

Genotype-to-Phenotype Associations with Frequented Region Variants DOI
Indika Kahanda,

Buwani Manuweera,

Brendan Mumey

et al.

2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Journal Year: 2023, Volume and Issue: 12, P. 114 - 119

Published: Dec. 5, 2023

A pangenome represents the entire sequence content and variation of a population. As collections complete reference quality genomes become more common, so does prevalence pangenomes, necessitating need for scalable computational methods their analysis. Previously, we developed FindFRs identifying Frequented Regions in graphs, where Region is subgraph that frequently traversed by multiple sequences. In this work, propose FindFRs3, which an updated version capable with improved runtime memory efficiency, enabling analysis much larger graphs. addition, FindFRs3 identifies Variants (the unique subpaths through each region). We demonstrate utility these variants using them as input features machine learning models can predict genotype-to-phenotype associations large yeast pangenome. Biological insights gained from show novel technique allows nuanced detailed pangenomes.

Language: Английский

Citations

0