PanKmer: k-mer-based and reference-free pangenome analysis
Anthony Aylward,
No information about this author
Semar Petrus,
No information about this author
Allen Mamerto
No information about this author
et al.
Bioinformatics,
Journal Year:
2023,
Volume and Issue:
39(10)
Published: Oct. 1, 2023
Abstract
Summary
Pangenomes
are
replacing
single
reference
genomes
as
the
definitive
representation
of
DNA
sequence
within
a
species
or
clade.
Pangenome
analysis
predominantly
leverages
graph-based
methods
that
require
computationally
intensive
multiple
genome
alignments,
do
not
scale
to
highly
complex
eukaryotic
genomes,
limit
their
scope
identifying
structural
variants
(SVs),
incur
bias
by
relying
on
genome.
Here,
we
present
PanKmer,
toolkit
designed
for
reference-free
pangenome
datasets
consisting
dozens
thousands
individual
genomes.
PanKmer
decomposes
set
input
into
table
observed
k-mers
and
presence–absence
values
in
each
These
stored
an
efficient
k-mer
index
data
format
encodes
SNPs,
INDELs,
SVs.
It
also
includes
functions
downstream
index,
such
calculating
similarity
statistics
between
individuals
at
whole-genome
local
scales.
For
example,
can
be
“anchored”
any
quantify
variability
conservation
specific
locus.
This
facilitates
workflows
with
various
biological
applications,
e.g.
cases
hybridization
plant
species.
provides
researchers
valuable
convenient
means
explore
full
genetic
variation
population,
without
bias.
Availability
implementation
is
implemented
Python
package
components
written
Rust,
released
under
BSD
license.
The
source
code
available
from
Package
Index
(PyPI)
https://pypi.org/project/pankmer/
well
Gitlab
https://gitlab.com/salk-tm/pankmer.
Full
documentation
https://salk-tm.gitlab.io/pankmer/.
Language: Английский
Genome Survey of Sphallerocarpus gracilis Based on High-throughput Sequencing
Shiming Qi,
No information about this author
Chunmei Zhang,
No information about this author
Fang Yan
No information about this author
et al.
Research Square (Research Square),
Journal Year:
2025,
Volume and Issue:
unknown
Published: Jan. 24, 2025
Abstract
Sphallerocarpus
gracilis
is
a
high-value
medicinal
and
green
health
food
product.
The
analysis
of
the
genomic
characteristic
information
S.
can
lay
theoretical
foundation
for
whole
genome
sequencing
molecular
mechanism
research
biosynthesis
bioactive
active
ingredients.
In
this
study,
survey
technology
was
employed
to
evaluate
characteristics
using
K-mer
analysis,
smudgeplot
used
its
chromosome
ploidy.
results
showed
that
size
sample
approximately
1,071
Mb,
corrected
1,063
Mb.
heterozygosity
rate,
proportion
repeat
sequences,
GC
content
were
determined
1.22%,
76.33%,
35.70%,
respectively.
Based
on
maximum
possible
ploidy
analyzed
species
AB
type,
corresponding
diploid
plant.
Blast
revealed
have
close
relative
relationship
with
Daucus
carota
(4.78%).
summary,
indicate
S.gracilis
complex
large
high
repetition
genome.
This
study
provides
basis
future
related
research.
Language: Английский
Inferring Staphylococcus aureus host species and cross-species transmission from a genome-based model
Wenyin Du,
No information about this author
Sitong Chen,
No information about this author
Rong Jiang
No information about this author
et al.
BMC Genomics,
Journal Year:
2025,
Volume and Issue:
26(1)
Published: Feb. 17, 2025
Language: Английский
Pato: prediction of probiotic bacteria using metabolic features
Brazilian Journal of Microbiology,
Journal Year:
2025,
Volume and Issue:
unknown
Published: April 23, 2025
Language: Английский
SNP-Based and Kmer-Based eQTL Analysis Using Transcriptome Data
Animals,
Journal Year:
2024,
Volume and Issue:
14(20), P. 2941 - 2941
Published: Oct. 11, 2024
Traditional
expression
quantitative
trait
locus
(eQTL)
mapping
associates
single
nucleotide
polymorphisms
(SNPs)
with
gene
expression,
where
the
SNPs
are
derived
from
large-scale
whole-genome
sequencing
(WGS)
data
or
transcriptome
data.
While
WGS
provides
a
high
SNP
density,
it
also
incurs
substantial
costs.
In
contrast,
RNA-seq
data,
which
more
accessible
and
less
expensive,
can
simultaneously
yield
expressions
SNPs.
Thus,
eQTL
analysis
based
on
offers
significant
potential
applications.
Two
primary
strategies
were
employed
for
in
this
study.
The
first
involved
analyzing
levels
relation
to
variant
sites
detected
between
populations
second
approach
utilized
kmers,
sequences
of
length
k
reads,
represent
associated
these
kmer
genotypes
expression.
We
discovered
87
association
signals
involving
eGene
basis
SNP-based
analysis.
These
genes
include
Language: Английский
Harnessing AI-Powered Genomic Research for Sustainable Crop Improvement
Agriculture,
Journal Year:
2024,
Volume and Issue:
14(12), P. 2299 - 2299
Published: Dec. 14, 2024
Artificial
intelligence
(AI)
can
revolutionize
agriculture
by
enhancing
genomic
research
and
promoting
sustainable
crop
improvement.
AI
systems
integrate
machine
learning
(ML)
deep
(DL)
with
big
data
to
identify
complex
patterns
relationships
analyzing
vast
genomic,
phenotypic,
environmental
datasets.
This
capability
accelerates
breeding
cycles,
improves
predictive
accuracy,
supports
the
development
of
climate-resilient,
high-yielding
varieties.
Applications
such
as
precision
agriculture,
automated
phenotyping,
analytics,
early
pest
disease
detection
demonstrate
AI’s
ability
optimize
agricultural
practices
while
sustainability.
Despite
these
advancements,
challenges
remain,
including
fragmented
sources,
variability
in
phenotyping
protocols,
ownership
concerns.
Addressing
issues
through
standardized
integration
frameworks,
advanced
analytical
tools,
ethical
will
be
critical
for
realizing
full
potential.
review
provides
a
comprehensive
overview
AI-powered
research,
highlights
role
training
robust
models,
explores
technological
considerations
practices.
Language: Английский
A k-mer-based pangenome approach for cataloging seed-storage-protein genes in wheat to facilitate genotype-to-phenotype prediction and improvement of end-use quality
Zhaoheng Zhang,
No information about this author
Dan Liu,
No information about this author
Binyong Li
No information about this author
et al.
Molecular Plant,
Journal Year:
2024,
Volume and Issue:
17(7), P. 1038 - 1053
Published: May 24, 2024
Wheat
is
a
staple
food
for
more
than
35%
of
the
world's
population,
with
wheat
flour
used
to
make
hundreds
baked
goods.
Superior
end-use
quality
major
breeding
target;
however,
improving
it
especially
time-consuming
and
expensive.
Furthermore,
genes
encoding
seed-storage
proteins
(SSPs)
form
multi-gene
families
are
repetitive,
gaps
commonplace
in
several
genome
assemblies.
To
overcome
these
barriers
efficiently
identify
superior
SSP
alleles,
we
developed
"PanSK"
(Pan-SSP
k-mer)
genotype-to-phenotype
prediction
based
on
an
SSP-based
pangenome
resource.
PanSK
uses
29-mer
sequences
that
represent
each
gene
at
pangenomic
level
reveal
untapped
diversity
across
landraces
modern
cultivars.
Genome-wide
association
studies
k-mers
identified
23
associated
novel
targets
improvement.
We
evaluated
effect
rye
secalin
found
removal
ω-secalins
from
1BL/1RS
translocation
lines
enhanced
quality.
Finally,
using
machine-learning-based
inspired
by
PanSK,
predicted
phenotypes
high
accuracy
genotypes
alone.
This
study
provides
effective
approach
design
genes,
enabling
varieties
processing
capabilities
improved
Language: Английский
Genotype-to-Phenotype Associations with Frequented Region Variants
Indika Kahanda,
No information about this author
Buwani Manuweera,
No information about this author
Brendan Mumey
No information about this author
et al.
2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM),
Journal Year:
2023,
Volume and Issue:
12, P. 114 - 119
Published: Dec. 5, 2023
A
pangenome
represents
the
entire
sequence
content
and
variation
of
a
population.
As
collections
complete
reference
quality
genomes
become
more
common,
so
does
prevalence
pangenomes,
necessitating
need
for
scalable
computational
methods
their
analysis.
Previously,
we
developed
FindFRs
identifying
Frequented
Regions
in
graphs,
where
Region
is
subgraph
that
frequently
traversed
by
multiple
sequences.
In
this
work,
propose
FindFRs3,
which
an
updated
version
capable
with
improved
runtime
memory
efficiency,
enabling
analysis
much
larger
graphs.
addition,
FindFRs3
identifies
Variants
(the
unique
subpaths
through
each
region).
We
demonstrate
utility
these
variants
using
them
as
input
features
machine
learning
models
can
predict
genotype-to-phenotype
associations
large
yeast
pangenome.
Biological
insights
gained
from
show
novel
technique
allows
nuanced
detailed
pangenomes.
Language: Английский