Briefings in Bioinformatics,
Journal Year:
2025,
Volume and Issue:
26(2)
Published: March 1, 2025
Metagenomic
data,
comprising
mixed
multi-species
genomes,
are
prevalent
in
diverse
environments
like
oceans
and
soils,
significantly
impacting
human
health
ecological
functions.
However,
current
research
relies
on
K-mer,
which
limits
the
capture
of
structurally
functionally
relevant
gene
contexts.
Moreover,
these
approaches
struggle
with
encoding
biologically
meaningful
genes
fail
to
address
one-to-many
many-to-one
relationships
inherent
metagenomic
data.
To
overcome
challenges,
we
introduce
FGeneBERT,
a
novel
pre-trained
model
that
employs
protein-based
representation
as
context-aware
structure-relevant
tokenizer.
FGeneBERT
incorporates
masked
modeling
enhance
understanding
inter-gene
contextual
triplet
enhanced
contrastive
learning
elucidate
sequence-function
relationships.
Pre-trained
over
100
million
sequences,
demonstrates
superior
performance
datasets
at
four
levels,
spanning
gene,
functional,
bacterial,
environmental
levels
ranging
from
1
213
k
input
sequences.
Case
studies
ATP
synthase
operons
highlight
FGeneBERT's
capability
for
functional
recognition
its
biological
relevance
research.
Genome biology,
Journal Year:
2025,
Volume and Issue:
26(1)
Published: March 17, 2025
Loose-skin
mandarins
(LSMs)
are
among
the
oldest
domesticated
horticultural
crops,
yet
their
domestication
history
and
genetic
basis
underlying
formation
of
key
selected
traits
remain
unclear.
We
provide
a
chromosome-scale
haplotype-resolved
assembly
for
ancient
Chinese
citrus
variety
Nanfengmiju
tangerine.
Through
integration
77
resequenced
114
published
germplasm
genomes,
we
categorize
LSMs
into
12
distinct
groups
based
on
population
genomic
analyses.
infer
that
ancestors
modern
cultivated
diverged
from
wild
in
Daoxian
approximately
500,000
years
ago,
when
they
entered
Yangtze
Pearl
River
Basins.
There,
were
four
cultivation
groups,
forming
cornerstone
LSM
cultivation.
identify
selective
sweeps
quantitative
trait
loci
genes
related
to
important
fruit
quality
traits,
including
sweetness
size.
reveal
co-selection
sugar
transporter
metabolism
associated
with
increased
sweetness.
Significant
alterations
auxin
gibberellin
signaling
networks
may
contribute
enlargement
fruits.
also
comprehensive,
high-spatiotemporal-resolution
atlas
allelic
gene
expression
during
development.
detect
5890
allele
pairs
showing
specific
patterns
significant
increase
variation
levels.
Our
study
provides
valuable
resources
further
revises
origin
LSMs,
offering
insights
improvement
plants.
Nature Ecology & Evolution,
Journal Year:
2024,
Volume and Issue:
8(8), P. 1505 - 1521
Published: July 19, 2024
Species
within
nearly
all
extant
animal
lineages
are
capable
of
regenerating
body
parts.
However,
it
remains
unclear
whether
the
gene
expression
programme
controlling
regeneration
is
evolutionarily
conserved.
Brittle
stars
a
species-rich
class
echinoderms
with
outstanding
regenerative
abilities,
but
investigations
into
genetic
bases
in
this
group
have
been
hindered
by
limited
genomic
resources.
Here
we
report
chromosome-scale
genome
assembly
for
brittle
star
Amphiura
filiformis.
We
show
that
most
rearranged
among
sequenced
so
far,
featuring
reorganized
Hox
cluster
reminiscent
rearrangements
observed
sea
urchins.
In
addition,
performed
an
extensive
profiling
during
adult
arm
and
identified
sequential
waves
governing
wound
healing,
proliferation
differentiation.
conducted
comparative
transcriptomic
analyses
other
invertebrate
vertebrate
models
appendage
uncovered
hundreds
genes
conserved
dynamics,
particularly
proliferative
phase
regeneration.
Our
findings
emphasize
crucial
importance
to
detect
long-range
conservation
between
vertebrates
classical
model
systems.
DNA Research,
Journal Year:
2022,
Volume and Issue:
29(6)
Published: Sept. 12, 2022
Homologous
chromosomes
in
the
diploid
genome
are
thought
to
contain
equivalent
genetic
information,
but
this
common
concept
has
not
been
fully
verified
animal
genomes
with
high
heterozygosity.
Here
we
report
a
near-complete,
haplotype-phased,
assembly
of
pearl
oyster,
Pinctada
fucata,
using
hi-fidelity
(HiFi)
long
reads
and
chromosome
conformation
capture
data.
This
includes
14
pairs
scaffolds
(>38
Mb)
corresponding
(2n
=
28).
The
accuracy
assembly,
as
measured
by
an
analysis
k-mers,
is
estimated
be
99.99997%.
Moreover,
haplotypes
95.2%
95.9%,
respectively,
complete
single-copy
BUSCO
genes,
demonstrating
quality
assembly.
Transposons
comprise
53.3%
major
contributor
structural
variations.
Despite
overall
collinearity
between
haplotypes,
one
chromosomal
contains
megabase-scale
non-syntenic
regions,
which
necessarily
have
never
detected
resolved
conventional
haplotype-merged
assemblies.
These
regions
encode
expanded
gene
families
NACHT,
DZIP3/hRUL138-like
HEPN,
immunoglobulin
domains,
multiplying
immunity
repertoire,
hypothesize
important
for
innate
immune
capability
oysters.
oyster
provides
insight
into
remarkable
haplotype
diversity
animals.
Genome Research,
Journal Year:
2022,
Volume and Issue:
32(10), P. 1862 - 1875
Published: Sept. 15, 2022
Despite
insertions
and
deletions
being
the
most
common
structural
variants
(SVs)
found
across
genomes,
not
much
is
known
about
how
these
SVs
vary
within
populations
between
closely
related
species,
nor
their
significance
in
evolution.
To
address
questions,
we
characterized
evolution
of
indel
using
genome
assemblies
three
Horticulture Research,
Journal Year:
2022,
Volume and Issue:
10(1)
Published: Nov. 3, 2022
Sweet
orange
originated
from
the
introgressive
hybridizations
of
pummelo
and
mandarin
resulting
in
a
highly
heterozygous
genome.
How
alleles
two
species
cooperate
shaping
sweet
phenotypes
under
distinct
circumstances
is
unknown.
Here,
we
assembled
chromosome-level
phased
diploid
Valencia
(DVS)
genome
with
over
99.999%
base
accuracy
99.2%
gene
annotation
BUSCO
completeness.
DVS
enables
allele-level
studies
for
other
hybrids
between
mandarin.
We
first
configured
an
allele-aware
transcriptomic
profiling
pipeline
applied
it
to
740
transcriptomes.
On
average,
32.5%
genes
have
significantly
biased
allelic
expression
Different
cultivars,
transgenic
lineages,
tissues,
development
stages,
disease
status
all
impacted
expressions
resulted
diversified
patterns
orange,
but
particularly
citrus
Huanglongbing
(HLB)
shifted
hundreds
leaves
calyx
abscission
zones.
In
addition,
detected
structural
mutations
HLB-tolerant
mutant
(T19)
more
sensitive
(T78)
through
long-read
sequencing.
The
irradiation-induced
mostly
involved
double-strand
breaks,
while
most
spontaneous
were
transposon
insertions.
mutants,
significant
ratio
alterations
(≥1.5-fold)
directly
affected
by
those
mutations.
T19,
located
at
translocated
segment
terminal
upregulated,
including
Briefings in Bioinformatics,
Journal Year:
2023,
Volume and Issue:
24(4)
Published: May 30, 2023
Abstract
Pathogen
detection
from
biological
and
environmental
samples
is
important
for
global
disease
control.
Despite
advances
in
pathogen
using
deep
learning,
current
algorithms
have
limitations
processing
long
genomic
sequences.
Through
the
cross-fusion
of
cross,
residual
neural
networks,
we
developed
DCiPatho
accurate
based
on
integrated
frequency
features
3-to-7
k-mers.
Compared
with
existing
state-of-the-art
algorithms,
can
be
used
to
accurately
identify
distinct
pathogenic
bacteria
infecting
humans,
animals
plants.
We
evaluated
both
learned
unlearned
species
genomics
metagenomics
datasets.
an
effective
tool
genomic-scale
identification
pathogens
by
integrating
k-mers
into
networks.
The
source
code
publicly
available
at
https://github.com/LorMeBioAI/DCiPatho.