Briefings in Bioinformatics,
Journal Year:
2025,
Volume and Issue:
26(2)
Published: March 1, 2025
Metagenomic
data,
comprising
mixed
multi-species
genomes,
are
prevalent
in
diverse
environments
like
oceans
and
soils,
significantly
impacting
human
health
ecological
functions.
However,
current
research
relies
on
K-mer,
which
limits
the
capture
of
structurally
functionally
relevant
gene
contexts.
Moreover,
these
approaches
struggle
with
encoding
biologically
meaningful
genes
fail
to
address
one-to-many
many-to-one
relationships
inherent
metagenomic
data.
To
overcome
challenges,
we
introduce
FGeneBERT,
a
novel
pre-trained
model
that
employs
protein-based
representation
as
context-aware
structure-relevant
tokenizer.
FGeneBERT
incorporates
masked
modeling
enhance
understanding
inter-gene
contextual
triplet
enhanced
contrastive
learning
elucidate
sequence-function
relationships.
Pre-trained
over
100
million
sequences,
demonstrates
superior
performance
datasets
at
four
levels,
spanning
gene,
functional,
bacterial,
environmental
levels
ranging
from
1
213
k
input
sequences.
Case
studies
ATP
synthase
operons
highlight
FGeneBERT's
capability
for
functional
recognition
its
biological
relevance
research.
Plants,
Journal Year:
2024,
Volume and Issue:
13(17), P. 2462 - 2462
Published: Sept. 3, 2024
Transposable
elements
(TEs)
significantly
contribute
to
the
evolution
and
diversity
of
plant
genomes.
In
this
study,
we
explored
roles
TEs
in
genomes
Citrus
Citrus-related
genera
by
constructing
a
pan-genome
TE
library
from
20
published
accessions.
Our
results
revealed
an
increase
content
number
types
compared
original
annotations,
as
well
decrease
unclassified
TEs.
The
average
length
per
assembly
was
approximately
194.23
Mb,
representing
41.76%
(Murraya
paniculata)
64.76%
(Citrus
gilletiana)
genomes,
with
mean
value
56.95%.
A
significant
positive
correlation
found
between
genome
size
both
content.
Consistent
difference
whole-genome
(39.83
Mb)
genera,
contained
34.36
Mb
more
sequences
than
Analysis
estimated
insertion
time
half-life
long
terminal
repeat
retrotransposons
(LTR-RTs)
suggested
that
removal
not
primary
factor
contributing
differences
among
These
findings
collectively
indicate
are
determinants
play
major
role
shaping
structures.
Principal
coordinate
analysis
(PCoA)
Gene
Ontology
(GO)
Kyoto
Encyclopedia
Genes
Genomes
(KEGG)
identifiers
fragmented
were
predominantly
derived
ancestral
while
intact
crucial
recent
evolutionary
diversification
Citrus.
Moreover,
presence
or
absence
near
AdhE
superfamily
closely
associated
bitterness
trait
species.
Overall,
study
enhances
annotation
provides
valuable
data
for
future
genetic
breeding
agronomic
research
Molecular Ecology Resources,
Journal Year:
2025,
Volume and Issue:
unknown
Published: Jan. 20, 2025
ABSTRACT
Pardosa
spiders,
belonging
to
the
wolf
spider
family
Lycosidae,
play
a
vital
role
in
maintaining
health
of
forest
and
agricultural
ecosystems
due
their
function
pest
control.
This
study
presents
chromosome‐level
genome
assemblies
for
two
allied
species,
P.
laura
agraria
.
Both
species'
genomes
show
notable
expansion
helitron
transposable
elements,
which
contributes
large
sizes.
Methylome
analysis
indicates
that
has
higher
overall
DNA
methylation
levels
compared
may
not
only
aids
element‐driven
but
also
positively
affects
three‐dimensional
organisation
after
transposon
amplification,
thereby
potentially
enhancing
stability.
Genes
associated
with
hyper‐differentially
methylated
regions
(compared
)
are
enriched
functions
related
mRNA
processing
energy
production.
Furthermore,
combined
transcriptome
methylome
profiling
uncovered
complex
regulatory
interplay
between
gene
expression,
emphasising
important
body
regulation
expression.
Comparative
genomic
shows
significant
cuticle
protein
detoxification‐related
families
,
improve
its
adaptability
various
habitats.
provides
essential
methylomic
insights,
offering
deeper
understanding
relationship
elements
stability,
illuminating
adaptive
evolution
species
differentiation
among
spiders.
Nature Communications,
Journal Year:
2025,
Volume and Issue:
16(1)
Published: March 12, 2025
Comparative
genomic
studies
can
identify
genes
under
evolutionary
constraint
or
specialized
for
trait
innovation.
Growing
evidence
suggests
that
also
acts
on
non-coding
regulatory
sequences,
exerting
significant
impacts
fitness-related
traits,
although
it
has
yet
to
be
thoroughly
explored
in
plants.
Using
the
assay
transposase-accessible
chromatin
by
sequencing
(ATAC-seq),
we
profile
over
80,000
maize
accessible
regions
(ACRs),
revealing
ACRs
evolve
faster
than
coding
genes,
with
about
one-third
being
maize-specific
and
regulating
associated
speciation.
We
highlight
role
of
transposable
elements
(TEs)
driving
intraspecific
innovation
hundreds
candidate
potentially
involved
transcriptional
rewiring
during
domestication.
Additionally,
demonstrate
importance
maintaining
subgenome
dominance
controlling
complex
variations.
This
study
establishes
a
framework
analyzing
trajectory
plant
sequences
offers
loci
downstream
exploration
application
breeding.
Intricate
regulation
gene
expression
is
important
execution
biology
processes.
Here,
authors
generate
comprehensive
map
integrating
ATAC-seq
data
12
major
tissues
explore
their
interspecific
constraints
multiple
Poaceae
genomes.
Briefings in Bioinformatics,
Journal Year:
2025,
Volume and Issue:
26(2)
Published: March 1, 2025
Metagenomic
data,
comprising
mixed
multi-species
genomes,
are
prevalent
in
diverse
environments
like
oceans
and
soils,
significantly
impacting
human
health
ecological
functions.
However,
current
research
relies
on
K-mer,
which
limits
the
capture
of
structurally
functionally
relevant
gene
contexts.
Moreover,
these
approaches
struggle
with
encoding
biologically
meaningful
genes
fail
to
address
one-to-many
many-to-one
relationships
inherent
metagenomic
data.
To
overcome
challenges,
we
introduce
FGeneBERT,
a
novel
pre-trained
model
that
employs
protein-based
representation
as
context-aware
structure-relevant
tokenizer.
FGeneBERT
incorporates
masked
modeling
enhance
understanding
inter-gene
contextual
triplet
enhanced
contrastive
learning
elucidate
sequence-function
relationships.
Pre-trained
over
100
million
sequences,
demonstrates
superior
performance
datasets
at
four
levels,
spanning
gene,
functional,
bacterial,
environmental
levels
ranging
from
1
213
k
input
sequences.
Case
studies
ATP
synthase
operons
highlight
FGeneBERT's
capability
for
functional
recognition
its
biological
relevance
research.