Briefings in Bioinformatics,
Journal Year:
2025,
Volume and Issue:
26(2)
Published: March 1, 2025
Metagenomic
data,
comprising
mixed
multi-species
genomes,
are
prevalent
in
diverse
environments
like
oceans
and
soils,
significantly
impacting
human
health
ecological
functions.
However,
current
research
relies
on
K-mer,
which
limits
the
capture
of
structurally
functionally
relevant
gene
contexts.
Moreover,
these
approaches
struggle
with
encoding
biologically
meaningful
genes
fail
to
address
one-to-many
many-to-one
relationships
inherent
metagenomic
data.
To
overcome
challenges,
we
introduce
FGeneBERT,
a
novel
pre-trained
model
that
employs
protein-based
representation
as
context-aware
structure-relevant
tokenizer.
FGeneBERT
incorporates
masked
modeling
enhance
understanding
inter-gene
contextual
triplet
enhanced
contrastive
learning
elucidate
sequence-function
relationships.
Pre-trained
over
100
million
sequences,
demonstrates
superior
performance
datasets
at
four
levels,
spanning
gene,
functional,
bacterial,
environmental
levels
ranging
from
1
213
k
input
sequences.
Case
studies
ATP
synthase
operons
highlight
FGeneBERT's
capability
for
functional
recognition
its
biological
relevance
research.
Nature,
Journal Year:
2021,
Volume and Issue:
590(7845), P. 284 - 289
Published: Jan. 18, 2021
Abstract
Lungfishes
belong
to
lobe-fined
fish
(Sarcopterygii)
that,
in
the
Devonian
period,
‘conquered’
land
and
ultimately
gave
rise
all
vertebrates,
including
humans
1–3
.
Here
we
determine
chromosome-quality
genome
of
Australian
lungfish
(
Neoceratodus
forsteri
),
which
is
known
have
largest
any
animal.
The
vast
size
this
genome,
about
14×
larger
than
that
humans,
attributable
mostly
huge
intergenic
regions
introns
with
high
repeat
content
(around
90%),
components
resemble
those
tetrapods
(comprising
mainly
long
interspersed
nuclear
elements)
more
they
do
ray-finned
fish.
continues
expand
independently
(its
transposable
elements
are
still
active),
through
mechanisms
different
enormous
genomes
salamanders.
17
fully
assembled
macrochromosomes
maintain
synteny
other
vertebrate
chromosomes,
microchromosomes
conserved
ancient
homology
ancestral
karyotype.
Our
phylogenomic
analyses
confirm
previous
reports
occupy
a
key
evolutionary
position
as
closest
living
relatives
4,5
,
underscoring
importance
for
understanding
innovations
associated
terrestrialization.
Lungfish
preadaptations
on
include
gain
limb-like
expression
developmental
genes
such
hoxc13
sall1
their
lobed
fins.
Increased
rates
evolution
duplication
obligate
air-breathing,
lung
surfactants
expansion
odorant
receptor
gene
families
(which
encode
proteins
involved
detecting
airborne
odours),
contribute
tetrapod-like
biology
lungfishes.
These
findings
advance
our
major
transition
during
evolution.
Horticulture Research,
Journal Year:
2023,
Volume and Issue:
10(5)
Published: April 4, 2023
Grapevine
is
one
of
the
most
economically
important
crops
worldwide.
However,
previous
versions
grapevine
reference
genome
tipically
consist
thousands
fragments
with
missing
centromeres
and
telomeres,
limiting
accessibility
repetitive
sequences,
centromeric
telomeric
regions,
study
inheritance
agronomic
traits
in
these
regions.
Here,
we
assembled
a
telomere-to-telomere
(T2T)
gap-free
for
cultivar
PN40024
using
PacBio
HiFi
long
reads.
The
T2T
(PN_T2T)
69
Mb
longer
9018
more
genes
identified
than
12X.v0
version.
We
annotated
67%
19
36
incorporated
gene
annotations
into
PN_T2T
assembly.
detected
total
377
clusters,
which
showed
associations
complex
traits,
such
as
aroma
disease
resistance.
Even
though
derives
from
nine
generations
selfing,
still
found
genomic
hotspots
heterozygous
sites
associated
biological
processes,
oxidation-reduction
process
protein
phosphorylation.
fully
complete
therefore
constitutes
an
resource
genetic
studies
breeding
programs.
Mobile DNA,
Journal Year:
2022,
Volume and Issue:
13(1)
Published: March 30, 2022
In
the
study
of
transposable
elements
(TEs),
generation
a
high
confidence
set
consensus
sequences
that
represent
diversity
TEs
found
in
given
genome
is
key
step
path
to
investigate
these
fascinating
genomic
elements.
Many
algorithms
and
pipelines
are
available
automatically
identify
putative
TE
families
present
genome.
Despite
availability
valuable
resources,
producing
library
high-quality
full-length
largely
remains
process
manual
curation.
This
know-how
often
passed
on
from
mentor-to-mentee
within
research
groups,
making
it
difficult
for
those
outside
field
access
this
highly
specialised
skill.
Communications Biology,
Journal Year:
2023,
Volume and Issue:
6(1)
Published: Sept. 19, 2023
Abstract
Repetitive
DNA
sequences
playing
critical
roles
in
driving
evolution,
inducing
variation,
and
regulating
gene
expression.
In
this
review,
we
summarized
the
definition,
arrangement,
structural
characteristics
of
repeats.
Besides,
introduced
diverse
biological
functions
repeats
reviewed
existing
methods
for
automatic
repeat
detection,
classification,
masking.
Finally,
analyzed
type,
structure,
regulation
human
genome
their
role
induction
complex
diseases.
We
believe
that
review
will
facilitate
a
comprehensive
understanding
provide
guidance
annotation
in-depth
exploration
its
association
with
Proceedings of the Royal Society B Biological Sciences,
Journal Year:
2020,
Volume and Issue:
287(1933)
Published: Aug. 26, 2020
Genome
size
(GS)
variation
is
a
fundamental
biological
characteristic;
however,
its
evolutionary
causes
and
consequences
are
the
topic
of
ongoing
debate.
Whether
GS
neutral
trait
or
one
subject
to
selective
pressures,
how
strong
these
pressures
are,
may
remain
open
questions.
Fundamentally,
genomic
sequences
responsible
for
this
directly
impact
potential
outcomes
and,
equally,
targets
different
pressures.
For
example,
duplications
deletions
genic
regions
(large
small)
can
have
immediate
drastic
phenotypic
effects,
while
an
expansion
contraction
non-coding
DNA
less
likely
cause
catastrophic
effects.
However,
in
long
term,
accumulation
deletion
ncDNA
larger
Modern
sequencing
technologies
allowing
dissection
proximate
causes,
but
combination
new
with
more
traditional
experiments
approaches
could
revolutionize
debate
potentially
resolve
many
arguments.
Here,
I
discuss
ambitious
way
forward
research,
putting
it
context
historical
debates,
theories
sometimes
contradictory
evidence,
highlighting
promise
combining
analytical
developments
experimental
evolution
approaches.
Plant Biotechnology Journal,
Journal Year:
2023,
Volume and Issue:
21(11), P. 2348 - 2357
Published: Aug. 2, 2023
Summary
Millets
are
a
class
of
nutrient‐rich
coarse
cereals
with
high
resistance
to
abiotic
stress;
thus,
they
guarantee
food
security
for
people
living
in
areas
extreme
climatic
conditions
and
provide
stress‐related
genetic
resources
other
crops.
However,
no
platform
is
available
comprehensive
systematic
multi‐omics
analysis
millets,
which
seriously
hinders
the
mining
genes
molecular
breeding
millets.
Here,
free,
web‐accessible,
user‐friendly
millets
database
(Milletdb,
http://milletdb.novogene.com
)
has
been
developed.
The
Milletdb
contains
six
their
one
related
species
genomes,
graph‐based
pan‐genomics
pearl
millet,
data,
enable
be
most
complete
available.
We
stored
GWAS
(genome‐wide
association
study)
results
20
yield‐related
trait
data
obtained
under
three
environmental
[field
(no
stress),
early
drought
late
drought]
2
years
database,
allowing
users
identify
that
support
yield
improvement.
can
simplify
functional
genomics
by
providing
different
tools
(e.g.,
‘Gene
mapping’,
‘Co‐expression’,
‘KEGG/GO
Enrichment’
analysis,
etc.).
On
platform,
gene
PMA1G03779.1
was
identified
through
‘GWAS’,
potential
modulate
respond
stresses.
Using
provided
Milletdb,
we
found
PLATZs
TFs
(transcription
factors)
family
expands
87.5%
millet
accessions
contributes
vegetative
growth
stress
responses.
effectively
serve
researchers
key
genes,
genome
editing
Genome Biology and Evolution,
Journal Year:
2023,
Volume and Issue:
15(9)
Published: Sept. 1, 2023
Bats
are
exceptional
among
mammals
for
their
powered
flight,
extended
lifespans,
and
robust
immune
systems
therefore
have
been
of
particular
interest
in
comparative
genomics.
Using
the
Oxford
Nanopore
Technologies
long-read
platform,
we
sequenced
genomes
two
bat
species
with
key
phylogenetic
positions,
Jamaican
fruit
(Artibeus
jamaicensis)
Mesoamerican
mustached
(Pteronotus
mesoamericanus),
carried
out
a
comprehensive
genomic
analysis
diverse
collection
bats
other
mammals.
The
high-quality,
genome
assemblies
revealed
contraction
interferon
(IFN)-α
at
immunity-related
type
I
IFN
locus
bats,
resulting
shift
relative
IFN-ω
IFN-α
copy
numbers.
Contradicting
previous
hypotheses
constitutive
expression
being
feature
system,
three
lost
all
genes.
This
to
could
contribute
increased
viral
tolerance
that
has
made
common
reservoir
viruses
can
be
transmitted
humans.
Antiviral
genes
stimulated
by
IFNs
also
showed
evidence
rapid
evolution,
including
lineage-specific
duplication
IFN-induced
transmembrane
positive
selection
IFIT2.
In
addition,
33
tumor
suppressors
6
DNA-repair
signs
selection,
perhaps
contributing
longevity
reduced
cancer
rates
bats.
rely
on
both
bat-wide
evolution
gene
repertoire,
suggesting
strategies.
Our
study
provides
new
resources
sheds
light
extraordinary
molecular
this
critically
important
group
Briefings in Bioinformatics,
Journal Year:
2024,
Volume and Issue:
25(3)
Published: March 27, 2024
Following
the
milestone
success
of
Human
Genome
Project,
'Encyclopedia
DNA
Elements
(ENCODE)'
initiative
was
launched
in
2003
to
unearth
information
about
numerous
functional
elements
within
genome.
This
endeavor
coincided
with
emergence
novel
technologies,
accompanied
by
provision
vast
amounts
whole-genome
sequences,
high-throughput
data
such
as
ChIP-Seq
and
RNA-Seq.
Extracting
biologically
meaningful
from
this
massive
dataset
has
become
a
critical
aspect
many
recent
studies,
particularly
annotating
predicting
functions
unknown
genes.
The
core
idea
behind
genome
annotation
is
identify
genes
various
sequence
infer
their
biological
functions.
Traditional
wet-lab
experimental
methods
still
rely
on
extensive
efforts
for
verification.
However,
early
bioinformatics
algorithms
software
primarily
employed
shallow
learning
techniques;
thus,
ability
characterize
features
limited.
With
widespread
adoption
RNA-Seq
technology,
scientists
community
began
harness
potential
machine
deep
approaches
gene
structure
prediction
annotation.
In
context,
we
reviewed
both
conventional
contemporary
frameworks,
highlighted
perspectives
challenges
arising
during
underscoring
dynamic
nature
evolving
scientific
landscape.