Nucleic Acids Research,
Год журнала:
2025,
Номер
53(6)
Опубликована: Фев. 25, 2025
Abstract
Despite
many
improvements
over
the
years,
annotation
of
human
genome
remains
imperfect.
The
use
evolutionarily
conserved
sequences
provides
a
strategy
for
selecting
high-confidence
subset
annotation.
Using
latest
whole-genome
alignment,
we
found
that
splice
sites
from
protein-coding
genes
in
high-quality
MANE
are
consistently
across
>350
species.
We
also
studied
RefSeq,
GENCODE,
and
CHESS
databases
not
present
MANE.
In
addition,
analyzed
completeness
alignment
with
respect
to
annotations
described
method
would
allow
us
fix
up
60%
missing
alignments
exons.
trained
logistic
regression
classifier
distinguish
between
conservation
exhibited
by
versus
chosen
randomly
neutrally
evolving
sequences.
classified
our
model
as
well-supported
have
lower
single
nucleotide
polymorphism
rates
better
transcriptomic
evidence.
then
computed
transcripts
using
only
“well-supported”
or
ones
This
is
enriched
major
gene
catalogs
appear
be
under
purifying
selection
more
likely
correct
functionally
relevant.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2023,
Номер
unknown
Опубликована: Ноя. 29, 2023
Developing
a
universal
representation
of
cells
which
encompasses
the
tremendous
molecular
diversity
cell
types
within
human
body
and
more
generally,
across
species,
would
be
transformative
for
biology.
Recent
work
using
single-cell
transcriptomic
approaches
to
create
definitions
in
form
atlases
has
provided
necessary
data
such
an
endeavor.
Here,
we
present
Universal
Cell
Embedding
(UCE)
foundation
model.
UCE
was
trained
on
corpus
atlas
from
other
species
completely
self-supervised
way
without
any
annotations.
offers
unified
biological
latent
space
that
can
represent
cell,
regardless
tissue
or
species.
This
embedding
captures
important
variation
despite
presence
experimental
noise
diverse
datasets.
An
aspect
UCE's
universality
is
new
organism
mapped
this
with
no
additional
labeling,
model
training
fine-tuning.
We
applied
Integrated
Mega-scale
Atlas,
36
million
cells,
than
1,000
uniquely
named
types,
hundreds
experiments,
dozens
tissues
eight
uncovered
insights
about
organization
space,
leveraged
it
infer
function
newly
discovered
types.
exhibits
emergent
behavior,
uncovering
biology
never
explicitly
for,
as
identifying
developmental
lineages
novel
not
included
set.
Overall,
by
enabling
every
state
type,
provides
valuable
tool
analysis,
annotation
hypothesis
generation
scale
single
datasets
continues
grow.
CHESS
3
represents
an
improved
human
gene
catalog
based
on
nearly
10,000
RNA-seq
experiments
across
54
body
sites.
It
significantly
improves
current
genome
annotation
by
integrating
the
latest
reference
data
and
algorithms,
machine
learning
techniques
for
noise
filtering,
new
protein
structure
prediction
methods.
contains
41,356
genes,
including
19,839
protein-coding
genes
158,377
transcripts,
with
14,863
transcripts
not
in
other
catalogs.
includes
all
MANE
at
least
one
transcript
most
RefSeq
GENCODE
genes.
On
CHM13
genome,
additional
129
is
available
http://ccb.jhu.edu/chess
.
Cell Genomics,
Год журнала:
2023,
Номер
3(8), С. 100375 - 100375
Опубликована: Авг. 1, 2023
Within
the
next
decade,
genomes
of
1.8
million
eukaryotic
species
will
be
sequenced.
Identifying
genes
in
these
sequences
is
essential
to
understand
biology
species.
This
challenging
due
transcriptional
complexity
genomes,
which
encode
hundreds
thousands
transcripts
multiple
types.
Among
these,
a
small
set
protein-coding
mRNAs
play
disproportionately
large
role
defining
phenotypes.
Due
their
sequence
conservation,
orthology
can
established,
making
it
possible
define
universal
catalog
genes.
should
substantially
contribute
uncovering
genomic
events
underlying
emergence
piece
briefly
reviews
basics
gene
prediction,
discusses
challenges
finalizing
annotation
human
genome,
and
proposes
strategies
for
producing
annotations
across
Tree
Life.
lays
groundwork
obtaining
all
genes-the
Earth's
code
life.
Abstract
The
human
genome
project's
lasting
legacies
are
the
emerging
insights
into
physiology
and
disease,
ascendance
of
biology
as
dominant
science
21st
century.
Sequencing
revealed
that
>90%
is
not
coding
for
proteins,
originally
thought,
but
rather
overwhelmingly
transcribed
non‐protein
coding,
or
non‐coding,
RNAs
(ncRNAs).
This
discovery
initially
led
to
hypothesis
most
genomic
DNA
“junk”,
a
term
still
championed
by
some
geneticists
evolutionary
biologists.
In
contrast,
molecular
biologists
biochemists
studying
vast
number
transcripts
produced
from
this
“junk”
often
surmise
these
ncRNAs
have
biological
significance.
What
gives?
essay
contrasts
two
opposing,
extant
viewpoints,
aiming
explain
their
bases,
which
arise
distinct
reference
frames
underlying
scientific
disciplines.
Finally,
it
aims
reconcile
divergent
mindsets
in
hopes
stimulating
synergy
between
fields.
Cell Death and Disease,
Год журнала:
2024,
Номер
15(3)
Опубликована: Март 11, 2024
Antisense
RNAs
(asRNAs)
represent
an
underappreciated
yet
crucial
layer
of
gene
expression
regulation.
Generally
thought
to
modulate
their
sense
genes
in
cis
through
sequence
complementarity
or
act
transcription,
asRNAs
can
also
regulate
different
molecular
targets
trans,
the
nucleus
cytoplasm.
Here,
we
performed
in-depth
characterization
NFYC
1
(NFYC-AS1),
asRNA
transcribed
head-to-head
subunit
proliferation-associated
NF-Y
transcription
factor.
Our
results
show
that
NFYC-AS1
is
a
prevalently
nuclear
peaking
early
cell
cycle.
Comparative
genomics
suggests
narrow
phylogenetic
distribution,
with
probable
origin
common
ancestor
mammalian
lineages.
overexpressed
pancancer,
preferentially
association
RB1
mutations.
Knockdown
by
antisense
oligonucleotides
impairs
growth
lung
squamous
carcinoma
and
small
cancer
cells,
phenotype
recapitulated
CRISPR/Cas9-deletion
its
start
site.
Surprisingly,
affected
only
when
endogenous
manipulated.
This
regulation
proliferation
at
least
part
independent
transcription-mediated
effect
on
possibly
exerted
RNA-dependent
trans
effects
converging
G2/M
cycle
phase
genes.
Accordingly,
NFYC-AS1-depleted
cells
are
stuck
mitosis,
indicating
defects
mitotic
progression.
Overall,
emerged
as
cycle-regulating
dual
action,
holding
therapeutic
potential
types,
including
very
aggressive
RB1-mutated
tumors.
Non-coding RNA Research,
Год журнала:
2024,
Номер
9(4), С. 1271 - 1279
Опубликована: Июнь 26, 2024
Long
non-coding
RNAs
(lncRNAs)
and
circular
(circRNAs)
have
emerged
as
critical
regulators
in
essentially
all
biological
processes
across
eukaryotes.
They
exert
their
functions
through
chromatin
remodeling,
transcriptional
regulation,
interacting
with
RNA-binding
proteins
(RBPs),
serving
microRNA
sponges,
etc.
Although
are
typically
more
species-specific
than
coding
RNAs,
a
number
of
well-characterized
lncRNA
(such