bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2023,
Номер
unknown
Опубликована: Июль 10, 2023
Differential
transcript
usage
(DTU)
plays
a
crucial
role
in
determining
how
gene
expression
differs
among
cells,
tissues,
and
different
developmental
stages,
thereby
contributing
to
the
complexity
diversity
of
biological
systems.
In
abnormal
it
can
also
lead
deficiencies
protein
function,
potentially
leading
pathogenesis
diseases.
Detecting
such
events
for
single-gene
genetic
traits
is
relatively
uncomplicated;
however,
heterogeneity
populations
with
complex
diseases
presents
an
intricate
challenge
due
presence
diverse
causal
undetermined
subtypes.
SPIT
first
statistical
tool
that
quantifies
within
population
identifies
predominant
subgroups
along
their
distinctive
sets
DTU
events.
We
provide
comprehensive
assessments
SPIT's
methodology
both
report
results
applying
analyze
brain
samples
from
individuals
schizophrenia.
Our
analysis
reveals
previously
unreported
six
candidate
genes.
RNA
splicing
is
highly
prevalent
in
the
brain
and
has
strong
links
to
neuropsychiatric
disorders;
yet,
role
of
cell
type-specific
transcript-isoform
diversity
during
human
development
not
been
systematically
investigated.
In
this
work,
we
leveraged
single-molecule
long-read
sequencing
deeply
profile
full-length
transcriptome
germinal
zone
cortical
plate
regions
developing
neocortex
at
tissue
single-cell
resolution.
We
identified
214,516
distinct
isoforms,
which
72.6%
were
novel
(not
previously
annotated
Gencode
version
33),
uncovered
a
substantial
contribution
diversity-regulated
by
binding
proteins-in
defining
cellular
identity
neocortex.
comprehensive
isoform-centric
gene
annotation
reprioritize
thousands
rare
de
novo
risk
variants
elucidate
genetic
mechanisms
for
disorders.
CHESS
3
represents
an
improved
human
gene
catalog
based
on
nearly
10,000
RNA-seq
experiments
across
54
body
sites.
It
significantly
improves
current
genome
annotation
by
integrating
the
latest
reference
data
and
algorithms,
machine
learning
techniques
for
noise
filtering,
new
protein
structure
prediction
methods.
contains
41,356
genes,
including
19,839
protein-coding
genes
158,377
transcripts,
with
14,863
transcripts
not
in
other
catalogs.
includes
all
MANE
at
least
one
transcript
most
RefSeq
GENCODE
genes.
On
CHM13
genome,
additional
129
is
available
http://ccb.jhu.edu/chess
.
The
process
of
splicing
messenger
RNA
to
remove
introns
plays
a
central
role
in
creating
genes
and
gene
variants.
We
describe
Splam,
novel
method
for
predicting
splice
junctions
DNA
using
deep
residual
convolutional
neural
networks.
Unlike
previous
models,
Splam
looks
at
400-base-pair
window
flanking
each
site,
reflecting
the
biological
that
relies
primarily
on
signals
within
this
window.
also
trains
donor
acceptor
pairs
together,
mirroring
how
machinery
recognizes
both
ends
intron.
Compared
SpliceAI,
is
consistently
more
accurate,
achieving
96%
accuracy
human
junctions.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Окт. 31, 2024
Abstract
Accurate
and
complete
gene
annotations
are
indispensable
for
understanding
how
genome
sequences
encode
biological
functions.
For
twenty
years,
the
GENCODE
consortium
has
developed
reference
human
mouse
genomes,
becoming
a
foundation
biomedical
genomics
communities
worldwide.
Nevertheless,
collections
of
important
yet
poorly-understood
classes
like
long
non-coding
RNAs
(lncRNAs)
remain
incomplete
scattered
across
multiple,
uncoordinated
catalogs,
slowing
down
progress
in
field.
To
address
these
issues,
undertaken
most
comprehensive
lncRNAs
annotation
effort
to
date.
This
is
founded
on
manual
full-length
targeted
long-read
sequencing,
matched
embryonic
adult
tissues,
orthologous
regions
mouse.
Altogether
17,931
novel
genes
(140,268
transcripts)
22,784
(136,169
have
been
added
catalog
representing
2-fold
6-fold
increase
transcripts,
respectively
-
greatest
since
sequencing
genome.
Novel
display
evolutionary
constraints,
well-formed
promoter
regions,
link
phenotype-associated
genetic
variants.
They
greatly
enhance
functional
interpretability
genome,
as
they
help
explain
millions
previously-mapped
“orphan”
omics
measurements
corresponding
transcription
start
sites,
chromatin
modifications
factor
binding
sites.
Crucially,
our
design
assigned
human-mouse
orthologs
at
rate
beyond
previous
studies,
tripling
number
disease-associated
with
orthologs.
The
expanded
enhanced
lncRNA
mark
critical
step
towards
deciphering
genomes.
International Journal of Molecular Sciences,
Год журнала:
2025,
Номер
26(5), С. 2004 - 2004
Опубликована: Фев. 25, 2025
Different
types
of
information
are
combined
during
variation
interpretation.
Computational
predictors,
most
often
pathogenicity
provide
one
type
for
this
purpose.
These
tools
based
on
various
kinds
algorithms.
Although
the
American
College
Genetics
and
Association
Molecular
Pathology
guidelines
classify
variants
into
five
categories,
practically
all
predictors
binary
pathogenic/benign
predictions.
We
developed
a
novel
artificial
intelligence-based
tool,
PON-P3,
basis
carefully
selected
training
dataset,
meticulous
feature
selection,
optimization.
started
with
1526
features
describing
variations,
their
sequence
structural
context,
parameters
affected
genes
proteins.
The
final
random
boosting
method
was
tested
compared
total
23
predictors.
PON-P3
performed
better
than
recently
introduced
which
utilize
large
language
models
or
methods
that
use
evolutionary
data
alone
in
combination
different
gene
protein
properties.
classifies
cases
three
categories
as
benign,
pathogenic,
uncertain
significance
(VUSs).
When
test
were
used,
some
metapredictors
slightly
PON-P3;
however,
real-life
situations,
patient
data,
those
overpredict
both
pathogenic
benign
cases.
predicted
possible
amino
acid
substitutions
human
proteins
encoded
from
MANE
transcripts.
also
used
to
predict
unambiguous
VUSs
(i.e.,
without
conflicts)
ClinVar.
A
12.9%
be
49.9%
benign.
Nucleic Acids Research,
Год журнала:
2025,
Номер
53(6)
Опубликована: Фев. 25, 2025
Abstract
Despite
many
improvements
over
the
years,
annotation
of
human
genome
remains
imperfect.
The
use
evolutionarily
conserved
sequences
provides
a
strategy
for
selecting
high-confidence
subset
annotation.
Using
latest
whole-genome
alignment,
we
found
that
splice
sites
from
protein-coding
genes
in
high-quality
MANE
are
consistently
across
>350
species.
We
also
studied
RefSeq,
GENCODE,
and
CHESS
databases
not
present
MANE.
In
addition,
analyzed
completeness
alignment
with
respect
to
annotations
described
method
would
allow
us
fix
up
60%
missing
alignments
exons.
trained
logistic
regression
classifier
distinguish
between
conservation
exhibited
by
versus
chosen
randomly
neutrally
evolving
sequences.
classified
our
model
as
well-supported
have
lower
single
nucleotide
polymorphism
rates
better
transcriptomic
evidence.
then
computed
transcripts
using
only
“well-supported”
or
ones
This
is
enriched
major
gene
catalogs
appear
be
under
purifying
selection
more
likely
correct
functionally
relevant.
Abstract
With
the
advent
of
complete
genome
assemblies,
annotation
has
become
essential
for
functional
interpretation
genomic
data.
Long-read
RNA
sequencing
(LR-RNAseq)
technologies
have
significantly
improved
transcriptome
by
enabling
full-length
transcript
reconstruction
both
coding
and
non-coding
RNAs.
However,
challenges
such
as
fragmentation
incomplete
isoform
representation
persist,
highlighting
need
robust
quality
control
(QC)
strategies.
This
study
presents
an
updated
version
ANNEXA,
a
pipeline
designed
to
enhance
using
LR-RNAseq
data
while
also
providing
QC
reconstructed
genes
transcripts.
ANNEXA
integrates
two
tools,
StringTie2
Bambu,
applying
stringent
filtering
criteria
improve
accuracy.
It
incorporates
deep
learning
models
evaluate
transcription
start
sites
(TSSs)
employs
tool
FEELnc
systematic
long
RNAs
(lncR-NAs).
Additionally,
offers
intuitive
visualizations
comparative
analyses
repertoires.
Benchmarking
against
multiple
reference
annotations
revealed
distinct
patterns
sensitivity
precision
known
novel
transcripts
mRNAs
lncRNAs.
To
demonstrate
its
utility,
was
applied
in
oncology
involving
human
eight
canine
cancer
cell
lines.
The
successfully
identified
across
species,
expanding
catalog
protein-coding
lncRNA
species.
Implemented
Nextflow
scalability
reproducibility,
is
available
open-source
tool:
https://github.com/IGDRion/ANNEXA
.
Cell Reports Methods,
Год журнала:
2024,
Номер
4(3), С. 100736 - 100736
Опубликована: Март 1, 2024
Differential
transcript
usage
(DTU)
plays
a
crucial
role
in
determining
how
gene
expression
differs
among
cells,
tissues,
and
developmental
stages,
contributing
to
the
complexity
diversity
of
biological
systems.
In
abnormal
it
can
also
lead
deficiencies
protein
function
underpin
disease
pathogenesis.
Analyzing
DTU
via
RNA
sequencing
(RNA-seq)
data
is
vital,
but
genetic
heterogeneity
populations
with
complex
diseases
presents
an
intricate
challenge
due
diverse
causal
events
undetermined
subtypes.
Although
majority
common
humans
are
categorized
as
complex,
state-of-the-art
analysis
methods
often
overlook
this
their
models.
We
therefore
developed
SPIT,
statistical
tool
that
identifies
predominant
subgroups
within
population
along
distinctive
sets
events.
This
study
provides
comprehensive
assessments
SPIT's
methodology
applies
analyze
brain
samples
from
individuals
schizophrenia,
revealing
previously
unreported
six
candidate
genes.