bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2023,
Номер
unknown
Опубликована: Июль 10, 2023
Differential
transcript
usage
(DTU)
plays
a
crucial
role
in
determining
how
gene
expression
differs
among
cells,
tissues,
and
different
developmental
stages,
thereby
contributing
to
the
complexity
diversity
of
biological
systems.
In
abnormal
it
can
also
lead
deficiencies
protein
function,
potentially
leading
pathogenesis
diseases.
Detecting
such
events
for
single-gene
genetic
traits
is
relatively
uncomplicated;
however,
heterogeneity
populations
with
complex
diseases
presents
an
intricate
challenge
due
presence
diverse
causal
undetermined
subtypes.
SPIT
first
statistical
tool
that
quantifies
within
population
identifies
predominant
subgroups
along
their
distinctive
sets
DTU
events.
We
provide
comprehensive
assessments
SPIT's
methodology
both
report
results
applying
analyze
brain
samples
from
individuals
schizophrenia.
Our
analysis
reveals
previously
unreported
six
candidate
genes.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2023,
Номер
unknown
Опубликована: Июль 29, 2023
The
process
of
splicing
messenger
RNA
to
remove
introns
plays
a
central
role
in
creating
genes
and
gene
variants.
Here
we
describe
Splam,
novel
method
for
predicting
splice
junctions
DNA
based
on
deep
residual
convolutional
neural
networks.
Unlike
some
previous
models,
Splam
looks
at
relatively
limited
window
400
base
pairs
flanking
each
site,
motivated
by
the
observation
that
biological
relies
primarily
signals
within
this
window.
Additionally,
introduces
idea
training
network
donor
acceptor
together,
principle
machinery
recognizes
both
ends
intron
once.
We
compare
Splam's
accuracy
recent
state-of-the-art
site
prediction
methods,
particularly
SpliceAI,
another
uses
Our
results
show
is
consistently
more
accurate
than
with
an
overall
96%
human
junctions.
generalizes
even
non-human
species,
including
distant
ones
like
flowering
plant
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Март 18, 2024
ABSTRACT
Long-read
RNA
sequencing
has
shed
light
on
transcriptomic
complexity,
but
questions
remain
about
the
functionality
of
downstream
protein
products.
We
introduce
Biosurfer,
a
computational
approach
for
comparing
isoforms,
while
systematically
tracking
transcriptional,
splicing,
and
translational
variations
that
underlie
differences
in
sequences
Using
we
analyzed
32,799
pairs
GENCODE
annotated
finding
majority
(70%)
variable
N-termini
are
due
to
alternative
transcription
start
sites,
only
9%
arise
from
5’
UTR
splicing.
Biosurfer’s
detailed
nucleotide-to-residue
relationships
helped
reveal
an
uncommonly
tracked
source
single
amino
acid
residue
changes
arising
codon
splits
at
junctions.
For
17%
internal
sequence
changes,
such
split
patterns
lead
differences,
termed
“ragged
codons”.
Of
C-termini,
72%
involve
splice-
or
intron
retention-induced
reading
frameshifts.
found
unusual
pattern
frame
which
first
frameshift
is
closely
followed
by
distinct
second
restores
original
frame,
term
“snapback”
frameshift.
long
read
RNA-seq-predicted
proteome
human
cell
line
similar
trends
as
compared
our
analysis,
with
exception
higher
proportion
isoforms
predicted
undergo
nonsense-mediated
decay.
comprehensive
characterization
long-read
RNA-seq
datasets
should
accelerate
insights
functional
role
providing
mechanistic
explanation
origins
proteomic
diversity
driven
Biosurfer
available
Python
package
https://github.com/sheynkman-lab/biosurfer
.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Апрель 16, 2024
Recently
developed
long–read
RNA
sequencing
technologies
promise
to
provide
a
more
accurate
and
comprehensive
view
of
transcriptomes
compared
short-read
sequencers,
primarily
due
their
capability
achieve
full–length
transcripts.
However,
realizing
this
potential
requires
computational
tools
tailored
process
long
reads,
which
exhibit
higher
error
rate
than
short
reads.
Existing
methods
for
assembling
quantifying
data
often
disagree
on
expressed
transcripts
abundance
levels,
leading
researchers
lack
confidence
in
the
produced
using
data.
One
approach
address
uncertainties
transcriptome
assembly
quantification
is
by
assigning
reads
transcripts,
enabling
detailed
characterization
transcript
support
at
read
level.
Here,
we
introduce
TranSigner,
versatile
tool
that
assigns
any
input
transcriptome.
TranSigner
consists
three
consecutive
modules
performing:
alignment
given
computation
compatibility
scores
based
positions,
execution
an
expectation–maximization
algorithm
probabilistically
assign
fractions
while
estimating
abundances.
Using
simulated
experimental
datasets
from
well
studied
organisms
—
Homo
Sapiens,
Arabidopsis
thaliana
Mus
musculus
demonstrate
achieves
accuracy
estimation
assignment
existing
tools.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Май 17, 2024
Abstract
As
the
number
and
variety
of
assembled
genomes
continues
to
grow,
annotated
is
falling
behind,
particularly
for
eukaryotes.
DNA-based
mapping
tools
help
address
this
challenge,
but
they
are
only
able
transfer
annotation
between
closely-related
species.
Here
we
introduce
LiftOn,
a
homology-based
software
tool
that
integrates
DNA
protein
alignments
enhance
accuracy
genome-scale
allow
relatively
distant
LiftOn’s
protein-centric
algorithm
considers
both
types
alignments,
chooses
optimal
open
reading
frames,
resolves
overlapping
gene
loci,
finds
additional
copies
where
exist.
LiftOn
can
reliably
representing
members
same
species,
as
demonstrate
on
human,
mouse,
honey
bee,
rice,
Arabidopsis
thaliana
.
It
further
map
effectively
across
species
pairs
far
apart
mouse
rat
or
Drosophila
melanogaster
D.
erecta
International Journal of Molecular Sciences,
Год журнала:
2024,
Номер
25(19), С. 10309 - 10309
Опубликована: Сен. 25, 2024
Genome-wide
association
studies
have
identified
a
locus
on
chromosome
10q22,
where
many
co-inherited
single
nucleotide
polymorphisms
(SNPs)
are
associated
with
atrial
fibrillation
(AF).
This
study
seeks
to
identify
the
impact
of
this
gene
expression
at
transcript
isoform
level
in
human
left
atria
and
gain
insight
into
potential
causal
variants.
Bulk
RNA
sequencing
was
analyzed
myozenin
1
(MYOZ1)
synaptopodin
2-like
(SYNPO2L)
isoforms
common
SNPs
region
levels.
Chromatin
marks
were
used
suggest
candidate
regulatory
region.
Protein
amino
acid
changes
examined
for
predicted
functional
consequences.
Transfection
MYOZ1
two
SYNPO2L
performed
localize
their
encoded
proteins
cardiomyocytes
derived
from
stem
cells.
We
one
four
isoforms,
which
encode
proteins,
while
other
long
noncoding
RNAs
(lncRNAs).
The
risk
allele
strongest
AF
susceptibility
SNP
10q22
is
decreased
increased
SNYPO2L
lncRNA
isoforms.
There
top
AF-associated
due
linkage
disequilibrium
(LD),
including
rs11000728,
we
propose
as
SNP,
confirmed
by
reporter
transfection.
In
addition,
LD
block
includes
three
missense
gene,
minor
protective
haplotype
be
detrimental
protein
function.
both
localized
sarcomere.
complex
several
alter
opposing
effects
expression,
along
PLoS Computational Biology,
Год журнала:
2024,
Номер
20(11), С. e1012543 - e1012543
Опубликована: Ноя. 20, 2024
Several
recent
studies
have
presented
evidence
that
the
human
gene
catalogue
should
be
expanded
to
include
thousands
of
short
open
reading
frames
(ORFs)
appearing
upstream
or
downstream
existing
protein-coding
genes,
each
which
might
create
an
additional
bicistronic
transcript
in
humans.
Here
we
explore
alternative
hypothesis
would
explain
translational
and
evolutionary
for
these
ORFs
without
need
novel
genes
transcripts.
We
examined
2,199
been
proposed
as
high-quality
candidates
determine
if
they
could
instead
represent
exons
can
added
genes.
checked
conservation
four
recently
sequenced,
genomes,
found
a
large
majority
(87.8%)
conserved
all
expected.
then
looked
splicing
connect
ORF
at
same
locus,
thus
creating
variant
using
its
first
exon.
These
protein
coding
exon
were
further
evaluated
structure
predictions
sequences
included
new
exons.
determined
541
out
strong
form
are
part
gene,
resulting
is
predicted
similar
better
structural
quality
than
currently
annotated
isoform.
NAR Genomics and Bioinformatics,
Год журнала:
2024,
Номер
6(4)
Опубликована: Сен. 28, 2024
The
ACMG/AMP
guidelines
include
five
categories
of
which
variants
uncertain
significance
(VUSs)
have
received
increasing
attention.
Recently,
Fowler
and
Rehm
claimed
that
all
or
most
VUSs
could
be
reclassified
as
pathogenic
benign
within
few
years.
To
test
this
claim,
we
collected
validated
benign,
pathogenic,
VUS
conflicting
from
ClinVar
LOVD
investigated
differences
at
gene,
protein,
structure,
variant
levels.
gene
protein
features
included
inheritance
patterns,
actionability,
functional
for
housekeeping,
essential,
complete
knockout,
lethality
haploinsufficient
proteins,
Gene
Ontology
annotations,
network
properties.
Structural
properties
the
location
secondary
structural
elements,
intrinsically
disordered
regions,
transmembrane
repeats,
conservation,
accessibility.
were
distributions
nucleotides,
their
groupings,
codons,
to
CpG
islands.
amino
acids
groups
investigated.
did
not
markedly
differ
other
variants.
only
major
accessibility
conservation
variants,
reduced
ratio
repeat-locating
in
VUSs.
Thus,
cannot
distinguished
types
They
display
one
form
natural
biological
heterogeneity.
Instead
concentrating
on
eradicating
VUSs,
community
would
benefit
investigating
understanding
factors
contribute
phenotypic
NAR Genomics and Bioinformatics,
Год журнала:
2024,
Номер
6(4)
Опубликована: Сен. 28, 2024
Eukaryotic
cells
express
a
large
number
of
transcripts
from
single
gene
due
to
alternative
splicing.
Despite
hundreds
thousands
splice
isoforms
being
annotated
in
databases,
it
has
been
reported
that
the
current
exon
catalogs
remain
incomplete.
At
same
time,
introns
human
protein-coding
(PC)
genes
contain
evolutionarily
conserved
elements
with
unknown
function.
Here,
we
explore
possibility
some
them
represent
cryptic
exons
are
expressed
rare
conditions.
We
identified
group
similar
terms
evolutionary
conservation
and
RNA-seq
read
coverage
Genotype-Tissue
Expression
dataset.
Most
were
poison,
i.e.
generated
an
nonsense-mediated
decay
(NMD)
isoform
upon
inclusion,
many
showed
signs
tissue-specific
cancer-specific
expression
regulation.
performed
A549
cell
line
treated
cycloheximide
inactivate
NMD
confirmed
using
quantitative
polymerase
chain
reaction
seven
eight
tested
are,
indeed,
expressed.
This
study
shows
PC
poison
exons,
which
reside
intronic
regions
not
fully
insufficient
representation
libraries.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2023,
Номер
unknown
Опубликована: Март 25, 2023
ORFanage
is
a
system
designed
to
assign
open
reading
frames
(ORFs)
both
known
and
novel
gene
transcripts
while
maximizing
similarity
annotated
proteins.
The
primary
intended
use
of
the
identification
ORFs
in
assembled
results
RNA
sequencing
(RNA-seq)
experiments,
capability
that
most
transcriptome
assembly
methods
do
not
have.
Our
experiments
demonstrate
how
can
be
used
find
protein
variants
RNA-seq
datasets,
improve
annotations
tens
thousands
transcript
models
RefSeq
GENCODE
human
annotation
databases.
Through
its
implementation
highly
accurate
efficient
pseudo-alignment
algorithm,
substantially
faster
than
other
ORF
methods,
enabling
application
very
large
datasets.
When
analyze
assemblies,
aid
separation
signal
from
transcriptional
noise
likely
functional
variants,
ultimately
advancing
our
understanding
biology
medicine.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2023,
Номер
unknown
Опубликована: Дек. 2, 2023
A
bstract
Despite
many
improvements
over
the
years,
annotation
of
human
genome
remains
imperfect,
and
different
annotations
reference
sometimes
contradict
one
another.
The
use
evolutionarily
conserved
sequences
provides
a
strategy
for
selecting
high-confidence
subset
that
is
more
likely
to
be
related
biological
functions,
rapidly
growing
number
genomes
from
other
species
increases
its
power.
Using
latest
whole
alignment,
we
found
splice
sites
protein-coding
genes
in
high-quality
MANE
are
consistently
across
than
400
species.
We
also
studied
RefSeq,
GENCODE,
CHESS
databases
not
present
MANE.
trained
logistic
regression
classifier
distinguish
between
conservation
exhibited
by
versus
chosen
randomly
neutrally
evolving
sequence.
classified
our
model
as
have
lower
SNP
rates
better
transcriptomic
support.
then
computed
transcripts
only
using
either
“conserved”
or
ones
This
enriched
major
gene
catalogs
appear
under
purifying
selection
correct
functionally
relevant.