Abstract
Background
Curated
databases
of
genetic
variants
assist
clinicians
and
researchers
in
interpreting
variation.
Yet,
these
contain
some
misclassified
variants.
It
is
unclear
whether
variant
misclassification
abating
as
rapidly
grow
implement
new
guidelines.
Methods
Using
archives
ClinVar
HGMD,
we
investigated
how
has
changed
over
6
years,
across
different
ancestry
groups.
We
considered
inborn
errors
metabolism
(IEMs)
screened
newborns
a
model
system
because
disorders
are
often
highly
penetrant
with
neonatal
phenotypes.
used
samples
from
the
1000
Genomes
Project
(1KGP)
to
identify
individuals
genotypes
that
were
classified
by
pathogenic.
Due
rarity
IEMs,
nearly
all
such
pathogenic
indicate
likely
or
HGMD.
Results
While
false-positive
rates
both
HGMD
have
improved
time,
currently
imply
two
orders
magnitude
more
affected
1KGP
than
observed
African
significantly
increased
chance
being
incorrectly
indicated
be
IEM
when
used.
However,
this
bias
affecting
genomes
was
no
longer
significant
once
common
removed
accordance
recent
classification
discovered
Pathogenic
Likely
reclassified
sixfold
DM
DM?
which
resulted
ClinVar’s
lower
rate.
Conclusions
Considering
since
been
reveals
our
increasing
understanding
rare
found
guidelines
allele
frequency
comprising
genetically
diverse
important
factors
reclassification.
also
European
South
Asian
confidence
category,
perhaps
due
an
multiple
submitters.
discuss
features
for
would
support
their
continued
improvement.
Nature Communications,
Год журнала:
2020,
Номер
11(1)
Опубликована: Ноя. 20, 2020
Abstract
Identifying
pathogenic
variants
and
underlying
functional
alterations
is
challenging.
To
this
end,
we
introduce
MutPred2,
a
tool
that
improves
the
prioritization
of
amino
acid
substitutions
over
existing
methods,
generates
molecular
mechanisms
potentially
causative
disease,
returns
interpretable
pathogenicity
score
distributions
on
individual
genomes.
Whilst
its
performance
state-of-the-art,
distinguishing
feature
MutPred2
probabilistic
modeling
variant
impact
specific
aspects
protein
structure
function
can
serve
to
guide
experimental
studies
phenotype-altering
variants.
We
demonstrate
utility
in
identification
structural
mutational
signatures
relevant
Mendelian
disorders
de
novo
mutations
associated
with
complex
neurodevelopmental
disorders.
then
experimentally
validate
several
identified
patients
such
argue
mechanism-driven
human
inherited
disease
have
potential
significantly
accelerate
discovery
clinically
actionable
Nucleic Acids Research,
Год журнала:
2021,
Номер
49(W1), С. W446 - W451
Опубликована: Апрель 1, 2021
Here
we
present
an
update
to
MutationTaster,
our
DNA
variant
effect
prediction
tool.
The
new
version
uses
a
different
model
and
attains
higher
accuracy
than
its
predecessor,
especially
for
rare
benign
variants.
In
addition,
have
integrated
many
sources
of
data
that
only
became
available
after
the
last
release
(such
as
gnomAD
ExAC
pLI
scores)
changed
splice
site
model.
To
more
easily
assess
relevance
detected
known
disease
mutations
clinical
phenotype
patient,
MutationTaster
now
provides
information
on
diseases
they
cause.
Further
changes
represent
major
overhaul
interfaces
increase
user-friendliness
whilst
under
hood
been
designed
accelerate
processing
uploaded
VCF
files.
We
also
offer
API
rapid
automated
query
smaller
numbers
variants
from
within
other
software.
MutationTaster2021
integrates
mutation
search
engine,
MutationDistiller,
prioritise
files
using
patient's
phenotype.
novel
is
at
https://www.genecascade.org/MutationTaster2021/.
This
website
free
open
all
users
there
no
login
requirement.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2023,
Номер
unknown
Опубликована: Янв. 15, 2023
Closing
the
gap
between
measurable
genetic
information
and
observable
traits
is
a
longstanding
challenge
in
genomics.
Yet,
prediction
of
molecular
phenotypes
from
DNA
sequences
alone
remains
limited
inaccurate,
often
driven
by
scarcity
annotated
data
inability
to
transfer
learnings
tasks.
Here,
we
present
an
extensive
study
foundation
models
pre-trained
on
sequences,
named
Nucleotide
Transformer,
ranging
50M
up
2.5B
parameters
integrating
3,202
diverse
human
genomes,
as
well
850
genomes
selected
across
phyla,
including
both
model
non-model
organisms.
These
transformer
yield
transferable,
context-specific
representations
nucleotide
which
allow
for
accurate
phenotype
even
low-data
settings.
We
show
that
developed
can
be
fine-tuned
at
low
cost
despite
available
regime
solve
variety
genomics
applications.
Despite
no
supervision,
learned
focus
attention
key
genomic
elements,
those
regulate
gene
expression,
such
enhancers.
Lastly,
demonstrate
utilizing
improve
prioritization
functional
variants.
The
training
application
foundational
explored
this
provide
widely
applicable
stepping
stone
bridge
sequence.
Code
weights
at:
https://github.com/instadeepai/nucleotide-transformer
Jax
https://huggingface.co/InstaDeepAI
Pytorch.
Example
notebooks
apply
these
any
downstream
task
are
https://huggingface.co/docs/transformers/notebooks#pytorch-bio.
Multiple
computational
approaches
have
been
developed
to
improve
our
understanding
of
genetic
variants.
However,
their
ability
identify
rare
pathogenic
variants
from
benign
ones
is
still
lacking.
Using
context
annotations
and
deep
learning
methods,
we
present
pathogenicity
prediction
models,
MetaRNN
MetaRNN-indel,
help
prioritize
nonsynonymous
single
nucleotide
(nsSNVs)
non-frameshift
insertion/deletions
(nfINDELs).
We
use
independent
test
sets
demonstrate
that
these
new
models
outperform
state-of-the-art
competitors
achieve
a
more
interpretable
score
distribution.
Importantly,
scores
both
are
comparable,
enabling
easy
adoption
integrated
genotype-phenotype
association
analysis
methods.
All
pre-computed
nsSNV
available
at
http://www.liulab.science/MetaRNN
.
The
stand-alone
program
also
https://github.com/Chang-Li2019/MetaRNN
Personalized
genome
sequencing
has
revealed
millions
of
genetic
differences
between
individuals,
but
our
understanding
their
clinical
relevance
remains
largely
incomplete.
To
systematically
decipher
the
effects
human
variants,
we
obtained
whole-genome
data
for
809
individuals
from
233
primate
species
and
identified
4.3
million
common
protein-altering
variants
with
orthologs
in
humans.
We
show
that
these
can
be
inferred
to
have
nondeleterious
humans
based
on
presence
at
high
allele
frequencies
other
populations.
use
this
resource
classify
6%
all
possible
as
likely
benign
impute
pathogenicity
remaining
94%
deep
learning,
achieving
state-of-the-art
accuracy
diagnosing
pathogenic
patients
diseases.
Science,
Год журнала:
2023,
Номер
380(6648), С. 913 - 924
Опубликована: Июнь 1, 2023
Comparative
analysis
of
primate
genomes
within
a
phylogenetic
context
is
essential
for
understanding
the
evolution
human
genetic
architecture
and
diversity.
We
present
such
study
50
species
spanning
38
genera
14
families,
including
27
first
reported
here,
with
many
from
previously
less
well
represented
groups,
New
World
monkeys
Strepsirrhini.
Our
analyses
reveal
heterogeneous
rates
genomic
rearrangement
gene
across
lineages.
Thousands
genes
under
positive
selection
in
different
lineages
play
roles
nervous,
skeletal,
digestive
systems
may
have
contributed
to
innovations
adaptations.
reveals
that
key
occurred
Simiiformes
ancestral
node
had
an
impact
on
adaptive
radiation
evolution.
Nature Biotechnology,
Год журнала:
2022,
Номер
40(7), С. 1035 - 1041
Опубликована: Март 28, 2022
Abstract
Whole-genome
sequencing
(WGS)
can
identify
variants
that
cause
genetic
disease,
but
the
time
required
for
and
analysis
has
been
a
barrier
to
its
use
in
acutely
ill
patients.
In
present
study,
we
develop
an
approach
ultra-rapid
nanopore
WGS
combines
optimized
sample
preparation
protocol,
distributing
over
48
flow
cells,
near
real-time
base
calling
alignment,
accelerated
variant
fast
filtration
efficient
manual
review.
Application
two
example
clinical
cases
identified
candidate
<8
h
from
identification.
We
show
this
framework
provides
accurate
calls
prioritization,
accelerates
diagnostic
genome
twofold
compared
with
previous
approaches.
Nature Structural & Molecular Biology,
Год журнала:
2023,
Номер
30(5), С. 584 - 593
Опубликована: Янв. 2, 2023
Anterograde
intraflagellar
transport
(IFT)
trains
are
essential
for
cilia
assembly
and
maintenance.
These
formed
of
22
IFT-A
IFT-B
proteins
that
link
structural
signaling
cargos
to
microtubule
motors
import
into
cilia.
It
remains
unknown
how
the
IFT-A/-B
arranged
complexes
these
polymerize
functional
trains.
Here
we
use
in
situ
cryo-electron
tomography
Chlamydomonas
reinhardtii
AlphaFold2
protein
structure
predictions
generate
a
molecular
model
entire
anterograde
train.
We
show
conformations
both
dependent
on
lateral
interactions
with
neighboring
repeats,
suggesting
polymerization
is
required
cooperatively
stabilize
complexes.
Following
three-dimensional
classification,
reveal
extends
two
flexible
tethers
maintain
connection
can
withstand
mechanical
stresses
present
actively
beating
Overall,
our
findings
provide
framework
understanding
fundamental
processes
govern
assembly.
Abstract
Background
The
Critical
Assessment
of
Genome
Interpretation
(CAGI)
aims
to
advance
the
state-of-the-art
for
computational
prediction
genetic
variant
impact,
particularly
where
relevant
disease.
five
complete
editions
CAGI
community
experiment
comprised
50
challenges,
in
which
participants
made
blind
predictions
phenotypes
from
data,
and
these
were
evaluated
by
independent
assessors.
Results
Performance
was
strong
clinical
pathogenic
variants,
including
some
difficult-to-diagnose
cases,
extends
interpretation
cancer-related
variants.
Missense
methods
able
estimate
biochemical
effects
with
increasing
accuracy.
regulatory
variants
complex
trait
disease
risk
less
definitive
indicates
performance
potentially
suitable
auxiliary
use
clinic.
Conclusions
show
that
while
current
are
imperfect,
they
have
major
utility
research
applications.
Emerging
increasingly
large,
robust
datasets
training
assessment
promise
further
progress
ahead.
The
prediction
of
molecular
phenotypes
from
DNA
sequences
remains
a
longstanding
challenge
in
genomics,
often
driven
by
limited
annotated
data
and
the
inability
to
transfer
learnings
between
tasks.
Here,
we
present
an
extensive
study
foundation
models
pre-trained
on
sequences,
named
Nucleotide
Transformer,
ranging
50
million
up
2.5
billion
parameters
integrating
information
3,202
human
genomes
850
diverse
species.
These
transformer
yield
context-specific
representations
nucleotide
which
allow
for
accurate
predictions
even
low-data
settings.
We
show
that
developed
can
be
fine-tuned
at
low
cost
solve
variety
genomics
applications.
Despite
no
supervision,
learned
focus
attention
key
genomic
elements
used
improve
prioritization
genetic
variants.
training
application
foundational
provides
widely
applicable
approach
phenotype
sequence.
Transformer
is
series
different
parameter
sizes
datasets
applied
various
downstream
tasks
fine-tuning.