PLoS Genetics,
Год журнала:
2025,
Номер
21(1), С. e1011540 - e1011540
Опубликована: Янв. 6, 2025
Innovative
and
easy-to-implement
strategies
are
needed
to
improve
the
pathogenicity
assessment
of
rare
germline
missense
variants.
Somatic
cancer
driver
mutations
identified
through
large-scale
tumor
sequencing
studies
often
impact
genes
that
also
associated
with
Mendelian
disorders.
The
use
mutation
data
aid
in
interpretation
variants,
regardless
whether
gene
is
a
hereditary
predisposition
syndrome
or
non-cancer-related
developmental
disorder,
has
not
been
systematically
assessed.
We
extracted
putative
from
Cancer
Hotspots
database
annotated
them
as
including
presence/absence
classification
ClinVar.
trained
two
supervised
learning
models
(logistic
regression
random
forest)
predict
variant
classifications
variants
ClinVar
using
Hotspot
(training
dataset).
performance
each
model
was
evaluated
an
independent
test
dataset
generated
part
searching
public
private
genome-wide
datasets
~1.5
million
individuals.
Of
2,447
mutations,
691
corresponding
had
previously
classified
ClinVar:
426
(61.6%)
likely
pathogenic/pathogenic,
261
(37.8%)
uncertain
significance,
4
(0.6%)
benign/benign.
odds
ratio
for
pathogenic/pathogenic
28.3
(95%
confidence
interval:
24.2–33.1,
p
<
0.001),
compared
all
other
same
216
genes.
Both
showed
high
correlation
assessments
training
dataset.
There
area
under
precision-recall
curve
values
(0.847
0.829)
receiver-operating
characteristic
(0.821
0.774)
logistic
forest
models,
respectively,
when
applied
With
techniques,
our
study
shows
can
be
leveraged
variation
potentially
causing
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2023,
Номер
unknown
Опубликована: Дек. 8, 2023
Predicting
the
effects
of
mutations
in
proteins
is
critical
to
many
applications,
from
understanding
genetic
disease
designing
novel
that
can
address
our
most
pressing
challenges
climate,
agriculture
and
healthcare.
Despite
a
surge
machine
learning-based
protein
models
tackle
these
questions,
an
assessment
their
respective
benefits
challenging
due
use
distinct,
often
contrived,
experimental
datasets,
variable
performance
across
different
families.
Addressing
requires
scale.
To
end
we
introduce
ProteinGym,
large-scale
holistic
set
benchmarks
specifically
designed
for
fitness
prediction
design.
It
encompasses
both
broad
collection
over
250
standardized
deep
mutational
scanning
assays,
spanning
millions
mutated
sequences,
as
well
curated
clinical
datasets
providing
high-quality
expert
annotations
about
mutation
effects.
We
devise
robust
evaluation
framework
combines
metrics
design,
factors
known
limitations
underlying
methods,
covers
zero-shot
supervised
settings.
report
diverse
70
high-performing
various
subfields
(eg.,
alignment-based,
inverse
folding)
into
unified
benchmark
suite.
open
source
corresponding
codebase,
MSAs,
structures,
model
predictions
develop
user-friendly
website
facilitates
data
access
analysis.
JAMA Network Open,
Год журнала:
2023,
Номер
6(10), С. e2339571 - e2339571
Опубликована: Окт. 25, 2023
Variants
of
uncertain
significance
(VUSs)
are
rampant
in
clinical
genetic
testing,
frustrating
clinicians,
patients,
and
laboratories
because
the
uncertainty
hinders
diagnoses
management.
A
comprehensive
assessment
VUSs
across
many
disease
genes
is
needed
to
guide
efforts
reduce
uncertainty.To
describe
sources,
gene
distribution,
population-level
attributes
evaluate
impact
different
types
evidence
used
reclassify
them.This
cohort
study
germline
DNA
variant
data
from
individuals
referred
by
clinicians
for
diagnostic
testing
hereditary
disorders.
Participants
included
whom
panel
was
conducted
between
September
9,
2014,
7,
2022.
Data
were
analyzed
1,
2022,
April
2023.The
outcomes
interest
VUS
rates
(stratified
age;
clinician-reported
race,
ethnicity,
ancestry
groups;
panels;
attributes),
percentage
reclassified
as
benign
or
likely
vs
pathogenic
pathogenic,
enrichment
reclassifying
VUSs.The
1
689
845
ranging
age
0
89
years
at
time
(median
age,
50
years),
with
203
210
(71.2%)
female
individuals.
There
39
150
Ashkenazi
Jewish
(2.3%),
64
730
Asian
(3.8%),
126
739
Black
(7.5%),
5539
French
Canadian
(0.3%),
169
714
Hispanic
(10.0%),
5058
Native
American
2696
Pacific
Islander
(0.2%),
4842
Sephardic
974
383
White
(57.7%).
Among
all
tested,
692
227
(41.0%)
had
least
535
385
(31.7%)
only
results.
The
number
per
individual
increased
more
most
missense
changes
(86.6%).
More
observed
sequenced
who
not
a
European
population,
middle-aged
older
adults,
underwent
disorders
incomplete
penetrance.
Of
37
699
unique
that
reclassified,
30
239
(80.2%)
ultimately
categorized
benign.
mean
(SD)
30.7
(20.0)
months
elapsed
be
benign,
22.4
(18.9)
pathogenic.
Clinical
contributed
reclassification.This
approximately
1.6
million
highlighted
need
better
methods
interpreting
variants,
availability
experimental
classification,
diverse
representation
groups
genomic
databases.
this
could
provide
sound
basis
understanding
sources
resolution
navigating
appropriate
next
steps
patient
care.
Abstract
Background
The
Critical
Assessment
of
Genome
Interpretation
(CAGI)
aims
to
advance
the
state-of-the-art
for
computational
prediction
genetic
variant
impact,
particularly
where
relevant
disease.
five
complete
editions
CAGI
community
experiment
comprised
50
challenges,
in
which
participants
made
blind
predictions
phenotypes
from
data,
and
these
were
evaluated
by
independent
assessors.
Results
Performance
was
strong
clinical
pathogenic
variants,
including
some
difficult-to-diagnose
cases,
extends
interpretation
cancer-related
variants.
Missense
methods
able
estimate
biochemical
effects
with
increasing
accuracy.
regulatory
variants
complex
trait
disease
risk
less
definitive
indicates
performance
potentially
suitable
auxiliary
use
clinic.
Conclusions
show
that
while
current
are
imperfect,
they
have
major
utility
research
applications.
Emerging
increasingly
large,
robust
datasets
training
assessment
promise
further
progress
ahead.
Germline
BRCA2
loss-of
function
variants,
which
can
be
identified
through
clinical
genetic
testing,
predispose
to
several
cancers1–5.
However,
variants
of
uncertain
significance
limit
the
utility
test
results.
Thus,
there
is
a
need
for
functional
characterization
and
classification
all
facilitate
management
individuals
with
these
variants.
Here
we
analysed
possible
single-nucleotide
from
exons
15
26
that
encode
DNA-binding
domain
hotspot
pathogenic
missense
To
enable
this,
used
saturation
genome
editing
CRISPR–Cas9-based
knock-in
endogenous
targeting
human
haploid
HAP1
cells6.
The
assay
was
calibrated
relative
nonsense
silent
validated
using
benign
standards
ClinVar
results
homology-directed
repair
assay7.
Variants
(6,959
out
6,960
evaluated)
were
assigned
seven
categories
pathogenicity
based
on
VarCall
Bayesian
model8.
Single-nucleotide
loss-of-function
associated
increased
risks
breast
cancer
ovarian
cancer.
integrated
into
models
ClinGen,
American
College
Medical
Genetics
Genomics,
Association
Molecular
Pathology9
Using
this
approach,
91%
classified
as
or
likely
benign.
These
improve
variant.
Results
comprehensive
evaluation
particularly
significance,
provide
useful
resource
who
carry
such
Nature Communications,
Год журнала:
2023,
Номер
14(1)
Опубликована: Дек. 6, 2023
Loss-of-function
of
DDX3X
is
a
leading
cause
neurodevelopmental
disorders
(NDD)
in
females.
also
somatically
mutated
cancer
driver
gene
proposed
to
have
tumour
promoting
and
suppressing
effects.
We
perform
saturation
genome
editing
DDX3X,
testing
vitro
the
functional
impact
12,776
nucleotide
variants.
identify
3432
functionally
abnormal
variants,
three
distinct
classes.
train
machine
learning
classifier
variants
NDD-relevance.
This
has
at
least
97%
sensitivity
99%
specificity
detect
pathogenic
for
NDD,
substantially
out-performing
silico
predictors,
resolving
up
93%
uncertain
significance.
Moreover,
functionally-abnormal
can
account
almost
all
excess
nonsynonymous
somatic
mutations
seen
DDX3X-driven
cancers.
Systematic
maps
variant
effects
generated
experimentally
tractable
cell
types
potential
transform
clinical
interpretation
both
germline
disease-associated
variation.