medRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Nov. 4, 2024
ABSTRACT
Background
As
healthcare
moves
from
a
one-size-fits-all
approach
towards
precision
care,
individual
risk
prediction
is
an
important
step
in
disease
prevention
and
early
detection.
Biobank-linked
systems
can
generate
knowledge
about
genomic
test
the
impact
of
implementing
that
care.
Risk-stratified
prostate
cancer
screening
one
clinical
application
might
benefit
such
approach.
Methods
We
developed
translation
pipeline
for
genomics-informed
national
system.
used
data
585,418
male
participants
Veterans
Affairs
(VA)
Million
Veteran
Program
(MVP),
among
whom
101,920
self-identify
as
Black/African-American,
to
develop
validate
Prostate
CAncer
integrated
Risk
Evaluation
(P-CARE)
model,
model
based
on
polygenic
score,
family
history,
genetic
principal
components.
The
was
externally
validated
18,457
PRACTICAL
Consortium
participants.
A
novel
blended
genome-exome
(BGE)
platform
laboratory
assay
both
P-CARE
rare
variants
cancer-associated
genes,
including
additional
validation
74,331
samples
All
Us
Research
Program.
Results
In
overall
ancestry-stratified
analyses,
score
601
associated
with
any,
metastatic,
fatal
MVP
PRACTICAL.
Values
at
≥80th
percentile
multiancestry
cohort
were
hazard
ratios
(HR)
2.75
(95%
CI
2.66-2.84),
2.78
2.54-2.99),
2.59
2.22-2.97)
MVP,
respectively,
compared
median.
When
high–
low-risk
groups
defined
HR>1.5
HR<0.75
metastatic
cancer,
220,062
(37.6%)
high-risk
vs.146,826
(25.1%)
had
47.9%
vs.
14.1%,
9.3%
2.0%,
3.6%
0.8%
cumulative
cause-specific
incidence
by
age
90,
respectively.
reports
are
now
being
implemented
trial
VA
system
(Clinicaltrials.gov
NCT05926102
).
Conclusions
consisting
components
describes
clinically
gradient
diverse
patient
population
demonstrates
potential
learning
health
implement
evaluate
care
approaches.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2023,
Volume and Issue:
unknown
Published: Nov. 16, 2023
Family-based
genome-wide
association
studies
(GWAS)
have
emerged
as
a
gold
standard
for
assessing
causal
effects
of
alleles
and
polygenic
scores.
Notably,
family
are
often
claimed
to
provide
an
unbiased
estimate
the
average
effect
(or
treatment
effect;
ATE)
allele,
on
basis
analogy
between
random
transmission
from
parents
children
randomized
controlled
trial.
Here,
we
show
that
this
interpretation
does
not
hold
in
general.
Because
Mendelian
segregation
only
randomizes
among
heterozygotes,
homozygotes
observable.
Consequently,
if
allele
has
different
can
arise
presence
gene-by-environment
interactions,
gene-by-gene
or
differences
LD
patterns,
biased
sample.
At
single
locus,
family-based
be
thought
providing
heterozygotes
(i.e.,
local
LATE).
This
extend
scores,
however,
because
sets
SNPs
heterozygous
each
family.
Therefore,
other
than
under
specific
conditions,
within-family
regression
slope
PGS
cannot
assumed
any
subset
weighted
families.
Instead,
reinterpreted
enabling
extent
which
at
loci
contributes
population-level
variance
trait.
include
between-family
variance,
applies
(roughly)
half
sample
variance.
In
practice,
potential
biases
GWAS
likely
smaller
those
arising
confounding
standard,
population-based
GWAS,
so
remain
important
dissection
genetic
contributions
phenotypic
variation.
Nonetheless,
estimates
is
less
straightforward
been
widely
appreciated.
The Annals of Statistics,
Journal Year:
2024,
Volume and Issue:
52(3)
Published: June 1, 2024
Genetic
prediction
holds
immense
promise
for
translating
genetic
discoveries
into
medical
advances.
As
the
high-dimensional
covariance
matrix
(or
linkage
disequilibrium
(LD)
pattern)
of
variants
often
presents
a
block-diagonal
structure,
numerous
methods
account
dependence
among
in
predetermined
local
LD
blocks.
Moreover,
due
to
privacy
considerations
and
data
protection
concerns,
variant
each
block
is
typically
estimated
from
external
reference
panels
rather
than
original
training
set.
This
paper
unified
analysis
blockwise
panel-based
estimators
framework
without
sparsity
restrictions.
We
find
that,
surprisingly,
even
when
has
structure
with
well-defined
boundaries,
estimation
adjusting
can
be
substantially
less
accurate
controlling
whole
matrix.
Further,
built
on
set
are
likely
have
varying
performance
high
dimensions,
which
may
reflect
cost
having
only
access
summary
level
based
novel
results
random
theory
numerically
evaluate
our
using
extensive
simulations
real
UK
Biobank.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: May 17, 2024
Collaborative
efforts,
such
as
the
Human
Cell
Atlas,
are
rapidly
accumulating
large
amounts
of
single-cell
data.
To
ensure
that
atlases
representative
human
genetic
diversity,
we
need
to
determine
ancestry
donors
from
whom
data
generated.
Self-reporting
race
and
ethnicity,
although
important,
can
be
biased
is
not
always
available
for
datasets
already
collected.
Here,
introduce
scAI-SNP,
a
tool
infer
directly
genomics
train
identified
4.5
million
ancestry-informative
single-nucleotide
polymorphisms
(SNPs)
in
1000
Genomes
Project
dataset
across
3201
individuals
26
population
groups.
For
query
set,
scAI-SNP
uses
these
SNPs
compute
contribution
each
groups
donor
cells
were
obtained.
Using
diverse
sets
with
matched
whole-genome
sequencing
data,
show
robust
sparsity
accurately
consistently
samples
derived
types
tissues
cancer
cells,
applied
different
modalities
profiling
assays,
RNA-seq
ATAC-seq.
Finally,
argue
ensuring
represent
ancestry,
ideally
alongside
ultimately
important
improved
equitable
health
outcomes
by
accounting
diversity.
medRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: July 26, 2024
Abstract
Importance
Polygenic
risk
scores
(PRSs)
for
coronary
artery
disease
(CAD)
are
a
growing
clinical
and
commercial
reality.
Whether
existing
provide
similar
individual-level
assessments
of
liability
is
critical
consideration
implementation
that
remains
uncharacterized.
Objective
Characterize
the
reliability
CAD
PRSs
perform
equivalently
at
population
level
predicting
risk.
Design
Cross-sectional
Study.
Setting
All
Us
Research
Program
(AOU),
Penn
Medicine
Biobank
(PMBB),
UCLA
ATLAS
Precision
Health
Biobank.
Participants
Volunteers
diverse
genetic
backgrounds
enrolled
in
AOU,
PMBB,
with
available
electronic
health
record
genotyping
data.
Exposures
from
previously
published
new
developed
separately
testing
cohorts.
Main
Outcomes
Measures
Sets
prediction
were
identified
by
comparing
calibration
discrimination
(Brier
score
AUROC)
generalized
linear
models
prevalent
using
Bayesian
analysis
variance.
Among
performing
scores,
agreement
between
estimates
was
tested
intraclass
correlation
(ICC)
Light’s
Kappa,
measures
inter-rater
reliability.
Results
50
calculated
171,095
AOU
participants.
When
included
model
CAD,
48
had
practically
equivalent
Brier
AUROCs
(region
practical
equivalence
=
0.02).
Across
these
84%
participants
least
one
both
top
bottom
quintile.
Continuous
individual
predictions
poor,
an
ICC
0.351
(95%
CI;
0.349,
0.352).
Agreement
two
statistically
moderate,
0.649
0.646,
0.652).
used
to
evaluate
consistency
assignment
high-risk
thresholds,
did
not
exceed
0.56
(interpreted
as
‘fair’)
across
scores.
Repeating
among
41,193
PMBB
50,748
yielded
different
sets
which
also
lacked
strong
agreement.
Conclusions
Relevance
three
biobanks,
performed
produced
unreliable
estimates.
Approaches
must
consider
potential
discordant
otherwise
indistinguishable
medRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: July 30, 2024
The
transferability
of
polygenic
scores
across
population
groups
is
a
major
concern
with
respect
to
the
equitable
clinical
implementation
genomic
medicine.
Since
genetic
associations
are
identified
relative
mean,
inevitably
differences
in
disease
or
trait
prevalence
among
social
strata
influence
relationship
between
PGS
and
risk.
Here
we
quantify
magnitude
PGS-by-Exposure
(PGSxE)
interactions
for
seven
human
diseases
(coronary
artery
disease,
type
2
diabetes,
obesity
thresholded
body
mass
index
waist-to-hip
ratio,
inflammatory
bowel
chronic
kidney
asthma)
pairs
75
exposures
White-British
subset
UK
Biobank
study
(n=408,801).
Across
24,198
PGSxE
models,
746
(3.1%)
were
significant
by
two
criteria,
at
least
three-fold
more
than
expected
chance
under
each
criterion.
Predictive
accuracy
significantly
improved
high-risk
including
interaction
terms
effects
as
large
those
documented
low
ancestries.
predominant
mechanism
PGS×E
shown
be
amplification
presence
adverse
such
polyunsaturated
fatty
acids,
mediators
obesity,
determinants
ill
health.
We
introduce
notion
proportion
needed
benefit
(PNB)
which
cumulative
number
treat
range
show
that
typically
this
halved
70
medRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: July 14, 2024
Abstract
Polygenic
risk
scores
(PRSs)
are
promising
tools
for
advancing
precision
medicine.
However,
existing
PRS
construction
methods
rely
on
static
summary
statistics
derived
from
genome-wide
association
studies
(GWASs),
which
often
updated
at
lengthy
intervals.
As
genetic
data
and
health
outcomes
continuously
being
generated
an
ever-increasing
pace,
the
current
training
deployment
paradigm
is
suboptimal
in
maximizing
prediction
accuracy
of
PRSs
incoming
patients
healthcare
settings.
Here,
we
introduce
real-time
PRS-CS
(rtPRS-CS),
enables
online,
dynamic
refinement
calibration
as
each
new
sample
collected,
without
need
to
perform
intermediate
GWASs.
Through
extensive
simulation
studies,
evaluate
performance
rtPRS-CS
across
various
architectures
sizes.
Leveraging
quantitative
traits
Mass
General
Brigham
Biobank
UK
Biobank,
show
that
can
integrate
massive
streaming
enhance
over
time.
We
further
apply
22
schizophrenia
cohorts
7
Asian
regions,
demonstrating
clinical
utility
dynamically
predicting
stratifying
disease
diverse
ancestries.
medRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Aug. 2, 2024
Background:
Genetic
factors
play
an
important
role
in
prostate
cancer
(PCa)
development
with
polygenic
risk
scores
(PRS)
predicting
disease
across
genetic
ancestries.
However,
there
are
few
convincing
modifiable
for
PCa
and
little
is
known
about
their
potential
interaction
risk.
We
analyzed
incident
cases
(n=6,155)
controls
(n=98,257)
of
European
African
ancestry
from
the
UK
Biobank
(UKB)
cohort
to
evaluate
neighborhood
socioeconomic
status
(nSES)-and
how
it
may
interact
PRS-on
Methods:
evaluated
a
multi-ancestry
PRS
containing
269
variants
understand
association
germline
genetics
UKB.
Using
English
Indices
Deprivation,
set
validated
metrics
that
quantify
lack
resources
within
geographical
areas,
we
performed
logistic
regression
investigate
main
effects
interactions
between
nSES
deprivation,
PRS,
PCa.
Results:
The
was
strongly
associated
(OR=2.04;
95%CI=2.00-2.09;
P<0.001).
Additionally,
deprivation
indices
were
inversely
PCa:
employment
(OR=0.91;
95%CI=0.86-0.96;
P<0.001),
education
(OR=0.94;
95%CI=0.83-0.98;
health
income
showed
heterogeneity
indices,
except
Townsend
Index
(P=0.03)
Conclusions:
reaffirmed
as
factor
identified
domains
influence
detection
potentially
correlated
environmental
exposures
These
findings
also
suggest
act
independently.
Polygenic
risk
scores
are
widely
used
in
disease
stratification,
but
their
accuracy
varies
across
diverse
populations.
Recent
methods
large-scale
leverage
multi-ancestry
data
to
improve
under-represented
populations
require
labelling
individuals
by
ancestry
for
prediction.
This
poses
challenges
practical
use,
as
clinical
practices
typically
not
based
on
ancestry.
We
propose
SPLENDID,
a
novel
penalized
regression
framework
biobank-scale
data.
Our
method
utilizes
principal
component
interactions
model
genetic
continuum
within
single
prediction
all
ancestries,
eliminating
the
need
discrete
labels.
In
extensive
simulations
and
analyses
of
9
traits
from
All
Us
Research
Program
(N=224,364)
UK
Biobank
(N=340,140),
SPLENDID
significantly
outperformed
existing
sparsity.
By
directly
incorporating
continuous
training,
stands
valuable
tool
robust
fairer
implementation.
Statistics in Medicine,
Journal Year:
2024,
Volume and Issue:
unknown
Published: Oct. 23, 2024
ABSTRACT
Polygenic
risk
scores
(PRS)
aim
to
predict
a
trait
from
genetic
information,
relying
on
common
variants
with
low
medium
effect
sizes.
As
genotype
data
are
high‐dimensional
in
nature,
it
is
crucial
develop
methods
that
can
be
applied
large‐scale
(large
and
large
).
Many
PRS
tools
aggregate
univariate
summary
statistics
genome‐wide
association
studies
into
single
score.
Recent
advancements
allow
simultaneous
modeling
of
variant
effects
individual‐level
data.
In
this
context,
we
introduced
snpboost,
an
algorithm
applies
statistical
boosting
estimate
via
multivariable
regression
models.
By
processing
iteratively
batches,
snpboost
deal
cohort
Having
solved
the
technical
obstacles
due
dimensionality,
methodological
scope
now
broadened—focusing
key
objectives
for
clinical
application
PRS.
Similar
most
has,
so
far,
been
restricted
quantitative
binary
traits.
Now,
incorporate
more
advanced
alternatives—targeted
particular
outcome.
Adapting
loss
function
extends
framework
further
situations
such
as
time‐to‐event
count
Furthermore,
alternative
functions
continuous
outcomes
us
focus
not
only
mean
conditional
distribution
but
also
other
aspects
may
helpful
stratification
individual
patients
quantify
prediction
uncertainty,
example,
median
or
quantile
regression.
This
work
enhances
fitting
across
multiple
model
classes
previously
unfeasible
type.