Nature Communications,
Год журнала:
2025,
Номер
16(1)
Опубликована: Март 12, 2025
Abstract
Sarcoidosis
is
a
complex
inflammatory
disease
with
strong
genetic
component.
Here,
we
perform
genome-wide
association
study
in
9755
sarcoidosis
cases
to
identify
risk
loci
and
map
associated
genes.
We
then
use
transcriptome-wide
studies
enrichment
analyses
explore
pathways
involved
Mendelian
randomization
examine
associations
modifiable
factors
circulating
biomarkers.
28
genomic
sarcoidosis,
the
C1orf141-IL23R
locus
showing
largest
effect
size.
observe
gene
expression
patterns
related
spleen,
whole
blood,
lung,
highlight
75
tissue-specific
genes
through
studies.
Furthermore,
analysis
establish
key
roles
for
T
cell
activation,
leukocyte
adhesion,
cytokine
production
sarcoidosis.
Additionally,
find
between
genetically
predicted
body
mass
index,
interleukin-23
receptor,
eight
proteins.
Importance
Polygenic
risk
scores
(PRSs)
for
coronary
heart
disease
(CHD)
are
a
growing
clinical
and
commercial
reality.
Whether
existing
provide
similar
individual-level
assessments
of
susceptibility
remains
incompletely
characterized.
Objective
To
characterize
the
agreement
CHD
PRSs
that
perform
similarly
at
population
level.
Design,
Setting,
Participants
Cross-sectional
study
participants
from
diverse
backgrounds
enrolled
in
All
Us
Research
Program
(AOU),
Penn
Medicine
BioBank
(PMBB),
University
California,
Los
Angeles
(UCLA)
ATLAS
Precision
Health
Biobank
with
electronic
health
record
genotyping
data.
Exposures
published
new
developed
separately
testing
samples.
Main
Outcomes
Measures
performed
population-level
prediction
were
identified
by
comparing
calibration
discrimination
models
prevalent
CHD.
Individual-level
was
tested
intraclass
correlation
coefficient
(ICC)
Light
κ.
Results
A
total
48
calculated
171
095
AOU
participants.
The
mean
(SD)
age
56.4
(16.8)
years.
104
947
(61.3%)
female.
35
590
(20.8%)
most
genetically
to
an
African
reference
population,
29
801
(17.4%)
admixed
American
100
493
(58.7%)
European
remaining
Central/South
Asian,
East
Middle
Eastern
populations.
There
17
589
(10.3%)
153
506
without
(89.7%)
When
included
model
CHD,
46
had
practically
equivalent
Brier
area
under
receiver
operator
curves
(region
practical
equivalence
±0.02).
Twenty
percent
least
1
score
both
top
bottom
5%
risk.
Continuous
individual
predictions
poor
(ICC,
0.373
[95%
CI,
0.372-0.375]).
κ,
used
evaluate
consistency
assignment,
did
not
exceed
0.56.
Analysis
among
41
193
PMBB
53
092
yielded
different
sets
scores,
which
also
lacked
agreement.
Conclusions
Relevance
level
demonstrated
highly
variable
estimates
Recognizing
may
generate
incongruent
estimates,
effective
implementation
will
require
refined
statistical
methods
quantify
uncertainty
strategies
communicate
this
patients
clinicians.
Genetic Epidemiology,
Год журнала:
2025,
Номер
49(1)
Опубликована: Янв. 1, 2025
ABSTRACT
In
large
cohort
studies
the
number
of
unaffected
individuals
outnumbers
affected
individuals,
and
power
can
be
low
to
detect
associations
for
outcomes
with
prevalence.
We
consider
how
including
recorded
family
history
in
regression
models
increases
between
genetic
variants
disease
risk.
show
theoretically
using
Monte‐Carlo
simulations
that
a
disease,
weighting
0.5
compared
true
cases,
associations.
This
is
powerful
approach
detecting
moderate
effects,
but
larger
effect
sizes
>
more
powerful.
illustrate
this
both
common
exome
sequencing
data
over
400,000
UK
Biobank
evaluate
association
burden
protein‐truncating
genes
risk
four
cancer
types.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2025,
Номер
unknown
Опубликована: Янв. 10, 2025
ABSTRACT
The
All
of
Us
Research
Program
(
)
seeks
to
accelerate
biomedical
research
and
address
the
underrepresentation
minorities
by
recruiting
over
one
million
ethnically
diverse
participants
across
United
States.
A
key
question
is
how
self-identification
with
discrete,
predefined
race
ethnicity
categories
compares
genetic
diversity
at
continental
subcontinental
levels.
To
contextualize
in
,
we
analyzed
∼2
common
variants
from
230,016
unrelated
whole
genomes
using
classical
population
genetics
methods,
alongside
reference
panels
such
as
1000
Genomes
Project,
Human
Genome
Diversity
Simons
Project.
Our
analysis
reveals
that
within
self-identified
groups
exhibit
a
gradient
rather
than
discrete
clusters.
distributions
ancestries
show
considerable
variation
ethnicity,
both
nationally
states,
reflecting
historical
impacts
U.S.
colonization,
transatlantic
slave
trade,
recent
migrations.
samples
filled
most
gaps
along
top
five
principal
components
current
global
panels.
Notably,
“Hispanic
or
Latino”
spanned
much
three-way
(African,
Native
American,
European)
admixture
spectrum.
Ancestry
was
significantly
associated
body
mass
index
(BMI)
height,
even
after
adjusting
for
socio-environmental
covariates.
In
particular,
West-Central
East
African
showed
opposite
associations
BMI.
This
study
emphasizes
importance
assessing
ancestries,
approach
insufficient
control
confounding
association
studies.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2025,
Номер
unknown
Опубликована: Янв. 14, 2025
Elucidating
ancestry-specific
structures
in
admixed
populations
is
crucial
for
comprehending
population
history
and
mitigating
confounding
effects
genome-wide
association
studies.
Existing
methods
elucidating
the
generally
rely
on
frequency-based
estimates
of
genetic
relationship
matrix
(GRM)
among
individuals
after
masking
segments
from
ancestry
components
not
being
targeted
investigation.
However,
these
approaches
disregard
linkage
information
between
markers,
potentially
limiting
their
resolution
revealing
structure
within
an
component.
We
introduce
expected
GRM
(as-eGRM),
a
novel
framework
relatedness
individuals.
The
key
design
as-eGRM
consists
defining
pairwise
based
genealogical
trees
encoded
Ancestral
Recombination
Graph
(ARG)
local
calls
computing
expectation
across
genome.
Comprehensive
evaluations
using
both
simulated
stepping-stone
models
empirical
datasets
three-way
Latino
cohorts
showed
that
analysis
robustly
outperforms
existing
with
diverse
demographic
histories.
Taken
together,
has
promise
to
better
reveal
fine-scale
component
individuals,
which
can
help
improve
robustness
interpretation
findings
studies
disease
or
complex
traits
understudied
populations.
JMIR Medical Informatics,
Год журнала:
2025,
Номер
13, С. e59452 - e59452
Опубликована: Янв. 28, 2025
Background
In
data-sparse
areas
such
as
health
care,
computer
scientists
aim
to
leverage
much
available
information
possible
increase
the
accuracy
of
their
machine
learning
models’
outputs.
As
a
standard,
categorical
data,
patients’
gender,
socioeconomic
status,
or
skin
color,
are
used
train
models
in
fusion
with
other
data
types,
medical
images
and
text-based
information.
However,
effects
including
features
for
model
training
data-scarce
underexamined,
particularly
regarding
intended
serve
individuals
equitably
diverse
population.
Objective
This
study
aimed
explore
data’s
on
outputs,
rooted
collection
dataset
publication
processes,
proposed
mixed
methods
approach
examining
datasets’
categories
before
using
them
training.
Methods
Against
theoretical
background
social
construction
categories,
we
suggest
assess
utility
an
example,
applied
our
Brazilian
dermatological
(Dermatological
Surgical
Assistance
Program
at
Federal
University
Espírito
Santo
[PAD-UFES]
20).
We
first
present
exploratory,
quantitative
that
assesses
when
excluding
each
unique
PAD-UFES
20
transformer-based
algorithm.
then
pair
analysis
qualitative
examination
based
interviews
authors.
Results
Our
suggests
scattered
across
predictive
classes.
gives
insights
into
how
were
collected
why
they
published,
explaining
some
observed.
findings
highlight
constructedness
publicly
datasets,
meaning
category
heavily
depend
both
these
defined
by
creators
sociomedico
context
which
collected.
reveals
relevant
limitations
datasets
contexts
different
from
those
data.
Conclusions
caution
against
without
reflection
dependency
features,
areas.
conclude
scientific,
context-dependent
is
helpful
judging
population
intended.
Non-communicable
diseases
(NCDs)
such
as
cardiovascular
diseases,
chronic
respiratory
cancers,
diabetes,
and
mental
health
disorders
pose
a
significant
global
challenge,
accounting
for
the
majority
of
fatalities
disability-adjusted
life
years
worldwide.
These
arise
from
complex
interactions
between
genetic,
behavioral,
environmental
factors,
necessitating
thorough
understanding
these
dynamics
to
identify
effective
diagnostic
strategies
interventions.
Although
recent
advances
in
multi-omics
technologies
have
greatly
enhanced
our
ability
explore
interactions,
several
challenges
remain.
include
inherent
complexity
heterogeneity
multi-omic
datasets,
limitations
analytical
approaches,
severe
underrepresentation
non-European
genetic
ancestries
most
omics
which
restricts
generalizability
findings
exacerbates
disparities.
This
scoping
review
evaluates
landscape
data
related
NCDs
2000
2024,
focusing
on
advancements
integration,
translational
applications,
equity
considerations.
We
highlight
need
standardized
protocols,
harmonized
data-sharing
policies,
advanced
approaches
artificial
intelligence/machine
learning
integrate
study
gene-environment
interactions.
also
opportunities
translating
insights
(GxE)
research
into
precision
medicine
strategies.
underscore
potential
advancing
enhancing
patient
outcomes
across
diverse
underserved
populations,
emphasizing
fairness-centered
strategic
investments
build
local
capacities
underrepresented
populations
regions.
Proceedings of the National Academy of Sciences,
Год журнала:
2025,
Номер
122(7)
Опубликована: Фев. 12, 2025
Alzheimer’s
disease
(AD)
affects
more
than
10%
of
the
population
≥65
y
age,
but
underlying
biological
risks
most
AD
cases
are
unclear.
We
show
anti-poly-glycine-arginine
(a-polyGR)
positive
aggregates
frequently
accumulate
in
sporadic
autopsy
brains
(45/80
cases).
hypothesize
that
these
caused
by
one
or
polyGR-encoding
repeat
expansion
mutations.
developed
a
CRISPR/deactivated-Cas9
enrichment
strategy
to
identify
candidate
GR-encoding
mutations
directly
from
genomic
DNA
isolated
a-polyGR(+)
cases.
Using
this
approach,
we
an
interrupted
(GGGAGA)
n
intronic
within
SINE-VNTR-Alu
element
CASP8
(
-GGGAGA
EXP
).
Immunostaining
using
a-polyGR
and
locus-specific
C-terminal
antibodies
demonstrate
expresses
hybrid
poly(GR)n(GE)n(RE)n
proteins
(+)
brains.
In
cells,
expression
minigenes
leads
increased
p-Tau
(Ser202/Thr205)
levels.
Consistent
with
other
types
repeat-associated
non-AUG
(RAN)
proteins,
protein
levels
stress.
Additionally,
stress-induced
reduced
metformin.
Association
studies
specific
aggregate
promoting
sequence
variants
found
~3.6%
controls
7.5%
increase
risk
[
-GGGAGA-AD-R1;
OR
2.2,
95%
CI
(1.5185
3.1896),
P
=
3.1
×
10
−5
].
Cells
transfected
high-risk
-GGGAGA-AD-R1
variant
toxicity
aggregates.
Taken
together,
data
polyGR(+)
as
frequent
unexpected
type
brain
pathology
alleles
relatively
common
factor.
support
model
which
combined
stress
risk.
Current Opinion in Structural Biology,
Год журнала:
2025,
Номер
92, С. 103023 - 103023
Опубликована: Фев. 22, 2025
Despite
massive
sequencing
efforts,
understanding
the
difference
between
human
pathogenic
and
benign
variants
remains
a
challenge.
Computational
variant
effect
predictors
(VEPs)
have
emerged
as
essential
tools
for
assessing
impact
of
genetic
variants,
although
their
performance
varies.
Initially,
sequence-based
methods
dominated
field,
but
recent
advances,
particularly
in
protein
structure
prediction
technologies
like
AlphaFold,
led
to
an
increased
utilization
structural
information
by
VEPs
aimed
at
scoring
missense
variants.
This
review
highlights
progress
integrating
into
VEPs,
showcasing
novel
models
such
AlphaMissense,
PrimateAI-3D,
CPT-1
that
demonstrate
improved
evaluation.
Structural
data
offers
more
interpretability,
especially
non-loss-of-function
provides
insights
complex
interactions
vivo.
As
field
utilizing
biomolecular
structures
will
be
pivotal
future
VEP
development,
with
breakthroughs
protein-ligand
protein-nucleic
acid
offering
new
avenues.