bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Nov. 4, 2024
Abstract
Tandem
repeats
(TRs)
–
highly
polymorphic,
repetitive
sequences
dispersed
across
the
human
genome
are
crucial
regulators
of
gene
expression
and
diverse
biological
processes.
Yet,
due
to
historical
challenges
in
their
accurate
calling
analysis,
TRs
have
remained
underexplored
compared
single
nucleotide
variants
(SNVs).
Here,
we
introduce
a
cell
type-specific
resource
exploring
impact
TR
variation
on
expression.
Leveraging
whole
single-cell
RNA
sequencing,
catalog
over
1.7
million
polymorphic
loci
associations
with
more
than
5
blood-derived
cells
from
1,790
individuals.
We
identify
58,000
quantitative
trait
(sc-eTRs),
16.6%
which
specific
one
28
distinct
immune
types.
Further
fine-mapping
uncovers
6,210
sc-eTRs
as
candidate
causal
drivers
21%
genes
tested
genome-wide.
show
through
colocalization
that
likely
2,000
GWAS
associated
immune-mediated
hematological
traits,
further
novel
warranting
investigation
rare
disease
cohorts.
pivotal,
yet
long-overlooked,
contributors
expression,
promising
implications
for
understanding
pathogenesis
genetic
architecture
complex
traits.
Nature Communications,
Journal Year:
2023,
Volume and Issue:
14(1)
Published: April 12, 2023
Short
tandem
repeats
(STRs)
are
abundant
and
highly
mutagenic
in
the
human
genome.
Many
STR
loci
have
been
associated
with
a
range
of
genetic
disorders.
However,
most
population-scale
studies
on
variation
humans
focused
European
ancestry
cohorts
or
limited
by
sequencing
depth.
Here,
we
depicted
comprehensive
map
366,013
polymorphic
STRs
(pSTRs)
constructed
from
6487
deeply
sequenced
genomes,
comprising
3983
Chinese
samples
(~31.5x,
NyuWa)
2504
1000
Genomes
Project
(~33.3x,
1KGP).
We
found
that
mutations
were
affected
motif
length,
chromosome
context
epigenetic
features.
identified
3273
1117
pSTRs
whose
repeat
numbers
gene
expression
3'UTR
alternative
polyadenylation,
respectively.
also
implemented
population
analysis,
investigated
differentiated
signatures,
genotyped
60
known
disease-causing
STRs.
Overall,
this
study
further
extends
scale
propels
our
understanding
semantics
Nature Communications,
Journal Year:
2023,
Volume and Issue:
14(1)
Published: Oct. 23, 2023
Abstract
Tandem
repeats
(TRs)
represent
one
of
the
largest
sources
genetic
variation
in
humans
and
are
implicated
a
range
phenotypes.
Here
we
present
deep
characterization
TR
based
on
high
coverage
whole
genome
sequencing
from
3550
diverse
individuals
1000
Genomes
Project
H3Africa
cohorts.
We
develop
method,
EnsembleTR,
to
integrate
genotypes
four
separate
methods
resulting
high-quality
at
more
than
1.7
million
loci.
Our
catalog
reveals
novel
sequence
features
influencing
heterozygosity,
identifies
population-specific
trinucleotide
expansions,
finds
hundreds
eQTL
signals.
Finally,
generate
phased
haplotype
panel
which
can
be
used
impute
most
TRs
nearby
single
nucleotide
polymorphisms
(SNPs)
with
accuracy.
Overall,
reference
generated
here
will
serve
as
valuable
resources
for
future
genome-wide
population-wide
studies
their
role
human
Cell,
Journal Year:
2023,
Volume and Issue:
186(17), P. 3659 - 3673.e23
Published: July 31, 2023
Many
regions
in
the
human
genome
vary
length
among
individuals
due
to
variable
numbers
of
tandem
repeats
(VNTRs).
To
assess
phenotypic
impact
VNTRs
genome-wide,
we
applied
a
statistical
imputation
approach
estimate
lengths
9,561
autosomal
VNTR
loci
418,136
unrelated
UK
Biobank
participants
and
838
GTEx
participants.
Association
fine-mapping
analyses
identified
58
that
appeared
influence
complex
trait
Biobank,
18
which
also
modulate
expression
or
splicing
nearby
gene.
Non-coding
at
TMCO1
EIF3H
generate
largest
known
contributions
common
genetic
variation
risk
glaucoma
colorectal
cancer,
respectively.
Each
these
two
associated
with
>2-fold
range
across
individuals.
These
results
reveal
substantial
previously
unappreciated
role
non-coding
health
gene
regulation.
Nature Genetics,
Journal Year:
2024,
Volume and Issue:
56(4), P. 569 - 578
Published: March 28, 2024
Abstract
Copy
number
variants
(CNVs)
are
among
the
largest
genetic
variants,
yet
CNVs
have
not
been
effectively
ascertained
in
most
association
studies.
Here
we
protein-altering
from
UK
Biobank
whole-exome
sequencing
data
(
n
=
468,570)
using
haplotype-informed
methods
capable
of
detecting
subexonic
and
variation
within
segmental
duplications.
Incorporating
into
analyses
rare
predicted
to
cause
gene
loss
function
(LOF)
identified
100
associations
LOF
with
41
quantitative
traits.
A
low-frequency
partial
deletion
RGL3
exon
6
conferred
one
strongest
protective
effects
on
hypertension
risk
(odds
ratio
0.86
(0.82–0.90)).
Protein-coding
rapidly
evolving
families
duplications—previously
invisible
analysis
methods—generated
some
human
genome’s
contributions
type
2
diabetes
risk,
chronotype
blood
cell
These
results
illustrate
potential
for
new
insights
genomic
that
has
escaped
large-scale
date.
Journal of Molecular Biology,
Journal Year:
2023,
Volume and Issue:
435(20), P. 168260 - 168260
Published: Sept. 7, 2023
Short
tandem
repeats
(STRs)
are
consecutive
repetitions
of
one
to
six
nucleotide
motifs.
They
hypervariable
due
the
high
prevalence
repeat
unit
insertions
or
deletions
primarily
caused
by
polymerase
slippage
during
replication.
Genetic
variation
at
STRs
has
been
shown
influence
a
range
traits
in
humans,
including
gene
expression,
cancer
risk,
and
autism.
Until
recently
have
poorly
studied
since
they
pose
significant
challenges
bioinformatics
analyses.
Moreover,
genome-wide
analysis
STR
population-scale
cohorts
requires
large
amounts
data
computational
resources.
However,
recent
advent
tools
resulted
multiple
datasets
spanning
nearly
two
million
genomic
loci
thousands
individuals
from
diverse
populations.
Here
we
present
WebSTR,
database
genetic
other
characteristics
across
human
WebSTR
is
based
on
reference
panels
more
than
1.7
created
with
state
art
annotation
methods
can
easily
be
extended
include
additional
species.
It
currently
contains
genotypes
for
1000
Genomes
Project,
H3Africa,
Genotype-Tissue
Expression
(GTEx)
Project
colorectal
patients
TCGA
dataset.
implemented
as
relational
programmatic
access
available
through
an
API
web
portal
browsing
data.
The
publicly
https://webstr.ucsd.edu.
medRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Jan. 23, 2024
Abstract
Most
genetic
association
studies
focus
on
binary
variants.
To
identify
the
effects
of
multi-allelic
variation
tandem
repeats
(TRs)
human
traits,
we
performed
direct
TR
genotyping
and
phenome-wide
in
168,554
individuals
from
UK
Biobank,
identifying
47
TRs
showing
causal
associations
with
73
traits.
We
replicated
23
31
(74%)
these
All
Us
cohort.
While
this
set
included
several
known
repeat
expansion
disorders,
novel
found
were
attributable
to
common
polymorphic
length
rather
than
rare
expansions
include
e.g.
a
coding
polyhistidine
motif
HRCT1
influencing
risk
hypertension
poly(CGC)
5’UTR
GNB2
heart
rate.
Causal
strongly
enriched
for
local
gene
expression
DNA
methylation.
Our
study
highlights
contribution
“missing
heritability”
genome.
Cell Genomics,
Journal Year:
2024,
Volume and Issue:
4(6), P. 100562 - 100562
Published: May 14, 2024
The
phenotypic
impact
of
genetic
variation
repetitive
features
in
the
human
genome
is
currently
understudied.
One
such
feature
multi-copy
47S
ribosomal
DNA
(rDNA)
that
codes
for
rRNA
components
ribosome.
Here,
we
present
an
analysis
rDNA
copy
number
(CN)
UK
Biobank
(UKB).
From
first
release
UKB
whole-genome
sequencing
(WGS)
data,
a
discovery
White
British
individuals
reveals
CN
associates
with
altered
counts
specific
blood
cell
subtypes,
as
neutrophils,
and
estimated
glomerular
filtration
rate,
marker
kidney
function.
Similar
trends
are
observed
other
ancestries.
A
range
analyses
argue
against
reverse
causality
or
common
confounder
effects,
all
core
results
replicate
second
WGS
release.
Our
work
demonstrates
influence
on
trait
variance
humans.