medRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2023,
Номер
unknown
Опубликована: Ноя. 28, 2023
Identifying
causal
mutations
accelerates
genetic
disease
diagnosis,
and
therapeutic
development.
Missense
variants
present
a
bottleneck
in
diagnoses
as
their
effects
are
less
straightforward
than
truncations
or
nonsense
mutations.
While
computational
prediction
methods
increasingly
successful
at
for
known
genes,
they
do
not
generalize
well
to
other
genes
the
scores
calibrated
across
proteome.
To
address
this,
we
developed
deep
generative
model,
popEVE,
that
combines
evolutionary
information
with
population
sequence
data
achieves
state-of-the-art
performance
ranking
by
severity
distinguish
patients
severe
developmental
disorders
from
potentially
healthy
individuals.
popEVE
identifies
442
cohort
of
disorder
cases,
including
evidence
119
novel
without
need
gene-level
enrichment
overestimating
prevalence
pathogenic
population.
By
placing
on
unified
scale,
our
model
offers
comprehensive
perspective
distribution
fitness
entire
proteome
broader
human
provides
compelling
even
exceptionally
rare
single-patient
where
conventional
techniques
relying
repeated
observations
may
be
applicable.
Interactive
web
viewer
downloads
available
pop.evemodel.org.
Nucleic Acids Research,
Год журнала:
2024,
Номер
52(D1), С. D1143 - D1154
Опубликована: Янв. 5, 2024
Machine
Learning-based
scoring
and
classification
of
genetic
variants
aids
the
assessment
clinical
findings
is
employed
to
prioritize
in
diverse
studies
analyses.
Combined
Annotation-Dependent
Depletion
(CADD)
one
first
methods
for
genome-wide
prioritization
across
different
molecular
functions
has
been
continuously
developed
improved
since
its
original
publication.
Here,
we
present
our
most
recent
release,
CADD
v1.7.
We
explored
integrated
new
annotation
features,
among
them
state-of-the-art
protein
language
model
scores
(Meta
ESM-1v),
regulatory
variant
effect
predictions
(from
sequence-based
convolutional
neural
networks)
sequence
conservation
(Zoonomia).
evaluated
version
on
data
sets
derived
from
ClinVar,
ExAC/gnomAD
1000
Genomes
variants.
For
coding
effects,
tested
31
Deep
Mutational
Scanning
(DMS)
ProteinGym
and,
prediction,
used
saturation
mutagenesis
reporter
assay
promoter
enhancer
sequences.
The
inclusion
features
further
overall
performance
CADD.
As
with
previous
releases,
all
sets,
v1.7
scores,
scripts
on-site
an
easy-to-use
webserver
are
readily
provided
via
https://cadd.bihealth.org/
or
https://cadd.gs.washington.edu/
community.
Nature Communications,
Год журнала:
2024,
Номер
15(1)
Опубликована: Июль 29, 2024
Abstract
The
effective
design
of
combinatorial
libraries
to
balance
fitness
and
diversity
facilitates
the
engineering
useful
enzyme
functions,
particularly
those
that
are
poorly
characterized
or
unknown
in
biology.
We
introduce
MODIFY,
a
machine
learning
(ML)
algorithm
learns
from
natural
protein
sequences
infer
evolutionarily
plausible
mutations
predict
fitness.
MODIFY
co-optimizes
predicted
sequence
starting
libraries,
prioritizing
high-fitness
variants
while
ensuring
broad
coverage.
In
silico
evaluation
shows
outperforms
state-of-the-art
unsupervised
methods
zero-shot
prediction
enables
ML-guided
directed
evolution
with
enhanced
efficiency.
Using
we
engineer
generalist
biocatalysts
derived
thermostable
cytochrome
c
achieve
enantioselective
C-B
C-Si
bond
formation
via
new-to-nature
carbene
transfer
mechanism,
leading
six
away
previously
developed
enzymes
exhibiting
superior
comparable
activities.
These
results
demonstrate
MODIFY’s
potential
solving
challenging
problems
beyond
reach
classic
evolution.
Nature,
Год журнала:
2023,
Номер
625(7996), С. 735 - 742
Опубликована: Ноя. 29, 2023
Abstract
Noncoding
DNA
is
central
to
our
understanding
of
human
gene
regulation
and
complex
diseases
1,2
,
measuring
the
evolutionary
sequence
constraint
can
establish
functional
relevance
putative
regulatory
elements
in
genome
3–9
.
Identifying
genomic
that
have
become
constrained
specifically
primates
has
been
hampered
by
faster
evolution
noncoding
compared
protein-coding
10
relatively
short
timescales
separating
primate
species
11
previously
limited
availability
whole-genome
sequences
12
Here
we
construct
a
alignment
239
species,
representing
nearly
half
all
extant
order.
Using
this
resource,
identified
are
under
selective
across
other
mammals
at
5%
false
discovery
rate.
We
detected
111,318
DNase
I
hypersensitivity
sites
267,410
transcription
factor
binding
but
not
placental
validate
their
cis
-regulatory
effects
on
expression.
These
enriched
for
genetic
variants
affect
expression
traits
diseases.
Our
results
highlight
important
role
recent
differentiating
primates,
including
humans,
from
mammals.
We
examined
454,712
exomes
for
genes
associated
with
a
wide
spectrum
of
complex
traits
and
common
diseases
observed
that
rare,
penetrant
mutations
in
implicated
by
genome-wide
association
studies
confer
~10-fold
larger
effects
than
variants
the
same
genes.
Consequently,
an
individual
at
phenotypic
extreme
greatest
risk
severe,
early-onset
disease
is
better
identified
few
rare
collective
action
many
weak
effects.
By
combining
across
phenotype-associated
into
unified
genetic
model,
we
demonstrate
superior
portability
diverse
global
populations
compared
common-variant
polygenic
scores,
greatly
improving
clinical
utility
genetic-based
prediction.
Abstract
Three
and
a
half
years
after
the
pandemic
outbreak,
now
that
WHO
has
formally
declared
emergency
is
over,
COVID-19
still
significant
global
issue.
Here,
we
focus
on
recent
developments
in
genetic
genomic
research
COVID-19,
give
an
outlook
state-of-the-art
therapeutical
approaches,
as
gradually
transitioning
to
endemic
situation.
The
sequencing
characterization
of
rare
alleles
different
populations
made
it
possible
identify
numerous
genes
affect
either
susceptibility
or
severity
disease.
These
findings
provide
beginning
new
avenues
pan-ethnic
therapeutic
well
potential
screening
protocols.
causative
virus,
SARS-CoV-2,
spotlight,
but
novel
threatening
virus
could
appear
anywhere
at
any
time.
Therefore,
continued
vigilance
further
warranted.
We
also
note
emphatically
prevent
future
pandemics
other
world-wide
health
crises,
imperative
capitalize
what
have
learnt
from
COVID-19:
specifically,
regarding
its
origins,
world’s
response,
insufficient
preparedness.
This
requires
unprecedented
international
collaboration
timely
data
sharing
for
coordination
effective
response
rapid
implementation
containment
measures.
American Journal of Medical Genetics Part C Seminars in Medical Genetics,
Год журнала:
2023,
Номер
193(3)
Опубликована: Июль 28, 2023
Abstract
The
transition
from
analog
to
digital
technologies
in
clinical
laboratory
genomics
is
ushering
an
era
of
“big
data”
ways
that
will
exceed
human
capacity
rapidly
and
reproducibly
analyze
those
data
using
conventional
approaches.
Accurately
evaluating
complex
molecular
facilitate
timely
diagnosis
management
genomic
disorders
require
supportive
artificial
intelligence
methods.
These
are
already
being
introduced
into
identify
variants
DNA
sequencing
data,
predict
the
effects
on
protein
structure
function
inform
interpretation
pathogenicity,
link
phenotype
ontologies
genetic
identified
through
exome
or
genome
help
clinicians
reach
diagnostic
answers
faster,
correlate
with
tumor
staging
treatment
approaches,
utilize
natural
language
processing
critical
published
medical
literature
during
analysis
use
interactive
chatbots
individuals
who
qualify
for
testing
provide
pre‐test
post‐test
education.
With
careful
ethical
development
validation
genomics,
these
advances
expected
significantly
enhance
abilities
geneticists
translate
clearly
synthesized
information
managing
care
their
patients
at
scale.
Communications Biology,
Год журнала:
2024,
Номер
7(1)
Опубликована: Июль 9, 2024
Abstract
Significant
progress
has
been
made
in
the
field
of
plant
genomics,
as
demonstrated
by
increased
use
high-throughput
methodologies
that
enable
characterization
multiple
genome-wide
molecular
phenotypes.
These
findings
have
provided
valuable
insights
into
traits
and
their
underlying
genetic
mechanisms,
particularly
model
species.
Nonetheless,
effectively
leveraging
them
to
make
accurate
predictions
represents
a
critical
step
crop
genomic
improvement.
We
present
AgroNT,
foundational
large
language
trained
on
genomes
from
48
species
with
predominant
focus
show
AgroNT
can
obtain
state-of-the-art
for
regulatory
annotations,
promoter/terminator
strength,
tissue-specific
gene
expression,
prioritize
functional
variants.
conduct
large-scale
silico
saturation
mutagenesis
analysis
cassava
evaluate
impact
over
10
million
mutations
provide
predicted
effects
resource
variant
characterization.
Finally,
we
propose
diverse
datasets
compiled
here
Plants
Genomic
Benchmark
(PGB),
providing
comprehensive
benchmark
deep
learning-based
methods
research.
The
pre-trained
is
publicly
available
HuggingFace
at
https://huggingface.co/InstaDeepAI/agro-nucleotide-transformer-1b
future
research
purposes.