GENA-LM: a family of open-source foundational DNA language models for long sequences
Nucleic Acids Research,
Journal Year:
2025,
Volume and Issue:
53(2)
Published: Jan. 11, 2025
Abstract
Recent
advancements
in
genomics,
propelled
by
artificial
intelligence,
have
unlocked
unprecedented
capabilities
interpreting
genomic
sequences,
mitigating
the
need
for
exhaustive
experimental
analysis
of
complex,
intertwined
molecular
processes
inherent
DNA
function.
A
significant
challenge,
however,
resides
accurately
decoding
which
inherently
involves
comprehending
rich
contextual
information
dispersed
across
thousands
nucleotides.
To
address
this
need,
we
introduce
GENA
language
model
(GENA-LM),
a
suite
transformer-based
foundational
models
capable
handling
input
lengths
up
to
36
000
base
pairs.
Notably,
integrating
newly
developed
recurrent
memory
mechanism
allows
these
process
even
larger
segments.
We
provide
pre-trained
versions
GENA-LM,
including
multispecies
and
taxon-specific
models,
demonstrating
their
capability
fine-tuning
addressing
spectrum
complex
biological
tasks
with
modest
computational
demands.
While
already
achieved
breakthroughs
protein
biology,
GENA-LM
showcases
similarly
promising
potential
reshaping
landscape
genomics
multi-omics
data
analysis.
All
are
publicly
available
on
GitHub
(https://github.com/AIRI-Institute/GENA_LM)
HuggingFace
(https://huggingface.co/AIRI-Institute).
In
addition,
web
service
(https://dnalm.airi.net/)
allowing
user-friendly
annotation
models.
Language: Английский
Genome Annotation and Analysis
Harsharan Singh,
No information about this author
Mannatpreet Khaira,
No information about this author
Karan Sharma
No information about this author
et al.
Elsevier eBooks,
Journal Year:
2024,
Volume and Issue:
unknown
Published: Jan. 1, 2024
Language: Английский
Splice site variants in the canonical donor site ofMED13Lexon 7 lead to intron retention in patients withMED13Lsyndrome
Jade Fauqueux,
No information about this author
Simon Boussion,
No information about this author
C. Thuillier
No information about this author
et al.
Journal of Medical Genetics,
Journal Year:
2024,
Volume and Issue:
61(11), P. 1040 - 1044
Published: Aug. 24, 2024
Pathogenic
variants
in
the
MED13L
gene
are
associated
with
autosomal
dominant
syndrome,
which
is
characterised
by
global
developmental
delay
and
cardiac
malformations.
We
investigated
two
heterozygous
located
at
canonical
donor
splice
site
motif
of
exon
7:
c.1009+1G>C
c.1009+5G>C.
report
that
silico
predictions
suggested
possible
outcomes:
7
skipping,
resulting
loss
phosphodegron
essential
for
regulation,
or
activation
a
cryptic
intron
7,
leading
to
retention.
RNA
analysis
confirmed
both
affected
site,
retention
73
bp
7.
This
caused
frameshift
premature
translation
termination,
consistent
haploinsufficiency.
Our
results
highlight
importance
combining
predictive
experimental
approaches
understand
functional
impact
variants.
These
insights
into
molecular
consequences
provide
deeper
understanding
genetic
basis
syndrome.
Language: Английский