Genomic language models: opportunities and challenges
Gonzalo Benegas,
No information about this author
Chengzhong Ye,
No information about this author
Carlos Albors
No information about this author
et al.
Trends in Genetics,
Journal Year:
2025,
Volume and Issue:
unknown
Published: Jan. 1, 2025
Language: Английский
Recipes and ingredients for deep learning models of 3D genome folding
Current Opinion in Genetics & Development,
Journal Year:
2025,
Volume and Issue:
91, P. 102308 - 102308
Published: Jan. 24, 2025
Language: Английский
From computational models of the splicing code to regulatory mechanisms and therapeutic implications
Nature Reviews Genetics,
Journal Year:
2024,
Volume and Issue:
unknown
Published: Oct. 2, 2024
Language: Английский
A Large-Scale Foundation Model for RNA Function and Structure Prediction
S. Zou,
No information about this author
Tianhua Tao,
No information about this author
Parvez Mahbub
No information about this author
et al.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Nov. 29, 2024
Abstract
Originally
marginalized
as
an
intermediate
in
the
information
flow
from
DNA
to
protein,
RNA
has
become
star
of
modern
biology,
holding
key
precision
therapeutics,
genetic
engineering,
evolutionary
origins,
and
our
understanding
fundamental
cellular
processes.
Yet
is
mysterious
it
prolific,
serving
store,
a
messenger,
catalyst,
spanning
many
underchar-acterized
functional
structural
classes.
Deciphering
language
important
not
only
for
mechanistic
its
biological
functions
but
also
accelerating
drug
design.
Toward
this
goal,
we
introduce
AIDO.RNA,
pre-trained
module
AI-driven
Digital
Organism
[1].
AIDO.RNA
contains
scale
1.6
billion
parameters,
trained
on
42
million
non-coding
(ncRNA)
sequences
at
single-nucleotide
resolution,
achieves
state-of-the-art
performance
comprehensive
set
tasks,
including
structure
prediction,
regulation,
molecular
function
across
species,
sequence
after
domain
adaptation
learns
model
essential
parts
protein
translation
that
models,
which
have
received
widespread
attention
recent
years,
do
not.
More
broadly,
hints
generality
modeling
ability
leverage
central
dogma
improve
biomolecular
representations.
Models
code
are
available
through
ModelGenerator
https://github.com/genbio-ai/AIDO
Hugging
Face
.
Language: Английский
Interpreting deep neural networks for the prediction of translation rates
Frederick Korbel,
No information about this author
Ekaterina Eroshok,
No information about this author
Uwe Ohler
No information about this author
et al.
BMC Genomics,
Journal Year:
2024,
Volume and Issue:
25(1)
Published: Nov. 9, 2024
The
5'
untranslated
region
of
mRNA
strongly
impacts
the
rate
translation
initiation.
A
recent
convolutional
neural
network
(CNN)
model
accurately
quantifies
relationship
between
massively
parallel
synthetic
regions
(5'UTRs)
and
levels.
However,
underlying
biological
features,
which
drive
predictions,
remain
elusive.
Uncovering
sequence
determinants
predictive
output
may
allow
us
to
develop
a
more
detailed
understanding
regulation
at
5'UTR.
Language: Английский
Accurate and General DNA Representations Emerge from Genome Foundation Models at Scale
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Dec. 5, 2024
Abstract
Language
models
applied
to
protein
sequences
have
become
a
panacea,
enabling
therapeutics
development,
materials
engineering,
and
core
biology
research.
Despite
the
successes
of
language
models,
genome
remain
nascent.
Recent
studies
suggest
bottleneck
is
data
volume
or
modeling
context
size,
since
long-range
interactions
are
widely
acknowledged
but
sparsely
annotated.
However,
it
may
be
case
that
even
short
DNA
modeled
poorly
by
existing
approaches,
current
unable
represent
wide
array
functions
encoded
DNA.
To
study
this,
we
develop
AIDO.DNA,
pretrained
module
for
representation
in
an
AI-driven
Digital
Organism
[1].
AIDO.DNA
seven
billion
parameter
encoder-only
transformer
trained
on
10.6
nucleotides
from
dataset
796
species.
By
scaling
model
size
while
maintaining
length
4k
nucleotides,
shows
substantial
improvements
across
breadth
supervised,
generative,
zero-shot
tasks
relevant
functional
genomics,
synthetic
biology,
drug
development.
Notably,
outperforms
prior
architectures
without
new
data,
suggesting
laws
needed
achieve
computeoptimal
models.
Models
code
available
through
Model-Generator
https://github.com/genbio-ai/AIDO
Hugging
Face
at
https://huggingface.co/genbio-ai
.
Language: Английский