bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2025,
Volume and Issue:
unknown
Published: Jan. 13, 2025
Abstract
Background
Selection
of
individuals
based
on
their
estimated
breeding
values
aims
to
maximize
response
selection
the
next
generation
in
additive
model.
However,
when
aim
is
not
only
about
short-term
population-wide
genetic
gain
but
also
over
multiple
generations,
an
optimal
strategy
as
clear-cut,
maintenance
diversity
may
become
important
factor.
This
study
provides
extended
comparison
existing
strategies
a
unifying
testing
pipeline
using
simulation
software
MoBPS.
Results
Applying
weighting
factor
SNP
effects
frequency
beneficial
allele
resulted
increase
long-term
1.6%
after
50
generations
while
reducing
inbreeding
rates
by
16.2%
compared
truncation
values.
this
losses
1.2%
with
break-even
point
reached
25
generations.
In
contrast,
inclusion
average
kinship
individual
top
population
additional
trait
index
weight
17.5%
no
and
increased
gains
4.3%
15.8%,
achieving
very
similar
efficiency
use
optimum
contribution
selection.
Combining
management
strategies,
weights
for
each
optimized
evolutionary
algorithm
scheme
5.1%
37.3%
reduced
rates.
The
proposed
included
contribution,
frequency,
index,
avoiding
matings
between
related
individuals,
lowering
proportion
selected
individuals.
Conclusions
combination
was
shown
be
far
superior
any
singular
method
tested
study.
As
efficient
methods
does
necessarily
lead
comes
at
extra
costs,
it
critical
companies
implement
such
success.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: May 14, 2024
In
recent
years,
significant
advancements
have
been
observed
in
the
domain
of
Natural
Language
Processing(NLP)
with
introduction
pre-trained
foundational
models,
paving
way
for
utilizing
similar
AI
technologies
to
interpret
language
biology.
this
research,
we
introduce
“LucaOne”,
a
novel
model
designed
integratively
learn
from
genetic
and
proteomic
languages,
encapsulating
data
169,861
species
en-compassing
DNA,
RNA,
proteins.
This
work
illuminates
potential
creating
biological
aimed
at
universal
bioinformatics
appli-cation.
Remarkably,
through
few-shot
learning,
efficiently
learns
central
dogma
molecular
biology
demonstrably
outperforms
com-peting
models.
Furthermore,
tasks
requiring
inputs
proteins,
or
combination
thereof,
LucaOne
exceeds
state-of-the-art
performance
using
streamlined
downstream
architecture,
thereby
providing
empirical
ev-idence
innovative
perspectives
on
models
comprehend
complex
systems.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2021,
Volume and Issue:
unknown
Published: Sept. 23, 2021
Predicting
the
function
of
a
protein
from
its
amino
acid
sequence
is
long-standing
challenge
in
bioinformatics.
Traditional
approaches
use
alignment
to
compare
query
either
thousands
models
families
or
large
databases
individual
sequences.
Here
we
instead
employ
deep
convolutional
neural
networks
directly
predict
variety
functions
–
EC
numbers
and
GO
terms
an
unaligned
sequence.
This
approach
provides
precise
predictions
which
complement
alignment-based
methods,
computational
efficiency
single
network
permits
novel
lightweight
software
interfaces,
demonstrate
with
in-browser
graphical
interface
for
prediction
all
computation
performed
on
user’s
personal
computer
no
data
uploaded
remote
servers.
Moreover,
these
place
full-length
sequences
into
generalised
functional
space,
facilitating
downstream
analysis
interpretation.
To
read
interactive
version
this
paper,
please
visit
https://google-research.github.io/proteinfer/
Abstract
Figure
QR
code
preprint
at
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2022,
Volume and Issue:
unknown
Published: April 28, 2022
Clinical
diagnosis
typically
incorporates
physical
examination,
patient
history,
and
various
laboratory
tests
imaging
studies,
but
makes
limited
use
of
the
human
system's
own
record
antigen
exposures
encoded
by
receptors
on
B
cells
T
cells.
We
analyzed
immune
receptor
datasets
from
593
individuals
to
develop
MAchine
Learning
for
Immunological
Diagnosis
(Mal-ID)
,
an
interpretive
framework
screen
multiple
illnesses
simultaneously
or
precisely
test
one
condition.
This
approach
detects
specific
infections,
autoimmune
disorders,
vaccine
responses,
disease
severity
differences.
Human-interpretable
features
model
recapitulate
known
responses
SARS-CoV-2,
Influenza,
HIV,
highlight
antigen-specific
receptors,
reveal
distinct
characteristics
Systemic
Lupus
Erythematosus
Type-1
Diabetes
autoreactivity.
analysis
has
broad
potential
scientific
clinical
interpretation
responses.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2023,
Volume and Issue:
unknown
Published: Feb. 22, 2023
Cryo-electron
microscopy
(cryo-EM)
is
currently
the
most
powerful
technique
for
determining
structures
of
large
protein
complexes
and
assemblies.
Picking
single-protein
particles
from
cryo-EM
micrographs
(images)
a
key
step
in
reconstructing
structures.
However,
widely
used
template-based
particle
picking
process
labor-intensive
time-consuming.
Though
emerging
machine
learning-based
can
potentially
automate
process,
its
development
severely
hindered
by
lack
large,
high-quality,
manually
labelled
training
data.
Here,
we
present
CryoPPP,
diverse,
expert-curated
image
dataset
single
analysis
to
address
this
bottleneck.
It
consists
32
non-redundant,
representative
datasets
selected
Electron
Microscopy
Public
Image
Archive
(EMPIAR).
includes
9,089
high-resolution
(∼300
images
per
EMPIAR
dataset)
which
coordinates
were
human
experts.
The
labelling
was
rigorously
validated
both
2D
class
validation
3D
density
map
with
gold
standard.
expected
greatly
facilitate
learning
artificial
intelligence
methods
automated
picking.
data
processing
scripts
are
available
at
https://github.com/BioinfoMachineLearning/cryoppp.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2025,
Volume and Issue:
unknown
Published: Jan. 13, 2025
Abstract
Background
Selection
of
individuals
based
on
their
estimated
breeding
values
aims
to
maximize
response
selection
the
next
generation
in
additive
model.
However,
when
aim
is
not
only
about
short-term
population-wide
genetic
gain
but
also
over
multiple
generations,
an
optimal
strategy
as
clear-cut,
maintenance
diversity
may
become
important
factor.
This
study
provides
extended
comparison
existing
strategies
a
unifying
testing
pipeline
using
simulation
software
MoBPS.
Results
Applying
weighting
factor
SNP
effects
frequency
beneficial
allele
resulted
increase
long-term
1.6%
after
50
generations
while
reducing
inbreeding
rates
by
16.2%
compared
truncation
values.
this
losses
1.2%
with
break-even
point
reached
25
generations.
In
contrast,
inclusion
average
kinship
individual
top
population
additional
trait
index
weight
17.5%
no
and
increased
gains
4.3%
15.8%,
achieving
very
similar
efficiency
use
optimum
contribution
selection.
Combining
management
strategies,
weights
for
each
optimized
evolutionary
algorithm
scheme
5.1%
37.3%
reduced
rates.
The
proposed
included
contribution,
frequency,
index,
avoiding
matings
between
related
individuals,
lowering
proportion
selected
individuals.
Conclusions
combination
was
shown
be
far
superior
any
singular
method
tested
study.
As
efficient
methods
does
necessarily
lead
comes
at
extra
costs,
it
critical
companies
implement
such
success.