Genes to Cells,
Journal Year:
2020,
Volume and Issue:
25(1), P. 6 - 21
Published: Jan. 1, 2020
Abstract
Motility
often
plays
a
decisive
role
in
the
survival
of
species.
Five
systems
motility
have
been
studied
depth:
those
propelled
by
bacterial
flagella,
eukaryotic
actin
polymerization
and
motor
proteins
myosin,
kinesin
dynein.
However,
many
organisms
exhibit
surprisingly
diverse
motilities,
advances
genomics,
molecular
biology
imaging
showed
that
motilities
inherently
independent
mechanisms.
This
makes
defining
breadth
nontrivial,
because
novel
may
be
driven
unknown
Here,
we
classify
known
based
on
unique
classes
movement‐producing
protein
architectures.
Based
this
criterion,
current
total
stands
at
18
types.
In
perspective,
discuss
these
modes
relative
to
latest
phylogenetic
Tree
Life
propose
history
motility.
During
~4
billion
years
since
emergence
life,
arose
Bacteria
with
flagella
pili,
Archaea
archaella.
Newer
became
possible
Eukarya
changes
cell
envelope.
Presence
or
absence
peptidoglycan
layer,
acquisition
robust
membrane
dynamics,
enlargement
cells
environmental
opportunities
likely
provided
context
for
(co)evolution
types
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2019,
Volume and Issue:
unknown
Published: June 20, 2019
Abstract
Protein
modeling
is
an
increasingly
popular
area
of
machine
learning
research.
Semi-supervised
has
emerged
as
important
paradigm
in
protein
due
to
the
high
cost
acquiring
supervised
labels,
but
current
literature
fragmented
when
it
comes
datasets
and
standardized
evaluation
techniques.
To
facilitate
progress
this
field,
we
introduce
Tasks
Assessing
Embeddings
(TAPE),
a
set
five
biologically
relevant
semi-supervised
tasks
spread
across
different
domains
biology.
We
curate
into
specific
training,
validation,
test
splits
ensure
that
each
task
tests
generalization
transfers
real-life
scenarios.
bench-mark
range
approaches
representation
learning,
which
span
recent
work
well
canonical
sequence
find
self-supervised
pretraining
helpful
for
almost
all
models
on
tasks,
more
than
doubling
performance
some
cases.
Despite
increase,
several
cases
features
learned
by
still
lag
behind
extracted
state-of-the-art
non-neural
This
gap
suggests
huge
opportunity
innovative
architecture
design
improved
paradigms
better
capture
signal
biological
sequences.
TAPE
will
help
community
focus
effort
scientifically
problems.
Toward
end,
data
code
used
run
these
experiments
are
available
at
https://github.com/songlab-cal/tape
.
Genome biology,
Journal Year:
2019,
Volume and Issue:
20(1)
Published: Oct. 22, 2019
Current-day
metagenomics
analyses
increasingly
involve
de
novo
taxonomic
classification
of
long
DNA
sequences
and
metagenome-assembled
genomes.
Here,
we
show
that
the
conventional
best-hit
approach
often
leads
to
classifications
are
too
specific,
especially
when
represent
novel
deep
lineages.
We
present
a
method
integrates
multiple
signals
classify
(Contig
Annotation
Tool,
CAT)
genomes
(Bin
BAT).
Classifications
automatically
made
at
low
ranks
if
closely
related
organisms
in
reference
database
higher
otherwise.
The
result
is
high
precision
even
for
from
considerably
unknown
organisms.
Microbiome,
Journal Year:
2020,
Volume and Issue:
8(1)
Published: April 6, 2020
Abstract
Background
The
newly
defined
superphylum
Patescibacteria
such
as
Parcubacteria
(OD1)
and
Microgenomates
(OP11)
has
been
found
to
be
prevalent
in
groundwater,
sediment,
lake,
other
aquifer
environments.
Recently
increasing
attention
paid
this
diverse
including
>
20
candidate
phyla
(a
large
part
of
the
phylum
radiation,
CPR)
because
it
refreshed
our
view
tree
life.
However,
adaptive
traits
contributing
its
prevalence
are
still
not
well
known.
Results
Here,
we
investigated
genomic
features
metabolic
pathways
groundwater
through
genome-resolved
metagenomics
analysis
600
Gbp
sequence
data.
We
observed
that,
while
members
have
reduced
genomes
(~
1
Mbp)
exclusively,
functions
essential
growth
reproduction
genetic
information
processing
were
retained.
Surprisingly,
they
sharply
redundant
nonessential
functions,
specific
activities
stress
response
systems.
ultra-small
cells
simplified
membrane
structures,
flagellar
assembly,
transporters,
two-component
Despite
lack
CRISPR
viral
defense,
bacteria
may
evade
predation
deletion
common
phage
receptors
alternative
strategies,
which
explain
low
representation
prophage
proteins
their
CRISPR.
By
establishing
linkages
between
bacterial
environmental
conditions,
results
provide
important
insights
into
evolution
CPR
group.
Conclusions
that
streamlined
many
acquiring
advantages
avoiding
invasion,
adapt
environment.
unique
small
genome
size,
cell
lacking
lineage
bringing
new
understandings
on
life
Bacteria.
Our
mechanisms
for
adaptation
environments,
demonstrate
a
case
where
less
is
more,
mighty.
arXiv (Cornell University),
Journal Year:
2019,
Volume and Issue:
unknown
Published: Jan. 1, 2019
Machine
learning
applied
to
protein
sequences
is
an
increasingly
popular
area
of
research.
Semi-supervised
for
proteins
has
emerged
as
important
paradigm
due
the
high
cost
acquiring
supervised
labels,
but
current
literature
fragmented
when
it
comes
datasets
and
standardized
evaluation
techniques.
To
facilitate
progress
in
this
field,
we
introduce
Tasks
Assessing
Protein
Embeddings
(TAPE),
a
set
five
biologically
relevant
semi-supervised
tasks
spread
across
different
domains
biology.
We
curate
into
specific
training,
validation,
test
splits
ensure
that
each
task
tests
generalization
transfers
real-life
scenarios.
benchmark
range
approaches
representation
learning,
which
span
recent
work
well
canonical
sequence
find
self-supervised
pretraining
helpful
almost
all
models
on
tasks,
more
than
doubling
performance
some
cases.
Despite
increase,
several
cases
features
learned
by
still
lag
behind
extracted
state-of-the-art
non-neural
This
gap
suggests
huge
opportunity
innovative
architecture
design
improved
modeling
paradigms
better
capture
signal
biological
sequences.
TAPE
will
help
machine
community
focus
effort
scientifically
problems.
Toward
end,
data
code
used
run
these
experiments
are
available
at
https://github.com/songlab-cal/tape.
Nature Communications,
Journal Year:
2019,
Volume and Issue:
10(1)
Published: Dec. 2, 2019
Rapid
growth
of
genome
data
provides
opportunities
for
updating
microbial
evolutionary
relationships,
but
this
is
challenged
by
the
discordant
evolution
individual
genes.
Here
we
build
a
reference
phylogeny
10,575
evenly-sampled
bacterial
and
archaeal
genomes,
based
on
comprehensive
set
381
markers,
using
multiple
strategies.
Our
trees
indicate
remarkably
closer
proximity
between
Archaea
Bacteria
than
previous
estimates
that
were
limited
to
fewer
"core"
genes,
such
as
ribosomal
proteins.
The
robustness
results
was
tested
with
respect
several
variables,
including
taxon
site
sampling,
amino
acid
substitution
heterogeneity
saturation,
non-vertical
evolution,
impact
exclusion
candidate
phyla
radiation
(CPR)
taxa.
provide
an
updated
view
domain-level
relationships.