bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2023,
Номер
unknown
Опубликована: Ноя. 29, 2023
Developing
a
universal
representation
of
cells
which
encompasses
the
tremendous
molecular
diversity
cell
types
within
human
body
and
more
generally,
across
species,
would
be
transformative
for
biology.
Recent
work
using
single-cell
transcriptomic
approaches
to
create
definitions
in
form
atlases
has
provided
necessary
data
such
an
endeavor.
Here,
we
present
Universal
Cell
Embedding
(UCE)
foundation
model.
UCE
was
trained
on
corpus
atlas
from
other
species
completely
self-supervised
way
without
any
annotations.
offers
unified
biological
latent
space
that
can
represent
cell,
regardless
tissue
or
species.
This
embedding
captures
important
variation
despite
presence
experimental
noise
diverse
datasets.
An
aspect
UCE's
universality
is
new
organism
mapped
this
with
no
additional
labeling,
model
training
fine-tuning.
We
applied
Integrated
Mega-scale
Atlas,
36
million
cells,
than
1,000
uniquely
named
types,
hundreds
experiments,
dozens
tissues
eight
uncovered
insights
about
organization
space,
leveraged
it
infer
function
newly
discovered
types.
exhibits
emergent
behavior,
uncovering
biology
never
explicitly
for,
as
identifying
developmental
lineages
novel
not
included
set.
Overall,
by
enabling
every
state
type,
provides
valuable
tool
analysis,
annotation
hypothesis
generation
scale
single
datasets
continues
grow.
Nature Biotechnology,
Год журнала:
2023,
Номер
42(2), С. 284 - 292
Опубликована: Май 25, 2023
Currently
available
single-cell
omics
technologies
capture
many
unique
features
with
different
biological
information
content.
Data
integration
aims
to
place
cells,
captured
technologies,
onto
a
common
embedding
facilitate
downstream
analytical
tasks.
Current
horizontal
data
techniques
use
set
of
features,
thereby
ignoring
non-overlapping
and
losing
information.
Here
we
introduce
StabMap,
mosaic
technique
that
stabilizes
mapping
by
exploiting
the
features.
StabMap
first
infers
topology
based
on
shared
then
projects
all
cells
supervised
or
unsupervised
reference
coordinates
traversing
shortest
paths
along
topology.
We
show
performs
well
in
various
simulation
contexts,
facilitates
'multi-hop'
where
some
datasets
do
not
share
any
enables
spatial
gene
expression
for
dissociated
transcriptomic
reference.
Nature Communications,
Год журнала:
2023,
Номер
14(1)
Опубликована: Июль 18, 2023
Abstract
Multiplexed
imaging
enables
measurement
of
multiple
proteins
in
situ,
offering
an
unprecedented
opportunity
to
chart
various
cell
types
and
states
tissues.
However,
classification,
the
task
identifying
type
individual
cells,
remains
challenging,
labor-intensive,
limiting
throughput.
Here,
we
present
CellSighter,
a
deep-learning
based
pipeline
accelerate
classification
multiplexed
images.
Given
small
training
set
expert-labeled
images,
CellSighter
outputs
label
probabilities
for
all
cells
new
achieves
over
80%
accuracy
major
across
platforms,
which
approaches
inter-observer
concordance.
Ablation
studies
simulations
show
that
is
able
generalize
its
data
learn
features
protein
expression
levels,
as
well
spatial
such
subcellular
patterns.
CellSighter’s
design
reduces
overfitting,
it
can
be
trained
with
only
thousands
or
even
hundreds
labeled
examples.
also
prediction
confidence,
allowing
downstream
experts
control
results.
Altogether,
drastically
hands-on
time
while
improving
consistency
datasets.
Molecules and Cells,
Год журнала:
2023,
Номер
46(2), С. 106 - 119
Опубликована: Фев. 1, 2023
With
the
increased
number
of
single-cell
RNA
sequencing
(scRNA-seq)
datasets
in
public
repositories,
integrative
analysis
multiple
scRNA-seq
has
become
commonplace.Batch
effects
among
different
are
inevitable
because
differences
cell
isolation
and
handling
protocols,
library
preparation
technology,
platforms.To
remove
these
batch
for
effective
integration
datasets,
a
methodologies
have
been
developed
based
on
diverse
concepts
approaches.These
methods
proven
useful
examining
whether
cellular
features,
such
as
subpopulations
marker
genes,
identified
from
certain
dataset,
consistently
present,
or
their
conditiondependent
variations,
increases
particular
disease-related
conditions,
observed
generated
under
similar
distinct
conditions.In
this
review,
we
summarize
approaches
pros
cons
reported
previous
literature.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2023,
Номер
unknown
Опубликована: Ноя. 29, 2023
Developing
a
universal
representation
of
cells
which
encompasses
the
tremendous
molecular
diversity
cell
types
within
human
body
and
more
generally,
across
species,
would
be
transformative
for
biology.
Recent
work
using
single-cell
transcriptomic
approaches
to
create
definitions
in
form
atlases
has
provided
necessary
data
such
an
endeavor.
Here,
we
present
Universal
Cell
Embedding
(UCE)
foundation
model.
UCE
was
trained
on
corpus
atlas
from
other
species
completely
self-supervised
way
without
any
annotations.
offers
unified
biological
latent
space
that
can
represent
cell,
regardless
tissue
or
species.
This
embedding
captures
important
variation
despite
presence
experimental
noise
diverse
datasets.
An
aspect
UCE's
universality
is
new
organism
mapped
this
with
no
additional
labeling,
model
training
fine-tuning.
We
applied
Integrated
Mega-scale
Atlas,
36
million
cells,
than
1,000
uniquely
named
types,
hundreds
experiments,
dozens
tissues
eight
uncovered
insights
about
organization
space,
leveraged
it
infer
function
newly
discovered
types.
exhibits
emergent
behavior,
uncovering
biology
never
explicitly
for,
as
identifying
developmental
lineages
novel
not
included
set.
Overall,
by
enabling
every
state
type,
provides
valuable
tool
analysis,
annotation
hypothesis
generation
scale
single
datasets
continues
grow.