Circulation,
Journal Year:
2018,
Volume and Issue:
138(4), P. 377 - 393
Published: March 27, 2018
Background:
No
pharmacological
therapy
exists
for
calcific
aortic
valve
disease
(CAVD),
which
confers
a
dismal
prognosis
without
invasive
replacement.
The
search
therapeutics
and
early
diagnostics
is
challenging
because
CAVD
presents
in
multiple
pathological
stages.
Moreover,
it
occurs
the
context
of
complex,
multi-layered
tissue
architecture;
rich
abundant
extracellular
matrix
phenotype;
unique,
highly
plastic,
multipotent
resident
cell
population.
Methods:
A
total
25
human
stenotic
valves
obtained
from
replacement
surgeries
were
analyzed
by
modalities,
including
transcriptomics
global
unlabeled
label-based
tandem-mass-tagged
proteomics.
Segmentation
into
stage–specific
samples
was
guided
near-infrared
molecular
imaging,
anatomic
layer-specificity
facilitated
laser
capture
microdissection.
Side-specific
cultures
subjected
to
calcifying
stimuli,
their
calcification
potential
basal/stimulated
proteomes
evaluated.
Molecular
(protein–protein)
interaction
networks
built,
central
proteins
associations
identified.
Results:
Global
transcriptional
protein
expression
signatures
differed
between
nondiseased,
fibrotic,
stages
CAVD.
Anatomic
microlayers
exhibited
unique
proteome
profiles
that
maintained
throughout
progression
identified
glial
fibrillary
acidic
as
specific
marker
valvular
interstitial
cells
spongiosa
layer.
marked
an
emergence
smooth
muscle
activation,
inflammation,
calcification-related
pathways.
Proteins
overrepresented
disease-prone
fibrosa
are
functionally
annotated
fibrosis
pathways,
we
found
vitro,
fibrosa-derived
demonstrated
greater
than
those
ventricularis.
These
studies
confirmed
microlayer-specific
preserved
cultured
cells,
exposed
alkaline
phosphatase–dependent
phosphatase–independent
stimuli
had
distinct
profiles,
both
overlapped
with
whole
tissue.
Analysis
protein–protein
significant
closeness
inflammatory
fibrotic
diseases.
Conclusions:
spatially
temporally
resolved
multi-omics,
network
systems
biology
strategy
identifies
first
regulatory
CAVD,
cardiac
condition
cure,
describes
novel
means
systematic
ontology
broadly
applicable
comprehensive
omics
cardiovascular
PLoS Computational Biology,
Journal Year:
2021,
Volume and Issue:
17(11), P. e1009442 - e1009442
Published: Nov. 16, 2021
It
is
challenging
to
associate
features
such
as
human
health
outcomes,
diet,
environmental
conditions,
or
other
metadata
microbial
community
measurements,
due
in
part
their
quantitative
properties.
Microbiome
multi-omics
are
typically
noisy,
sparse
(zero-inflated),
high-dimensional,
extremely
non-normal,
and
often
the
form
of
count
compositional
measurements.
Here
we
introduce
an
optimized
combination
novel
established
methodology
assess
multivariable
association
with
complex
population-scale
observational
studies.
Our
approach,
MaAsLin
2
(Microbiome
Multivariable
Associations
Linear
Models),
uses
generalized
linear
mixed
models
accommodate
a
wide
variety
modern
epidemiological
studies,
including
cross-sectional
longitudinal
designs,
well
data
types
(e.g.,
counts
relative
abundances)
without
covariates
repeated
To
construct
this
method,
conducted
large-scale
evaluation
broad
range
scenarios
under
which
straightforward
identification
meta-omics
associations
can
be
challenging.
These
simulation
studies
reveal
that
2’s
model
preserves
statistical
power
presence
measures
multiple
covariates,
while
accounting
for
nuances
controlling
false
discovery.
We
also
applied
dataset
from
Integrative
Human
(HMP2)
project
which,
addition
reproducing
results,
revealed
unique,
integrated
landscape
inflammatory
bowel
diseases
(IBD)
across
time
points
omics
profiles.
Bioinformatics and Biology Insights,
Journal Year:
2020,
Volume and Issue:
14, P. 117793221989905 - 117793221989905
Published: Jan. 1, 2020
To
study
complex
biological
processes
holistically,
it
is
imperative
to
take
an
integrative
approach
that
combines
multi-omics
data
highlight
the
interrelationships
of
involved
biomolecules
and
their
functions.
With
advent
high-throughput
techniques
availability
generated
from
a
large
set
samples,
several
promising
tools
methods
have
been
developed
for
integration
interpretation.
In
this
review,
we
collected
adopt
analyze
multiple
omics
summarized
ability
address
applications
such
as
disease
subtyping,
biomarker
prediction,
deriving
insights
into
data.
We
provide
methodology,
use-cases,
limitations
these
tools;
brief
account
repositories
visualization
portals;
challenges
associated
with
integration.
Molecular Systems Biology,
Journal Year:
2018,
Volume and Issue:
14(6)
Published: June 1, 2018
Method20
June
2018Open
Access
Transparent
process
Multi-Omics
Factor
Analysis—a
framework
for
unsupervised
integration
of
multi-omics
data
sets
Ricard
Argelaguet
orcid.org/0000-0003-3199-3722
European
Molecular
Biology
Laboratory,
Bioinformatics
Institute,
Hinxton,
Cambridge,
UK
Search
more
papers
by
this
author
Britta
Velten
orcid.org/0000-0002-8397-3515
Laboratory
(EMBL),
Heidelberg,
Germany
Damien
Arnol
orcid.org/0000-0003-2462-534X
Sascha
Dietrich
orcid.org/0000-0002-0648-1832
Heidelberg
University
Hospital,
Thorsten
Zenz
orcid.org/0000-0001-7890-9845
German
Cancer
Research
Center
(dkfz)
and
National
Tumor
Diseases
(NCT),
&
Hematology,
Hospital
Zurich
Zurich,
Switzerland
John
C
Marioni
orcid.org/0000-0001-9092-0852
Cambridge
Wellcome
Trust
Sanger
Florian
Buettner
Corresponding
Author
[email
protected]
orcid.org/0000-0001-5587-6761
Helmholtz
Zentrum
München–German
Environmental
Health,
Institute
Computational
Biology,
Neuherberg,
Wolfgang
Huber
orcid.org/0000-0002-0474-2218
Oliver
Stegle
orcid.org/0000-0002-8818-7193
Information
Argelaguet1,‡,
Velten2,‡,
Arnol1,
Dietrich3,
Zenz3,4,5,
Marioni1,6,7,
*,1,8,
*,2
*,1,2
1European
2European
3Heidelberg
4German
5Germany
6Cancer
7Wellcome
8Helmholtz
‡These
authors
contributed
equally
to
work
*Corresponding
author.
Tel:
+49
89
23742560;
E-mail:
6221
387
8823;
3878190;
Systems
(2018)14:e8124https://doi.org/10.15252/msb.20178124
PDFDownload
PDF
article
text
main
figures.
Peer
ReviewDownload
a
summary
the
editorial
decision
including
letters,
reviewer
comments
responses
feedback.
ToolsAdd
favoritesDownload
CitationsTrack
CitationsPermissions
ShareFacebookTwitterLinked
InMendeleyWechatReddit
Figures
Info
Abstract
Multi-omics
studies
promise
improved
characterization
biological
processes
across
molecular
layers.
However,
methods
resulting
heterogeneous
are
lacking.
We
present
Analysis
(MOFA),
computational
method
discovering
principal
sources
variation
in
sets.
MOFA
infers
set
(hidden)
factors
that
capture
technical
variability.
It
disentangles
axes
heterogeneity
shared
multiple
modalities
those
specific
individual
modalities.
The
learnt
enable
variety
downstream
analyses,
identification
sample
subgroups,
imputation
detection
outlier
samples.
applied
cohort
200
patient
samples
chronic
lymphocytic
leukaemia,
profiled
somatic
mutations,
RNA
expression,
DNA
methylation
ex
vivo
drug
responses.
identified
major
dimensions
disease
heterogeneity,
immunoglobulin
heavy-chain
variable
region
status,
trisomy
chromosome
12
previously
underappreciated
drivers,
such
as
response
oxidative
stress.
In
second
application,
we
used
analyse
single-cell
data,
identifying
coordinated
transcriptional
epigenetic
changes
along
cell
differentiation.
Synopsis
(MOFA)
is
discovery
when
omics
assays
same
broadly
applicable
approach
integration.
inferred
latent
represent
underlying
Factors
can
be
or
data-type
specific.
model
flexibly
handles
missing
values
different
types.
an
application
Chronic
Lymphocytic
Leukaemia,
discovers
low
dimensional
space
spanned
known
clinical
markers
profiles
from
single-cells,
recovers
differentiation
trajectories
identifies
between
transcriptome
epigenome.
Introduction
Technological
advances
increasingly
layers
probed
parallel,
ranging
genome,
epigenome,
transcriptome,
proteome
metabolome
phenome
profiling
(Hasin
et
al,
2017).
Integrative
analyses
use
information
these
deliver
comprehensive
insights
into
systems
under
study.
Motivated
this,
domains,
cancer
biology
(Gerstung
2015;
Iorio
2016;
Mertins
Genome
Atlas
Network,
2017),
regulatory
genomics
(Chen
2016),
microbiology
(Kim
2016)
host-pathogen
(Soderholm
2016).
Most
recent
technological
have
also
enabled
performing
at
level
(Macaulay
Angermueller
Guo
2017;
Clark
2018;
Colomé-Tatché
Theis,
2018).
A
common
aim
applications
characterize
samples,
manifested
one
several
(Ritchie
2015).
particularly
appealing
if
relevant
not
priori,
hence
may
missed
consider
single
modality
targeted
approaches.
basic
strategy
testing
marginal
associations
prominent
example
quantitative
trait
locus
mapping,
where
large
numbers
association
tests
performed
genetic
variants
gene
expression
levels
(GTEx
Consortium,
2015)
marks
While
em-inently
useful
variant
annotation,
inherently
local
do
provide
coherent
global
map
differences
kernel-
graph-based
combine
types
similarity
network
(Lanckriet
2004;
Wang
2014);
however,
it
difficult
pinpoint
determinants
graph
structure.
Related
there
exist
generalizations
other
clustering
reconstruct
discrete
groups
based
on
(Shen
2009;
Mo
2013).
key
challenge
sufficiently
addressed
approaches
interpretability.
particular,
would
desirable
drive
observed
These
could
continuous
gradients,
clusters
combinations
thereof.
Such
help
establishing
explaining
with
external
phenotypes
covariates.
Although
factor
models
address
been
proposed
(e.g.
Meng
2014,
Tenenhaus
2014;
preprint:
Singh
2018),
either
lack
sparsity,
which
reduce
interpretability,
require
substantial
number
parameters
determined
using
computationally
demanding
cross-validation
post
hoc.
Further
challenges
faced
existing
scalability
larger
sets,
handling
non-Gaussian
modalities,
binary
readouts
count-based
traits.
Results
statistical
integrating
fashion.
Intuitively,
viewed
versatile
statistically
rigorous
generalization
component
analysis
(PCA)
data.
Given
matrices
measurements
partially
overlapping
interpretable
low-dimensional
representation
terms
(Fig
1A).
thus
facilitating
gradients
subgroups
loadings
sparse,
thereby
linkage
most
features.
Importantly,
what
extent
each
unique
1B),
revealing
Once
trained,
output
range
visualization,
classification
space(s)
factors,
well
automated
annotation
(gene
set)
enrichment
analysis,
1B).
Figure
1.
Analysis:
overview
Model
overview:
takes
M
input
(Y1,…,
YM),
modality,
co-occurrent
but
features
necessarily
related
differ
numbers.
decomposes
matrix
(Z)
weight
matrices,
(W1,..,
WM).
White
cells
correspond
zeros,
i.e.
inactive
features,
whereas
cross
symbol
denotes
values.
fitted
queried
(i)
variance
decomposition,
assessing
proportion
explained
(ii)
semi-automated
inspection
(iii)
visualization
(iv)
values,
assays.
Download
figure
PowerPoint
Technically,
builds
upon
group
(Virtanen
2012;
Khan
Klami
Bunte
Zhao
Leppäaho
Kaski,
adapted
requirements
(Materials
Methods):
fast
inference
variational
approximation,
sparse
solutions
interpretation,
efficient
flexible
combination
likelihood
enables
diverse
binary-,
count-
continuous-valued
relationship
previous
Virtanen
2013;
Remes
Hore
Leppáaho
2017)
discussed
Materials
Methods
Appendix
Table
S3.
implemented
well-documented
open-source
software
comes
tutorials
workflows
domains
Methods).
Taken
together,
functionalities
powerful
tool
disentangling
studies.
validation
comparison
simulated
First,
validate
MOFA,
its
generative
model,
varying
views,
models,
Methods,
S1).
found
was
able
accurately
dimension,
except
settings
high
proportions
(Appendix
Fig
account
observations
fit
simulating
count
Figs
S2
S3).
compared
two
reported
integration:
GFA
(Leppäaho
iCluster
(Mo
Over
simulations,
tended
infer
redundant
S4)
were
less
accurate
recovering
patterns
activity
views
S5).
than
EV1).
For
example,
training
CLL
next,
required
25
min
versus
34
h
5–6
days
iCluster.
Click
here
expand
figure.
EV1.
Scalability
iClusterTime
(red),
(blue)
(green)
function
K,
D,
N
M.
Baseline
=
3,
K
10,
D
1,000
100
5%
Shown
average
time
10
trials,
error
bars
denote
standard
deviation.
only
shown
lowest
all
training.
Application
leukaemia
study
(CLL),
combined
mutation
(Dietrich
2A).
Notably,
nearly
40%
some
types;
value
scenario
uncommon
studies,
designed
cope
Methods;
configured
order
accommodate
2.
A.
Study
Data
rows
(D
features)
(N)
columns,
grey
bars.
B,
C.
(B)
Proportion
total
(R2)
assay
(C)
cumulative
explained.
D.
Absolute
top
1
2
Mutations
E.
Visualization
colours
IGHV
status
tumours;
shape
colour
tone
indicate
status.
F.
Number
enriched
Reactome
per
(FDR
<
1%).
categories
pathways
defined
S2.
(minimum
2%
least
type;
robust
algorithm
initialization
subsampling
S6
S7).
largely
orthogonal,
capturing
independent
S6).
Among
these,
active
assays,
indicating
broad
roles
2B).
contrast,
3
5
4
only.
Cumulatively,
41%
38%
mRNA
24%
2C).
trained
excluding
probe
their
redundancy,
finding
still
recovered,
while
others
dependent
type
S8).
2013),
consistent
instances
S9).
important
reveals
axis
attributed
stress
As
part
pipeline,
provides
strategies
identify
aetiology
weights
aligned
(IGHV),
2D
E).
Thus,
correctly
them
(Zenz
2010;
Fabbri
Dalla-Favera,
marker
associated
1,
surrogate
state
tumour's
origin
activation
B-cell
receptor.
practice
generally
considered
(Fabbri
our
results
complex
substructure
3A,
S10).
At
current
resolution,
three
subgroup
Oakes
al
(2016)
Queiros
(2015)
S11),
although
suggestive
evidence
continuum.
connected
S12
S13),
genes
linked
(Vasconcelos
2005;
Maloum
Trojani
Morabito
Plesingerova
3B
C)
drugs
target
kinases
receptor
pathway
3D
3.
Characterization
Beeswarm
plot
corresponding
3-means
(LZ),
intermediate
(IZ)
(HZ).
largest
absolute
Plus
minus
symbols
right
sign
loading.
Genes
highlighted
orange
described
prognostic
Heatmap
(B).
weights,
annotated
category.
Drug
curves
stratified
(A).
Despite
importance,
accounted
20%
suggesting
existence
heterogeneity.
One
5,
revealed
tagged
senescence
(Figs
2F
EV2A),
heat-shock
proteins
(HSPs;
EV2B
C),
essential
protein
folding
up-regulated
conditions
(Srivastava,
2002;
Åkerfelt
2010).
HSP
cancers
tumour
survival
(Trachootham
2009),
far
family
has
received
little
attention
context
CLL.
Consistent
strongest
stress,
reactive
oxygen
species
(ROS),
damage
apoptosis
EV2D
EV2.
(oxidative
factor)
5.
Colours
TNF,
inflammatory
marker.
Gene
(t-test,
six
Samples
ordered
Scaled
loading,
captured
9%
suggested
aetiologies
immune
T-cell
signalling
2F),
likely
due
composition
samples:
comprised
mainly
B
cells,
possible
contamination
T
monocytes
S14).
11%
samples'
general
sensitivity
(Geeleher
S15).
imputes
Next,
explored
annotations,
missing,
mis-annotated
inaccurate,
since
they
frequently
imperfect
surrogates
(Westra
2011).
Since
biomarker
impacting
care,
assessed
consistency
176
out
patients,
agreement
further
allowed
classifying
patients
lacked
clinically
measured
EV3A
B).
Interestingly,
assigned
label.
Upon
nine
cases
showed
signatures,
borderline
classification;
remaining
clearly
discordant
EV3C
D).
Additional
whole
exome
sequencing
confirmed
outliers
within
EV3E
F).
EV3.
Prediction
denoting
predicted
labels
Pie
chart
showing
imputed
Sample-to-sample
correlation
ONO-4509
(not
included
data):
Boxplots
viability
ONO-4509.
middle;
left
right,
viabilities
M-CLL
U-CLL
shown,
respectively.
panels
show
concentrations
tested.
Boxes
first
third
quartiles
value.
Whole
mutations
y-axis,
separately
labelled.
incomplete
problem
high-throughput
ability
fill
entire
both
tasks,
yielded
predictions
established
strategies,
feature-wise
mean,
SoftImpute
(Mazumder
2010)
k-nearest
neighbour
(Troyanskaya
2001;
EV4,
S16),
GFA,
especially
case
S17).
EV4.
Imputation
A,
B.
Considered
SoftImpute,
mean
(Mean)
(kNN).
averages
squared
(MSE)
15
experiments
increasing
fractions
considering
(A)
random
random.
Error
plus
error.
Latent
predictive
outcomes
Finally,
utility
predictors
outcomes.
Three
significantly
next
treatment
(Cox
regression,
FDR
1%,
4A
B):
origin,
Factors,
7
8,
chemo-immunotherapy
prior
collection
(P
0.01,
t-test).
captures
del17p
TP53
oncogenes
(Garg
Fluhr
S18),
8
WNT
S19).
4.
Relationship
Association
univariate
Cox
regression
174
(96
Biological
aging
is
the
gradual,
progressive
decline
in
system
integrity
that
occurs
with
advancing
chronological
age,
causing
morbidity
and
disability.
Measurements
of
pace
are
needed
as
surrogate
endpoints
trials
therapies
designed
to
prevent
disease
by
slowing
biological
aging.
We
report
a
blood-DNA-methylation
measure
sensitive
variation
among
individuals
born
same
year.
first
modeled
change-over-time
18
biomarkers
tracking
organ-system
across
12
years
follow-up
n
=
954
members
Dunedin
Study
1972–1973.
Rates
change
each
biomarker
over
ages
26–38
were
composited
form
aging-related
decline,
termed
Pace-of-Aging.
Elastic-net
regression
was
used
develop
DNA-methylation
predictor
Pace-of-Aging,
called
DunedinPoAm
for
Dunedin(P)ace(o)f(A)ging(m)ethylation.
Validation
analysis
cohort
studies
CALERIE
trial
provide
proof-of-principle
single-time-point
person’s
Nucleic Acids Research,
Journal Year:
2018,
Volume and Issue:
unknown
Published: July 20, 2018
We
report
a
new
class
of
artifacts
in
DNA
methylation
measurements
from
Illumina
HumanMethylation450
and
MethylationEPIC
arrays.
These
reflect
failed
hybridization
to
target
DNA,
often
due
germline
or
somatic
deletions
manifest
as
incorrectly
reported
intermediate
methylation.
The
survive
existing
preprocessing
pipelines,
masquerade
epigenetic
alterations
can
confound
discoveries
epigenome-wide
association
studies
methylation-quantitative
trait
loci.
implement
solution,
P-value
with
out-of-band
(OOB)
array
(pOOBAH),
the
R
package
SeSAMe.
Our
method
effectively
masks
deleted
hyperpolymorphic
regions,
reducing
eliminating
spurious
reports
silencing
at
oft-deleted
tumor
suppressor
genes
such
CDKN2A
RB1
cases
deletions.
Furthermore,
our
substantially
decreases
technical
variation
whilst
retaining
biological
variation,
both
within
across
HM450
EPIC
platform
measurements.
SeSAMe
provides
light-weight,
modular
data
analysis
suite,
performant
implementation
suitable
for
efficient
thousands
samples.
International Journal of Molecular Sciences,
Journal Year:
2019,
Volume and Issue:
20(19), P. 4781 - 4781
Published: Sept. 26, 2019
Recent
advances
in
omics
technologies
have
led
to
unprecedented
efforts
characterizing
the
molecular
changes
that
underlie
development
and
progression
of
a
wide
array
complex
human
diseases,
including
cancer.
As
result,
multi-omics
analyses—which
take
advantage
these
genomics,
transcriptomics,
epigenomics,
proteomics,
metabolomics,
other
areas—have
been
proposed
heralded
as
key
advancing
precision
medicine
clinic.
In
field
oncology,
genomics
approaches,
and,
more
recently,
analyses
helped
reveal
several
mechanisms
cancer
development,
treatment
resistance,
recurrence
risk,
findings
implemented
clinical
oncology
help
guide
decisions.
However,
truly
integrated
not
applied
widely,
preventing
further
medicine.
Additional
are
needed
develop
analytical
infrastructure
necessary
generate,
analyze,
annotate
data
effectively
inform
medicine-based
decision-making.
Computational and Structural Biotechnology Journal,
Journal Year:
2021,
Volume and Issue:
19, P. 3735 - 3746
Published: Jan. 1, 2021
Increased
availability
of
high-throughput
technologies
has
generated
an
ever-growing
number
omics
data
that
seek
to
portray
many
different
but
complementary
biological
layers
including
genomics,
epigenomics,
transcriptomics,
proteomics,
and
metabolomics.
New
insight
from
these
have
been
obtained
by
machine
learning
algorithms
produced
diagnostic
classification
biomarkers.
Most
biomarkers
date
however
only
include
one
omic
measurement
at
a
time
thus
do
not
take
full
advantage
recent
multi-omics
experiments
now
capture
the
entire
complexity
systems.
Multi-omics
integration
strategies
are
needed
combine
knowledge
brought
each
layer.
We
summarized
most
methods/
frameworks
into
five
strategies:
early,
mixed,
intermediate,
late
hierarchical.
In
this
mini-review,
we
focus
on
challenges
existing
paying
special
attention
applications.