Exploring the latent space of transcriptomic data with topic modeling DOI Creative Commons
Filippo Valle, Michele Caselle, Matteo Osella

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Nov. 3, 2024

Abstract The availability of high-dimensional transcriptomic datasets is increasing at a tremendous pace, together with the need for suitable computational tools. Clustering and dimensionality reduction methods are popular go-to to identify basic structures in these datasets. At same time, different topic modeling techniques have been developed organize deluge available data natural language using their latent topical structure. This paper leverages statistical analogies between text compare when applied gene expression data. Specifically, we test accuracy specific task discovering reconstructing tissue structure human transcriptome distinguishing healthy from cancerous tissues. We examine properties space recovered by methods, highlight differences, pros cons across tasks. Finally, show that can be useful embedding space, where neural network classifier annotate profiles high accuracy.

Language: Английский

Exploring the latent space of transcriptomic data with topic modeling DOI Creative Commons
Filippo Valle, Michele Caselle, Matteo Osella

et al.

NAR Genomics and Bioinformatics, Journal Year: 2025, Volume and Issue: 7(2)

Published: March 29, 2025

Abstract The availability of high-dimensional transcriptomic datasets is increasing at a tremendous pace, together with the need for suitable computational tools. Clustering and dimensionality reduction methods are popular go-to to identify basic structures in these datasets. At same time, different topic modeling techniques have been developed organize deluge available data natural language using their latent topical structure. This paper leverages statistical analogies between text compare when applied gene expression data. Specifically, we test accuracy specific task discovering reconstructing tissue structure human transcriptome distinguishing healthy from cancerous tissues. We examine properties space recovered by methods, highlight differences, pros cons across tasks. focus particular on how priors can affect results interpretability. Finally, show that be useful low-dimensional embedding space, where neural network classifier annotate profiles high accuracy.

Language: Английский

Citations

0

Exploring the latent space of transcriptomic data with topic modeling DOI Creative Commons
Filippo Valle, Michele Caselle, Matteo Osella

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Nov. 3, 2024

Abstract The availability of high-dimensional transcriptomic datasets is increasing at a tremendous pace, together with the need for suitable computational tools. Clustering and dimensionality reduction methods are popular go-to to identify basic structures in these datasets. At same time, different topic modeling techniques have been developed organize deluge available data natural language using their latent topical structure. This paper leverages statistical analogies between text compare when applied gene expression data. Specifically, we test accuracy specific task discovering reconstructing tissue structure human transcriptome distinguishing healthy from cancerous tissues. We examine properties space recovered by methods, highlight differences, pros cons across tasks. Finally, show that can be useful embedding space, where neural network classifier annotate profiles high accuracy.

Language: Английский

Citations

2