TrAnnoScope: A Modular Snakemake Pipeline for Full-Length Transcriptome Analysis and Functional Annotation DOI Open Access

Aysevil Pektas,

Frank Panitz, Bo Thomsen

и другие.

Genes, Год журнала: 2024, Номер 15(12), С. 1547 - 1547

Опубликована: Ноя. 29, 2024

Background/Objectives: Transcriptome assembly and functional annotation are essential in understanding gene expression biological function. Nevertheless, many existing pipelines lack the flexibility to integrate both short- long-read sequencing data or fail provide a complete, customizable workflow for transcriptome analysis, particularly non-model organisms. Methods: We present TrAnnoScope, analysis pipeline designed process Illumina short-read PacBio data. The provides generate high-quality, full-length (FL) transcripts with broad annotation. Its modular design allows users adapt specific steps other platforms types. encompasses from quality control annotation, employing tools established databases such as SwissProt, Pfam, Gene Ontology (GO), Kyoto Encyclopedia of Genes Genomes (KEGG), Eukaryotic Orthologous Groups (KOG). As case study, TrAnnoScope was applied RNA-Seq Iso-Seq zebra finch brain, ovary, testis tissue. Results: generated by tissue demonstrated strong alignment reference genome (99.63%), it found that 93.95% matched protein sequences proteome were captured nearly complete. Functional provided matches known assigned relevant terms majority transcripts. Conclusions: successfully integrates short long technologies transcriptomes minimal user input. modularity ease use make valuable tool researchers analyzing complex datasets,

Язык: Английский

Systematic assessment of long-read RNA-seq methods for transcript identification and quantification DOI Creative Commons
Francisco J. Pardo-Palacios, Dingjie Wang, Fairlie Reese

и другие.

Nature Methods, Год журнала: 2024, Номер 21(7), С. 1349 - 1363

Опубликована: Июнь 7, 2024

Abstract The Long-read RNA-Seq Genome Annotation Assessment Project Consortium was formed to evaluate the effectiveness of long-read approaches for transcriptome analysis. Using different protocols and sequencing platforms, consortium generated over 427 million sequences from complementary DNA direct RNA datasets, encompassing human, mouse manatee species. Developers utilized these data address challenges in transcript isoform detection, quantification de novo detection. study revealed that libraries with longer, more accurate produce transcripts than those increased read depth, whereas greater depth improved accuracy. In well-annotated genomes, tools based on reference demonstrated best performance. Incorporating additional orthogonal replicate samples is advised when aiming detect rare novel or using reference-free approaches. This collaborative offers a benchmark current practices provides direction future method development

Язык: Английский

Процитировано

74

PyHMMER: a Python library binding to HMMER for efficient sequence analysis DOI Creative Commons
Martin Larralde, Georg Zeller

Bioinformatics, Год журнала: 2023, Номер 39(5)

Опубликована: Апрель 19, 2023

PyHMMER provides Python integration of the popular profile Hidden Markov Model software HMMER via Cython bindings. This allows annotation protein sequences with HMMs and building new ones directly Python. increases flexibility use, allowing creating queries from code, launching searches, obtaining results without I/O, or accessing previously unavailable statistics like uncorrected P-values. A parallelization model greatly improves performance when running multithreaded while producing exact same as HMMER.

Язык: Английский

Процитировано

47

Systematic assessment of long-read RNA-seq methods for transcript identification and quantification DOI Creative Commons
Francisco J. Pardo-Palacios, Dingjie Wang, Fairlie Reese

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2023, Номер unknown

Опубликована: Июль 27, 2023

Abstract The Long-read RNA-Seq Genome Annotation Assessment Project (LRGASP) Consortium was formed to evaluate the effectiveness of long-read approaches for transcriptome analysis. consortium generated over 427 million sequences from cDNA and direct RNA datasets, encompassing human, mouse, manatee species, using different protocols sequencing platforms. These data were utilized by developers address challenges in transcript isoform detection quantification, as well de novo identification. study revealed that libraries with longer, more accurate produce transcripts than those increased read depth, whereas greater depth improved quantification accuracy. In well-annotated genomes, tools based on reference demonstrated best performance. When aiming detect rare novel or when reference-free approaches, incorporating additional orthogonal replicate samples are advised. This collaborative offers a benchmark current practices provides direction future method development

Язык: Английский

Процитировано

25

Comprehensive assessment of mRNA isoform detection methods for long-read sequencing data DOI Creative Commons

Yaqi Su,

Zhejian Yu,

Siqian Jin

и другие.

Nature Communications, Год журнала: 2024, Номер 15(1)

Опубликована: Май 10, 2024

Abstract The advancement of Long-Read Sequencing (LRS) techniques has significantly increased the length sequencing to several kilobases, thereby facilitating identification alternative splicing events and isoform expressions. Recently, numerous computational tools for detection using long-read data have been developed. Nevertheless, there remains a deficiency in comparative studies that systemically evaluate performance these tools, which are implemented with different algorithms, under various simulations encompass potential influencing factors. In this study, we conducted benchmark analysis thirteen methods nine capable identifying structures from RNA-seq data. We evaluated their performances simulated data, represented diverse platforms generated by an in-house simulator, RNA sequins (sequencing spike-ins) as well experimental Our findings demonstrate IsoQuant highly effective tool LRS, Bambu StringTie2 also exhibiting strong performance. These results offer valuable guidance future research on ongoing improvement LRS

Язык: Английский

Процитировано

11

Identifying and quantifying isoforms from accurate full-length transcriptome sequencing reads with Mandalorion DOI Creative Commons
Roger Volden, Kayla D. Schimke, Ashley Byrne

и другие.

Genome biology, Год журнала: 2023, Номер 24(1)

Опубликована: Июль 17, 2023

In this manuscript, we introduce and benchmark Mandalorion v4.1 for the identification quantification of full-length transcriptome sequencing reads. It further improves upon already strong performance v3.6 used in LRGASP consortium challenge. By processing real simulated data, show three main features Mandalorion: first, Mandalorion-based isoform has very high precision maintains recall even absence any genome annotation. Second, read counts as quantified by a correlation with counts. Third, isoforms identified closely reflect data sets they are based on.

Язык: Английский

Процитировано

19

Single-cell and spatial transcriptomics: Bridging current technologies with long-read sequencing DOI
Chengwei Ulrika Yuan, Fu Xiang Quah, Martin Hemberg

и другие.

Molecular Aspects of Medicine, Год журнала: 2024, Номер 96, С. 101255 - 101255

Опубликована: Фев. 17, 2024

Язык: Английский

Процитировано

6

From words to complete phrases: insight into single-cell isoforms using short and long reads DOI Creative Commons
Anoushka Joglekar, Careen Foord, Julien Jarroux

и другие.

Transcription, Год журнала: 2023, Номер 14(3-5), С. 92 - 104

Опубликована: Июнь 14, 2023

The profiling of gene expression patterns to glean biological insights from single cells has become commonplace over the last few years. However, this approach overlooks transcript contents that can differ between individual and cell populations. In review, we describe early work in field single-cell short-read sequencing as well full-length isoforms cells. We then recent long-read wherein some elements have been observed tandem. Based on earlier bulk tissue, motivate study combination other RNA variables. Given are still blind aspects isoform biology, suggest possible future avenues such CRISPR screens which further illuminate function variables distinct

Язык: Английский

Процитировано

15

Challenges in identifying mRNA transcript starts and ends from long-read sequencing data DOI
Ezequiel Calvo-Roitberg, Rachel F. Daniels, Athma A. Pai

и другие.

Genome Research, Год журнала: 2024, Номер 34(11), С. 1719 - 1734

Опубликована: Ноя. 1, 2024

Long-read sequencing (LRS) technologies have the potential to revolutionize scientific discoveries in RNA biology through comprehensive identification and quantification of full-length mRNA isoforms. Despite great promise, challenges remain widespread implementation LRS for RNA-based applications, including concerns about low coverage, high error, robust computational pipelines. Although much focus has been placed on defining exon composition structure with data, less careful characterization done ability assess terminal ends isoforms, specifically, transcription start end sites. Such is crucial completely delineating full molecules regulatory consequences. However, there are substantial inconsistencies both coordinates reads spanning a gene, such that often fail accurately recapitulate annotated or empirically derived molecules. Here, we describe specific identifying quantifying how these issues influence biological interpretations data. We then review recent experimental advances designed alleviate problems, ideal use cases each approach. Finally, outline anticipated developments necessary improvements from

Язык: Английский

Процитировано

4

IsoTools 2.0: software for comprehensive analysis of long-read transcriptome sequencing data DOI Creative Commons
Yalan Bi,

Tom Lukas Lankenau,

Matthias Lienhard

и другие.

Journal of Molecular Biology, Год журнала: 2025, Номер unknown, С. 169049 - 169049

Опубликована: Фев. 1, 2025

Язык: Английский

Процитировано

0

Discovery of Novel Protein-Coding and Long Non-coding Transcripts in Distinct Regions of the Human Brain DOI Creative Commons
Kristina Santucci, Yuning Cheng, Si-Mei Xu

и другие.

Journal of Molecular Neuroscience, Год журнала: 2025, Номер 75(1)

Опубликована: Март 6, 2025

Recent improvements in the accuracy of long-read sequencing (LRS) technologies have expanded scope for novel transcriptional isoform discovery. Additionally, these advancements improved precision transcript quantification, enabling a more accurate reconstruction complex splicing patterns and transcriptomes. Thus, this project aims to take advantage analytical developments discovery analysis RNA isoforms human brain. A set was compiled using three bioinformatic tools, quantifying their expression across eight replicates cerebellar hemisphere, five frontal cortex, six putamen. By taking subset consistent all methods, 170 highly confident curated downstream analysis. This consisted 104 messenger RNAs (mRNAs) 66 long non-coding (lncRNAs) isoforms. The detailed structure, expression, potential encoded proteins mRNA BambuTx321 been further described as an exemplary representative. tissue-specific [mean counts per million (CPM) 5.979] lncRNA, BambuTx1299, hemisphere observed. Overall, has identified annotated several diverse tissues brain, providing insights into investigating functional roles. contributed comprehensive understanding brain's transcriptomic landscape applications basic research.

Язык: Английский

Процитировано

0