Detecting differential transcript usage in complex diseases with SPIT DOI Creative Commons
Beril Erdogdu, Ales Varabyou, Stephanie C. Hicks

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2023, Номер unknown

Опубликована: Июль 10, 2023

Differential transcript usage (DTU) plays a crucial role in determining how gene expression differs among cells, tissues, and different developmental stages, thereby contributing to the complexity diversity of biological systems. In abnormal it can also lead deficiencies protein function, potentially leading pathogenesis diseases. Detecting such events for single-gene genetic traits is relatively uncomplicated; however, heterogeneity populations with complex diseases presents an intricate challenge due presence diverse causal undetermined subtypes. SPIT first statistical tool that quantifies within population identifies predominant subgroups along their distinctive sets DTU events. We provide comprehensive assessments SPIT's methodology both report results applying analyze brain samples from individuals schizophrenia. Our analysis reveals previously unreported six candidate genes.

Язык: Английский

Developmental isoform diversity in the human neocortex informs neuropsychiatric risk mechanisms DOI
Ashok Patowary, Pan Zhang, Connor Jops

и другие.

Science, Год журнала: 2024, Номер 384(6698)

Опубликована: Май 23, 2024

RNA splicing is highly prevalent in the brain and has strong links to neuropsychiatric disorders; yet, role of cell type-specific transcript-isoform diversity during human development not been systematically investigated. In this work, we leveraged single-molecule long-read sequencing deeply profile full-length transcriptome germinal zone cortical plate regions developing neocortex at tissue single-cell resolution. We identified 214,516 distinct isoforms, which 72.6% were novel (not previously annotated Gencode version 33), uncovered a substantial contribution diversity-regulated by binding proteins-in defining cellular identity neocortex. comprehensive isoform-centric gene annotation reprioritize thousands rare de novo risk variants elucidate genetic mechanisms for disorders.

Язык: Английский

Процитировано

23

CHESS 3: an improved, comprehensive catalog of human genes and transcripts based on large-scale expression data, phylogenetic analysis, and protein structure DOI Creative Commons
Ales Varabyou, Markus J. Sommer, Beril Erdogdu

и другие.

Genome biology, Год журнала: 2023, Номер 24(1)

Опубликована: Окт. 30, 2023

CHESS 3 represents an improved human gene catalog based on nearly 10,000 RNA-seq experiments across 54 body sites. It significantly improves current genome annotation by integrating the latest reference data and algorithms, machine learning techniques for noise filtering, new protein structure prediction methods. contains 41,356 genes, including 19,839 protein-coding genes 158,377 transcripts, with 14,863 transcripts not in other catalogs. includes all MANE at least one transcript most RefSeq GENCODE genes. On CHM13 genome, additional 129 is available http://ccb.jhu.edu/chess .

Язык: Английский

Процитировано

23

The hidden impact of in-source fragmentation in metabolic and chemical mass spectrometry data interpretation DOI
Martin Giera,

Aries Aisporna,

Winnie Uritboonthai

и другие.

Nature Metabolism, Год журнала: 2024, Номер 6(9), С. 1647 - 1648

Опубликована: Июнь 25, 2024

Язык: Английский

Процитировано

15

Investigating open reading frames in known and novel transcripts using ORFanage DOI
Ales Varabyou, Beril Erdogdu, Steven L. Salzberg

и другие.

Nature Computational Science, Год журнала: 2023, Номер 3(8), С. 700 - 708

Опубликована: Июль 31, 2023

Язык: Английский

Процитировано

11

Splam: a deep-learning-based splice site predictor that improves spliced alignments DOI Creative Commons
Kuan-Hao Chao, Alan Mao, Steven L. Salzberg

и другие.

Genome biology, Год журнала: 2024, Номер 25(1)

Опубликована: Сен. 16, 2024

The process of splicing messenger RNA to remove introns plays a central role in creating genes and gene variants. We describe Splam, novel method for predicting splice junctions DNA using deep residual convolutional neural networks. Unlike previous models, Splam looks at 400-base-pair window flanking each site, reflecting the biological that relies primarily on signals within this window. also trains donor acceptor pairs together, mirroring how machinery recognizes both ends intron. Compared SpliceAI, is consistently more accurate, achieving 96% accuracy human junctions.

Язык: Английский

Процитировано

4

GENCODE: massively expanding the lncRNA catalog through capture long-read RNA sequencing DOI Creative Commons

Gazaldeep Kaur,

Tamara Perteghella, Sílvia Carbonell Sala

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Окт. 31, 2024

Abstract Accurate and complete gene annotations are indispensable for understanding how genome sequences encode biological functions. For twenty years, the GENCODE consortium has developed reference human mouse genomes, becoming a foundation biomedical genomics communities worldwide. Nevertheless, collections of important yet poorly-understood classes like long non-coding RNAs (lncRNAs) remain incomplete scattered across multiple, uncoordinated catalogs, slowing down progress in field. To address these issues, undertaken most comprehensive lncRNAs annotation effort to date. This is founded on manual full-length targeted long-read sequencing, matched embryonic adult tissues, orthologous regions mouse. Altogether 17,931 novel genes (140,268 transcripts) 22,784 (136,169 have been added catalog representing 2-fold 6-fold increase transcripts, respectively - greatest since sequencing genome. Novel display evolutionary constraints, well-formed promoter regions, link phenotype-associated genetic variants. They greatly enhance functional interpretability genome, as they help explain millions previously-mapped “orphan” omics measurements corresponding transcription start sites, chromatin modifications factor binding sites. Crucially, our design assigned human-mouse orthologs at rate beyond previous studies, tripling number disease-associated with orthologs. The expanded enhanced lncRNA mark critical step towards deciphering genomes.

Язык: Английский

Процитировано

4

PON-P3: Accurate Prediction of Pathogenicity of Amino Acid Substitutions DOI Open Access
Muhammad Kabir, Saeed Ahmed, Haoyang Zhang

и другие.

International Journal of Molecular Sciences, Год журнала: 2025, Номер 26(5), С. 2004 - 2004

Опубликована: Фев. 25, 2025

Different types of information are combined during variation interpretation. Computational predictors, most often pathogenicity provide one type for this purpose. These tools based on various kinds algorithms. Although the American College Genetics and Association Molecular Pathology guidelines classify variants into five categories, practically all predictors binary pathogenic/benign predictions. We developed a novel artificial intelligence-based tool, PON-P3, basis carefully selected training dataset, meticulous feature selection, optimization. started with 1526 features describing variations, their sequence structural context, parameters affected genes proteins. The final random boosting method was tested compared total 23 predictors. PON-P3 performed better than recently introduced which utilize large language models or methods that use evolutionary data alone in combination different gene protein properties. classifies cases three categories as benign, pathogenic, uncertain significance (VUSs). When test were used, some metapredictors slightly PON-P3; however, real-life situations, patient data, those overpredict both pathogenic benign cases. predicted possible amino acid substitutions human proteins encoded from MANE transcripts. also used to predict unambiguous VUSs (i.e., without conflicts) ClinVar. A 12.9% be 49.9% benign.

Язык: Английский

Процитировано

0

Conservation assessment of human splice site annotation based on a 470-genome alignment DOI Creative Commons
Ilia Minkin, Steven L. Salzberg

Nucleic Acids Research, Год журнала: 2025, Номер 53(6)

Опубликована: Фев. 25, 2025

Abstract Despite many improvements over the years, annotation of human genome remains imperfect. The use evolutionarily conserved sequences provides a strategy for selecting high-confidence subset annotation. Using latest whole-genome alignment, we found that splice sites from protein-coding genes in high-quality MANE are consistently across >350 species. We also studied RefSeq, GENCODE, and CHESS databases not present MANE. In addition, analyzed completeness alignment with respect to annotations described method would allow us fix up 60% missing alignments exons. trained logistic regression classifier distinguish between conservation exhibited by versus chosen randomly neutrally evolving sequences. classified our model as well-supported have lower single nucleotide polymorphism rates better transcriptomic evidence. then computed transcripts using only “well-supported” or ones This is enriched major gene catalogs appear be under purifying selection more likely correct functionally relevant.

Язык: Английский

Процитировано

0

Facilitating genome annotation using ANNEXA and long-read RNA sequencing DOI Creative Commons
N. Hoffmann,

Aurore Besson,

Édouard Cadieu

и другие.

Опубликована: Апрель 20, 2025

Abstract With the advent of complete genome assemblies, annotation has become essential for functional interpretation genomic data. Long-read RNA sequencing (LR-RNAseq) technologies have significantly improved transcriptome by enabling full-length transcript reconstruction both coding and non-coding RNAs. However, challenges such as fragmentation incomplete isoform representation persist, highlighting need robust quality control (QC) strategies. This study presents an updated version ANNEXA, a pipeline designed to enhance using LR-RNAseq data while also providing QC reconstructed genes transcripts. ANNEXA integrates two tools, StringTie2 Bambu, applying stringent filtering criteria improve accuracy. It incorporates deep learning models evaluate transcription start sites (TSSs) employs tool FEELnc systematic long RNAs (lncR-NAs). Additionally, offers intuitive visualizations comparative analyses repertoires. Benchmarking against multiple reference annotations revealed distinct patterns sensitivity precision known novel transcripts mRNAs lncRNAs. To demonstrate its utility, was applied in oncology involving human eight canine cancer cell lines. The successfully identified across species, expanding catalog protein-coding lncRNA species. Implemented Nextflow scalability reproducibility, is available open-source tool: https://github.com/IGDRion/ANNEXA .

Язык: Английский

Процитировано

0

Detecting differential transcript usage in complex diseases with SPIT DOI Creative Commons
Beril Erdogdu, Ales Varabyou, Stephanie C. Hicks

и другие.

Cell Reports Methods, Год журнала: 2024, Номер 4(3), С. 100736 - 100736

Опубликована: Март 1, 2024

Differential transcript usage (DTU) plays a crucial role in determining how gene expression differs among cells, tissues, and developmental stages, contributing to the complexity diversity of biological systems. In abnormal it can also lead deficiencies protein function underpin disease pathogenesis. Analyzing DTU via RNA sequencing (RNA-seq) data is vital, but genetic heterogeneity populations with complex diseases presents an intricate challenge due diverse causal events undetermined subtypes. Although majority common humans are categorized as complex, state-of-the-art analysis methods often overlook this their models. We therefore developed SPIT, statistical tool that identifies predominant subgroups within population along distinctive sets events. This study provides comprehensive assessments SPIT's methodology applies analyze brain samples from individuals schizophrenia, revealing previously unreported six candidate genes.

Язык: Английский

Процитировано

2