Detecting differential transcript usage in complex diseases with SPIT DOI Creative Commons
Beril Erdogdu, Ales Varabyou, Stephanie C. Hicks

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2023, Номер unknown

Опубликована: Июль 10, 2023

Differential transcript usage (DTU) plays a crucial role in determining how gene expression differs among cells, tissues, and different developmental stages, thereby contributing to the complexity diversity of biological systems. In abnormal it can also lead deficiencies protein function, potentially leading pathogenesis diseases. Detecting such events for single-gene genetic traits is relatively uncomplicated; however, heterogeneity populations with complex diseases presents an intricate challenge due presence diverse causal undetermined subtypes. SPIT first statistical tool that quantifies within population identifies predominant subgroups along their distinctive sets DTU events. We provide comprehensive assessments SPIT's methodology both report results applying analyze brain samples from individuals schizophrenia. Our analysis reveals previously unreported six candidate genes.

Язык: Английский

Splam: a deep-learning-based splice site predictor that improves spliced alignments DOI Creative Commons
Kuan-Hao Chao, Alan Mao, Steven L. Salzberg

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2023, Номер unknown

Опубликована: Июль 29, 2023

The process of splicing messenger RNA to remove introns plays a central role in creating genes and gene variants. Here we describe Splam, novel method for predicting splice junctions DNA based on deep residual convolutional neural networks. Unlike some previous models, Splam looks at relatively limited window 400 base pairs flanking each site, motivated by the observation that biological relies primarily signals within this window. Additionally, introduces idea training network donor acceptor together, principle machinery recognizes both ends intron once. We compare Splam's accuracy recent state-of-the-art site prediction methods, particularly SpliceAI, another uses Our results show is consistently more accurate than with an overall 96% human junctions. generalizes even non-human species, including distant ones like flowering plant

Язык: Английский

Процитировано

4

Biosurfer for systematic tracking of regulatory mechanisms leading to protein isoform diversity DOI Creative Commons

Mayank Murali,

Jamie Saquing, Senbao Lu

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Март 18, 2024

ABSTRACT Long-read RNA sequencing has shed light on transcriptomic complexity, but questions remain about the functionality of downstream protein products. We introduce Biosurfer, a computational approach for comparing isoforms, while systematically tracking transcriptional, splicing, and translational variations that underlie differences in sequences Using we analyzed 32,799 pairs GENCODE annotated finding majority (70%) variable N-termini are due to alternative transcription start sites, only 9% arise from 5’ UTR splicing. Biosurfer’s detailed nucleotide-to-residue relationships helped reveal an uncommonly tracked source single amino acid residue changes arising codon splits at junctions. For 17% internal sequence changes, such split patterns lead differences, termed “ragged codons”. Of C-termini, 72% involve splice- or intron retention-induced reading frameshifts. found unusual pattern frame which first frameshift is closely followed by distinct second restores original frame, term “snapback” frameshift. long read RNA-seq-predicted proteome human cell line similar trends as compared our analysis, with exception higher proportion isoforms predicted undergo nonsense-mediated decay. comprehensive characterization long-read RNA-seq datasets should accelerate insights functional role providing mechanistic explanation origins proteomic diversity driven Biosurfer available Python package https://github.com/sheynkman-lab/biosurfer .

Язык: Английский

Процитировано

1

Enhancing transcriptome expression quantification through accurate assignment of long RNA sequencing reads with TranSigner DOI Creative Commons
Hyun Joo Ji, Mihaela Pertea

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Апрель 16, 2024

Recently developed long–read RNA sequencing technologies promise to provide a more accurate and comprehensive view of transcriptomes compared short-read sequencers, primarily due their capability achieve full–length transcripts. However, realizing this potential requires computational tools tailored process long reads, which exhibit higher error rate than short reads. Existing methods for assembling quantifying data often disagree on expressed transcripts abundance levels, leading researchers lack confidence in the produced using data. One approach address uncertainties transcriptome assembly quantification is by assigning reads transcripts, enabling detailed characterization transcript support at read level. Here, we introduce TranSigner, versatile tool that assigns any input transcriptome. TranSigner consists three consecutive modules performing: alignment given computation compatibility scores based positions, execution an expectation–maximization algorithm probabilistically assign fractions while estimating abundances. Using simulated experimental datasets from well studied organisms — Homo Sapiens, Arabidopsis thaliana Mus musculus demonstrate achieves accuracy estimation assignment existing tools.

Язык: Английский

Процитировано

1

Combining DNA and protein alignments to improve genome annotation with LiftOn DOI Creative Commons
Kuan-Hao Chao, Jakob Heinz, Celine Hoh

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Май 17, 2024

Abstract As the number and variety of assembled genomes continues to grow, annotated is falling behind, particularly for eukaryotes. DNA-based mapping tools help address this challenge, but they are only able transfer annotation between closely-related species. Here we introduce LiftOn, a homology-based software tool that integrates DNA protein alignments enhance accuracy genome-scale allow relatively distant LiftOn’s protein-centric algorithm considers both types alignments, chooses optimal open reading frames, resolves overlapping gene loci, finds additional copies where exist. LiftOn can reliably representing members same species, as demonstrate on human, mouse, honey bee, rice, Arabidopsis thaliana . It further map effectively across species pairs far apart mouse rat or Drosophila melanogaster D. erecta

Язык: Английский

Процитировано

1

Transcriptomic Insights into the Atrial Fibrillation Susceptibility Locus near the MYOZ1 and SYNPO2L Genes DOI Open Access
Sojin Y. Wass, Han Sun,

Gregory Tchou

и другие.

International Journal of Molecular Sciences, Год журнала: 2024, Номер 25(19), С. 10309 - 10309

Опубликована: Сен. 25, 2024

Genome-wide association studies have identified a locus on chromosome 10q22, where many co-inherited single nucleotide polymorphisms (SNPs) are associated with atrial fibrillation (AF). This study seeks to identify the impact of this gene expression at transcript isoform level in human left atria and gain insight into potential causal variants. Bulk RNA sequencing was analyzed myozenin 1 (MYOZ1) synaptopodin 2-like (SYNPO2L) isoforms common SNPs region levels. Chromatin marks were used suggest candidate regulatory region. Protein amino acid changes examined for predicted functional consequences. Transfection MYOZ1 two SYNPO2L performed localize their encoded proteins cardiomyocytes derived from stem cells. We one four isoforms, which encode proteins, while other long noncoding RNAs (lncRNAs). The risk allele strongest AF susceptibility SNP 10q22 is decreased increased SNYPO2L lncRNA isoforms. There top AF-associated due linkage disequilibrium (LD), including rs11000728, we propose as SNP, confirmed by reporter transfection. In addition, LD block includes three missense gene, minor protective haplotype be detrimental protein function. both localized sarcomere. complex several alter opposing effects expression, along

Язык: Английский

Процитировано

1

Upstream open reading frames may contain hundreds of novel human exons DOI Creative Commons
Hyun Joo Ji, Steven L. Salzberg

PLoS Computational Biology, Год журнала: 2024, Номер 20(11), С. e1012543 - e1012543

Опубликована: Ноя. 20, 2024

Several recent studies have presented evidence that the human gene catalogue should be expanded to include thousands of short open reading frames (ORFs) appearing upstream or downstream existing protein-coding genes, each which might create an additional bicistronic transcript in humans. Here we explore alternative hypothesis would explain translational and evolutionary for these ORFs without need novel genes transcripts. We examined 2,199 been proposed as high-quality candidates determine if they could instead represent exons can added genes. checked conservation four recently sequenced, genomes, found a large majority (87.8%) conserved all expected. then looked splicing connect ORF at same locus, thus creating variant using its first exon. These protein coding exon were further evaluated structure predictions sequences included new exons. determined 541 out strong form are part gene, resulting is predicted similar better structural quality than currently annotated isoform.

Язык: Английский

Процитировано

1

There will always be variants of uncertain significance. Analysis of VUSs DOI Creative Commons
Haoyang Zhang, Muhammad Kabir, Saeed Ahmed

и другие.

NAR Genomics and Bioinformatics, Год журнала: 2024, Номер 6(4)

Опубликована: Сен. 28, 2024

The ACMG/AMP guidelines include five categories of which variants uncertain significance (VUSs) have received increasing attention. Recently, Fowler and Rehm claimed that all or most VUSs could be reclassified as pathogenic benign within few years. To test this claim, we collected validated benign, pathogenic, VUS conflicting from ClinVar LOVD investigated differences at gene, protein, structure, variant levels. gene protein features included inheritance patterns, actionability, functional for housekeeping, essential, complete knockout, lethality haploinsufficient proteins, Gene Ontology annotations, network properties. Structural properties the location secondary structural elements, intrinsically disordered regions, transmembrane repeats, conservation, accessibility. were distributions nucleotides, their groupings, codons, to CpG islands. amino acids groups investigated. did not markedly differ other variants. only major accessibility conservation variants, reduced ratio repeat-locating in VUSs. Thus, cannot distinguished types They display one form natural biological heterogeneity. Instead concentrating on eradicating VUSs, community would benefit investigating understanding factors contribute phenotypic

Язык: Английский

Процитировано

1

Human introns contain conserved tissue-specific cryptic poison exons DOI Creative Commons

Sergey Margasyuk,

Antonina Kuznetsova, Lev Zavileyskiy

и другие.

NAR Genomics and Bioinformatics, Год журнала: 2024, Номер 6(4)

Опубликована: Сен. 28, 2024

Eukaryotic cells express a large number of transcripts from single gene due to alternative splicing. Despite hundreds thousands splice isoforms being annotated in databases, it has been reported that the current exon catalogs remain incomplete. At same time, introns human protein-coding (PC) genes contain evolutionarily conserved elements with unknown function. Here, we explore possibility some them represent cryptic exons are expressed rare conditions. We identified group similar terms evolutionary conservation and RNA-seq read coverage Genotype-Tissue Expression dataset. Most were poison, i.e. generated an nonsense-mediated decay (NMD) isoform upon inclusion, many showed signs tissue-specific cancer-specific expression regulation. performed A549 cell line treated cycloheximide inactivate NMD confirmed using quantitative polymerase chain reaction seven eight tested are, indeed, expressed. This study shows PC poison exons, which reside intronic regions not fully insufficient representation libraries.

Язык: Английский

Процитировано

1

Investigating Open Reading Frames in Known and Novel Transcripts using ORFanage DOI Creative Commons
Ales Varabyou, Beril Erdogdu, Steven L. Salzberg

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2023, Номер unknown

Опубликована: Март 25, 2023

ORFanage is a system designed to assign open reading frames (ORFs) both known and novel gene transcripts while maximizing similarity annotated proteins. The primary intended use of the identification ORFs in assembled results RNA sequencing (RNA-seq) experiments, capability that most transcriptome assembly methods do not have. Our experiments demonstrate how can be used find protein variants RNA-seq datasets, improve annotations tens thousands transcript models RefSeq GENCODE human annotation databases. Through its implementation highly accurate efficient pseudo-alignment algorithm, substantially faster than other ORF methods, enabling application very large datasets. When analyze assemblies, aid separation signal from transcriptional noise likely functional variants, ultimately advancing our understanding biology medicine.

Язык: Английский

Процитировано

2

Quality assessment of splice site annotation based on conservation across multiple species DOI Creative Commons
Ilia Minkin, Steven L. Salzberg

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2023, Номер unknown

Опубликована: Дек. 2, 2023

A bstract Despite many improvements over the years, annotation of human genome remains imperfect, and different annotations reference sometimes contradict one another. The use evolutionarily conserved sequences provides a strategy for selecting high-confidence subset that is more likely to be related biological functions, rapidly growing number genomes from other species increases its power. Using latest whole alignment, we found splice sites protein-coding genes in high-quality MANE are consistently across than 400 species. We also studied RefSeq, GENCODE, CHESS databases not present MANE. trained logistic regression classifier distinguish between conservation exhibited by versus chosen randomly neutrally evolving sequence. classified our model as have lower SNP rates better transcriptomic support. then computed transcripts only using either “conserved” or ones This enriched major gene catalogs appear under purifying selection correct functionally relevant.

Язык: Английский

Процитировано

2