Conservation assessment of human splice site annotation based on a 470-genome alignment DOI Creative Commons
Ilia Minkin, Steven L. Salzberg

Nucleic Acids Research, Год журнала: 2025, Номер 53(6)

Опубликована: Фев. 25, 2025

Abstract Despite many improvements over the years, annotation of human genome remains imperfect. The use evolutionarily conserved sequences provides a strategy for selecting high-confidence subset annotation. Using latest whole-genome alignment, we found that splice sites from protein-coding genes in high-quality MANE are consistently across >350 species. We also studied RefSeq, GENCODE, and CHESS databases not present MANE. In addition, analyzed completeness alignment with respect to annotations described method would allow us fix up 60% missing alignments exons. trained logistic regression classifier distinguish between conservation exhibited by versus chosen randomly neutrally evolving sequences. classified our model as well-supported have lower single nucleotide polymorphism rates better transcriptomic evidence. then computed transcripts using only “well-supported” or ones This is enriched major gene catalogs appear be under purifying selection more likely correct functionally relevant.

Язык: Английский

Universal Cell Embeddings: A Foundation Model for Cell Biology DOI Creative Commons
Yanay Rosen, Yusuf Roohani, Ayush Agrawal

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2023, Номер unknown

Опубликована: Ноя. 29, 2023

Developing a universal representation of cells which encompasses the tremendous molecular diversity cell types within human body and more generally, across species, would be transformative for biology. Recent work using single-cell transcriptomic approaches to create definitions in form atlases has provided necessary data such an endeavor. Here, we present Universal Cell Embedding (UCE) foundation model. UCE was trained on corpus atlas from other species completely self-supervised way without any annotations. offers unified biological latent space that can represent cell, regardless tissue or species. This embedding captures important variation despite presence experimental noise diverse datasets. An aspect UCE's universality is new organism mapped this with no additional labeling, model training fine-tuning. We applied Integrated Mega-scale Atlas, 36 million cells, than 1,000 uniquely named types, hundreds experiments, dozens tissues eight uncovered insights about organization space, leveraged it infer function newly discovered types. exhibits emergent behavior, uncovering biology never explicitly for, as identifying developmental lineages novel not included set. Overall, by enabling every state type, provides valuable tool analysis, annotation hypothesis generation scale single datasets continues grow.

Язык: Английский

Процитировано

43

Small and long non-coding RNAs: Past, present, and future DOI
Ling‐Ling Chen, V. Narry Kim

Cell, Год журнала: 2024, Номер 187(23), С. 6451 - 6485

Опубликована: Ноя. 1, 2024

Язык: Английский

Процитировано

33

Insights into eye genetics and recent advances in ocular gene therapy DOI Creative Commons
Viktória Szabó, Balázs Varsányi,

Mirella Telles Salgueiro Barboni

и другие.

Molecular and Cellular Probes, Год журнала: 2025, Номер 79, С. 102008 - 102008

Опубликована: Янв. 18, 2025

Язык: Английский

Процитировано

2

CHESS 3: an improved, comprehensive catalog of human genes and transcripts based on large-scale expression data, phylogenetic analysis, and protein structure DOI Creative Commons
Ales Varabyou, Markus J. Sommer, Beril Erdogdu

и другие.

Genome biology, Год журнала: 2023, Номер 24(1)

Опубликована: Окт. 30, 2023

CHESS 3 represents an improved human gene catalog based on nearly 10,000 RNA-seq experiments across 54 body sites. It significantly improves current genome annotation by integrating the latest reference data and algorithms, machine learning techniques for noise filtering, new protein structure prediction methods. contains 41,356 genes, including 19,839 protein-coding genes 158,377 transcripts, with 14,863 transcripts not in other catalogs. includes all MANE at least one transcript most RefSeq GENCODE genes. On CHM13 genome, additional 129 is available http://ccb.jhu.edu/chess .

Язык: Английский

Процитировано

23

The hidden impact of in-source fragmentation in metabolic and chemical mass spectrometry data interpretation DOI
Martin Giera,

Aries Aisporna,

Winnie Uritboonthai

и другие.

Nature Metabolism, Год журнала: 2024, Номер 6(9), С. 1647 - 1648

Опубликована: Июнь 25, 2024

Язык: Английский

Процитировано

15

Natural antisense transcripts as versatile regulators of gene expression DOI
Andreas Werner, Aditi Kanhere, Claes Wahlestedt

и другие.

Nature Reviews Genetics, Год журнала: 2024, Номер unknown

Опубликована: Апрель 17, 2024

Язык: Английский

Процитировано

13

Genome annotation: From human genetics to biodiversity genomics DOI Creative Commons
Roderic Guigó

Cell Genomics, Год журнала: 2023, Номер 3(8), С. 100375 - 100375

Опубликована: Авг. 1, 2023

Within the next decade, genomes of 1.8 million eukaryotic species will be sequenced. Identifying genes in these sequences is essential to understand biology species. This challenging due transcriptional complexity genomes, which encode hundreds thousands transcripts multiple types. Among these, a small set protein-coding mRNAs play disproportionately large role defining phenotypes. Due their sequence conservation, orthology can established, making it possible define universal catalog genes. should substantially contribute uncovering genomic events underlying emergence piece briefly reviews basics gene prediction, discusses challenges finalizing annotation human genome, and proposes strategies for producing annotations across Tree Life. lays groundwork obtaining all genes-the Earth's code life.

Язык: Английский

Процитировано

17

Are non‐protein coding RNAs junk or treasure? DOI Creative Commons
Nils G. Walter

BioEssays, Год журнала: 2024, Номер 46(4)

Опубликована: Фев. 13, 2024

Abstract The human genome project's lasting legacies are the emerging insights into physiology and disease, ascendance of biology as dominant science 21st century. Sequencing revealed that >90% is not coding for proteins, originally thought, but rather overwhelmingly transcribed non‐protein coding, or non‐coding, RNAs (ncRNAs). This discovery initially led to hypothesis most genomic DNA “junk”, a term still championed by some geneticists evolutionary biologists. In contrast, molecular biologists biochemists studying vast number transcripts produced from this “junk” often surmise these ncRNAs have biological significance. What gives? essay contrasts two opposing, extant viewpoints, aiming explain their bases, which arise distinct reference frames underlying scientific disciplines. Finally, it aims reconcile divergent mindsets in hopes stimulating synergy between fields.

Язык: Английский

Процитировано

6

The pancancer overexpressed NFYC Antisense 1 controls cell cycle mitotic progression through in cis and in trans modes of action DOI Creative Commons
Cecilia Pandini, Giulia Pagani, Martina Tassinari

и другие.

Cell Death and Disease, Год журнала: 2024, Номер 15(3)

Опубликована: Март 11, 2024

Antisense RNAs (asRNAs) represent an underappreciated yet crucial layer of gene expression regulation. Generally thought to modulate their sense genes in cis through sequence complementarity or act transcription, asRNAs can also regulate different molecular targets trans, the nucleus cytoplasm. Here, we performed in-depth characterization NFYC 1 (NFYC-AS1), asRNA transcribed head-to-head subunit proliferation-associated NF-Y transcription factor. Our results show that NFYC-AS1 is a prevalently nuclear peaking early cell cycle. Comparative genomics suggests narrow phylogenetic distribution, with probable origin common ancestor mammalian lineages. overexpressed pancancer, preferentially association RB1 mutations. Knockdown by antisense oligonucleotides impairs growth lung squamous carcinoma and small cancer cells, phenotype recapitulated CRISPR/Cas9-deletion its start site. Surprisingly, affected only when endogenous manipulated. This regulation proliferation at least part independent transcription-mediated effect on possibly exerted RNA-dependent trans effects converging G2/M cycle phase genes. Accordingly, NFYC-AS1-depleted cells are stuck mitosis, indicating defects mitotic progression. Overall, emerged as cycle-regulating dual action, holding therapeutic potential types, including very aggressive RB1-mutated tumors.

Язык: Английский

Процитировано

6

Functional roles of conserved lncRNAs and circRNAs in eukaryotes DOI Creative Commons
Jingxin Li, Xiaolin Wang

Non-coding RNA Research, Год журнала: 2024, Номер 9(4), С. 1271 - 1279

Опубликована: Июнь 26, 2024

Long non-coding RNAs (lncRNAs) and circular (circRNAs) have emerged as critical regulators in essentially all biological processes across eukaryotes. They exert their functions through chromatin remodeling, transcriptional regulation, interacting with RNA-binding proteins (RBPs), serving microRNA sponges, etc. Although are typically more species-specific than coding RNAs, a number of well-characterized lncRNA (such

Язык: Английский

Процитировано

6