Journal of Polytechnic, Год журнала: 2025, Номер unknown, С. 1 - 1
Опубликована: Май 22, 2025
In recent years, advancements in high-throughput technologies have uncovered numerous concealed layers known as non-coding Ribonucleic Acids (ncRNAs), shifting the protein-centric view of genomes. NcRNAs, previously considered insignificant segments genome, are now recognized essential functional components prokaryotic and eukaryotic organisms. Long RNAs (lncRNAs) a unique category ncRNAs with 200 nucleotides length, which instrumental key biological functions, including cellular differentiation, regulatory mechanisms, epigenetic modifications. Despite similarities between lncRNAs messenger (mRNAs), there is fundamental difference: mRNAs encode proteins, whereas do not. This study aims to distinguish these two RNA classes from each other by designing robust machine learning (ML) pipeline employing Recursive Feature Elimination (RFE) for dimensionality reduction dataset XGBoost (XGB) classification model. Whereas previous studies trained tested models using complete set features, we employ RFE technique reduce number thereby achieve more optimal relevant features. To evaluate predictive performance our pipeline, used error rate, accuracy, precision, recall, F1-score. Compared three existing lncRNA identification tools literature, demonstrated superior prediction accuracy precision at 93.42% 94.19% respectively.
Язык: Английский