PhyloSophos: a high-throughput scientific name mapping algorithm augmented with explicit consideration of taxonomic science, and its application on natural product (NP) occurrence database processing DOI Creative Commons
Minhyung Cho, Kwang‐Hwi Cho, Kyoung Tai No

et al.

BMC Bioinformatics, Journal Year: 2023, Volume and Issue: 24(1)

Published: Dec. 14, 2023

Abstract Background The standardization of biological data using unique identifiers is vital for seamless integration, comprehensive interpretation, and reproducibility research findings, contributing to advancements in bioinformatics systems biology. Despite being widely accepted as a universal identifier, scientific names species have inherent limitations, including lack stability, uniqueness, convertibility, hindering their effective use databases, particularly natural product (NP) occurrence posing substantial obstacle utilizing this valuable large-scale applications. Result To address these challenges facilitate high-throughput analysis involving names, we developed PhyloSophos, Python package that considers the properties taxonomic accurately map name inputs entries within chosen reference database. We illustrate importance assessing multiple databases considering syntax-based pre-processing NP an example, with ultimate goal integrating heterogeneous information into single, unified dataset. Conclusions anticipate PhyloSophos significantly aid systematic processing poorly digitized curated data, such biodiversity ethnopharmacological resources, enabling full-scale resources.

Language: Английский

PhyloSophos: a high-throughput scientific name mapping algorithm augmented with explicit consideration of taxonomic science DOI Creative Commons
Minhyung Cho, Kwang‐Hwi Cho, Kyoung Tai No

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: March 20, 2023

Abstract Summary The nature of taxonomic science and the scientific nomenclature system makes it difficult to use names as identifiers without running into complications. To facilitate high-throughput analysis biological data involving names, we designed PhyloSophos, a Python package that takes account properties systems map name inputs entries within reference database choice. We would like present three case-studies which demonstrates how our implementations, including rule-based pre-processing recursive mapping could improve performance information availability. expect PhyloSophos help with systematic processing poorly digitized curated data, such biodiversity ethnopharmacological resources, thus enabling full-scale bioinformatics using these data. Availability implementation is available at GitHub https://github.com/mhcho4096/phylosophos . Supplementary are Bioinformatics online.

Language: Английский

Citations

0

PhyloSophos: a high-throughput scientific name mapping algorithm augmented with explicit consideration of taxonomic science, and its application on natural product (NP) occurrence database processing DOI Creative Commons
Minhyung Cho, Kwang‐Hwi Cho, Kyoung Tai No

et al.

BMC Bioinformatics, Journal Year: 2023, Volume and Issue: 24(1)

Published: Dec. 14, 2023

Abstract Background The standardization of biological data using unique identifiers is vital for seamless integration, comprehensive interpretation, and reproducibility research findings, contributing to advancements in bioinformatics systems biology. Despite being widely accepted as a universal identifier, scientific names species have inherent limitations, including lack stability, uniqueness, convertibility, hindering their effective use databases, particularly natural product (NP) occurrence posing substantial obstacle utilizing this valuable large-scale applications. Result To address these challenges facilitate high-throughput analysis involving names, we developed PhyloSophos, Python package that considers the properties taxonomic accurately map name inputs entries within chosen reference database. We illustrate importance assessing multiple databases considering syntax-based pre-processing NP an example, with ultimate goal integrating heterogeneous information into single, unified dataset. Conclusions anticipate PhyloSophos significantly aid systematic processing poorly digitized curated data, such biodiversity ethnopharmacological resources, enabling full-scale resources.

Language: Английский

Citations

0