
Computers in Biology and Medicine, Journal Year: 2024, Volume and Issue: 179, P. 108920 - 108920
Published: July 23, 2024
This study introduces RheumaLinguisticpack (RheumaLpack), the first specialised linguistic web corpus designed for field of musculoskeletal disorders. By combining mining (i.e., scraping) and natural language processing (NLP) techniques, as well clinical expertise, RheumaLpack systematically captures curates structured unstructured data across a spectrum sources including trials registers ClinicalTrials.gov), bibliographic databases PubMed), medical agencies (i.e. European Medicines Agency), social media Reddit), accredited health websites MedlinePlus, Harvard Health Publishing, Cleveland Clinic). Given complexity rheumatic diseases (RMDs) their significant impact on quality life, this resource can be proposed useful tool to train algorithms that could mitigate diseases' effects. Therefore, aims improve training artificial intelligence (AI) facilitate knowledge discovery in RMDs. The development involved systematic six-step methodology covering identification, characterisation, selection, collection, processing, description. result is non-annotated, monolingual, dynamic corpus, featuring almost 3 million records spanning from 2000 2023. represents pioneering contribution rheumatology research, providing advanced AI NLP applications. highlights value address challenges posed by diseases, illustrating corpus's potential research treatment paradigms rheumatology. Finally, shown replicated obtain other specialities. code details how build are also provided dissemination such resource.
Language: Английский