New opportunities and challenges for conservation evidence synthesis from advances in natural language processing DOI Creative Commons
Charlotte H. Chang, Susan C. Cook‐Patton, James T. Erbaugh

et al.

Conservation Biology, Journal Year: 2025, Volume and Issue: 39(2)

Published: April 1, 2025

Abstract Addressing global environmental conservation problems requires rapidly translating natural and social science evidence to policy‐relevant information. Yet, exponential increases in scientific production combined with disciplinary differences reporting research make interdisciplinary syntheses especially challenging. Ongoing developments language processing (NLP), such as large models, machine learning (ML), data mining, hold the promise of accelerating cross‐disciplinary primary research. The evolution ML, NLP, artificial intelligence (AI) systems computational provides new approaches accelerate all stages synthesis science. To show how processing, AI can help automate scale science, we describe methods that querying literature, process unstructured bodies textual evidence, extract parameters interest from studies. Automation translate other agendas by categorizing labeling at scale, yet there are major unanswered questions about use hybrid AI‐expert ethically effectively conservation.

Language: Английский

GPT is an effective tool for multilingual psychological text analysis DOI Creative Commons
Steve Rathje, Dan-Mircea Mirea, Ilia Sucholutsky

et al.

Proceedings of the National Academy of Sciences, Journal Year: 2024, Volume and Issue: 121(34)

Published: Aug. 12, 2024

The social and behavioral sciences have been increasingly using automated text analysis to measure psychological constructs in text. We explore whether GPT, the large-language model (LLM) underlying AI chatbot ChatGPT, can be used as a tool for several languages. Across 15 datasets ( n = 47,925 manually annotated tweets news headlines), we tested different versions of GPT (3.5 Turbo, 4, 4 Turbo) accurately detect (sentiment, discrete emotions, offensiveness, moral foundations) across 12 found that r 0.59 0.77) performed much better than English-language dictionary 0.20 0.30) at detecting judged by manual annotators. nearly well as, sometimes than, top-performing fine-tuned machine learning models. Moreover, GPT’s performance improved successive model, particularly lesser-spoken languages, became less expensive. Overall, may superior many existing methods analysis, since it achieves relatively high accuracy requires no training data, is easy use with simple prompts (e.g., “is this negative?”) little coding experience. provide sample code video tutorial analyzing application programming interface. argue other LLMs help democratize making advanced natural language processing capabilities more accessible, facilitate cross-linguistic research understudied

Language: Английский

Citations

51

Using natural language processing to analyse text data in behavioural science DOI
Stefan Feuerriegel, Abdurahman Maarouf, Dominik Bär

et al.

Nature Reviews Psychology, Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 2, 2025

Language: Английский

Citations

2

Assessment of Resident and AI Chatbot Performance on the University of Toronto Family Medicine Residency Progress Test: Comparative Study DOI Creative Commons
Ryan S. Huang, Kevin Lu, Christopher Meaney

et al.

JMIR Medical Education, Journal Year: 2023, Volume and Issue: 9, P. e50514 - e50514

Published: Sept. 5, 2023

Large language model (LLM)-based chatbots are evolving at an unprecedented pace with the release of ChatGPT, specifically GPT-3.5, and its successor, GPT-4. Their capabilities in general-purpose tasks generation have advanced to point performing excellently on various educational examination benchmarks, including medical knowledge tests. Comparing performance these 2 LLM models that Family Medicine residents a multiple-choice test can provide insights into their potential as education tools.

Language: Английский

Citations

36

Exploring the Association Between Textual Parameters and Psychological and Cognitive Factors DOI Creative Commons
Kadir Uludağ

Psychology Research and Behavior Management, Journal Year: 2024, Volume and Issue: Volume 17, P. 1139 - 1150

Published: March 1, 2024

Textual data analysis has become a popular method for examining complex human behavior in various fields, including psychology, psychiatry, sociology, computer science, mining, forensic sciences, and communication studies. However, identifying the most relevant textual parameters analyzing is still challenge.

Language: Английский

Citations

15

A tutorial on open-source large language models for behavioral science DOI Creative Commons
Z. Hussain, Marcel Binz, Rui Mata

et al.

Behavior Research Methods, Journal Year: 2024, Volume and Issue: 56(8), P. 8214 - 8237

Published: Aug. 15, 2024

Large language models (LLMs) have the potential to revolutionize behavioral science by accelerating and improving research cycle, from conceptualization data analysis. Unlike closed-source solutions, open-source frameworks for LLMs can enable transparency, reproducibility, adherence protection standards, which gives them a crucial advantage use in science. To help researchers harness promise of LLMs, this tutorial offers primer on Hugging Face ecosystem demonstrates several applications that advance conceptual empirical work science, including feature extraction, fine-tuning prediction, generation responses. Executable code is made available at github.com/Zak-Hussain/LLM4BeSci.git . Finally, discusses challenges faced with (open-source) related interpretability safety perspective future intersection modeling

Language: Английский

Citations

11

Analyzing the concordance and consistency of AI and human ratings in hospitality reviews DOI
Sandra Morini Marrero, José Manuel Ramos Henríquez, Anil Bilgihan

et al.

Journal of Hospitality and Tourism Technology, Journal Year: 2025, Volume and Issue: unknown

Published: March 12, 2025

Purpose This study aims to explore the application of ChatGPT analyze hotel guest satisfaction from online reviews. As feedback plays a critical role in consumer decision-making hospitality industry, research evaluates accuracy and reliability ChatGPT’s ratings compared those human raters classic supervised machine learning classification techniques. Design/methodology/approach Using TripAdvisor reviews five-star hotels, authors use structured two-phase assess both inter- intra-rater reliability. Findings The results highlight distinct differences rating behavior between artificial intelligence (AI) judges, with showing tendency toward more moderate ratings. In addition, observe slight for guests overrate their experiences, supporting literature on subjective nature Despite these variations, shows significant agreement ratings, especially when minor discrepancies are accounted for, suggesting its utility as analysis tool industry. paper highlights ability process evaluate textual data discusses implications using AI improve review processes management. advocate incorporation tools into customer systems augment suggest future refine models practical applications. Originality/value advances understanding AI’s management by demonstrating analyzing through providing methodological framework assessing AI-generated content.

Language: Английский

Citations

1

Can large language models help augment English psycholinguistic datasets? DOI Creative Commons
Sean Trott

Behavior Research Methods, Journal Year: 2024, Volume and Issue: 56(6), P. 6082 - 6100

Published: Jan. 23, 2024

Research on language and cognition relies extensively psycholinguistic datasets or "norms". These contain judgments of lexical properties like concreteness age acquisition, can be used to norm experimental stimuli, discover empirical relationships in the lexicon, stress-test computational models. However, collecting human at scale is both time-consuming expensive. This issue compounded for multi-dimensional norms those incorporating context. The current work asks whether large models (LLMs) leveraged augment creation large, English. I use GPT-4 collect multiple kinds semantic (e.g., word similarity, contextualized sensorimotor associations, iconicity) English words compare these against "gold standard". For each dataset, find that GPT-4's are positively correlated with judgments, some cases rivaling even exceeding average inter-annotator agreement displayed by humans. then identify several ways which LLM-generated differ from human-generated systematically. also perform "substitution analyses", demonstrate replacing a statistical model does not change sign parameter estimates (though select cases, there significant changes their magnitude). conclude discussing considerations limitations associated general, including concerns data contamination, choice LLM, external validity, construct quality. Additionally, all (over 30,000 total) made available online further analysis.

Language: Английский

Citations

7

How large language models can reshape collective intelligence DOI
Jason W. Burton, Ezequiel Lopez-Lopez, Shahar Hechtlinger

et al.

Nature Human Behaviour, Journal Year: 2024, Volume and Issue: 8(9), P. 1643 - 1655

Published: Sept. 20, 2024

Language: Английский

Citations

7

Multitask Learning for Crash Analysis: A Fine-Tuned LLM Framework Using Twitter Data DOI Creative Commons
Shadi Jaradat, Richi Nayak, Alexander Paz

et al.

Smart Cities, Journal Year: 2024, Volume and Issue: 7(5), P. 2422 - 2465

Published: Sept. 1, 2024

Road traffic crashes (RTCs) are a global public health issue, with traditional analysis methods often hindered by delays and incomplete data. Leveraging social media for real-time safety offers promising alternative, yet effective frameworks this integration scarce. This study introduces novel multitask learning (MTL) framework utilizing large language models (LLMs) to analyze RTC-related tweets from Australia. We collected 26,226 traffic-related May 2022 2023. Using GPT-3.5, we extracted fifteen distinct features categorized into six classification tasks nine information retrieval tasks. These were then used fine-tune GPT-2 modeling, which outperformed baseline models, including GPT-4o mini in zero-shot mode XGBoost, across most Unlike single-task classifiers that may miss critical details, our MTL approach simultaneously classifies extracts detailed natural language. Our fine-tunedGPT-2 model achieved an average accuracy of 85% the tasks, surpassing model’s 64% XGBoost’s 83.5%. In fine-tuned BLEU-4 score 0.22, ROUGE-I 0.78, WER 0.30, significantly outperforming GPT-4 0.0674, 0.2992, 2.0715. results demonstrate efficacy enhancing both retrieval, offering valuable insights data-driven decision-making improve road safety. is first explicitly apply data LLMs within enhance

Language: Английский

Citations

6

Efficacy of ChatGPT in Cantonese Sentiment Analysis: A Comparative Study (Preprint) DOI Creative Commons
Ziru FU, Yu‐Cheng Hsu, Christian S. Chan

et al.

Journal of Medical Internet Research, Journal Year: 2023, Volume and Issue: unknown

Published: July 19, 2023

Background: Sentiment analysis is a significant yet difficult task in natural language processing. The linguistic peculiarities of Cantonese, including its high similarity with Standard Chinese, grammatical and lexical uniqueness, colloquialism multilingualism, make it different from other languages pose additional challenges to sentiment analysis. Recent advances models such as ChatGPT offer potential viable solutions. Objective: This study investigated the efficacy GPT-3.5 GPT-4 Cantonese context web-based counseling compared their performance mainstream methods, lexicon-based methods machine learning approaches. Methods: We analyzed transcripts web-based, text-based service Hong Kong, total 131 individual sessions 6169 messages between counselors help-seekers. First, codebook was developed for human annotation. A simple prompt ("Is this text positive, neutral, or negative? Respond label only.") then given each message's sentiment. GPT-4's method 3 state-of-the-art models, linear regression, support vector machines, long short-term memory neural networks. Results: Our findings revealed ChatGPT's remarkable accuracy classification, GPT-4, respectively, achieving 92.1% (5682/6169) 95.3% (5880/6169) identifying negative sentiment, thereby outperforming traditional method, which had an 37.2% (2295/6169), accuracies ranging 66% (4072/6169) 70.9% (4374/6169). Conclusions: Among many techniques, demonstrates superior emerges promising tool also highlights applicability real-world scenarios, monitoring quality services detecting message-level sentiments vivo. insights derived pave way further exploration into capabilities underresourced specialized domains like psychotherapy

Language: Английский

Citations

14