Cited by Evaluating and Enhancing Trustworthiness of LLMs in Perception Tasks

Large language models for whole-learner support: opportunities and challenges DOI

Amogh Mannekote, Adam Davies, Juan D. Pinto

и другие.

Frontiers in Artificial Intelligence, Год журнала: 2024, Номер 7

Опубликована: Окт. 15, 2024

In recent years, large language models (LLMs) have seen rapid advancement and adoption, are increasingly being used in educational contexts. this perspective article, we explore the open challenge of leveraging LLMs to create personalized learning environments that support “whole learner” by modeling adapting both cognitive non-cognitive characteristics. We identify three key challenges toward vision: (1) improving interpretability LLMs' representations whole learners, (2) implementing adaptive technologies can leverage such provide tailored pedagogical support, (3) authoring evaluating LLM-based agents. For interpretability, discuss approaches for explaining LLM behaviors terms their internal learners; adaptation, examine how be context-aware feedback scaffold skills through natural interactions; authoring, highlight opportunities involved using instructions specify Addressing these will enable AI tutors enhance accounting each student's unique background, abilities, motivations, socioemotional needs.

Язык: Английский

Процитировано

Understanding generative AI to harness its potentials and mini- mize risks: A perspective DOI

Tommaso Caselli

European Journal of Radiology, Год журнала: 2025, Номер unknown, С. 111951 - 111951

Опубликована: Янв. 1, 2025

Язык: Английский

Процитировано

Collaborative Growth: When Large Language Models Meet Sociolinguistics DOI

Dong Nguyen

Language and Linguistics Compass, Год журнала: 2025, Номер 19(2)

Опубликована: Фев. 3, 2025

ABSTRACT Large Language Models (LLMs) have dramatically transformed the AI landscape. They can produce remarkable fluent text and exhibit a range of natural language understanding generation capabilities. This article explores how LLMs might be used for sociolinguistic research and, conversely, sociolinguistics contribute to development LLMs. It argues that both areas will benefit from thoughtful, engaging collaboration. Sociolinguists are not merely end users LLMs; they crucial role play in

Язык: Английский

Процитировано

How to Write Effective Prompts for Screening Biomedical Literature Using Large Language Models DOI

Maria Teresa Colangelo, Stefano Guizzardi,

Marco Meleti

и другие.

BioMedInformatics, Год журнала: 2025, Номер 5(1), С. 15 - 15

Опубликована: Март 11, 2025

Large language models (LLMs) have emerged as powerful tools for (semi-)automating the initial screening of abstracts in systematic reviews, offering potential to significantly reduce manual burden on research teams. This paper provides a broad overview prompt engineering principles and highlights how traditional PICO (Population, Intervention, Comparison, Outcome) criteria can be converted into actionable instructions LLMs. We analyze trade-offs between “soft” prompts, which maximize recall by accepting articles unless they explicitly fail an inclusion requirement, “strict” demand explicit evidence every criterion. Using periodontics case study, we illustrate design affects recall, precision, overall efficiency discuss metrics (accuracy, F1 score) evaluate performance. also examine common pitfalls, such overly lengthy prompts or ambiguous instructions, underscore continuing need expert oversight mitigate hallucinations biases inherent LLM outputs. Finally, explore emerging trends, including multi-stage pipelines fine-tuning, while noting ethical considerations related data privacy transparency. By applying rigorous evaluation, researchers optimize LLM-based processes, allowing faster more comprehensive synthesis across biomedical disciplines.

Язык: Английский

Процитировано

Benchmarking Prompt Sensitivity in Large Language Models DOI

Amirhossein Razavi,

Mina Soltangheis,

Negar Arabzadeh

и другие.

Lecture notes in computer science, Год журнала: 2025, Номер unknown, С. 303 - 313

Опубликована: Янв. 1, 2025

Язык: Английский

Процитировано

A Primer for Evaluating Large Language Models in Social-Science Research DOI

Suhaib Abdurahman,

Alireza S. Ziabari,

Alexander Moore

и другие.

Advances in Methods and Practices in Psychological Science, Год журнала: 2025, Номер 8(2)

Опубликована: Апрель 1, 2025

Autoregressive large language models (LLMs) exhibit remarkable conversational and reasoning abilities exceptional flexibility across a wide range of tasks. Subsequently, LLMs are being increasingly used in scientific research to analyze data, generate synthetic or even write articles. This trend necessitates that authors follow best practices for conducting reporting LLM journal reviewers can evaluate the quality works use LLMs. We provide social-scientific with essential recommendations ensure replicable robust results using Our also highlight considerations reviewers, focusing on methodological rigor, replicability, validity when evaluating studies automate data processing simulate human data. offer practical advice assessing appropriateness applications submitted studies, emphasizing need transparency challenges posed by nondeterministic continuously evolving nature these models. By providing framework critical review, this primer, we aim high-quality, innovative landscape social-science

Язык: Английский

Процитировано

Generalization bias in large language model summarization of scientific research DOI

Uwe Peters, Benjamin Chin‐Yee

Royal Society Open Science, Год журнала: 2025, Номер 12(4)

Опубликована: Апрель 1, 2025

Artificial intelligence chatbots driven by large language models (LLMs) have the potential to increase public science literacy and support scientific research, as they can quickly summarize complex information in accessible terms. However, when summarizing texts, LLMs may omit details that limit scope of research conclusions, leading generalizations results broader than warranted original study. We tested 10 prominent LLMs, including ChatGPT-4o, ChatGPT-4.5, DeepSeek, LLaMA 3.3 70B, Claude 3.7 Sonnet, comparing 4900 LLM-generated summaries their texts. Even explicitly prompted for accuracy, most produced those with 70B overgeneralizing 26-73% cases. In a direct comparison human-authored summaries, LLM were nearly five times more likely contain broad (odds ratio = 4.85, 95% CI [3.06, 7.70], p < 0.001). Notably, newer tended perform worse generalization accuracy earlier ones. Our indicate strong bias many widely used towards posing significant risk large-scale misinterpretations findings. highlight mitigation strategies, lowering temperature settings benchmarking accuracy.

Язык: Английский

Процитировано

Enhancing Analytic Hierarchy Process Modelling Under Uncertainty With Fine‐Tuning LLM DOI

Haeun Park, Hyunjoo Oh, Feng Gao

и другие.

Expert Systems, Год журнала: 2025, Номер 42(6)

Опубликована: Май 9, 2025

ABSTRACT Given that decision‐making typically encompasses stages such as problem recognition, the generation of alternatives, and selection optimal choice, Large Language Models (LLMs) are progressively being integrated into tasks requiring enumeration comparative evaluation thereby promoting more rational frameworks. Analysing extent to which LLMs exhibit meaningful performance at each stage process has thus become a critical area inquiry. In particular, hold potential identify latent relationships within contextual information data related domain. This capability enables them propose novel criteria or alternatives may otherwise be overlooked by human designers. study seeks advance modelling analytical hierarchy (AHP), widely utilised multiple decision making (MCDM) method, leveraging LLMs. To achieve this, methodology was developed for constructing AHP models using fine‐tuned with domain‐specific documents. The proposed assessed evaluating its outputs aligned reference hierarchies created experts under predefined Additionally, examined model's efficacy in generating complete scenarios where these were not predefined. For empirical validation, applied assess improve management six‐sector agricultural enterprises. Comparative analysis LLM‐based results expert evaluations conducted determine validity robustness approach. findings provide insights contribute structured enhance application MCDM methods.

Язык: Английский

Процитировано

Developer and LLM Pair Programming: An Empirical Study of Role Dynamics and Prompt-Based Collaboration DOI

Sri Rama Chandra Charan Teja Tadi

International Journal of Advanced Research in Science Communication and Technology, Год журнала: 2025, Номер unknown, С. 436 - 444

Опубликована: Май 12, 2025

With the introduction of large language models (LLMs) as coding partners, classic pair programming dynamics are being rewritten. This research empirically examines collaboration between software developers and LLMs on tasks, uncovering a dynamic role toggling informed by prompt accuracy contextual cues. Instead deterministic driver-navigator dichotomies, we find an emergent interdependence where programmers function orchestrators intent oscillate executor, interpreter, creative collaborator. Prompt design has emerged critical skill for orchestrating collaboration, shifting focus from code authorship to dialogical problem-solving. perspective introduces new vision human-AI co-creation in coding, highlighting its potential within future intelligent development environments.

Язык: Английский

Процитировано

Towards Human-Like Educational Question Generation with Small Language Models DOI

Fares Fawzi,

Sarang Balan,

Mutlu Cukurova

и другие.

Communications in computer and information science, Год журнала: 2024, Номер unknown, С. 295 - 303

Опубликована: Янв. 1, 2024

Язык: Английский

Процитировано