Comparing large language models for supervised analysis of students’ lab notes DOI Creative Commons
Rebeckah K. Fussell, Megan Flynn, Anil Damle

et al.

Physical Review Physics Education Research, Journal Year: 2025, Volume and Issue: 21(1)

Published: March 31, 2025

Recent advancements in large language models (LLMs) hold significant promise for improving physics education research that uses machine learning. In this study, we compare the application of various conducting a large-scale analysis written text grounded classification problem: identifying skills students’ typed lab notes through sentence-level labeling. Specifically, use training data to fine-tune two different LLMs, BERT and LLaMA, performance these both traditional bag-of-words approach few-shot LLM (without fine-tuning). We evaluate based on their resource use, metrics, outcomes when notes. find higher-resource often, but not necessarily, perform better than lower-resource models. also all report similar trends outcomes, although absolute values estimated measurements are always within uncertainties each other. results discuss relevant considerations researchers seeking select model type as classifier. Published by American Physical Society 2025

Language: Английский

On opportunities and challenges of large multimodal foundation models in education DOI Creative Commons
Stefan Küchemann, Karina E. Avila,

Yavuz Dinc

et al.

npj Science of Learning, Journal Year: 2025, Volume and Issue: 10(1)

Published: Feb. 25, 2025

Abstract Recently, the option to use large language models as a middleware connecting various AI tools and other led development of so-called multimodal foundation models, which have power process spoken text, music, images videos. In this overview, we explain new set opportunities challenges that arise from integration in education.

Language: Английский

Citations

0

Natural Language Processing and Large Language Models DOI Creative Commons
Peter Wulff, Marcus Kubsch, Christina Krist

et al.

Springer texts in education, Journal Year: 2025, Volume and Issue: unknown, P. 117 - 142

Published: Jan. 1, 2025

Language: Английский

Citations

0

Application of retrieval-augmented generation for interactive industrial knowledge management via a large language model DOI Creative Commons
Lun-Chi Chen, Mayuresh Sunil Pardeshi, Y Liao

et al.

Computer Standards & Interfaces, Journal Year: 2025, Volume and Issue: 94, P. 103995 - 103995

Published: March 6, 2025

Language: Английский

Citations

0

Leveraging LLM respondents for item evaluation: A psychometric analysis DOI Creative Commons

Yunting Liu,

Shreya Bhandari, Zachary A. Pardos

et al.

British Journal of Educational Technology, Journal Year: 2025, Volume and Issue: unknown

Published: Feb. 24, 2025

Effective educational measurement relies heavily on the curation of well‐designed item pools. However, calibration is time consuming and costly, requiring a sufficient number respondents to estimate psychometric properties items. In this study, we explore potential six different large language models (LLMs; GPT‐3.5, GPT‐4, Llama 2, 3, Gemini‐Pro Cohere Command R Plus) generate responses with comparable those human respondents. Results indicate that some LLMs exhibit proficiency in College Algebra similar or exceeds college students. find used study have narrow distributions, limiting their ability fully mimic variability observed respondents, but an ensemble can better approximate broader distribution typical Utilizing response theory, parameters calibrated by LLM high correlations (eg, >0.8 for GPT‐3.5) counterparts. Several augmentation strategies are evaluated relative performance, resampling methods proving most effective, enhancing Spearman correlation from 0.89 (human only) 0.93 (augmented human). Practitioner notes What already known about topic The collection candidate test items common practice when designing assessment tool. Large (LLMs) been found rival abilities variety subject areas, making them low‐cost option testing efficacy Data using AI has effective strategy machine learning model performance. paper adds This provides first analysis open‐source proprietary as compared humans. finds produced 50 undergraduate Using augment data yields mixed results. Implications and/or policy moderate performance themselves suggests they could provide curating quality low‐stakes formative summative assessments. methodology offers scalable way evaluate vast amounts generative AI‐produced

Language: Английский

Citations

0

Comparing large language models for supervised analysis of students’ lab notes DOI Creative Commons
Rebeckah K. Fussell, Megan Flynn, Anil Damle

et al.

Physical Review Physics Education Research, Journal Year: 2025, Volume and Issue: 21(1)

Published: March 31, 2025

Recent advancements in large language models (LLMs) hold significant promise for improving physics education research that uses machine learning. In this study, we compare the application of various conducting a large-scale analysis written text grounded classification problem: identifying skills students’ typed lab notes through sentence-level labeling. Specifically, use training data to fine-tune two different LLMs, BERT and LLaMA, performance these both traditional bag-of-words approach few-shot LLM (without fine-tuning). We evaluate based on their resource use, metrics, outcomes when notes. find higher-resource often, but not necessarily, perform better than lower-resource models. also all report similar trends outcomes, although absolute values estimated measurements are always within uncertainties each other. results discuss relevant considerations researchers seeking select model type as classifier. Published by American Physical Society 2025

Language: Английский

Citations

0