Accuracy and reliability of large language models in assessing learning outcomes achievement across cognitive domains DOI Open Access

Swapna Haresh Teckwani,

Amanda Huee‐Ping Wong, W. A. N. V. Luke

et al.

AJP Advances in Physiology Education, Journal Year: 2024, Volume and Issue: 48(4), P. 904 - 914

Published: Nov. 8, 2024

The advent of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT and Gemini, has significantly impacted the educational landscape, offering unique opportunities for learning assessment. In realm written assessment grading, traditionally viewed as a laborious subjective process, this study sought to evaluate accuracy reliability these LLMs in evaluating achievement outcomes across different cognitive domains scientific inquiry course on sports physiology. Human graders three LLMs, GPT-3.5, GPT-4o, were tasked with scoring submitted student assignments according set rubrics aligned various domains, namely "Understand," "Analyze," "Evaluate" from revised Bloom's taxonomy "Scientific Inquiry Competency." Our findings revealed that while demonstrated some level competency, they do not yet meet standards human graders. Specifically, interrater (percentage agreement correlation analysis) between was superior compared two grading rounds each LLM, respectively. Furthermore, concordance LLM mostly moderate poor terms overall scores pre-specified domains. results suggest future where AI could complement expertise but underscore importance adaptive by educators continuous improvement current technologies fully realize potential.

Language: Английский

Google Gemini as a next generation AI educational tool: a review of emerging educational technology DOI Creative Commons
Muhammad Imran, Norah Almusharraf

Smart Learning Environments, Journal Year: 2024, Volume and Issue: 11(1)

Published: May 23, 2024

Abstract This emerging technology report discusses Google Gemini as a multimodal generative AI tool and presents its revolutionary potential for future educational technology. It introduces features, including versatility in processing data from text, image, audio, video inputs generating diverse content types. study recent empirical studies, practice, the relationship between landscape. further explores Gemini’s relevance endeavors practical applications technologies. Also, it significant challenges ethical considerations that must be addressed to ensure responsible effective integration into

Language: Английский

Citations

25

Exploring the Prospects and Perils of Integrating Artificial Intelligence and ChatGPT in Academic and Research Libraries (ARL): Challenges and Opportunity DOI
Satveer Singh Nehra, Sadanand Y. Bansode

Journal of Web Librarianship, Journal Year: 2024, Volume and Issue: unknown, P. 1 - 22

Published: Aug. 13, 2024

An attempt is made to find out the scope of AI and AI-powered Chatbot (ChatGPT) in academic research libraries, its possible challenges opportunities, how it makes a difference. In this study, we found that chatbots have potential revolutionize libraries by promoting specialized librarianship reshaping services thinking outside box, enabling search retrieval information based on personal recommendations. These advancements might raise standards offer. However, include user dependability AI, which may affect users' reading habits, lack skilled staff underdeveloped nations, need for high-quality data. ChatGPT also faces biases, ethical implications, an inability understand tone or context, leading misunderstandings poor communication. To effectively utilize as library customer tool, must manage their data, monitor ChatGPT's responses, consider limitations.

Language: Английский

Citations

8

Accuracy and reliability of large language models in assessing learning outcomes achievement across cognitive domains DOI Open Access

Swapna Haresh Teckwani,

Amanda Huee‐Ping Wong, W. A. N. V. Luke

et al.

AJP Advances in Physiology Education, Journal Year: 2024, Volume and Issue: 48(4), P. 904 - 914

Published: Nov. 8, 2024

The advent of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT and Gemini, has significantly impacted the educational landscape, offering unique opportunities for learning assessment. In realm written assessment grading, traditionally viewed as a laborious subjective process, this study sought to evaluate accuracy reliability these LLMs in evaluating achievement outcomes across different cognitive domains scientific inquiry course on sports physiology. Human graders three LLMs, GPT-3.5, GPT-4o, were tasked with scoring submitted student assignments according set rubrics aligned various domains, namely "Understand," "Analyze," "Evaluate" from revised Bloom's taxonomy "Scientific Inquiry Competency." Our findings revealed that while demonstrated some level competency, they do not yet meet standards human graders. Specifically, interrater (percentage agreement correlation analysis) between was superior compared two grading rounds each LLM, respectively. Furthermore, concordance LLM mostly moderate poor terms overall scores pre-specified domains. results suggest future where AI could complement expertise but underscore importance adaptive by educators continuous improvement current technologies fully realize potential.

Language: Английский

Citations

1