Опубликована: Дек. 9, 2024
Язык: Английский
Опубликована: Дек. 9, 2024
Язык: Английский
JMIR AI, Год журнала: 2024, Номер unknown
Опубликована: Дек. 9, 2024
People with schizophrenia often present cognitive impairments that may hinder their ability to learn about condition. Education platforms powered by Large Language Models (LLMs) have the potential improve accessibility of mental health information. However, black-box nature LLMs raises ethical and safety concerns regarding controllability over chatbots. In particular, prompt-engineered chatbots drift from intended role as conversation progresses become more prone hallucinations. To develop evaluate a Critical Analysis Filter (CAF) system ensures an LLM-powered chatbot reliably complies predefined its instructions scope while delivering validated For proof-of-concept, we educational GPT-4 can dynamically access information manual written for people caregivers. CAF, team LLM agents are used critically analyze refine chatbot's responses deliver real-time feedback chatbot. assess CAF re-establish adherence instructions, generate three conversations (by conversing disabled) wherein starts towards various unintended roles. We use these checkpoint initialize automated between adversarial designed entice it Conversations were repeatedly sampled enabled disabled respectively. Three human raters independently rated each response according criteria developed measure integrity; specifically, transparency (such admitting when statement lacks explicit support scripted sources) tendency faithfully convey in manual. total, 36 (3 different conversations, 3 per checkpoint, 4 queries conversation) compliance Activating resulted score was considered acceptable (≥2) 67.0% responses, compared only 8.7% deactivated. Although rigorous testing realistic scenarios is needed, our results suggest self-reflection mechanisms could enable be effectively safely platforms. This approach harnesses flexibility constraining appropriate accurate interactions.
Язык: Английский
Процитировано
0Опубликована: Дек. 9, 2024
Язык: Английский
Процитировано
0