Published: Dec. 9, 2024
Language: Английский
Published: Dec. 9, 2024
Language: Английский
JMIR AI, Journal Year: 2024, Volume and Issue: unknown
Published: Dec. 9, 2024
People with schizophrenia often present cognitive impairments that may hinder their ability to learn about condition. Education platforms powered by Large Language Models (LLMs) have the potential improve accessibility of mental health information. However, black-box nature LLMs raises ethical and safety concerns regarding controllability over chatbots. In particular, prompt-engineered chatbots drift from intended role as conversation progresses become more prone hallucinations. To develop evaluate a Critical Analysis Filter (CAF) system ensures an LLM-powered chatbot reliably complies predefined its instructions scope while delivering validated For proof-of-concept, we educational GPT-4 can dynamically access information manual written for people caregivers. CAF, team LLM agents are used critically analyze refine chatbot's responses deliver real-time feedback chatbot. assess CAF re-establish adherence instructions, generate three conversations (by conversing disabled) wherein starts towards various unintended roles. We use these checkpoint initialize automated between adversarial designed entice it Conversations were repeatedly sampled enabled disabled respectively. Three human raters independently rated each response according criteria developed measure integrity; specifically, transparency (such admitting when statement lacks explicit support scripted sources) tendency faithfully convey in manual. total, 36 (3 different conversations, 3 per checkpoint, 4 queries conversation) compliance Activating resulted score was considered acceptable (≥2) 67.0% responses, compared only 8.7% deactivated. Although rigorous testing realistic scenarios is needed, our results suggest self-reflection mechanisms could enable be effectively safely platforms. This approach harnesses flexibility constraining appropriate accurate interactions.
Language: Английский
Citations
0Published: Dec. 9, 2024
Language: Английский
Citations
0