A recent scientific paper co-authored by Microsoft researchers scrutinized the “trustworthiness” and potential toxicity of large language models (LLMs), specifically focusing on OpenAI’s GPT-4 and its predecessor, GPT-3.5.
The research team found that GPT-4, although generally more reliable than GPT-3.5 in standard benchmarks, is more susceptible to “jailbreaking” prompts that bypass the model’s safety measures. These prompts can lead GPT-4 astray, following misleading instructions more precisely and generating harmful content.
The co-authors’ blog post accompanying the paper states, “We also find that although GPT-4 is usually more trustworthy than GPT-3.5 on standard benchmarks, GPT-4 is more vulnerable given jailbreaking system or user prompts, which are maliciously designed to bypass the security measures of LLMs, potentially because GPT-4 follows (misleading) instructions more precisely.”
Surprisingly, Microsoft’s involvement in this research, which appears to cast OpenAI’s GPT-4 in a negative light, can be attributed to a collaboration with Microsoft product groups to confirm that potential vulnerabilities do not impact customer-facing services.
It’s important to note that the research team worked with Microsoft product groups to confirm that the potential vulnerabilities identified do not impact current customer-facing services. This is in part true because finished AI applications apply a range of mitigation approaches to address potential harms that may occur at the model level of the technology.
The blog post assures that mitigation approaches are in place to address potential harms at the model level, and OpenAI has been made aware of the vulnerabilities identified in the system.
Furthermore, the study revealed that GPT-4 was more prone to leaking private and sensitive data, including email addresses, compared to other LLMs.
As the scientific community continues to explore the capabilities of LLMs, ensuring their ethical and responsible deployment remains a critical priority for the industry. Let us know your views on this in the comments section below.