OpenAI is taking up the mantle against AI “hallucinations,” the company announced Wednesday, with a newer method for training AI models.
The research comes at a time when misinformation stemming from AI systems is more hotly debated than ever, amid the generative AI boom and lead-up to the 2024 U.S. presidential election. OpenAI accelerated the generative AI boom last year when it released ChatGPT, its chatbot powered by GPT-3 and GPT-4, and surpassed 100 million monthly users in two months, reportedly setting a record for fastest-growing app. To date, Microsoft has invested more than $13 billion in OpenAI, and the startup’s value has reached roughly $29 billion.
AI hallucinations occur when models like OpenAI’s ChatGPT or Google‘s Bard fabricate information entirely, behaving as if they are spouting facts. One example: In Google’s own February promotional video for Bard, the chatbot makes an untrue claim about the James Webb Space Telescope. More recently, ChatGPT cited “bogus” cases in a New York federal court filing, and the New York attorneys involved may face sanctions.
“Even state-of-the-art models are prone to producing falsehoods – they exhibit a tendency to invent facts in moments of uncertainty,” the OpenAI researchers wrote in the report. “These hallucinations are particularly problematic in domains that require multi-step reasoning, since a single logical error is enough to derail a much larger solution.”
OpenAI’s potential new strategy for fighting the fabrications: Train AI models to reward themselves for each individual correct step of reasoning when they’re arriving at an answer, instead of just rewarding a correct final conclusion. The approach is called “process supervision,” as opposed to “outcome supervision,” and could lead to better explainable AI, according to the researchers, since the strategy encourages models to follow more of a human-like chain of “thought” approach.
“Detecting and mitigating a model’s logical mistakes, or hallucinations, is a critical step towards building aligned AGI [or artificial general intelligence],” Karl Cobbe, mathgen researcher at OpenAI, told CNBC, noting that while OpenAI did not invent the process supervision approach, the company is helping to push it forward. “The motivation behind this research is to address hallucinations in order to make models more capable at solving challenging reasoning problems.”
OpenAI has released an accompanying dataset of 800,000 human labels it used to train the model mentioned in the research paper, Cobbe said.
Ben Winters, senior counsel at the Electronic Privacy Information Center and leader of its AI and human rights project, expressed skepticism, telling CNBC he would be interested to see the full dataset and accompanying examples.
“I just don’t think that this alone does any significant mitigation of concerns about misinformation and incorrect results… when it’s actually being used in the wild,” Winters said. He added, “It definitely matters whether they plan on implementing whatever they have found through their research here [into their products], and if they’re not, that does bring some fairly serious questions about what they are willing to release into the public.”
Since it’s not clear that the OpenAI paper has been peer-reviewed or reviewed in another format, Suresh Venkatasubramanian, director of the center for tech responsibility at Brown University, told CNBC that he views the research as more of a preliminary observation than anything else.
“This will need to shake out in the research community before we can say anything certain about this,” Venkatasubramanian said. “In this world, there are a lot of results that come out very regularly, and because of the overall instability in how large language models work, what might work in one setting, model and context may not work in another setting, model and context.”
Venkatasubramanian added, “Some of the hallucinatory stuff that people have been concerned about is [models] making up citations and references. There is no evidence in this paper that this would work for that…It’s not that I’m saying it won’t work; I’m saying that this paper does not provide that evidence.”
OpenAI did not respond to a request for comment asking whether the research had been externally reviewed in any capacity, or when, if ever, the company plans on implementing the new strategy into ChatGPT and its other products.
“It’s certainly welcome to see companies trying to tinker with the development of their systems to try and reduce these kinds of errors – I think what’s key is to interpret this as corporate research, in light of the many barriers that exist to deeper forms of accountability,” Sarah Myers West, managing director of the AI Now Institute, told CNBC.
West added, “[OpenAI is] releasing a small dataset of human-level feedback with this paper, but it hasn’t provided basic details about the data used to train and test GPT-4. So there’s still a tremendous amount of opacity that is challenging any meaningful accountability efforts in the field of AI, even as these systems are directly affecting people already.”