By intentionally inserting invisible patterns into AI-generated text, it can be used as a clue to whether the word currently being read was written by a human.
These ‘watermarks’ are invisible to the naked eye, but allow computers to determine the likelihood that the text was created by an AI system. If watermarks are applied to large language models, it seems likely that some of the problems already caused by these language models can be avoided.
For example, after OpenAI launched a chatbot called ChatGPT last November, students have already started cheating by writing essays using this language model. After CNET, a news website, used Chat GPT to write an article, it was embroiled in plagiarism suspicions and had to publish a corrected article. Introducing a watermarking method to the system before reporting AI-generated articles like this would solve this problem.
These watermarks have already been used in several studies and have almost perfectly identified AI-generated texts. For example, researchers at the University of Maryland were able to detect text generated by Meta’s open-source language model, OPT-6.7B, with a detection algorithm developed in-house. Peer review of the paper in which the results were published has not yet been conducted, and the watermark used will be released free of charge around February 15th.
AI language models work by predicting and generating words one at a time. The watermarking algorithm randomly classifies the vocabulary used by the language model into ‘green list’ and ‘red list’ whenever each word is generated, and instructs the language model to select words included in the green list.
The higher the number of green-listed words in a passage, the more likely the text was machine-generated. Words in human text tend to be classified more arbitrarily. For example, for the word ‘beautiful’, the watermarking algorithm can classify the word ‘flower’ as green and the word ‘orchid’ as red. Assistant Professor Tom Goldstein of the University of Maryland, who participated in the related study, explains that AI models with watermarking algorithms are more likely to use the word ‘flower’ than ‘orchid’.
Chat GPT, a new type of large language model, generates sentences so high-level that they can be mistaken for human writing. These AI models confidently present a lot of information, but in reality are notorious for producing information full of lies and bias. For an unskilled layman, the task of distinguishing text written by an AI model from text written by a human can be nearly impossible. The formidable pace of AI development means that existing tools for distinguishing AI-synthesized text will quickly become obsolete as new models with even greater performance emerge. There is a relentless race among AI developers to find new safeguards against the latest AI models.
John Kirchenbauer, a researcher at the University of Maryland who participated in the watermarking work, describes the current situation as a kind of lawless zone. He hopes the watermarking tool will power efforts to distinguish AI-generated text. John claims that the tool his team has developed can be adapted to work with any AI language model that predicts which words come after certain words.
Irene Solaiman, policy director at AI startup Hugging Face, said the results of the study were promising and timely. She previously worked as an AI researcher at Open AI and had experience working on AI artifact detection, but she did not participate in this research.
“As the scope of application of the model expands, more and more people outside the field of AI who have no computer science training will feel the need to use AI text generation detection methods,” Soleiman says.
However, this new method has limitations. Watermarking only works if the creator of the text adds watermarking capabilities to the large language model from the start. Open AI is known to be conducting research on AI-generated text detection methods, including watermarks, but most of the specific research details are not disclosed. OpenAIs generally don’t give outsiders enough information about how Chat GPT works or how it’s trained, and they take a much more closed stance on the right to modify their language models. Open AI did not immediately respond to MIT Technology Review’s request for comment.
Soleiman says it’s unclear how the new findings will apply to language models created by non-meta companies, such as Chat GPT. The AI model used to test the watermark is also small compared to popular language models such as Chat GPT.
Additional testing needs to be done to examine the various scenarios in which someone attempts to defeat the watermarking method, but the researchers counter that people who want to circumvent these detection techniques don’t have many options. “To get rid of a watermark, you have to remove about half of the words in a passage of text,” says Goldstein.
“I wouldn’t make the risky choice of underestimating high school students,” Soleiman said, adding, “but most normal people won’t be able to manipulate this kind of watermark.”