The tool, called PhotoGuard, works like a protective shield by altering photos in tiny ways that are invisible to the human eye but prevent them from being manipulated. If someone tries to use an editing app based on a generative AI model such as Stable Diffusion to manipulate an image that has been “immunized” by PhotoGuard, the result will look unrealistic or warped.
Right now, “anyone can take our image, modify it however they want, put us in very bad-looking situations, and blackmail us,” says Hadi Salman, a PhD researcher at MIT who contributed to the research. It was presented at the International Conference on Machine Learning this week.
PhotoGuard is “an attempt to solve the problem of our images being manipulated maliciously by these models,” says Salman. The tool could, for example, help prevent women’s selfies from being made into nonconsensual deepfake pornography.
The need to find ways to detect and stop AI-powered manipulation has never been more urgent, because generative AI tools have made it quicker and easier to do than ever before. In a voluntary pledge with the White House, leading AI companies such as OpenAI, Google, and Meta committed to developing such methods in an effort to prevent fraud and deception. PhotoGuard is a complementary technique to another one of these techniques, watermarking: it aims to stop people from using AI tools to tamper with images to begin with, whereas watermarking uses similar invisible signals to allow people to detect AI-generated content once it has been created.
The MIT team used two different techniques to stop images from being edited using the open-source image generation model Stable Diffusion.
The first technique is called an encoder attack. PhotoGuard adds imperceptible signals to the image so that the AI model interprets it as something else. For example, these signals could cause the AI to categorize an image of, say, Trevor Noah as a block of pure gray. As a result, any attempt to use Stable Diffusion to edit Noah into other situations would look unconvincing.
The second, more effective technique is called a diffusion attack. It disrupts the way the AI models generate images, essentially by encoding them with secret signals that alter how they’re processed by the model. By adding these signals to an image of Trevor Noah, the team managed to manipulate the diffusion model to ignore its prompt and generate the image the researchers wanted. As a result, any AI-edited images of Noah would just look gray.
The work is “a good combination of a tangible need for something with what can be done right now,” says Ben Zhao, a computer science professor at the University of Chicago, who developed a similar protective method called Glaze that artists can use to prevent their work from being scraped into AI models.