HyperAI

Visual artists are increasingly concerned about the unauthorized use of their work by generative AI systems, but most lack the technical knowledge or control needed to protect their creations from AI crawlers—programs that collect internet content to train artificial intelligence models. A new study by researchers from the University of California San Diego and the University of Chicago reveals that despite the availability of protective tools, many artists remain unable to effectively shield their work. The study, set to be presented at the 2025 Internet Measurement Conference, analyzed over 200 visual artists’ practices and reviewed more than 1,100 professional artist websites. It found that while 80% of surveyed artists have taken steps to prevent their work from being used in AI training, only a fraction possess the tools or understanding to do so effectively. One of the most effective protective measures is the use of robots.txt—a simple text file that instructs web crawlers which pages to avoid. However, more than 60% of artists surveyed were unaware of this basic tool. Even when artists use platforms like Squarespace, which offers a user-friendly interface to block AI crawlers, only 17% actually enable the feature, often due to lack of awareness. Many artists rely on tools like Glaze, developed by the University of Chicago researchers, which subtly alters images to make them less usable for AI training. Two-thirds of the surveyed artists reported using Glaze, and 60% have reduced their online content sharing, with 51% posting only low-resolution versions of their work. Despite these efforts, the effectiveness of protection is limited. While major AI companies like Google and OpenAI generally respect robots.txt in both policy and practice, others do not. The study found that Bytespider, operated by ByteDance (TikTok’s parent company), consistently ignores these directives. Moreover, many AI crawlers claim to follow robots.txt but lack verifiable compliance. The situation is further complicated by shifting online policies. Some major publishers, including Vox Media and The Atlantic, removed AI crawler restrictions from their robots.txt files after entering licensing deals with AI firms. Researchers also observed a growing number of websites, including right-wing misinformation sites, allowing AI crawlers—possibly to spread false information into large language models. Another emerging tool is Cloudflare’s “block AI bots” feature, which allows website owners to restrict access to AI crawlers. Yet, only 5.7% of Cloudflare users have activated it, indicating slow adoption. Legal frameworks are also evolving. In the European Union, the AI Act now requires AI developers to obtain authorization from copyright holders before using their data. In the U.S., courts are still determining the scope of fair use in AI training, with ongoing lawsuits challenging whether scraping copyrighted content for model training is permissible. The researchers conclude that legal uncertainty is driving demand for stronger technical controls. However, without greater transparency from service providers and better access to tools, most artists remain vulnerable. As one author noted, “The more legal remedies are unclear or weakened, the more artists will depend on technical solutions—yet most still can’t access or use them.”

Visual artists struggle to protect their work from AI crawlers despite available tools, study reveals

Related Links