Up to now, AI companies have accessed YouTube’s vast resources of videos, captions, and other materials without seeking permission. A startup focused on content licensing for AI, named Calliope Networks, aims to alter this trend with its initiative called “License to Scrape,” which directly targets YouTube creators.
“There’s clear interest from AI companies to gather YouTube content, as evidenced by their ongoing actions. Our objective is to develop a tool that facilitates legal and straightforward access for them,” states Calliope Networks CEO Dave Davis. In contrast to large social platforms like Reddit, YouTube hasn’t formed agreements with major AI firms to allow scraping of its videos. The benefit of the License to Scrape is its ability to bypass the need for YouTube itself to provide a massive archive of content at once, instead organizing a collective of creators to negotiate a comprehensive license.
Davis, who previously worked in traditional media licensing and left his position at the Motion Picture Licensing Corporation to establish Calliope, believes that the AI sector will shift from unapproved scraping to a licensing framework. This perspective is gaining traction, coinciding with a surge in startups focused on AI data licensing. Calliope Networks is a founding member of the Datasets Providers Alliance, a group that mandates all creators and rights holders to consent to scraping.
Davis envisions the following process: YouTube creators willing to license their content will forge an agreement with Calliope, which will, in turn, sublicense the material for training generative AI models. To make the venture appealing to AI companies, a substantial amount of content will be essential, so the initiative must rally enough YouTubers before it can effectively commence. Calliope would retain a share of the licensing fees collected from the AI firms.
While there isn’t currently a similar approach in the AI sector, Davis has crafted the scraping license format by drawing inspiration from established practices within the entertainment field, such as those used by Broadcast Music Inc. (BMI) and the American Society of Composers, Authors, and Publishers (ASCAP), both of which implement blanket licenses for music.
“We are still in the preliminary recruitment phase,” Davis notes. He anticipates that Calliope will need to provide a minimum of 25,000 to 50,000 hours of YouTube content to gain traction within the AI realm. The necessity for this substantial amount of footage highlights the importance of collaboration among creators in monetizing AI training—volume plays a crucial role in this industry, as video generators rely on vast quantities of data.
At this point, no major influencers have publicly backed the license; however, Calliope has already approached several influencer marketing agencies, including Viral Nation, to onboard clients. “The feedback from creators has been extremely positive,” shares Bianca Serafini, the head of content licensing at Viral Nation. She is optimistic that a significant portion of their nearly 900 YouTube creators will get involved. “This is the first time we’ve been presented with something like this.”
So, what is YouTube’s perspective on this initiative? While Davis has not collaborated directly with the platform regarding this project, he believes that it aligns with YouTube’s goals. “I think YouTube aims to empower creators with greater control,” Davis explains.
While YouTube refrains from commenting on specific licensing firms, it does advocate for users to form their own agreements. “In general, creators can negotiate arrangements with third-party companies regarding their content on our platform,” states YouTube spokesperson Jack Malon. He highlighted that the company has recently issued a blog post outlining its commitment to providing YouTubers with “more control” in the era of AI. A key aspect for YouTube is obtaining authorization or explicit permission: “Unauthorized access to creator content is against YouTube’s Terms of Service, and we will continue to implement measures to ensure third parties uphold these terms.”
The success of the License to Scrape program hinges not just on attracting prominent YouTubers, but also on a significant transformation in how AI companies approach foundational training. With over 30 copyright cases concerning permissionless data-scraping currently underway in US courts, such a transformation may eventually be legally required. Nevertheless, as text-to-video generation tools typically demand vast amounts of high-quality data to function effectively, the quest for new data sources may need to adopt a different strategy.
Until then, it remains uncertain if major AI firms intend to cease their data scraping activities on what they label as “publicly available” information from sites like YouTube. When they secure agreements that encompass foundational model training—such as AI startup Runway partnering with movie studio Lionsgate—the data typically isn’t “publicly available.” Most agreements they are forging with platforms and publishers are geared towards supplying content for AI search products like SearchGPT rather than foundational model training. Recently, after facing a legal threat from the well-known UK-based parenting forum Mumsnet, OpenAI informed WIRED of its primary interest in licensing large datasets that aren’t publicly accessible.
Meanwhile, proponents of this initiative believe it’s essential to advance rather than waiting for AI companies to exhibit interest. “We just have to get ahead of this,” asserts Serafini.