Cloudflare is Meeting its AI Moment
2 recent moves push back on wild west of IP scraping
The wild west strategy, led by OpenAI and followed by most others, has been to innovate first and deal with IP concerns later. The "ask for forgiveness, not permission" approach that nearly all industry leading AI model vendors have adopted for treating model training data has led to incredible growth as people and businesses across sectors begin to get their heads around the potential of AI in their domains.
But it’s also left those with their IP available on the internet with little short-term ability to prevent their IP from being consumed by web crawlers as training data for new AI models that then generate billions in revenue with no compensation flowing back to the original creators. Cloudflare’s recent actions suggest they see a unique opportunity to rebalance this dynamic between AI labs and publishers.
Cloudflare’s annoucements
Cloudflare announced on July 1 the opening of a private beta of a pay per crawl service. This service allows a domain owner to set a flat price per request for crawlers to access their content. They're also working on new tech to provide transparency between AI web crawlers and content owners, allowing web crawlers to identify themselves and give content owners control over what crawlers they will allow.
Cloudflare has also blocked all AI web crawlers from their customers' web content by default. Changing this default setting will greatly reduce the flow of free data that AI labs were using for training by simply changing their domains from opt-out to opt-in. Essentially, Cloudflare is standing up a marketplace for AI training data, putting themselves at the center of what may end up being an enormous market opportunity.
An marketplace for expertise
Training data, but more specifically, knowledge and expertise scraped from the internet, are the raw materials of large language models. If you can reduce your COGS (i.e. cost of training data) to zero, you'd be irresponsible not to do so, and so this has been the approach of the AI labs. However, the long-term consequence of this reduction in COGS is a reduction in the supply of high quality raw materials (i.e. new creations, innovations, knowledge, and expertise).
When the price of goods decreases, supply decreases. When the supply of knowledge and expertise decreases, innovation decreases. Since the release of ChatGPT, the value of many types of expertise has been in free fall, but this may be a first step in restoring some price stability around expertise in the new AI economy.
With Cloudflare’s move, a free market solution starts to emerge for how individuals and businesses can maintain dominion over their expertise. By allowing their customers options to allow, deny, or charge a per-crawl fee for this type of traffic, they’ve set up a marketplace for those seeking expertise and knowledge for model training that respects the exchange in value between the IP rights holder and the AI model vendor. Since the release of ChatGPT and popular rise of multi-modal AI models, the value of many types of expertise has been in free fall, but this may be a first step in restoring some price stability around expertise in the AI economy.
Cloudflare’s unique advantage
For Cloudflare, this is very big. It represents an opportunity for them to grow from a startup darling and disruptive innovator in their market (not small already) and move upstream in the value chain essentially as an IP broker. They are uniquely position to do this because of where they sit in the tech stack today and where they aren't playing compared to their competitors (i.e. frontier model development that requires mass amounts of training data).
Solutions at higher levels of the stack, such as at the application level with paywalls and other gatekeeping methods all have undesirable effects on end users. Cloudflare handles this problem as web traffic flows through their infrastructure, a solution much more effective at blocking undesirable traffic (web crawlers) while simultaneously much less intrusive on desirable traffic (human users). There are no other players with the same combination of scale to deliver and incentive to do so.
Pressure on the hyperscalers
I expect the precedent Cloudflare's actions have set are more important than the short term impact they might deliver. They control more than 20% of the internet infrastructure market, but what about the other 80%? This is where Cloudflare has a unique advantage. Their moves will put other large infrastructure providers in a tricky position. Microsoft, Google, and Amazon all have large stakes in continuing to collect model training data with a ton of skin in the frontier model development game. These companies will be pressured by their customers for similar offerings, but have less upside to do so as it will increase costs for model training in the short term.
Rebalancing the Supply Chain of Expertise
If Cloudflare is successful in creating a market for content crawls, something many in the AI space including myself have been expecting to emerge since late 2023, I expect others will be forced to follow. For Cloudflare, this means going up market and greatly expanding their opportunity for growth. For IP rights holders, it's a first step towards being able to sleep again at night in the AI era with a plausible option for protecting one's IP while being able to monetize it in a new way.
And for the AI labs, although I expect this to be perceived initially as slowing their ability to innovate, it actually begins a critical rebalancing in the AI economy. If the value of expertise and knowledge becomes effectively zero, the slowing of innovation to follow will dwarf any short-term impacts of these market shifts. At least we can squint and see start to see pathways to a viable, sustainable, and more equitable sharing of profits of this new technology across the broader supply chain of expertise, and a functioning supply chain of expertise is fundamental for AI models to continue to improve at the pace we’ve seen over the last several years.

