They Need More: Why Your Expertise is AI's Next Frontier
How Policy Shifts And Data Hunger Are Creating New Markets For Specialized Knowledge
America’s AI Action Plan and President Trump’s additional commentary made clear that the Administration will favor AI technology advancement over IP infringement concerns. By framing the issue as a national security priority, there's little reason to believe there will be any substantive political opposition, so you should assume this will be the government's posture for at least the next 4 years.
But this doesn’t mean IP rights holders and domain experts need to wait for idly for the next election. Markets are developing around this tension in the form of better tooling to prevent IP scraping (as I wrote about in my last article), but there are also brokers matching up those with desirable IP that is not easily scraped with AI labs willing to pay a premium for high quality, curated, and sometimes niche domain expertise and IP.
They Aren’t Coming to Help
If you've been hoping for regulation to curb use of copyrighted IP by AI labs, it's unlikely to happen in the US through legislation. Some court rulings have favored rightsholders, but this will create a patchwork of precedence, not any universal protection. Given the stated position of the administration along with the political and economic power held by the big AI labs, we may even see legislation to overturn these court rulings by explicitly adding AI model training to the list of "fair use" if it becomes a big enough problem for the labs.
This doesn't mean there's nothing you can do to maintain the value of your IP. There are solutions emerging in the market for protecting the value of IP and monetizing it new ways.
Assume Anything Public Will End Up in a Model
As a business owner or individual, you'll need to become the gatekeeper of your own IP. Be deliberate about what you make available online. Some assets benefit from broad reach. Ensuring web scrapers can easily consume your public sales and marketing materials and press releases may be beneficial. Consumable support documentation may help your customers serve themselves with off the shelf AI assistants rather than an in-house chat bot.
However, high-value IP that gives you a market advantage needs to be thought about differently. Paywalls are already implemented by many companies for this purpose, but in some ways they restrict access to some materials that may be desirable to expose to web scrapers through blanket policies. New capabilities like I wrote about in my previous article on Cloudflare should be considered as you decide how to keep some content available to humans but make unavailable to web scraping bots. But beyond re-establishing good hygiene for publishing on the AI-era internet, there's a bigger opportunity emerging for domain experts and their IP.
They Need Mooooooooooore.
Here's the truth. The frontier model developers (OpenAI, Anthropic, Google, et all) have already taken it all. They've already essentially used the entire internet. And…it's not enough. It's not enough for them to deliver on the generated hype. They still need more. Specifically, they need domain expert knowledge, and they need it in large amounts. This is why the market for AI model training data is exploding. See the recent acquisition of Scale AI by Meta for $13B and the surge in business for their competitors like Handshake.ai. "Handshake AI connects domain experts with leading AI labs to test, challenge, and guide how LLMs learn by providing the human judgment AI needs."
But there's a problem with this business model. I've only been in the consulting business for a year and, but over that time I've received the advice almost daily that you shouldn't charge an hourly rate. You can and should charge based on the size of the problem you're solving for the client. Domain experts are paid a decent hourly rate to provide their knowledge, but is that the true value of that knowledge to the AI labs that have billions on the line now and trillions at stake in the future?
At the discipline level (e.g. "Physics", "Law", "Medicine", etc), the general knowledge base is becoming commoditized, but new research is continuously developed and needs to be incorporated to keep models up to date. These are the areas on which large AI training data brokers are focused, and their main customers are the big AI labs.
As I wrote about before, Cloudflare sees that their customers place value on solutions that can protect their publications from being scraped for no compensation, but they’re still in the publishing business. They also value solutions that can help them sell more of their publications. Their marketplace for web scraping empowers their customers to do just that.
There are other emerging players like Protege, making markets for film and audio rights holders with data curation tools to filter, tag, and otherwise make the IP more readily consumable by labs and then brokering licensing deals. These early movers are a great signal of a broader opportunity to monetize IP before it ends up inadvertently (and without consent, for that matter) in large scale training data sets of the AI labs.
The Opportunity for Domain Experts and IP Owners
Even with all these efforts, however, there is still an enormous amount of domain knowledge that AI models can’t access today. The opportunity for those who have it is to understand first who would value LLM-powered access to this knowledge at a premium and second, how can they best capture it and package it for use in AI systems that those potential buyers would use.
While large scale, generally applicable data sets attract the interest of the AI labs. There are many small to mid-sized organizations operating in market verticals that would stand to benefit from AI tools enhanced with niche domain knowledge. The opportunity is there, but the market is still very nascent. There aren’t many people who have this kind of domain knowledge who also know how to best to curate it and package it for AI systems. Similarly, potential buyers aren’t yet aware of an immediate need beyond what they’ve already seen with off the shelf AI. But I expect that players in a few niche areas will emerge with structured and curated knowledge bases, easily integrated through MCP or other protocols as off-the-shelf agents.
Cue the contrarians who will say “the foundation models already can do all that”. They can do a lot of very impressive things, indeed, but they don’t have domain knowledge that isn’t in their training data and new domain knowledge is being developed constantly. Change is accelerating and the latest methods and techniques in many disciplines are evolving. Without a continuous process for incorporating new knowledge, eventually AI systems will fail to reflect (even a heavily biased) reality and will lose utility as they drift.
This is great news for domain experts.
If data is the new oil, domain expertise is the new gasoline.
It’s the refinement of data and experiences into useful real-world insight. A market for it will emerge as a supply chain of domain experts, curators, brokers, and integrators figure out how to best work together to maximize its market value.
We’ll get more into some of these dynamics in future articles.

