ThenAI says "Hello world"

10 mins read

Jan 26, 2026

The secret ThenAI master plan (just between you and me)

One of the hardest parts about building something new is knowing when to stop being quiet about it.

Over the last six months, I've had over 100 conversations with people across the AI and data ecosystem: enterprise data leaders, AI researchers, PE investors, compliance officers, telco executives, healthcare innovators, and AI founders on both sides of what I'm building. Every single conversation reinforced the same truth: we're living in a world where AI companies are desperate for quality training data while enterprises sit on billions in untapped data assets, and nobody is solving this comprehensively.

The crisis nobody wants to talk about

As you may know, the AI industry is experiencing an unprecedented data crisis. Companies are spending $20 billion annually acquiring training data. OpenAI, Anthropic, and others are running out of internet data to train on. Models are plateauing. The entire industry knows this is coming.

Here's what they're doing about it: Either cobbling together Frankenstein datasets from Kaggle, Hugging Face, and scattered public sources, paying anywhere from $150k to $250k for single data batches from bespoke providers that are of questionable quality, or hoping synthetic data would be able to feed the data-hungry models.

Meanwhile, enterprises are sitting on the solution.

Around 60 to 80% of enterprise data is "data exhaust." It's not customer information. It's not competitively sensitive. It's the byproduct of doing business. Telecommunications companies have device connection patterns from millions of modems. Sports organizations have unused filming footage. Insurance companies have diagnostic data they're required to collect for compliance but never monetize.

This data is collecting dust. Literally costing companies money in storage fees. All while AI companies would pay substantial sums for access to it.

The problem isn't scarcity. It's fear, friction, and infrastructure.

Why this market doesn't exist yet

Fear. The major barrier is being risk-averse, especially at large companies. But this is leaving enormous amounts of money on the table. Most of the data isn't competitive. It could be a major revenue source instead of an expense.

The infrastructure cost. To monetize data properly, you need multi-format support for structured, semi-structured, and unstructured data. You need versioning and provenance tracking. You need data validation, quality verification, and authentication. You need escrow systems for trust. You need ETL in and out of legacy systems where most enterprise data actually lives.

Building all of this in-house for unknown transaction volumes makes no economic sense. But it makes perfect sense for a shared vendor to build once, just like the early days of cloud computing that we now take for granted.

Trust (the lack of). There's a race to the bottom in the aggregation space with thousands of vendors competing on price. Meanwhile, “premium” bespoke data providers charge anywhere from $100k to $250k for single batches with variable quality. Buyers don't trust quality. Sellers don't trust privacy protections.

The technology exists. The will does not.

I spent four years at Skyflow, where we solved PII, PCI, and PHI compliance challenges previously thought not possible. I've seen anonymization, data validation, and quality verification work at scale. The infrastructure patterns exist.

What's missing is someone willing to build the shared service layer that makes data transactions economically viable for everyone.

Consider the parallel to real estate. Nobody expects buyers and sellers to handle their own title searches, escrow, and legal compliance. Neutral third parties exist because trust requires infrastructure. Data transactions need the same thing.

Amazon doesn't ask every seller to build their own fulfillment center. They built the infrastructure once and created a marketplace worth hundreds of billions. That's the model.

What we are building

ThenAI is a two-sided marketplace for AI training data. We handle everything that's been keeping this market from existing.

For data sellers: Revenue share model where we absorb setup and processing costs. We take a variable rate of 10 to 40% based on service level, similar to Amazon's fulfillment model. We only get paid when we create value for you.

For AI buyers: End the Frankenstein approach. Stop manually combining disparate datasets that don't match your distribution requirements. Get access to domain-specific data with pre-validation and quality checks. Get exclusive datasets for competitive advantage instead of the same public data everyone else is using.

For the industry: We're building the infrastructure once and using it multiple times. Fixed cost advantage. When a telco monetizes their customer usage patterns through ThenAI, they're helping unlock data sharing across healthcare, finance, and beyond. Shared infrastructure with high fixed costs and low marginal costs is how markets scale.

We're also inviting industry leaders to join us in creating the gold standard reference frameworks this market desperately needs. Quality verification, authentication standards, ethical guidelines. These don't exist yet because nobody has built the market that needs them.

Why me, why now

I spent the last fifteen years building toward this exact moment, though I didn't know it at the time.

At Accenture, I worked on Fortune 500 ERP systems and saw firsthand how much data gets generated and how much just sits there unused. Storage fees add up while potential revenue goes unrealized.

At Zuora, I watched companies across the world struggle to innovate their monetization models, trying to extract value from assets they already possessed.

At AppZen, I learned how transformative good data is for AI products and how devastating bad data can be.

At Skyflow, I saw the data sourcing lifecycle up close. The technical challenges of privacy preservation. The complexity of data transformation.

Every role showed me a different piece of the puzzle. The data exists. The demand exists. The technology exists. What's been missing is someone willing to build the neutral platform that makes the transactions possible.

The master plan

The market timing creates urgency. Post-LLM, vision language models need fewer but higher quality examples. Three to six instead of thousands. Better data provides the edge. Wall Street has spent 40 to 50 years using data as a differentiator. AI companies will do the same, and the window to establish this market is now.

Here's the plan:

  1. Build infrastructure for high-value verticals where pain is acute and sellers are ready

  2. Use that traction to expand horizontally across more industries as trust and understanding grow

  3. Use that scale to drive down costs through shared infrastructure while maintaining quality standards

  4. Establish industry standards for data quality verification, authentication, and ethics that don't exist today

Don't tell anyone (except I'm telling you right now because I need your help)

That's where you come in

The overarching purpose of ThenAI is to help expedite the move from a data-hoarding economy toward a data-sharing economy. That's the sustainable solution for advancing AI.

We're looking for a small number of partners to start this conversation with us.

If you're an enterprise sitting on valuable data exhaust you're not using, we want to talk. If you're an AI company tired of cobbling together inadequate training sets, we want to talk. If you're someone who believes this market needs to exist and wants to help build it, we want to talk.

The early stages will require patience. We'll need significant handholding on both sides. But we're putting skin in the game. Revenue share means we only succeed when you succeed. We're building the shared infrastructure so this market can finally exist at scale.

The data exists. The demand exists. The technology exists. What's been missing is someone willing to bridge the gap.

That's where ThenAI comes in.

If you want to be part of building the data sharing economy, reach out via email or schedule a call.

Founder and CEO of ThenAI,

Chih-Hsuan Wu