By reverse engineering the front-end code of 200 AI companies and tracing their APIs, developers discovered that 146 of them were actually wrappers for ChatGPT and other similar platforms. Despite using the same technology stack, they made a staggering 75 times their initial investment.

This article is machine translated
Show original

"Among 200 AI startups, 73% of their products are actually just 'shells,' mainly using ChatGPT and Claude as shells!"

This conclusion dealt a significant blow and sparked considerable controversy within the AI startup community.

Looking back to 2023, OpenAI CEO Sam Altman bluntly stated: "ChatGPT is doomed to die out."

However, the reality is quite the opposite: with the explosive popularity of ChatGPT, a wave of entrepreneurial enthusiasm has swept in, with countless investments pouring in, and some companies have attracted considerable attention even before releasing their products.

Now, software engineer Teja Kusireddy has used data to uncover part of the truth behind this "boom." He reverse-engineered and decompiled the code of 200 AI companies and traced their API calls, discovering that many companies claiming "disruptive innovation" still rely on third-party services for their core functionality, merely adding an extra layer of "innovation" on the outside. The gap between market hype and reality is staggering.

So, are investors "completely clueless," or are AI startups "too good at hyping things up"? How is the line drawn between "self-developed" and "shell-based"? Next, we will take a look at the latest findings and conclusions revealed by Teja Kusireddy from his first-person perspective through a long article.

Why initiate "reverse engineering"?

Last month, I fell down an unexpected "rabbit hole" and became confused—it started as a very simple question, but in the end, it made me start to doubt my understanding of the entire AI startup ecosystem.

It was 2 a.m. that day. While I was debugging a webhook integration, I accidentally discovered something was wrong.

A company that claims to have "independently developed deep learning infrastructure" is calling OpenAI's API every few seconds.

This company just raised $4.3 million from investors by claiming that "we have built a completely different AI technology".

At that moment, I decided to thoroughly investigate just how complicated this matter really was.

Survey Methodology: How I Conducted the Survey

I don't want to write a hot commentary based on "intuition" and complaints; I want data, real data.

So, I started setting up the tools:

Over the next three weeks, I did the following:

We crawled the websites of 200 AI startups from "We're Hiring" posts on YC, Product Hunt, and LinkedIn;

  • Monitor their network traffic sessions for 60 seconds;
  • Their JavaScript bundled files were decompiled and analyzed;
  • The captured API calls are compared with the fingerprint database of known services;
  • Finally, we compared the claims they made on their marketing pages with their actual technical implementations.

I specifically excluded companies that were less than 6 months old (those teams were still in the exploratory stage) and focused on startups that had already received external funding and publicly claimed to have "exclusive technology".

I received data that stunned me.

The results showed that 73% of the companies had a significant gap between their claimed technology and its actual implementation.

The 200 AI startups can be categorized as follows:

But what truly shocked me wasn't just the number. What surprised me even more was that I wasn't even angry about it.

Next, we will break it down step by step, which can be divided into three modes.

Mode 1: The so-called "self-developed model" is actually just GPT-4 with some additional operations.

Every time I see phrases like "our self-developed large language model," I can almost predict what will be discovered next.

As a result, I guessed correctly 34 out of 37 times.

Technical Features Revealed:

These are obvious clues when I monitor outbound traffic:

  • Every time a user interacts with the so-called "AI," a request is sent to api.openai.com;
  • The request headers contain the OpenAI-Organization identifier;
  • Response time is fully compliant with OpenAI's API latency pattern (most queries 150–400ms).
  • Token usage is consistent with GPT-4's billing tier;
  • The rate-limited exponential backoff is exactly the same as that of OpenAI.

Real cases exposed

There's a company that claims to have a "revolutionary natural language understanding engine," but after decompiling their code, I discovered that their so-called "self-developed AI" is just these few lines of code:

That's it—the entire so-called "self-developed model" appeared 23 times in their fundraising presentation.

  • No fine-tuning
  • No custom training
  • No innovative architecture

It simply gave GPT-4 a system prompt that said, "Please pretend you are not GPT-4."

In reality, the company's costs and pricing are only:

  • GPT-4 API: $0.03 per 1K input tokens, $0.06 per 1K output tokens.
  • Average query duration: Approximately 500 input tokens, 300 output tokens
  • Cost per query: approximately $0.033

Their pricing structure for users is: $2.50 per query (or $299 per month for 200 queries).

The direct cost profit margin is as high as 75 times!

Even more absurdly... I discovered that the code from three different companies was almost identical:

  • The variable names are exactly the same
  • The annotation style is exactly the same.
  • The instruction to "never mention OpenAI" is also completely consistent.

Therefore, I deduce that these companies are either:

  • copied from the same tutorial
  • They hired the same outsourced engineer.
  • They used the same startup accelerator template.

Another company added what it called "innovative features":

In their presentation to investors, they called this feature "Intelligent Fallback Architecture".

In my personal opinion, there's nothing wrong with packaging OpenAI's API itself. The problem lies in these companies calling it a "self-developed model," when in reality it's just an API plus custom system prompts.

This is like buying a Tesla, changing the logo, and then claiming to have invented "exclusive electric vehicle technology."

Pattern 2: The RAG architecture that everyone is using (but nobody acknowledges).

Compared to the first model, this one is more subtle. RAG (Retrieval-Augmented Generation) is indeed useful, but the gap between the marketing hype and actual implementation of many AI startups is much larger.

They boasted that they had developed—"advanced neural retrieval + self-developed embedding model + semantic search infrastructure..."

In reality, what they possess is:

I found that 42 companies were using almost identical technology stacks:

  • The embedding model used is OpenAI's text-embedding-ada-002 (instead of "our self-developed embedding model").
  • The vector storage uses Pinecone or Weaviate (instead of "our proprietary vector database").
  • The text generation uses GPT-4 (instead of "the model we trained").

The actual code looks like this:

This isn't to say the technology is bad—RAG is indeed effective. But calling it "self-developed AI infrastructure" is as absurd as calling your WordPress website a "custom content management architecture."

Let's do the math again: the company's actual costs (per query):

  • OpenAI Embedded Model: $0.0001 per 1K tokens
  • Pinecone query: $0.00004 per query
  • GPT-4 generation: $0.03 per 1K tokens
  • Total cost: Approximately $0.002 per query

The actual price paid by the user is: $0.50–$2.00 per query.

API cost-profit margins can be as high as 250–1000 times!

I found that 12 companies had completely identical code structures, and another 23 companies had a similarity of over 90%.

The only difference is the variable name, and whether to use Pinecone or Weaviate.

  • One company added a Redis cache and touted it as an "optimization engine".
  • Another company added retry logic and trademarked it as "Intelligent Fault Recovery System".

The economics of a typical startup running 1 million queries per month:

cost:

  • OpenAI Embedded Model: Approximately $100
  • Pinecone hosting: Approximately $40
  • GPT-4 generation: approximately $30,000
  • Total cost: Approximately US$30,140 per month

Income: $150,000–$500,000 per month

Gross profit margin: 80–94%

Is this a bad business? No, the gross profit margin is very impressive.

But is it "self-developed AI"? No.

Pattern 3: The so-called "we fine-tuned our model" is actually...

Fine-tuning sounds impressive, and it can be useful in certain situations. But what I've found to be the following:

Only 7% of companies truly train models from scratch. Respect! I've seen their infrastructure:

  • Training tasks using AWS SageMaker or Google Vertex AI
  • The trained model files (model outputs) are stored in the S3 bucket.
  • Custom inference endpoints
  • GPU instance monitoring

Most other companies simply use OpenAI's tweaked API, which essentially means paying OpenAI to store their prompts and examples in their systems.

Learn to identify shell companies in 30 seconds

If you want to know whether what I'm saying is true or false, you don't actually need me to spend three weeks investigating; here's a quick way to tell:

Phenomenon 1: Network traffic

Open DevTools (F12), switch to the Network tab, and then interact with its AI features. If you see these requests:

  • api.openai.com
  • api.anthropic.com
  • api.cohere.ai

What you're seeing is a "shell company." They may have added a middleware layer, but the AI doesn't belong to them.

Phenomenon 2: Response Time Pattern

OpenAI's API has a unique latency characteristic. If each response takes between 200 and 350 milliseconds, it can be almost certainly identified as an OpenAI service.

Phenomenon 3: JavaScript bundled files

Open the webpage source code and search for the following keywords:

I discovered that 12 companies were storing their API keys in their front-end code. I reported them all, but none of them responded.

Phenomenon 4: Marketing Language Matrix

The pattern is obvious:

  • Specific technical terms = possibly true
  • Vague marketing terms = likely a cover-up

If they only use vague terms like "advanced AI" or "intelligent engine" without specific technical details, it usually means there's something fishy going on.

The true state of infrastructure

In reality, the technological landscape of AI startups is roughly as follows:

Why is this really important?

You might be thinking, "Who cares? As long as it works."

What you said is partly correct, but the matter is more important than it appears on the surface:

  • For investors: You're funding prompt engineering, not AI research. Valuation needs adjustment.
  • For the customer: you're paying API costs plus a hefty premium. In reality, you could probably build the same thing in a weekend.
  • For developers: the barrier to entry is lower than you think. That "AI startup" you admire? You could probably replicate its core technology at a hackathon.
  • For the entire ecosystem: when 73% of "AI companies" are exaggerating or misleading their technological capabilities, we are already in a bubble.

The wrapper pattern (because not all wrappers are bad).

The clever shell companies weren't lying; what they were actually doing was:

  • Domain-specific workflows
  • A better user experience
  • Clever model arrangement
  • Valuable data pipelines

They simply use OpenAI at the underlying level, which is fine.

Those 27% of companies that did it right

Let me highlight those companies that do things honestly:

Category 1: Transparent Shell Companies

The homepage directly states "Built on GPT-4". They are selling workflows, not AI itself. Examples include:

  • Legal document automation (GPT-4 + legal templates)
  • Customer service routing system (Claude + industry knowledge)
  • Content Workflow (Multi-model + Manual Review)

Category Two: True Builders

These companies are actually training models:

  • Healthcare AI (HIPAA-compliant self-hosted model)
  • Financial Analysis (Customized Risk Model)
  • Industrial Automation (Dedicated Computer Vision Model)

Category Three: Innovators

Companies that build truly new technologies on existing foundations:

  • Multi-model voting system improves accuracy
  • Custom agent framework with memory
  • New retrieval architecture

These companies will explain their architecture in detail during their promotional materials because they actually built it themselves.

What I learned (and what you should know)

After three weeks of reverse engineering AI startups, I've summarized the following points:

  • The technology stack itself isn't that important; the key is the problem it solves. Some of the best products I've found are "just" a shell. They have excellent user experiences, solve real problems, and are honest about their approach.
  • But honesty is important. The difference between a smart shell company and a fraudulent company is transparency.
  • The AI craze is creating the wrong incentives. Founders feel pressured to claim they've "developed their own AI" because investors and customers expect it. This needs to change.
  • Building on APIs is not shameful. Every iPhone app is a product of "wrapping the iOS API," and we don't care. What we care about is whether it works.

The real test: Can you make it yourself?

My evaluation framework is as follows:

  • If you can replicate their core technology within 48 hours, they are a shell company.
  • If they are honest about this, then there's no problem.
  • If they're lying—get away from them.

My actual advice

To the founder:

  • Honestly describe your technology stack
  • Competing on user experience, data, and industry knowledge
  • Don't claim you did something you didn't do.
  • "Built with GPT-4" is not a weakness

For investors:

  • Request to view architecture diagram
  • Request an API invoice (OpenAI invoices don't lie).
  • Reasonable assessment of shell companies
  • Reward Transparency

For customers:

  • Check network traffic (Network Tab)
  • Inquire about infrastructure details
  • Don't pay a 10x premium for API calls.
  • Evaluate based on results, not technical hype.

That thing no one dared to say openly

Most so-called "AI startups" are actually service companies that operate on API costs rather than employee costs.

There's no problem with that.

But it should be called by its original name.

What will happen next?

The era of AI repackaging is inevitable. We have experienced similar cycles in other fields:

  • Cloud infrastructure (every startup claims to have "built its own data center")
  • Mobile applications (everyone calls themselves "native," but they're actually hybrid development)
  • Blockchain (every company is developing "based on blockchain")

Ultimately, the market will mature. Honest developers will prevail, and scammers will be exposed.

And now, we are in the middle of this chaotic phase.

Final Thoughts

After reverse-engineering 200 AI startups, I'm actually more optimistic about the field than disappointed.

  • Those 27% of companies that are truly engaged in technology research and development are doing exceptionally well.
  • Clever shell companies are also solving real problems.
  • Even some companies that are misleading have good products; they just need to adjust their marketing.

But we need to make honesty about AI infrastructure the norm. Using OpenAI's API doesn't mean you're not a developer. Lying is what will make you lose credibility.

Make cool products, solve real problems, and use any effective tools you can. Just don't tout your prompt project as having a "proprietary neural network architecture."

My mental journey since the investigation began

At the end of his blog post, Teja Kusireddy also shared what happened after he started his investigation:

  • Week 1: I was naive to think that about 20-30% of companies used third-party APIs.
  • Week 2: A founder contacted him, asking, "How did I get into their production environment?" In reality, Teja Kusireddy hadn't actually entered; everything he saw was in the browser's web panel. These companies simply hadn't expected anyone to be looking.
  • Week 3: Two companies requested Teja Kusireddy to remove the discovered content.
  • Yesterday: A VC asked him if he could review their portfolio companies before the next board meeting, and Teja Kusireddy agreed.

Teja Kusireddy stated that he will later publicly share the methodology of his investigation, the complete crawling infrastructure, API fingerprinting technology, ready-to-run detection scripts, and response time patterns of various AI APIs on GitHub.

Over three weeks, Teja Kusireddy stated that the only conclusion he learned was that the market ultimately rewards transparency, even if it might initially punish it. He also revealed that after publishing this content:

Seven founders contacted him privately, some defensive, some grateful.

Three companies requested assistance in transforming their marketing efforts from "proprietary AI" to "development based on top-level APIs".

One founder told him, "I know we're lying, the investors want it this way, everyone's doing it. How do we stop?"

“The gold rush for AI won’t end, but the era of honesty must begin,” Teja Kusireddy said. “If you’re interested, open your DevTools, check the network panel, and verify it for yourself. The truth is right under F12.”

Source: https://pub.towardsai.net/i-reverse-engineered-200-ai-startups-73-are-lying-a8610acab0d3

This article is from the WeChat official account "CSDN" , authored by Teja Kusireddy, and published with authorization from 36Kr.

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments