Written by: Huang Shiliang
"Data is the new oil"—this phrase has been overused in the AI community. But in the mainstream narrative, it seems to have nothing to do with ordinary people like us—it's a capital game for tech giants, a contest of graphics cards and trillions of parameters.
But after thinking about it, I realized that this analogy is a very good compass for us to apply in AI.
I. A seriously misunderstood metaphor
"Data is the new oil"—this phrase has now become almost the bible of the AI era.
But to be honest, most people's first reaction to this statement is: This is a big company's business, what does it have to do with ordinary people like me?
Because in the mainstream narrative, the "data" they talk about is the entire internet, Wikipedia, and other petabyte-scale stuff; "oil refining technology" is tens of thousands of H100 graphics cards plus a group of scientists with annual salaries of millions; and the "final product" is the omniscient and omnipotent God model like GPT-5.
This logic is certainly sound in business, but the problem is that it's essentially saying: don't get involved, you can't get on the table.
We ordinary people are directly kicked out.
What's even darker is that there's another version of this story that makes me angrier the more I think about it:
Data is the new oil; our consumer data is Venezuela's oil field; and Meituan, Alibaba, and TikTok are like Trump in the United States.
They accidentally (or actually did it on purpose) came here, inserted pipes to extract oil, took our data for free, refined it into "98-octane gasoline" (precise algorithms, big data price discrimination), and then forced us to sell it back to them.
The result was that we became the suckers—not only did we contribute raw materials for free, but we were also sold out and had to help their platform count the money.
In this version of the story, the only players are the giants. We have neither massive amounts of data nor capital, and it's impossible for us to train a large model. So "data is the new oil" becomes a slogan that sounds impressive but is completely useless to individuals, and even a bit disgusting.
Second, try to understand it from a different perspective, and there's hope for the solution.
I think this consensus is problematic. We need to look at it from a different perspective.
If we insist on applying the concept of "data is the new oil" to ordinary people, then the question is no longer "is this analogy correct?", but rather: how exactly does this thing guide me in my work?
The oil industry is so powerful because it has a very clear and inescapable logical chain:
Find an oil field (exploration) → Build a refinery (processing) → Standardize the product (gasoline) → Build distribution channels (gas stations) → Sell to users.
For ordinary people like us, the "data oil" of the AI era must be disassembled meticulously according to these steps. If any link is missing, your AI anxiety will never be transformed into productivity, but will only become mental drain from "browsing news + saving links + watching others get rich".
Now, I'll break down how ordinary people should do things, following this logic.
Third, the first step: Where are the oil fields? — Find the "miniature gold mines" around you.
In traditional industries, you have to go to places like Saudi Arabia and Russia to find oil. But with our approach, the oil fields are practically right at your fingertips. I think there are at least two main categories.
1. Personal Private Data: Your Own Backyard
This is the most easily overlooked, yet most stable type of data. It doesn't need to be large in scale, but it has extremely high purity.
For example, your work process, the logic behind your decisions, the pitfalls you've encountered (failure reviews), and the unspoken rules you've learned over the years in the industry.
For example, your digital footprint: notes, codebases, drafts, emails, etc. written over the past ten years... all of these count.
The value of this lies in the fact that it belongs entirely to you. The "personal digital twin" or "domain expert agent" trained with this data cannot be replaced by any general-purpose large model.
If you haven't used a computer much in your work and life over the past 5 years and have relied solely on a mobile phone, you're unlikely to evolve into an AI producer and are destined to be an AI consumer.
If I really want to make money from AI, I think I need to buy a computer. Why?
Without computers, you're unlikely to have a systematic accumulation of data, making you a complete "oil-poor country." Don't expect to accomplish anything significant from the few pictures in your phone's album or the tens of gigabytes of voice messages and chatter in WeChat—too many impurities, too poorly structured, you can't refine up to 92-octane gasoline; at best, you'll be lucky to get 29-octane.
2. A treasure trove of public data: Assemble your "exploration team"
The second category is data that everyone can see, but 99% of people are just "consuming" rather than "exploring": X.com, WeChat official accounts, arXiv, YouTube... These are the "public sea" of the data age.
The internet, especially social media, is deteriorating too rapidly. I dare say that more than 50%, possibly more than 90%, of the content is AGRC (AI-generated rubbish content).
These people are using AI to mass-produce nonsense, directly polluting the geological formations. If you're not aware of this when you're doing geological exploration, you'll be digging up nothing but garbage.
Worse still: if you feed garbage to the brain or to AI, all that will come out is garbage, and it might even clog your oil refinery.
Therefore, to ensure you don't dig up AGRC, I suggest you create a rigorously selected **"Inspiration Source Combination".** But note: simply looking at it isn't enough; that's called hoarding crude oil. You need to learn crude oil refining**—process each source using AI to transform it into machine-readable fuel:
Deep sedimentary rocks (books): These are the ballast. Create a yearly reading list, including professional classics and literature.
The AI-assisted approach: Don't just read mindlessly. Always use Gemini or ChatGPT to assist your reading. After finishing a chapter, give it a discussion and let it generate thought-provoking questions. After reading, you must create electronic reading notes and feed them to the AI; this is how you build your knowledge base.
Frontier Exploration Area (Papers and Reports): Spend your free time browsing arXiv or Google Scholar. Hold a weekly "paper luncheon" and force yourself to finish one paper.
AI-powered approach: Can't get through raw text? Just dump the PDF on NotebookLM or ChatGPT and let it summarize the core arguments and data for you, turning the "hard bones" into "broth" for you to save.
Surface runoff (news): I use RSS or a customized news feed. I scan the headlines when reading news, and only save the truly impressive ones.
AI-powered approach: Don't just save links. Copy the content and let AI help you tag it, extract keywords, and categorize it for storage in your note-taking app. Otherwise, it'll just gather dust and be forgotten.
Associated activities (podcasts and lectures): Listening to TED Radio Hour on my commute. Forcing myself to attend one or two offline salons each month.
AI-powered approach: When you hear good ideas, don't just nod. Use Whisper to transcribe the recording into text, then let AI organize it into structured notes. Audio cannot be retrieved, but text can.
High-yield wells (social media): Follow a group of genuine experts on Twitter/X. Regularly clean up your following list and unfollow those posting spam or negative content.
AI integration method: When you see a brilliant thread, simply copy it and feed it to the AI, letting it analyze where the person's logical flaws are, or integrate its viewpoints into your knowledge system.
Field research (life observation, field investigation): Deliberately practice "looking at life with questions in mind." This is the kind of intuitive data that AI web crawlers can't crawl at all.
AI-powered approach: When inspiration strikes, don't type; just speak, then let AI organize it into a diary. Let AI help you transform your random thoughts into logical insights.
We must develop the habit of picking up our phones and spouting nonsense at any time.
These six sources are your "mixed oil fields." Only when your inputs are wild and diverse enough, and have all undergone preliminary processing by AI, will what you refine avoid being clichéd.
Fourth, the second step: Where is the oil refining equipment? — Don't just focus on the large model.
Having found the oil, the next step is to refine it. Mainstream media keeps urging you to buy a graphics card, but for an individual, the real refinery is your own software stack plus your thought process.
1. The large model is just a "boiler".
Getting a ChatGPT Plus membership won't make us any better. It's like buying a boiler and standing next to it watching it shine—but you don't actually use it.
Large models like ChatGPT and DeepSeek are essentially basic power units, the foundation. They can burn, but that doesn't mean they can produce oil.
2. The real oil refinery is a "personal tool system".
An efficient personal oil refinery needs these components:
Pipeline (toolchain): VS Code, Python, Skills, and the like.
Process flow (methodology): This is the core barrier. It's about how you write a Prompt, how you build a RAG knowledge base, and how you get several agents (skills) to work together.
The key point is never "how powerful the model is", but rather: how you interact with AI, and how you translate the implicit experience in your mind into instructions that AI can understand.
This "personal engineering system" is your refinery, not the model itself.
Fifth, the third step: The product is not the end goal; selling it is the real battle.
This is the most brutal link in the entire chain. PetroChina only needs to transport the oil to gas stations, and drivers will naturally line up. But in the AI era, productization and sales are incredibly difficult.
1. The "gasoline" refined by AI is extremely non-standard.
What you create using "personal data" + "large models" is most likely not General Motors gasoline, but rather:
- A Python script that only you can use.
- An article with a unique style
- An AI-processed report after a doctor's examination.
- A personalized set of legal advice
These things are not universal, not standardized, and are particularly dependent on the scenario.
2. The real big question: Who to sell to?
So before you start, you have to ask yourself this question: Who are you going to sell this stuff to? This is actually a way of figuring out what kind of oil we're going to refine.
Selling to yourself (for personal use): Saving time is making money, and this is the easiest closed loop to achieve.
Selling to businesses (B2B): Package your prompt or workflow into a solution. This requires extremely strong pre-sales skills (persuasive abilities).
Selling to the general public (B2C): Turn it into an app or a content column. This depends on your ability to distribute traffic.
In fact, in the AI era, refining oil (generating content) is becoming increasingly easy, but building gas stations (distribution and sales) is more difficult than ever before.
6. Don't forget to protect the environment: Don't let waste bury you.
Traditional oil refining produces waste residue, wastewater, and waste gas. If you don't treat these, the refinery won't even make a profit before the people are killed by the fumes.
The same applies to data refining; the **"cyber pollution"** is extremely serious and requires a **"environmental protection department" to clean it up regularly.
1. Clean up expired "tool scraps".
The speed at which AI is evolving is freaking fast, ridiculously fast.
The "Top 10 AI Navigation Sites You Must Use in 2025" list you bookmarked last month might see five of them go out of business this week; the AI drawing parameters you're working hard on today might be rendered obsolete by "one-click generation" tomorrow.
Don't be a "cyber scavenger," hoarding a bunch of outdated tools you can't bear to throw away. Uninstall what needs uninstalling, unfollow what needs unfollowing. Tools are meant to be used, not worshipped.
Hoarding outdated tools is like filling your house with rusty scrap metal; it will only slow you down.
2. Discard the "empty shell" of data that has been squeezed dry.
Many people suffer from "squirrel syndrome": they download PDFs whenever they see them, collect videos whenever they see them, and fill their hard drives with several terabytes of data, feeling like they own the whole world.
That's not knowledge, that's landfill garbage.
The truly environmentally friendly approach is to use AI to extract the "essence" from PDFs, videos, and long articles—generating summaries, extracting key quotes, and transforming them into your notes.
Once you've squeezed the files dry, discard the original files (or archive them to cold storage). Your attention is an extremely expensive and limited resource; don't let these raw files consume your bandwidth.
A highly efficient oil refinery should only retain the refined fuel and discard the crude oil shell.
3. Cut off those "vampire zombie bills"
AI anxiety has led us to do many foolish things, the most foolish of which is rushing to spend money to buy a sense of security.
Enrolling in classes, buying courses, attending events, purchasing Plus memberships... all these costs are not low. What's worse, once you subscribe to many things (the kind that charge monthly fees), you often forget to cancel.
I bought a server for testing before, and it's been at least three years. Every month, it silently deducts a sum of money from my account, hidden among a bunch of bills. I had no idea—it was only used on the day of testing.
I also impulsively bought a whole bunch of auto-renewing subscriptions for ChatGPT, Gemini, Claude, Perplexity… and some APIs too. And what happened? Most of the time they just sit there collecting dust.
Damn, what a waste!
These are all things that "environmental protection" must address. Otherwise, before you can even refine oil that you can sell, your resources will be completely depleted by this pollution.
VII. Finally, a few words: A map of action
When we strip away the grand facade of "data is the new oil," it is no longer an unattainable capitalist story, but a stark roadmap for ordinary people.
In this era, if you want to win, quickly check your "balance sheet":
- Storage: Are you still scrolling through TikTok? Or are you already consciously accumulating high-quality data through "Inspiration Sources" + AI assistance? (Remember to avoid AGRC spam)
- Production capacity: Do you have your own set of tools and methodologies (for an oil refinery), and what kind of oil do you refine?
- Channels: Have you thought it through? Who exactly are you planning to sell these non-standard products you've refined to? This can be used to determine production capacity, whether you're refining 92-octane or 98-octane oil.
- Environmental concerns: Have you accumulated a lot of digital junk? Have you checked your credit card statements and unsubscribed from those zombie subscriptions?
Finally, here's a piece of advice: forget about those news stories with billions of parameters. Start today—buy a computer, establish your "source of inspiration," drill your first micro-oil well, sell it to yourself first, and develop an automated tool that solidifies your work into one where AI is the primary driver and you are the secondary driver.
Honestly, I'm quite confused myself. I've been tinkering with AI for over three years, and I haven't achieved anything significant. I've only managed to create an AI to manage my to-do list and a AI to manage my reading notes. I'm still wondering what else I can develop.




