Original title: " CZ invests $11 million in a Chinese junior college student's seed round to become an education agent ."
Original author: Founder Park, an entrepreneur community under GeekPark
A Chinese junior in college raised $11 million in seed funding, making it the highest-funded student startup in Silicon Valley to date.
VideoTutor, an education agent product for K-12 schools that allows users to generate personalized teaching/explanation videos with just one sentence, announced today that it has completed an $11 million seed round of financing. This round was led by YZi Labs, with participation from Baidu Ventures, Jinqiu Fund, Amino Capital, BridgeOne Capital, and several other well-known investors.
This is also the first AI product company invested in by YZi Labs.
Founder Kai Zhao stated that VideoTutor received recognition and support from the investment teams of CZ and YZi Labs, with YZi Labs ultimately leading this round of financing. They received more than 10 term sheets (TS) and ultimately selected these few.
The first version of the product was launched on May 14 (first launched on the Founder Park product marketplace), which was recognized by the market and validated by PMF. In less than 5 months, it completed this $11 million seed round of financing.
In Kai's view, the core reason they were able to secure this funding is that, given the right direction, the "Little Genius Team" solved the pain points of American college entrance exam preparation in the K-12 sector by using visual learning methods.
"This field is more suitable for young people, coupled with excellent engineering skills, and the founder has very good insight and experience, and the execution is very fast."
It's not just them; Cursor, Mercor, Pika, GPTZero, and many other Silicon Valley college students are redefining everyone's understanding of AI entrepreneurship with AI products that are setting new funding records one after another.
Starting a business in the AI era is truly different.
We talked to the young people at VideoTutor to find out why they were able to secure this seed funding round, what changes are happening in Silicon Valley startups today, and why they are so keen to hire employees from major Chinese companies.
Interviewees: CEO Kai Zhao, CTO James Zhan.
Interview & Editing | Wan Hu
The following is the interview transcript, edited and compiled by Founder Park.

In the K-12 education sector, visual learning is the true direction.
Founder Park: So many organizations are optimistic about you. In your opinion, what is the key point that impressed them?
Kai: I think the first thing is that we're on the right track. The AI education sector has great potential and a bright future. We've entered the education field of the American college entrance exams, specifically the SAT and AP exams. Our target user group is K-12 high school students, and the gap between us and this user group is very small, basically no generation gap. We've gone through the entire test preparation and learning cycle, we know where the pain points of the exams and test preparation are, and we can create a product that truly solves these pain points for this group.
Secondly, the team is outstanding. James comes from Gemini and was a core AI engineer and algorithm specialist at Google. I myself have three educational startup experiences, starting with educational software during my freshman year of college, and participating in the creation of MathGPTPro during my sophomore year, with the project being selected for the Miracle Innovation Forum, among others. I have experience in successfully developing educational products.
Thirdly, in the field of AI education, the core is the animation engine, and we are the core developers of VideoTutor. We are the team that understands the core technology best and can make the animation engine render very accurately.
The team itself has a very good marketing background and knows how to do communication.
VideoTutor perfectly aligns with a consensus among mainstream US VCs: the "young genius team." This refers to a field well-suited for young people, coupled with strong engineering skills, a founder with excellent insight and experience, and rapid execution. I believe this is a shared reason why all investors are optimistic about it.

VideoTutor debuts on the NYSE at YZi Labs EASY Residency Demo Day
Founder Park: What core problem in the education industry does your product aim to solve?
Kai: The learning products currently on the market can be divided into two categories: active learning products and passive learning products. Passive learning products, such as ByteDance's Gauth, Chegg, and AnswersAi, cover what we call "homework help" scenarios. The learning path is very short, and students mainly pay to do homework help.
VideoTutor, on the other hand, covers active learning scenarios. We don't need to consider students' learning motivation because they have to study and take exams, such as the SAT and AP exams in the United States. In this scenario, there are many pain points related to visualization. 80% of the content in the SAT involves knowledge such as functions and calculus that require complex image rendering. VideoTutor's animation engine can solve this scenario very well.

Moreover, the average transaction value in this field is very high. An average of 2.6 million students in the US take the SAT exam each year, creating a significant demand for paid services. In-person SAT courses are expensive, charged by the hour rather than in packages, starting at an average of $150 per hour, with most costing around $230. Many students and parents pay for these courses. However, VideoTutor can effectively transfer or even replace teacher training because, currently, AI-generated videos are almost indistinguishable from teacher training content. This allows students to have their own personalized AI test preparation tutor at a minimal cost.
Founder Park: What prompted you to decide to make this product?
Kai: Actually, before us, a team at Stanford had already done this, called Gatekeep AI. They were also trying to do visual learning. I had already realized the impact of this direction at that time. In the previous startups, the educational products that everyone made were basically connected to the GPT API, similar to a ChatGPT wrapper product. But we found that these products based solely on text-based question and answer have a ceiling. We can see that the business of companies like Chegg and Gauth is declining, and a large part of the scenarios have been replaced by ChatGPT, because students can solve many homework problems by paying $20 to use ChatGPT.
Products that rely on API wrappers and optimizations have reached their limit.
However, multimodal visual generation holds immense potential because there are numerous visualization learning scenarios in the US college entrance exam field. Unfortunately, Gatekeep made a good start but didn't continue, as it launched a bit too early; the basic model programming capabilities weren't mature enough, and GPT-4 hadn't been released yet. Furthermore, mathematical animation engines involve rendering and algorithms, which they couldn't overcome. But our team mastered all the core development aspects of the animation engine, solved this problem, and achieved highly accurate video rendering.
PMF: Users have a strong willingness to pay.
Founder Park: After your product was launched, you also reached cooperation agreements with several schools. In your opinion, when or which feature made you feel, "I've done the right thing with this product, and I've found the right pain point," and you felt that you had found PMF (Product-Market Fit)?
Kai: It can be explained from three dimensions.
First, from a revenue perspective, VideoTutor has received API requests from 1,000 companies to date, including all the well-known large educational institutions in the United States, and even some domestic institutions. In addition, many schools want to purchase our services. Consumer-facing (C-end) users are more direct in their intentions. One parent, who is also an investor, tried the product and gave it to all his friends and family to try, and everyone was willing to pay. Then, he somehow got my phone number and texted me wanting to invest in us. C-end users have a very strong willingness to pay.
Secondly, from the perspective of user needs. Why is one-on-one tutoring so essential in the US? Because parents believe one-on-one teaching is effective and are willing to pay the price. Now, multimodal AI technology can achieve a human-like one-on-one teaching effect, answering questions directly. Moreover, the video lessons recorded by online one-on-one tutors in the US are practically indistinguishable from AI-generated videos. This is what I mean by "demand transfer": students pay a high price for pre-recorded courses that are indistinguishable from AI-generated ones, so why not use AI? Lower costs and better teaching results.
We received a lot of very positive feedback from students, and many teachers were also willing to spread the word about the product. The completion rate and usage time in the early stages were particularly good. The 200 seed users we have selected now are all from the early stages.
Thirdly, there's the product's taste and sense. When you continuously work backward from the progress of the entire education industry, to the core needs of students and parents who want to pay, and then to the evolution of the product itself, the whole logic is a closed loop. So, from these three dimensions, you'll feel that PMF (Product-Market Fit) is sufficient. The most crucial element is an extremely strong willingness to pay.

A partnership was reached with FIZZ
Founder Park: Many users are proactively offering to pay, and some are even contacting you wanting to invest.
Kai: Yes. The willingness to pay is inherently strong in the SAT and AP fields. The average order value in this area starts at $100 to $200, and offline classes are even more expensive, potentially reaching $800. There are 2.6 million students in the US taking the SAT, and 37% of them will actively pay for it. This is a market with very strong willingness and demand. Our product can effectively bridge this demand gap.
Founder Park: For test takers on the SAT, given the choice between a human tutor and an AI, will they trust the AI?
Kai: Now, AI is rarely wrong in answering questions at the level of the American SAT and AP exams. In this case, why is it better than offline tutors? First, it's cheaper, and second, students can ask any questions they want without worrying about the teacher's opinion or impatience for asking silly questions. They can learn anytime, anywhere, 24 hours a day.
Moreover, this market is transferable. After completing the US market, we can also transfer it to A-Level exams in Canada and the UK, etc., where the demand for paid services is very high.
Founder Park: What are your current thoughts on the paid aspect?
Kai: We offer a monthly subscription option, and another option is pay-per-learning. I think AI is already capable of pay-per-results. We might launch a package where, for example, you pay $799 and we guarantee your child a perfect score on the SAT Math.
Founder Park: But if we pay based on exam results, doesn't that also depend on the student's individual initiative?
Kai: This might not be possible with the Chinese college entrance exam (Gaokao) because it covers a vast number of topics, thousands. However, the American SAT only has 62 topics, 50 of which are standard and most students have no problem with them. The remaining 12 are generally easy to master as well. Unless a student has a genuine problem with their logic, there's virtually no chance they won't be able to learn it. Furthermore, AI's efficiency-enhancing effects are very significant.
Many online tutoring services in the US offer this service. You pay a tutor $1800, and the success rate is almost 100% because the SAT test points are fixed. As long as the student's IQ is normal, there shouldn't be any problems. However, this doesn't apply to the Chinese college entrance exam (Gaokao), which cannot be improved in a short period. Furthermore, the Gaokao in China requires a significant score difference and includes challenging questions, while the American SAT doesn't have absolutely difficult questions because it tests your understanding of the material.
Pay-per-result is a model that teaching assistants have already been using, so it has this prerequisite.
Founder Park: Is model cost a factor in your pricing? Does it represent a large percentage?
Kai: The average order value in our field is very high, starting at $69 per month. Model costs are currently very low, so it's not a problem. The education industry is unlike the coding field, where everyone is competing on price, because coding requires supporting very long contexts.
For products targeting high school students, the web version is the most important.
Founder Park: I remember you mentioned last time that your first prototype only took about two months to develop. How did you plan to handle the entire development cycle, including the division of labor, deciding which features to include and which not to include?
Kai: The consensus of our entire team is that iterations must be fast, because only by being fast can we get feedback from early users quickly.
The first version caused a huge sensation after being released on Twitter, attracting a large number of users. However, many of these users were programmers, investors, or tech enthusiasts—we can collectively call them "early adopters." At that stage, the feedback from them was scattered and not very valuable. It was necessary to sift through this broad user base to identify the truly core seed users—high-quality high school students—and then obtain useful feedback through consultation.
The core feedback we received was that video rendering accuracy must reach 100%; this was the most crucial area for optimization. Features like UI aesthetics or support for different TTS (Text-to-Speech) audio-visual selections were eliminated. Returning to the core of the product: we're working on knowledge learning for science-related scenarios, so the accuracy of graphics rendering is paramount.
Founder Park: How was the generation time decided at the time?
Kai: At that time, the peak duration was about 6 minutes. The main consideration was that the explanation of ordinary questions and knowledge points should not exceed 6 minutes. However, in subsequent feedback, we found that some students with weaker learning abilities hoped that the content would be explained more slowly and in greater depth. We realized that the duration should not be limited, but rather depend more on the user's learning ability.
Founder Park: What's the longest you can stay now?
Kai: It should take no longer than an hour, and we can keep asking questions until we get to the bottom of things. It generates data in real-time during the conversation, but this feature was recently added; it wasn't in the initial versions.
Founder Park: Were there any features you initially considered but later decided against because you realized they weren't that important?
Kai: For example, apps. At the time, we thought we should develop an app quickly, but we later found that most students in the US use laptops or iPads for learning. Most K-12 schools in the US provide students with a Chromebook computer. Computers are very common, and students complete their homework on computers. High school students basically each have a computer, and mobile phones account for less than 5% of learning scenarios, which is a very low percentage.
Founder Park: So if it's a product that mainly targets education or students, the web version is the first thing to do, while the app is not so important.
Kai: Yes, we already knew this data at the time, after all, I had studied in the United States for many years. Later, we conducted a survey of 100 students from the initial tens of thousands of users, and more than 90 of these students had computers, so we were even more convinced of this.
Founder Park: When you launched your first version, were you also targeting the K-12 demographic?
Kai: Yes, that's what we were targeting afterward. We're not exactly competitors with Gauth; we focus more on test preparation. Many high school students in the US already choose either in-person tutoring or online learning platforms, and VideoTutor has successfully bridged that demand.
Founder Park: Will K12 be your core user group for at least a year?
Kai: This should be a core indicator within two years.
Use large models, but don't rely solely on large models.
Founder Park: Could you briefly introduce your current technical implementation? VideoTutor does indeed perform much better than other video generation models in generating courses and charts. In fact, your technology is quite surprising, especially when many models cannot even accurately generate text.
James: The videos we generate contain both text and images. The general production process is as follows: a large language model generates text and corresponding animation instructions, then the animation instructions are rendered by our animation engine, and finally presented on the video.
The text portion is relatively simple; we use a large language model to generate the text and then render it directly. However, the animation portion is generated by our own mathematical animation rendering engine. Its advantage lies in its extremely high accuracy in rendering coordinate axes, geometric shapes, and other elements—this is precisely our core technology.
Current large language models only output text. Our agent is like giving the large language model a piece of paper and a pen, allowing it to draw the teaching animations it envisions. The part that is drawn is entirely our technology.
Founder Park: How was the final compositing of the entire video, including audio and video, processed?
James: Initially, the user provides a prompt, such as "What is the Pythagorean theorem?". The first step is to have the large language model infer all scenarios, typically 3 to 5, depending on the difficulty of the question. Then, the model generates a rough script for each scenario. Next, based on the script for each scenario, a second inference is performed to generate the text, corresponding images, and human voice text for that scenario. The human voice text is then synthesized using TTS (Text-to-Speech).
Finally, we pieced all the scenes together to create a complete video.
Founder Park: I understand that was the plan in the first version. Now that an interactive process has been added, has the generation process changed as well?
James: Yes, there have been changes. To ensure users see the content as quickly as possible, we now generate the first scene for them to view, while subsequent scenes are rendered in the background. When a user asks a question, we convert their voice into text, and then pass this text, along with the content from all previous scenes, to the large language model for reasoning, allowing it to plan the next teaching scenario. The rendering process for subsequent scenes is the same as before.
Founder Park: If a user has a question after listening for one minute, they will ask it directly. After you receive the question, you send it back to the model for processing along with the previously covered content. During this process, after the user asks their question, does the animation continue playing or stop?
James: We've reduced the latency from 20-30 seconds initially to under 5 seconds. In terms of interaction, we've implemented transitions to minimize user focus on those 5 seconds, ensuring a smooth flow throughout the process. Within 4-5 seconds, they'll see the newly presented content tailored to their question.
The current design involves the AI teacher saying, "Hmm, let me think about it," and then erasing the blackboard, just like a real teacher. If you feel there's a problem with what was said, then I'll erase it and rewrite it for you. This process feels more natural.
Furthermore, we don't just passively wait for users to ask questions; we also conduct quizzes during the process. We use the feedback from the quizzes and the user's questions to make inferences. Also, our microphones aren't completely unrestricted; users need to actively turn them on and off.
Founder Park: So based on this mechanism, the longest possible explanation can be generated is about an hour.
James: To be precise, there is no limit. If he keeps having questions, he can keep asking.
Kai: Yes, there are no pre-set limitations. Actually, VideoTutor's approach is driven by the advancements in multimodal AI. We're not creating demand, but rather better fulfilling existing needs. Look at offline, live education; why are American parents willing to pay such high prices? Because the American education and training industry primarily offers one-on-one instruction, starting at $100 per hour. This is because offline teachers can provide guided questioning; they can observe where you don't understand and then ask you further questions. VideoTutor strives to achieve this kind of real-teacher learning effect, enabling each child to interact and learn in real time.
Founder Park: Are students required to turn on their cameras during class?
Kai: Not really. Whether students turn on their cameras depends mainly on US privacy laws. The product won't have a feature to force them to turn on; it depends on the student's preference. The main interaction will be through asking questions and voice feedback.
Founder Park: Technically, do you use a strategy of combining small models with large cloud models, or what?
Kai: It's a collaborative effort. We have an internal dataset with over 100,000 video samples. The better ones from this dataset are manually annotated and then used to train and fine-tune models. For example, we currently have over 8,000 SAT sample training data. These fine-tuned mini-models are then used in conjunction with general-purpose commercial models in the cloud, such as Claude and Gemini.
Founder Park: Will using Claude, Gemini, or GPT affect the core performance of the product?
Kai: We primarily deal with the K-12 field, and the level of our basic models is already sufficient. However, to ensure 100% accuracy, we use two models for simultaneous verification. If the answers from both models match, then there are basically no errors. For code generation, we mainly use Claude, as its coding capabilities are quite good.
Founder Park: What are the current technical bottlenecks in the product? Is it modeling capabilities or code generation?
Kai: Model capability is one aspect. Another is rendering; we've already broken through to under 5 seconds, and with more GPUs deployed, it can be even faster. Another is long-term memory capability. We need to accumulate long-term learning behavior data on students to know which knowledge points they don't understand. For example, if they've forgotten something they learned a month ago, we can remind them again.
James: We've actually put a lot of effort into improving rendering time, constantly making technological breakthroughs, from 2 minutes at the beginning to 1 minute, and now to less than 10 seconds. Our ultimate goal is to achieve rendering with virtually no latency, so that when a user asks a question, the result is immediately available after the inference is completed. This is a challenge our team is currently working on, but we've found a new direction.
Ignoring completion rates, focusing only on final exam scores
Founder Park: How do you measure the core metrics of a product at this stage? How do you determine if a video is useful to users?
Kai: The most crucial metric is the exam. In the new version, after watching the video, there will be a quiz at the end. If you get it right, it proves you understand; if you don't, it proves the explanation wasn't clear.
Learning effectiveness cannot be judged solely by completion rates; some students may understand the material halfway through. If they take a test halfway through, and pass, they don't need to watch the rest. Our product's core metric is how many students improved their scores.
Founder Park: But his final exam was taken in a different setting. How did you find out whether he passed or not?
Kai: This brings us to the product culture in the US, where users spontaneously share their positive experiences and results after using a product. Many students who have used VideoTutor and taken the SAT will proactively share their experiences and scores. We also make them campus ambassadors to spread the word further.
We have 20 high school students serving as campus ambassadors. Actually, you see, Mercor was very successful in its early days, using a typical "user success story" model. Mercor helped many Indian programmers find jobs in the US, and then they would contact these users, create a user story for them, and explain how they found jobs using Mercor. This generated excellent word-of-mouth marketing. VideoTutor works on the same principle; we want more students to use the product and achieve great results, and then we'll create user stories from these students and share them.
Founder Park: Where do students primarily share their experiences?
Kai: Students are mainly on TikTok, and parents are in Facebook groups.
Founder Park: If we look at a timeframe of six months or a year, what are your plans for product growth?
Kai: I think that essentially, VideoTutor is still a consumer-facing (C-end) product, and word-of-mouth marketing is extremely important. Many successful AI applications relied on word-of-mouth from early adopters; for example, designers used it and liked it, so it spread. For us, the core metric is how many SAT test takers used the product and scored high, and then spread the word to other children and parents. Parents mainly use Facebook and Instagram, and students use TikTok; we spread the word on these platforms. When this kind of consensus-based word-of-mouth is formed, school teachers will naturally become aware of it. The reason so many schools knew about us early on was because many teachers used it, liked it, and recommended it to their school's purchasing managers. So, the most crucial factor is still word-of-mouth marketing from C-end users; the key metric is how many children improved their scores after using it.
Founder Park: What is the general status of the new version and the timeline for its release?
Kai: We hope to release it publicly within two months at the earliest. At that time, students will be able to get answers to questions with very low latency, and the graphics rendering for science scenarios will be 100% accurate. Of course, we will not cover competition scenarios or complex university knowledge like linear algebra for the time being; we will focus more on the K-12 field.
Founder Park: What are the current barriers or moats of VideoTutor?
Kai: I think there are a few points. First is the data flywheel. Behind every video is code. Good video data generated by users, after secondary annotation, can be used to retrain and fine-tune the model. The more data, the better the video quality. Another point is learning behavior data. We know which knowledge points different students are weak in, which allows us to build a data flywheel. The more people use it, the better the product understands the students. Second is leading technological advantages, such as the animation engine algorithm. Although the algorithm itself isn't the most core advantage, as we iterate rapidly and collect more and more data, the advantage will become more apparent.
Thirdly, there's the brand. VideoTutor has already become a leading brand in the AI education field among parents in North America, and parents' trust is also an invisible barrier.
Founder Park: What kind of product do you expect VideoTutor to become in three to five years?
Kai: We hope that VideoTutor will become an AI teacher for everyone learning STEM subjects in the future. We only focus on STEM. I think it will surpass Duolingo in the future. Duolingo is a world-class language learning product, but in the STEM field, there hasn't been a world-class product in the past because STEM requires a lot of graphics rendering. Now the basic model technology is ready, so I think the next "Duolingo" will emerge in the STEM field.
We are hiring, especially looking for people from major domestic companies.
Founder Park: You've had several entrepreneurial experiences before. What were they all about?
Kai: I'm a junior in college now. During my freshman year, James and I started an educational product startup and raised $200,000 in angel investment. Although it failed, we learned a valuable lesson: you can't get caught up in homogeneous competition. At the time, our app faced many similar products on the market, forcing us into early-stage competition for user acquisition, making it difficult to monetize.
For my second startup, I joined MathGPTPro as a co-founder and stayed for a few months. During that time, I learned how to analyze product metrics, how to build a product, and how to expand my user base. It was also then that I came to the conclusion that text-based, answer-oriented educational products had reached their limit. This was because they were no different from ChatGPT, and the structured knowledge question banks that companies like Zuoyebang had invested heavily in were being replaced by large-scale model editing capabilities. Therefore, for my third startup, I knew that visualization was an inevitable trend.

A photo of Zhao Kai with Sam Altman at Harvard University.
Founder Park: Besides making you realize the limitations of text-based products, how have your two past experiences helped you in your current work at VideoTutor, in terms of teamwork or other aspects?
Kai: It's very helpful.
First, it helps me better determine the direction and future potential of my product. I analyze competitors' website traffic and revenue to assess the overall evolution of the product.
Secondly, in terms of product development, it can better judge the product development pace, including product design, front-end and back-end integration, and which indicators to look at.
Thirdly, team management and organizational culture capabilities. I established a more comprehensive management system, including the division of labor, rewards, and stock option issuance for each team member. Also, I learned how to raise funds. We completed this round of $10 million in funding within 20 days.
Founder Park: How many people are on your team right now?
Kai: There are 6 people living together.
Founder Park: How was the team initially set up?
Kai: James and I have already started two businesses. We both graduated from the same university, and we made an app together in our freshman year. In our sophomore year, I started a business with two other people, and we all knew each other. When we realized that this technology could bring a very big product vision, we contacted each other to form a team to develop this product. We were all alumni, including Nick, another partner on the team, who was also my college roommate.
Founder Park: You're planning to expand your recruitment now. What kind of people are you looking to hire?
Kai: We are mainly recruiting for backend, frontend, large language model, and UI/UX positions, and we prefer experienced individuals. This is because we have now moved beyond the trial-and-error phase and entered a phase of rapid product builds, requiring experienced people to help us grow.
Founder Park: We need experienced engineers, product managers, and growth leaders to grow the product from 1 to 10, or even from 10 to 100.
Kai: Yes, that's the stage. We expect to expand the team to 9 to 10 people, with hiring engineers as the primary priority.
This recruitment will likely be conducted domestically, so it will be a hybrid of in-person and remote recruitment.
Founder Park: What kind of portrait do you hope this person will have?
Kai: We prefer someone with experience at large companies like ByteDance or Meituan. ByteDance has a fast-paced, dynamic organizational culture that values young people. People trained at ByteDance have strong methodologies and skills, and joining us allows them to bring their successful experiences and integrate them into their learning.
We're looking for people with experience in high-pressure, rapid-iteration strategies from top domestic companies. We've moved past the student startup phase and don't need to hire complete novices anymore. We need more experienced hires, but not absolute industry veterans. Industry veterans might have family responsibilities and can't be as driven. So, someone in the middle – young and driven – is ideal.
We're willing to offer generous stock options to top talent. Although we've raised $11 million, why didn't we hire engineers in the US? It's because we believe China's product development and engineering capabilities are truly exceptional. This wave will almost certainly see Chinese-run teams creating great products that will succeed internationally. Many AI applications are currently being developed by Chinese; China's engineering capabilities are truly impressive. This is also our advantage—we need to leverage the strengths of both China and the US.
College students in Silicon Valley are all starting AI businesses.
Founder Park: The trend of college student entrepreneurship is particularly evident now, especially in Silicon Valley. What kind of situation do you see?
Kai: Let's look at a fact: take the companies with valuations in this round of funding, for example: Mercor, which focuses on AI-powered recruitment, has already completed a new round of financing of over $300 million, bringing its valuation to over $10 billion; while Cursor is already a foregone conclusion with a valuation of $10 billion. There are also companies like GPTZero and Pika, among others. These are all university student startups, especially Cursor and Mercor, whose founders were both college dropouts in their junior year.
This wave of young entrepreneurs shares a common characteristic: highly differentiated competition. They focus on doing things in extremely narrow fields, avoiding generic approaches. For example, Mercor, which focuses on AI recruitment, initially only recruited Indian programmers.
The second point is the environment. The capital environment and underlying innovation of Silicon Valley, such as Stanford, Y Combinator, and Peter Thiel's fund, support college student entrepreneurship in the earliest stages. They are willing to support you regardless of whether you have a mature idea or not, and provide a strong network of contacts.
Thirdly, I think it's the qualities of these college students. Whether it's us or these college students from Silicon Valley, they all possess a very courageous spirit of adventure and extremely strong learning abilities. This adventurous spirit is something many students in China may lack. Because in Silicon Valley, there are many inspiring examples of successful peers around you, and the investment environment is more willing to believe in young people.
For me, I also weighed the costs and benefits at the time. If I chose to finish university and then find a job, I might not be able to repay my family for the cost of studying abroad, nor would I necessarily see a significant return on my investment. But if I chose to start a business, I could learn like crazy while I was young, and my life would have endless possibilities. I've wanted to create a great company since I was a child.
Founder Park: Why is it that today's generation of college students can build companies worth tens of billions of dollars, while in the past, selling for ten or twenty million dollars was considered a great achievement? Is there an AI boom or bubble factor involved?
Kai: I don't think it's entirely a bubble. Cursor has $450 million in real revenue, which is very reliable. Behind this is the crucial methodology and insight of this generation of young teams. Look at these teams, they all have excellent backgrounds, and they have very good learning abilities.
Cursor initially relied on its team of university student programmers, who were highly receptive to AI and provided strong feedback. The founder himself was also a gifted engineer with a deep understanding of users and strong engineering iteration capabilities; the product was built from scratch with just four people in the early days. After iterating the product well, they built a reputation among users, generated revenue, and investors, fearing they might miss out on the next Mark Zuckerberg, poured in capital.
The most fundamental condition is that many technologies in this wave of AI are new, and young people learn quickly, are pragmatic, reliable, and daring. Therefore, they have an extreme understanding of users and an ultra-fast iteration speed to defeat traditional products. For example, before Cursor, GitHub Copilot was also quite good, but why didn't it beat Cursor? It's because of user experience and execution speed.
Founder Park: Could it be said that because AI is a new technology, many product perceptions also need to be viewed from a new perspective?
Kai: Yes, the younger generation of entrepreneurs has a deeper understanding and insight than the previous generation, and they are closer to users. The mainstream AI users now are all born after 2000, and their learning and feedback iteration speed and tolerance are faster than the previous generation of entrepreneurs.
Therefore, the speed of cognitive iteration is key. In the mobile internet era, technological iteration was measured in years or quarters, but in the AI era, it may be measured in days. As a founder, you must learn quickly, and young people are more likely to stay up late and have more drive.
Founder Park: Some media outlets have reported that many founders in Silicon Valley have started working 996 (9 am to 9 pm, 6 days a week). What are your thoughts on this?
Kai: Some of my white entrepreneur friends have raised a lot of money and they also work 996 (9am-9pm, 6 days a week). They, like us, rent a big house where everyone lives and works together. I think 996 is more of a forced choice due to the environment. Silicon Valley is a bit like the gold rush right now; nobody wants to fall behind, so the only competition is the speed of product iteration, which requires working late into the night to iterate quickly. It's an environment that shapes things, forcing people to do this.
Founder Park: What are the trends in the choice of business sectors among college student entrepreneurs in Silicon Valley?
Kai: I think there's a trend, whether we're in education or anyone else, of starting a business within our comfort zone. A comfort zone refers to a level of understanding you have of the field and its users. The founder of Cursor has a deep understanding of coding, and we started in education because we understand the target audience well. Nowadays, young people are more likely to start businesses within their existing comfort zones, rather than rashly jumping into unfamiliar areas. This is because the user feedback you get is faster and more accurate.
There's also the layering of knowledge. Having worked in education three times, my understanding has been constantly evolving. These college students are less likely to rashly do things they've never done before; they're always thinking about how to do them better. They have a new generation's way of thinking, constantly iterating within their own cognitive framework and daring to create opportunities.
Another key element is the spirit of daring to take risks. They're not easily swayed by others' opinions and have an attitude of "I don't care what you think about me," displaying great confidence. Behind this is a culture of "high-speed experimentation." I know my product isn't ready yet, but I don't care; I launch quickly, iterate quickly, and get feedback quickly.
Founder Park: When did this trend start?
Kai: I think it's a consensus-based success. When people see projects like GPTZero grow from dorm rooms, iterate continuously, and then gain capital support and user recognition, there are many more such successful cases of rapid trial and error and rapid growth, which leads to a consensus.
In short, "Better done than perfect"—completion is more important than perfection. And there's not much worry about competition; many Silicon Valley founders are willing to share their product ideas, unafraid of copying, as long as they iterate quickly. I think this wave of young people also has excellent storytelling skills. This storytelling isn't empty rhetoric, but rather grounded in pragmatism and truth, combined with their vision for the future.
Founder Park: First, market yourself.
Kai: Yes. I think the underlying mindset lies in a spirit of adventure and extreme self-confidence. Driven by this, they bravely try and fail, unafraid of saying the wrong thing. They boldly express their product ideas, boldly execute them, and if they make a mistake, they can simply correct it. This culture of not being afraid of trying and failing has fueled this wave of entrepreneurial enthusiasm and success among college students.
VCs in the US also look at projects from college students; Y Combinator regularly invests in some college student projects each round.
Fundraising is the last thing VideoTutor needs to worry about right now.
Founder Park: If you could go back to when you first started VideoTutor, what advice would you give yourself? What could you have done better?
Kai: I think the pace should be faster. Also, there's the team composition. The VideoTutor team went through multiple rounds of refinement. If I had known earlier, I would have better assemble a team based on the skill profiles required for the product. I think that in the end, organizational ability is crucial for startups. I will spend more time on organizational skills: selecting, identifying, and utilizing talent effectively.
The current team is suitable for growth from 0 to 1, but to make VideoTutor bigger, we still need more experienced people to join and bring their excellent experience and abilities to the team to help the whole team grow together.
Founder Park: What kind of product or technical challenges do you think VideoTutor might encounter in the next six months?
Kai: I think one aspect is rendering. To achieve true zero latency, we still need breakthroughs in engineering. The second aspect is growth, which I think is about product taste. This involves many things, such as whether the UI and interaction design are smooth and perfect, whether the functional interactions are bug-free, and whether the visual layout is beautiful, etc. These are all challenges for us.
James: I think initially, we positioned VideoTutor as a visual teaching and tutoring platform for all subjects, but later we became very focused, targeting only the mathematics field, because that's what we're best at. Our mathematical rendering engine is the most professional. Our next key breakthrough will likely be horizontal expansion. For example, how do we bring the advantages of visualization to humanities scenarios? Like explaining "Hoeing the fields at noon, sweat drips onto the soil." This is a point we need to consider technically going forward.
Founder Park: Will the founder's background cause problems in subsequent expansion?
Kai: Not really. Actually, many large VCs have approached us, like a16z. They don't invest too early; they only help when the team already shows signs of success, so they know the investment won't fail. We maintain very good relationships with many large VCs.
Fundraising is the least of VideoTutor's worries; their biggest concerns are the user ecosystem and the product itself.



