Closed-door discussion between Chinese and American AI entrepreneurs: Changes and new trends in AI entrepreneurship after DeepSeek-R1

TechFlow

02-12

This article is machine translated

Show original

Here is the English translation of the text, with the content inside <> retained without translation:

Chatbots may not necessarily be users' first AI product.

Source:FounderPark

Image source: Generated by Boundless AI

DeepSeek is undoubtedly the focus during the 2025 Spring Festival, from topping the Apple Store's free chart to various cloud vendors actively deploying DeepSeek-R1, DeepSeek has even become the first AI product experienced by many people. For entrepreneurs, from discussions on technical innovation points, analysis of training and inference costs, to the impact on the entire AI industry, everyone is talking about it.

On February 2nd, Founder Park and Global Ready, the closed-door global community under GeekPark, organized a closed-door discussion, inviting over 60 founders and technical experts from AI companies in Silicon Valley, China, London, Singapore, and Japan to have an in-depth discussion on the new technical directions and product trends triggered by DeepSeek, from the perspectives of technical innovation, product implementation, and computing power shortage.

After desensitization, we have summarized the key points of the closed-door discussion.

01 Where is the innovation of DeepSeek?

DeepSeek released its V3 base model at the end of December, which is currently one of the most powerful open-source models in the industry, containing 37B activation parameters and a total parameter scale of 671B, which is a large-scale MoE (Mixture of Experts) model.

The "Aha moment" of the R1 model released in January 2025 refers to the model's ability to exhibit a certain reflective capability during inference. For example, in the process of solving a problem, the model may realize that a certain method is no longer applicable and adjust to a more effective method during the process. This reflective capability comes from reinforcement learning (RL).

R1 is DeepSeek's flagship model, and R1's inference capability is on par with OpenAI's o1. The specific implementation method can be summarized as: R1 uses two-step reinforcement learning and two-step SFT, where the first two steps of RL and SFT are mainly used to build a data generation teacher model to guide the third step of data generation. This model aims to become the most powerful inference model currently.

The core innovation of the DeepSeek R1-Zero model is to skip the traditional fine-tuning (SFT) process and directly optimize the inference through reinforcement learning (RL). In addition, using DeepSeek R1 as a teacher model to distill an open-source small model (such as Qwen1.7B/7B/14B/32B) can significantly improve the capability of the small model.
In terms of coding capability, DeepSeek's R1 and OpenAI's recently released o3 mini are on par, with o3 mini slightly stronger overall. The difference is that R1 is open-source, which will stimulate more application developers to use R1.
The key to DeepSeek's success is using a highly integrated engineering solution to drive down the price. Looking at their methods separately, each method can be found in last year's papers, but DeepSeek uses the latest methods in an aggressive manner. These methods themselves may have side effects, such as additional storage overhead, but they can greatly improve the utilization rate of the cluster.
If it's not a large-scale cluster serving a large number of people, the MLA architecture may have side effects. DeepSeek's many methods, if not applied in specific scenarios and environments, cannot achieve maximum performance optimization, and using these technologies separately may even have side effects. Their system design is very ingenious, to the extent that if these technologies are taken out individually, they cannot produce the same effect as DeepSeek.
One should not only train a process reward model, because if only this type of model is trained, the final effect may not meet expectations and may even lead to overfitting. DeepSeek has chosen the most primitive reinforcement learning method, using heuristic rules to score the final result, and then using traditional reinforcement learning methods to correct the process. This method they have chosen is also the result of constant trial and error, which is benefited by DeepSeek's highly efficient infrastructure.
Even if DeepSeek does not publicly release its inference code, other teams can still roughly figure out which methods they have used. The open-source model weights are already sufficient for other teams to reproduce their performance, but the difficulty lies in how to figure out the special configurations inside, which requires time.
Relying solely on data labeling rewards, it is difficult to achieve super human intelligence capabilities. A real reward model based on real data or real environment feedback is needed to achieve higher-level reward optimization and generate superhuman intelligence capabilities.
Technical speculation: If the base model itself has strong generality, combined with mathematical and coding capabilities, the combination of the two parts will produce stronger generalization capabilities. For example, if there is a relatively intelligent base model that is already good at writing, then combined with some mathematical and coding reinforcement learning, it may be able to achieve good generalization, ultimately producing some very strong capabilities. Specifically, it may be able to write works of various genres from parallel prose to regulated verse, while other models are not as good in this regard.

02 Why is DeepSeek's cost so low?

The sparsity of the model is very high. Although this is a large model with over 600B parameters, the actual activated parameters per token during inference are only 37B, meaning its inference speed and resource consumption are equivalent to a 37B parameter model. But to achieve this, a lot of system design changes are required.
In DeepSeek V3, the MoE architecture contains 256 expert modules, but only a small portion of them are activated during each inference. Under high load, it can dynamically adjust the resource utilization rate, theoretically compressing the cost to 1/256 of the original. This design reflects DeepSeek's forward-looking software architecture. If the system optimization is good enough, the price can be greatly reduced under the same order of magnitude.
During model training, there are usually three axes of parallelization: data parallelism, pipeline parallelism, and tensor parallelism. To accommodate the sparse model design, DeepSeek has made a lot of adjustments to the training framework and pipeline, abandoning tensor parallelism and only using data parallelism and pipeline parallelism, and further performing fine-grained expert parallelism by precisely dividing the number of experts (up to 256 experts) and allocating different experts to different GPUs. This allows H800 and H100 to achieve similar training efficiency, bypassing hardware limitations.
In terms of model deployment, experiments show that the computing power cost is controllable, and the technical difficulty is not high, usually only taking one to two weeks to complete the replication, which is very beneficial for many application developers.
A possible model architecture: allow the reasoning RL to no longer be limited to the large language model itself, but add a "thinking machine" outside to complete the entire reasoning capability, which can further reduce the overall cost by several orders of magnitude.

03 Chatbots may not necessarily be users' first AI product

The success of DeepSeek R1 lies not only in its reasoning capability, but also in its combination with search functionality, where the reasoning model + search is to some extent equivalent to a micro-agent framework. For most users, this is their first experience with a reasoning model. For users who have already used other reasoning models (such as OpenAI's o1), the DeepSeek R1 with search functionality is a completely new experience.
For users who have not used AI products before, their first AI product may not necessarily be a language interaction product like ChatGPT, but could be another product driven by models in a different scenario.
The competitive barrier for AI application companies lies in the product experience. Whoever can do it faster and better, providing users with more comfortable functions, will have a competitive advantage in the market.

Here is the English translation of the text, with the specified terms retained and not translated:

The currently visible thought process presented by the model is a satisfactory design, but it is more like an early work to improve the model's capabilities through reinforcement learning (RL). The length of the reasoning process is not the only criterion for evaluating the correctness of the final result. In the future, the focus will shift from complex 'Twitter threads' to more concise short reasoning processes.

04 AI Deployment in Vertical Scenarios is Easier Now

For relatively vertical tasks (vertical tasks), task evaluation can be completed through a rule system, without relying on complex rewarding models. On well-defined vertical tasks, models similar to Tiny Zero or 7B can quickly obtain usable results.

On a well-defined vertical task, training with a 70 billion parameter or larger model distilled by DeepSeek can quickly lead to an "aha moment". From a cost perspective, on a 7B model, solving simple arithmetic problems or Blackjack tasks with clear answers only requires 2-4 H100 or H200 GPUs, and the model can converge to a usable state in less than half a day.

In vertical domains, especially in handling tasks with clear answers, such as mathematical calculations, physical rule judgments (item placement, whether movements follow rules), the effect of DeepSeek R1 is indeed better than other models and the cost is controllable, so it can be applied in a wide range of vertical domains. However, for tasks without clear answers, such as judging whether something is aesthetically pleasing or whether an answer makes someone happy, subjective evaluations cannot be well solved by rule-based methods. This may require waiting three months or half a year until better methods emerge to solve these problems.

When using supervised fine-tuning (SFT) or similar methods, it is difficult to solve the time-consuming data set queries, and the domain distribution of these data sets often cannot fully cover all levels of the task. Now there is a new and better tool kit, equipped with a high-quality model, which can solve the past difficulties in data collection and tasks with clear answers.

Relying solely on a rule system, although mathematics and code can define relatively clear rules, if faced with more complex or more open-ended tasks, it will become very difficult to rely on a rule system. So people may eventually explore more suitable models to evaluate the results of these complex scenarios. They may adopt ORM (outcome-oriented reward function) instead of PRM (process-oriented reward function) methods, or explore other similar methods. Ultimately, they may build a simulator similar to a "world model" to provide better feedback for the decision-making of various models.

When training reasoning capabilities with small models, you don't even need to rely on 'Token'-based solutions. In a solution for an e-commerce domain, the entire reasoning capability was directly extracted from a 'Transformer'-based model, and another small model was used to complete all the reasoning work, combined with the 'Transformer' to implement the entire task.

For companies that develop models for their own use (such as hedge funds), the challenge lies in the cost issue. Large companies can spread the cost by attracting customers, but small teams or companies cannot afford the high R&D costs. The open-sourcing of DeepSeek is of great significance to them, as it allows teams that previously could not afford the high R&D costs to now build models.

In the financial field, especially in quantitative funds, it is usually necessary to analyze a large amount of financial data, such as company financial reports and Bloomberg data. These companies often build their own datasets and conduct supervised training, but the cost of data labeling is very high. For these companies, the application of reinforcement learning (RL) in the fine-tuning stage can significantly improve model performance and achieve a qualitative leap.

05 Domestic Chips May Solve the Inference Computing Power Problem

Domestically, there are quite a few benchmarks against the A100 and A800 chips, but the biggest bottleneck of domestic chips is not in chip design, but in wafer fabrication. DeepSeek's adaptation to Huawei is also because the latter can relatively stably produce chips, and can still guarantee stable training and inference under subsequent stricter sanctions.

As Nvidia moves forward, from the perspective of single-card training, these high-end chips have excess computing power in some application scenarios. For example, the computing power of a single card may not be fully utilized in the training stage due to additional cache and memory constraints, making it not the most suitable for training tasks.

In the domestic chip market, if focusing solely on AI applications without considering scientific computing, significantly reducing high-precision floating-point computing power and only focusing on AI tasks, it is possible to catch up with Nvidia's flagship chips in some performance indicators.

06 More Powerful Agents and Cross-Application Calling Capabilities

For many vertical domains, the capabilities of agents will see significant improvements. You can first take out a basic model, turn some rules into a rule model, which may be a pure engineering solution. Then, you can use this engineering solution to allow the basic model to iterate and train on it. You may get a result that already has some superhuman intelligence capabilities. On this basis, you can make some preference adjustments to make its answers more in line with human reading habits, and you may be able to obtain a more powerful reasoning agent in a specific vertical domain.

This may bring a problem, you may not be able to have an agent with strong generalization capabilities in all vertical domains. After training an agent in a specific domain, it can only work in that domain and cannot be generalized to other vertical domains. But this is a possible (landing) direction, because the inference cost of DeepSeek itself is very low, you can choose a model and then conduct a series of reinforcement training, and after the training is completed, it only serves a specific vertical domain and no longer cares about other vertical domains. This is an acceptable solution for vertical AI companies.

From an academic perspective, an important trend in the next year is that some existing methods in reinforcement learning will be transferred to the application of large models, to solve the current problems of insufficient generalization or inaccurate evaluation. Through this approach, the performance and generalization capabilities of the models can be further improved. With the application of reinforcement learning, the ability to output structured information will be greatly improved, ultimately better supporting various application scenarios, especially improving the generation effect of charts and other structured content.

More and more people can use R1 for post-training, and everyone can create their own agents. The model layer will become different agent models, using different tools to solve problems in different domains, ultimately realizing a multi-agent system.

2025 may become the year of intelligent agents, with many companies launching agents with task planning capabilities. However, there is currently a lack of sufficient data to support these tasks. For example, planning tasks may include helping users order takeout, book travel, and judge the availability of scenic spot tickets. These tasks require a large amount of data and reward mechanisms to evaluate the accuracy of the models, such as planning a trip to Zhangjiajie, how to judge what is right and wrong, and how to conduct model learning. These issues will become the next research hotspot, and reasoning capabilities will ultimately be used to solve practical problems.

In 2025, the ability to call across applications will become a hotspot. On the Android system, due to its open-source nature, developers can achieve cross-application operations through underlying permissions, and agents will be able to control your browser, mobile phone, computer and other devices in the future. However, in the Apple ecosystem, due to the strict permission management, agents still face great difficulties in completely controlling all applications on the device, and Apple must independently develop intelligent agents that can control all applications.

Source

Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.

Add to Favorites

Comments

Relevant content