Six major AIs are engaged in a trading war. Will the crypto version of the "Turing Test" have a good result?

This article is machine translated
Show original

Written by David, TechFlow



Good news, after the epic crash on October 11, crypto trading has become active again.



The bad news is that AI is trading.



At the beginning of a new week, the market began to become active, and a project called nof1.ai caused a lot of discussion on crypto social media.



The focus of everyone's attention is also simple. They watch the six large AI models in this project in real time and do crypto transactions on Hyperliquid to see who makes more money.





Note that this isn't a simulated trading platform. Claude, GPT-5, Gemini, Deepseek, Grok, and Tongyi Qianwen each traded $10,000 in real money on Hyperliquid. All addresses are publicly available, allowing anyone to watch this "AI Trader War" in real time.



Interestingly, these six AIs use the exact same prompts and receive the exact same market data. The only variable is their individual thinking styles.



Within just a few days after going online on October 18, some AIs had earned more than 20%, while others had lost nearly 40%.



In 1950, Turing proposed the famous Turing Test, attempting to answer the question, "Can a machine think like a human?" Now, in the crypto, six major AIs are competing in the Alpha Arena, answering a more interesting question:



If the smartest AIs were allowed to trade in the real market, who would survive?



Perhaps in this crypto version of the "Turing Test," the account balance is the only judge.



A good AI is one that makes money, and Deepseek is currently leading the way.



Traditional AI evaluation, whether it is asking the model to write code, do math problems, or write articles, is essentially tested in a "static" environment.



The questions are fixed, the answers are predictable, and may even have appeared in the training data.



But the crypto market is different.



Given the extreme information asymmetry, prices fluctuate every second. There's no single answer, only profit and loss. More importantly, the crypto market is a classic zero-sum game: your profit is someone else's loss. The market will immediately and ruthlessly punish any wrong decision.



The Nof1 team, which hosted the AI ​​Trading War, wrote a sentence on their website:



Markets are the ultimate test of intelligence.





If the traditional Turing test asks "Can you make it so that humans can't tell you are a machine?", then Alpha Arena actually asks:



Can you make money in the crypto market? This is actually the real expectation of crypto players for AI.



Currently, the addresses of the six major AI models on Hyperliquid are as follows, and you can also easily retrieve their positions and transaction records.





At the same time, the nof1.ai official website also visualizes all of their current historical trading records, positions, profits, and thinking processes on the front end, making it easy for everyone to refer to them.



For readers who are completely unfamiliar with the AI, the specific trading rules of several AIs are:



Each AI receives an initial capital of $10,000 and can trade perpetual contracts for BTC, ETH, SOL, BNB, DOGE, and XRP. The goal is to maximize returns while controlling risk. Each AI must independently decide when to open and close positions, and how much leverage to use. Season 1 will run for a few weeks, depending on the situation, with major updates coming in Season 2.



As of October 20, the third day after trading began, the situation had clearly diverged.





The current leader is Deepseek Chat V3.1, with $12,533 (+25.33%), followed by Grok-4 with $12,147 (+21.47%), and Claude Sonnet 4.5 with $11,047 (+10.47%).



Qwen3 Max performed relatively well, with a balance of $10,263 (+2.63%). GPT-5 lagged significantly behind, with a current balance of $7,442 (-25.58%). The worst performer was Gemini 2.5 Pro, with a balance of $6,062 (-39.38%).



The most surprising yet reasonable performance is of course Deepseek's.



It’s surprising because this model is far less popular in the international AI community than GPT and Claude. It’s also reasonable because Deepseek is backed by the Magic Square Quantitative team.



This quantitative giant, with assets under management exceeding 100 billion RMB, began its career in algorithmic trading before venturing into AI. From quantitative trading to large-scale AI models, and then to using AI for real-world crypto trading, Deepseek has seemingly returned to its roots.



In comparison, OpenAI's proud GPT-5 lost more than 25%, and Google's Gemini was even more miserable, with 44 transactions resulting in a loss of nearly 40%.



In real trading scenarios, perhaps strong language skills alone are not enough, and understanding of the market is more important.



Same gun, different techniques



If you start tracking Alpha Arena from October 18th, you'll find that the first few AIs are similar, but the gap becomes larger as time goes by.



At the end of the first day, the best AI, Deepseek, only earned 4%, while the worst, Qwen3, lost 5.26%. Most AIs hovered between plus or minus 2%, seemingly testing the market.



But on October 20th, the situation suddenly changed. Deepseek soared to 25.33%, while Gemini plummeted to -39.38%. In just three days, the gap between the top and bottom stocks widened to 65 percentage points.



Even more interesting is the difference in transaction frequency.



Gemini completed 44 trades, an average of 15 per day, like an anxious speculative trader. Claude, on the other hand, only completed three, and Grok even had an open position. This discrepancy couldn't be explained by the prompts, as they used the same set of prompts.





Looking at the profit and loss distribution, Deepseek's largest single loss was $348, but its overall profit was $2,533. Gemini's largest single profit was $329, but its largest loss was as high as $750.



Different AIs (public version large models, not re-adjusted) have completely different balances between risk and return.



In addition, you can see the chat records and thought processes of different models in the Model Chat option on the website. These monologues are particularly interesting.





Just as human traders have different styles, AI also seems to exhibit different personalities. Gemini's frequent trading and thinking resembles that of someone with ADHD, Claude's caution resembles that of a conservative fund manager, and Deepseek's steady approach resembles that of a seasoned quantitative expert, who only discusses positions without any emotional assessment.



This personality doesn't feel like it was designed, but rather emerged naturally during the training process. When faced with uncertainty, different AIs tend to respond in different ways.



All AIs see the same candlestick charts, the same volume, and the same market depth. They even use the same prompts. So, what causes such a huge difference?



The influence of training data may be key.



Magic Square Quantitative, the company behind Deepseek, has accumulated a massive amount of trading data and strategies over the past decade. Even if this data isn't directly used for training, does it influence the team's understanding of what constitutes a good trading decision?



In contrast, the training data of OpenAI and Google may be more inclined towards academic papers and online texts, and their understanding of real trading may not be down-to-earth enough.



At the same time, some traders speculate that Deepseek may have optimized time series prediction capabilities during training, while GPT-5 may be better at processing natural language. Different architectures will have different performance when faced with structured data such as price charts.



Watching AI do transactions is also a business



While everyone is paying attention to the profits and losses of AI, few people pay attention to the mysterious company behind it.



Nof1.ai, the company behind this AI trading war, isn't very well-known, but if you look at its social media following, you can still find some clues.



The people behind nof1.ai do not seem to be a group of typical crypto entrepreneurs, but rather a group of academic AI researchers.



Jay A Zhang (founder)'s profile is also very interesting:



"Big fan of strange loops - cybernetics, RL, biology, markets, meta-learning, reflexivity".



Reflexivity is a core concept of Soros's theory: the perceptions of market participants influence the market, and market changes in turn influence the perceptions of participants. Having someone who studies reflexivity conduct an AI trading market experiment itself seems rather fateful.



Let everyone see how AI trades and see how this "being observed" affects the market.





Co-founder Matthew Siper's profile shows him as a PhD candidate in machine learning at New York University and an AI research scientist. A project undertaken by a PhD student who hasn't yet graduated seems more like a validation of academic research.



Other accounts nof1 follows include a researcher at Google DeepMind and an associate professor at New York University who specialize in AI and games.



Judging by their actions and background, Nof1 is clearly not just trying to generate buzz. The name SharpeBench is quite ambitious. The Sharpe ratio is the gold standard for measuring risk-adjusted returns. Perhaps what they really want to do is to create a benchmarking platform for AI trading capabilities.



Some speculate that Nof1 has strong capital backing, while others say they may be laying the groundwork for subsequent AI trading services.



If they launch a subscription service for Deepseek trading strategies, they'll likely attract a significant number of buyers. Based on this prototype, developing AI asset management, strategy subscriptions, and trading solutions for large enterprises is a foreseeable business opportunity.



In addition to the team itself, watching AI trades can be profitable in itself.



As soon as Alpha Arena went online, people started following orders.



The simplest strategy is to follow Deepseek. Buy what it buys, sell what it sells. Meanwhile, there are people in the comment section who do the opposite, specifically acting as Gemini's counterparty. They sell when Gemini buys, and buy when it sells.



But there’s a problem with copying orders: when everyone knows what Deepseek is buying, will this strategy still work? This is what project founder Jay Zhang calls reflexivity, meaning that the observation itself changes the object being observed.



There is also a semblance of democratization of top trading strategies here.



On the surface, it may seem like everyone can understand the AI's trading strategy, but in reality, what you see are the trading results, not the trading logic. The take-profit and stop-loss logic of each AI is not necessarily consistent and reliable.



While Nof1 is testing AI trading behavior, retail investors are looking for the secret to wealth, other traders are learning from it, and researchers are collecting data.



Only the AI ​​itself is unaware of being watched, diligently executing each trade. If the classic Turing test is about deception and imitation, then the current Alpha Arena trading war is about crypto traders responding to AI's capabilities and results.



In this result-driven crypto market, AI that can make money may be more important than AI that can chat.


Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
93
Add to Favorites
22
Comments