Abandoning data centers, these startups are building new AI models

05-02

This article is machine translated

Show original

Researchers used GPU clusters distributed globally, combining private and public data to successfully train a new type of large language model (LLM) - a breakthrough that could potentially overturn the current mainstream AI construction paradigm.

Two AI startups taking non-traditional paths, Flower AI and Vana, jointly created this new model called Collective-1. Flower AI developed technology that allows training tasks to be distributed across hundreds of interconnected computers, a solution already used by multiple enterprises for AI model training without centralized computing power or data. Vana provided diverse data sources including X platform, Reddit, and Telegram private messages.

By modern standards, Collective-1 is relatively small - its 7 billion parameters (which collectively determine the model's capabilities) are far from the hundreds of billions of parameters of today's most advanced models (such as those supporting ChatGPT, Claude, and Gemini).

Nic Lane, a Cambridge University computer scientist and Flower AI co-founder, noted that this distributed approach could potentially overcome Collective-1's scale limitations. He revealed that Flower AI is currently training a 30 billion parameter model using traditional data and plans to develop a trillion-parameter model later this year - approaching industry leaders' levels. "This could completely transform people's perception of AI, and we are pushing forward full force," Lane stated. The startup is also incorporating images and audio to create a multimodal model.

Distributed modeling could also potentially reshape the power dynamics in the AI industry.

Currently, AI companies rely on two pillars for model construction: massive training data and concentrated computing power in data centers connected by ultra-high-speed fiber optic networks. They also heavily depend on publicly scraped datasets (though some involve copyrighted materials), including web pages and book content.

This model means only financially robust enterprises and countries with access to high-end chips can develop the most valuable cutting-edge models. Even open-source models like Meta's Llama and Deep Exploration's R1 come from companies with large data centers. The distributed approach allows small and medium enterprises and universities to develop ecosystem AI by integrating dispersed resources, or enables countries lacking traditional infrastructure to build stronger models by connecting multiple data centers.

Lane believes the AI industry will increasingly favor new methods that break through single data center limitations. "Compared to the data center model, distributed solutions can scale computing power more elegantly," he explained.

Helen Toner, an AI governance expert at the Center for Security and Emerging Technology, evaluated Flower AI's approach as having "potential significant implications for AI competition and governance". She noted: "While it may still be difficult to match the most advanced technology, it has value as a fast-following strategy."

Divide and Conquer

The core of distributed AI training lies in reconstructing computing power allocation logic. Building large language models requires inputting massive text into the system and adjusting parameters to generate effective responses. Traditional data centers divide training tasks across different GPUs and periodically integrate them into a unified main model.

The new technology allows work that would traditionally be completed in large data centers to be distributed across hardware devices miles apart, connected only by ordinary networks.

Industry giants are also exploring distributed learning. Last year, Google researchers proposed a new framework called "Distributed Path Composition" (DiPaCo) that improves distributed training efficiency. To build models like Collective-1, Lane and colleagues from Chinese and British institutions developed a new tool called Photon, which uses more efficient data representation and training sharing integration schemes. Lane candidly admitted that while the process is slower than traditional training, it offers greater flexibility and can accelerate training by adding hardware at any time.

Photon was developed with researchers from Beijing University of Posts and Telecommunications and Zhejiang University and was open-sourced last month. Vana, Flower AI's collaboration partner, is committed to enabling users to share personal data with AI builders in new ways - its software supports users contributing private data from platforms like X and Reddit, with the ability to specify usage scope and even receive economic compensation.

Vana co-founder Anna Kazlauskas stated that this aims to tap into undeveloped data potential while giving users more control. "These non-public data, which typically cannot enter AI models, are being used for foundational model training for the first time, and users can own the rights to models created by their data," she emphasized.

Mirco Musolesi, a computer scientist at University College London, pointed out that the key value of distributed training is unlocking new types of data: "Applying this to frontier models can allow the AI industry to train using dispersed sensitive data from fields like healthcare and finance while avoiding data centralization risks."

Welcome to join BlockBeats official community:

Telegram Subscription Group: https://t.me/theblockbeats

Telegram Discussion Group: https://t.me/BlockBeats_App

Twitter Official Account: https://twitter.com/BlockBeatsAsia

Source

Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.

Add to Favorites

Comments

Relevant content

All-in station

Ho Chi Minh City launches a $1 billion Digital Asset Fund, aiming to become a "financial hub" for investors.

BlockTempo

Arthur Hayes challenges Multicoin founder: He bets 100,000 magnesium that HYPE will outperform all Altcoin within six months.

HYPE

2.47%

ODAILY

The day CZ missed his best investment, Crypto missed out on AI.

CAI