Author: elizeliu
Original: R3PO
Recently, there has been a wave of investment and acquisitions in the AI field. Salesforce, a world-renowned company, injected $450 million into Anthropic, while Runway successfully raised $141 million in funding. Additionally, Snowflake also announced the completion of its acquisition of Neeva, while Chinese domestic giant Meituan acquired AI company Light Years Away for 2.065 billion.
The most high-profile deal, however, was undoubtedly the acquisition of startup MosaicML. It is understood that MosaicML was acquired by big data giant Databricks for approximately US$1.3 billion, and its valuation has doubled six times in this transaction, making it the largest acquisition in the first half of this year. It has only been established for 2 years and has more than 60 employees. What supports the high valuation of MosaicML?
Databricks Acquires MosaicML to Accelerate the Democratization of Generative AI Technologies
Databricks recently officially announced that it has acquired MosaicML, a generative artificial intelligence startup, for approximately US$1.3 billion (approximately RMB 9.3 billion) to provide services for building ChatGPT-like tools for enterprises.
After the acquisition, MosaicML will become part of the Databricks Lakehouse platform. MosaicML's entire team and technology will be brought under the banner of Databricks, providing enterprises with a unified platform to manage data assets and be able to use their own proprietary data to build, own and protect Own generative AI models.
MosaicML is a very young generative AI company. It was founded in San Francisco in 2021. It has only publicly disclosed one round of financing and has only 62 employees. In the last round of financing, its valuation was 220 million US dollars, that is to say, the valuation of the acquisition of MosaicML directly jumped 6 times. The deal is the largest acquisition announced in the generative AI field so far this year. Not long ago, cloud computing giant Snowflake just announced the acquisition of another generative AI company, Neeva. After a few months of investment frenzy, a massive corporate wave of acquisitions of generative AI startups appears to be underway.
Databricks originated from UC Berkeley and participated in the development of the Apache Spark project. As a data storage and analysis giant, as of 2022, it will be valued at $31 billion, helping large companies such as AT&T, Shell, and Walgreens process data. Some time ago, I just opened up my own large model Dolly, aiming to achieve a similar effect to ChatGPT with fewer parameters. After cloud computing became more popular, the concept of "integration of lakes and warehouses" proposed by Spark has deeply influenced a number of big data start-ups. Since its establishment in 2013, Databricks has rapidly grown into the world's hottest Data Infra company. Last year, Databricks announced annual revenue of more than $1 billion, and after completing its latest round of financing in August 2021, its latest valuation reached $38 billion.
Advantages of MosaicML MPT Series Models
MosaicML's MPT series models are subclassed from the HuggingFace PretrainedModel base class and are fully compatible with the HuggingFace ecosystem. The MPT-7B model is one of MosaicML's most popular models, with billions of parameters and can handle more than 2,000 natural language processing tasks. Among them, the optimization layer of MPT-7B includes FlashAttention and low-precision layer norm, etc., which can make the model 2-7 times faster than traditional training methods, and the near-linear scalability of resources ensures that models with billions of parameters can be used in Train in hours, not days. MosaicML also released a new commercially available open source large language model MPT-30B, which has 30 billion parameters and outperforms GPT-3.
Data source: MT-Bench evaluation of MosaicML mainstream models
The strengths of the MPT series models are their high efficiency and low cost. The complexity of artificial intelligence models that use a large amount of data for "training" has risen sharply. Training a model now costs at least millions of dollars, which is generally unaffordable for small and medium-sized enterprises except for large companies. MosaicML's MPT series models allow enterprises to train their own language models at a lower cost and with higher efficiency, making it easier to apply generative AI technology and achieve better business performance. Most open-source language models can only handle sequences with at most a few thousand tokens (see Figure 1). However, with the MosaicML platform and a single node of 8xA100-40GB, users can easily fine-tune the MPT-7B to handle context lengths up to 65k. The ability to handle this extreme context length adaptation comes from ALiBi, one of the key architectural choices in MPT-7B.
For example, the full text of The Great Gatsby has less than 68k Tokens. In one test, the model StoryWriter read The Great Gatsby and generated an epilogue. One of the epilogues of model generation is shown in Figure 2. StoryWriter read The Great Gatsby in about 20 seconds (about 150,000 words per minute). Due to the longer sequence length, its "typing" speed is slower than other MPT-7B models, at about 105 words per minute. Although StoryWriter was fine-tuned with a context length of 65k, ALiBi enables the model to infer longer inputs than it was trained on: 68k tokens in the case of The Great Gatsby and up to 84k tokens in testing.
Figure 2: The MPT-7B-StoryWriter-65k+ wrote the epilogue for The Great Gatsby. The result of the epilogue is to provide the full text of "The Great Gatsby" (approximately 68k tokens) as input to the model, followed by the word "epilogue" and allow the model to continue generating.
Proliferation of Generative AI Technologies
Generative AI technology is a branch of artificial intelligence that uses large amounts of data and deep learning algorithms to automatically generate content such as original text, images, and computer code. The emergence of this technology allows people to process and analyze data more conveniently and better serve human needs. With the rapid development of big data and artificial intelligence technology, generative AI technology has been widely used in natural language processing, image recognition and virtual reality and other fields. For example, in the field of natural language processing, GPT-4 has become one of the most popular generative AI models, which can be used for tasks such as generating articles, translating languages, and answering questions. In the field of image recognition, StyleGAN2 can generate high-quality images, which can be used in game development, film and television production, and virtual reality.
Naveen Rao, CEO of MosaicML, previously stated that since 2018, the complexity of artificial intelligence models using large amounts of data for "training" has risen sharply, and training a model now costs at least millions of dollars. Small and medium-sized enterprises generally cannot afford it. After this acquisition, the joint product of Databricks' Lakehouse platform and MosaicML technology will enable enterprises to use their own proprietary data to train and build generative AI models simply, quickly and at low cost. Without control and ownership, custom AI model development can take place. According to Databricks, with the platform and technical support of Databricks and MosaicML, the cost of training and using LLMs for enterprises will be significantly reduced, and it is expected to drop to around several thousand dollars. This facilitates the popularization of generative AI.
What Databricks' Acquisition of MosaicML Means
The main purpose of Databricks' acquisition of MosaicML is to accelerate the development and democratization of generative AI technology. By integrating the technologies and resources of the two companies, Databricks can better meet the needs of customers and provide more efficient and convenient solutions. Specifically, the acquisition will bring changes in the following aspects:
1. More efficient large language model
After Databricks acquires MosaicML, it can integrate the MPT series models into its Lakehouse platform to provide customers with more efficient and lower-cost large language models. This will help enterprises better handle natural language processing tasks and improve business efficiency and accuracy.
2. Faster model training speed
MosaicML's MPT series models feature fast training, which will help Databricks provide faster model training services. This is especially important for businesses that need to respond quickly to market demands, helping them better meet customer needs.
3. Greater democratization
Databricks' acquisition of MosaicML also means that the democratization of generative AI technology will further increase. MosaicML's MPT series models can make it easier for small and medium-sized enterprises to train their own language models, so that they can better apply generative AI technology and achieve better business performance. This will help promote the development and application of generative AI technology, and promote the popularization and development of artificial intelligence technology.
Summarize
Generative AI applications are designed to generate raw text, images, and computer code based on the user's natural language cues. Interest in the technology has surged since artificial intelligence startup OpenAI launched ChatGPT, an online generative AI chatbot, last November. “Every organization should be able to benefit from the AI revolution and have more control over how its data is used. Databricks and MosaicML have an incredible opportunity to democratize AI and make Lakehouse the powerhouse of build generation The best place for artificial intelligence,” said Ali Ghodsi, co-founder and CEO of Databricks.
The significance of Databricks' acquisition of MosaicML is not only to accelerate the development and democratization of generative AI technology, but also to integrate the technologies and resources of the two companies to provide customers with more efficient and convenient solutions. With the rapid development and application of artificial intelligence technology, generative AI technology will play an increasingly important role. Databricks' acquisition of MosaicML also reflects the importance and investment of various companies in this direction. Companies like Anthropic and OpenAI license off-the-shelf language models to companies, which then build generative AI applications on top of them. Driven by strong commercial demand for these models, opportunities have been created for startups like MosaicML. From the successive acquisitions of Snowflake and Databricks, we can see that large technology companies are gradually moving from independent research and development and strategic investment to mergers and acquisitions for generative AI technology.
Reference source:
https://www.databricks.com/company/newsroom/press-releases/databricks-signs-definitive-agreement-acquire-mosaicml-leading-generative-ai-platform
https://mattturck.com/mosaic/
https://twitter.com/lmsysorg/status/1672077353533730817/photo/1
https://www.mosaicml.com/blog/mpt-7b#appendix-eval
https://www.mosaicml.com/blog/mpt-30b
Original source: https://mp.weixin.qq.com/s/WG1zoLeROkrfD1jT2CZyIA




