Microsoft's AI family bucket was on a rampage last night: GPT-4o went to the cloud, Nadella confessed his love for OpenAI on the spot, and Altman revealed a new model

avatar
36kr
05-22
This article is machine translated
Show original

Overnight, Microsoft's AI universe took shape.

Early this morning, at the annual 2024 Microsoft Build conference, Microsoft CEO Satya Nadella announced more than 50 AI capability updates in one go, covering GPT-4o cloud computing, self-developed Cobalt chip, team version of Copilot, SOTA small model and many other aspects.

As a "developer feast" for the AI ​​circle, the releases of this Microsoft Build conference mainly have the following core highlights:

1. Announced that GPT-4o is generally available on Azure AI, and introduced multiple large models from companies such as Cohere, Databricks, Meta, Mistral, and the open source community Hugging Face . While holding OpenAI, it also focused on third-party models and open source models.

2. The Windows Copilot library will be launched in June, including models of more than 40 models and multiple local APIs that can be used out of the box.

3. Launched the 4.2 billion parameter multimodal SLM (small language model) Phi-3-vision , which supports image understanding and interaction; at the same time, it provides the 7 billion parameter Phi-3 small model and the 14 billion parameter Phi-3 medium model, which support cross-operating systems and cloud-edge operation.

Phi-3-vision open source address: https://huggingface.co/microsoft/Phi-3-vision-128k-instruct

4. Launched the latest edge-side small model Phi-Silica , designed specifically for the NPU in Copilot+ PC, and achieved SOTA in SLM.

5. It was announced that native support for PyTorch and WebNN frameworks will be provided through Windows DirectML. Developers will have a web-native machine learning framework that enables them to directly access GPUs and NPUs.

6. Introducing Copilot connectors to support connecting business data, workflows, and third-party SaaS applications to help enterprises build and customize Copilot.

7. Launched Team Copilot , which can play multiple roles such as meeting host, record meeting notes, make charts, manage projects, etc., and expand Agent capabilities.

8. The Azure AI Studio platform will launch custom model capabilities, and the data analysis platform Microsoft Fabric will add new real-time intelligent capabilities .

9. Open the preview version of its self-developed chip customized CPU Azure Cobalt to customers, with performance improved by up to 40% .

10. Announced that it will become one of the first platforms to provide NVIDIA Blackwell GPUs and released Copilot+ PC equipped with RTX GPUs; expanded cooperation with AMD, Azure will become the first cloud platform to provide the general version of the ND MI300X V5 accelerator .

Nadella mentioned that the most prominent (trend) in the past year was how developers used the power of big models to change the world.

Currently, Microsoft has built three platforms : the first is Microsoft Copilot , which becomes the user's daily assistant and helps users take action; the second is the Copilot stack , which helps developers build AI applications and solutions faster; the third is Copilot+PC , the first AI PC. (Microsoft fires at Apple! AI PCs are equipped with GPT-4o, AI real-time chat teaches you how to play games, Qualcomm win big)

It is worth mentioning that at the last moment of the conference, which lasted more than 2 hours, OpenAI CEO Sam Altman appeared and revealed that new models and overall intelligence will be the key to OpenAI's next model , while speed and cost are also important.

01. The self-developed Cobalt chip is publicly previewed and will be released with RTX GPU version Copilot+PC

Nadella said Microsoft will release more than 50 updates today, which can be read in the order of the structure of the Copilot stack.

In terms of AI infrastructure, Nadella said that in order to implement sustainable development, by 2025, 100% of the energy used by Microsoft will come from zero-carbon energy.

Last November, Microsoft released its first AI supercomputer on the cloud. Now, Azure's supercomputing capacity has increased by 30 times.

In the cooperation with NVIDIA, the two parties cover the full stack process from cloud, AI platform to App application.

Microsoft will be one of the first platforms to offer NVIDIA Blackwell GPUs , and will release Copilot+ PCs equipped with RTX GPUs in the coming months, providing gamers, creators, and developers with higher performance for local AI workloads while delivering Microsoft's new Copilot+ features.

Microsoft announced an expansion of its collaboration with AMD, and Azure will become the first cloud platform to offer the general version of the ND MI300X V5 accelerator , which will provide the best GPT-4 price/performance ratio.

Microsoft Azure Maia also continues to be updated. Its first cluster has been launched and is providing computing power support for services such as Copilot and Azure OpenAI.

Microsoft's Arm-based CPU Azure Cobalt has entered the public preview stage, with performance improvements of up to 40%. Nadella said that Cobalt is used for video processing and permission management in Microsoft 365, has supported billions of conversations in services such as Microsoft Teams, and has currently served companies such as Siemens and Snowflake.

02. GPT-4o connected to Azure multimodal small model Phi-3-vision debuts

Currently, more than 50,000 organizations are using Azure AI. Nadella said it all started with its strategic cooperation with OpenAI.

Microsoft announced that GPT-4o is now generally available on Azure AI , which means that now any application or website can be turned into a multimodal, full-duplex conversational interface.

For example, users can obtain the Agent's proactive inquiry service on the web page. When it learns that the user is preparing for camping, it will provide him with suggestions and help him choose items to add to the shopping cart.

Just last week, OpenAI launched its latest multimodal model, GPT-4o. Yesterday, Microsoft demonstrated how Copilot uses GPT-4o in videos, allowing users to share screens or conversations to get help from Copilot, whether it is assisting in gaming, editing documents, or programming.

For example, if a user tries to make a sword in a game, Copilot can recognize the user's screen, communicate with the user and help him complete the game task, such as reminding him that "some materials are needed", "Press E on the keyboard to open the equipment library", "Go collect wood, stone and other resources", etc.

Next, Microsoft also brought many other models , including models from Cohere, Databricks, Meta, Mistral, Snowflake and other companies, which can be obtained in Azure AI. Microsoft announced that it will introduce new models from Core42, NTT DATA and other platforms.

Microsoft wants both OpenAI and Open AI . It announced that it will strengthen its cooperation with the open source community Hugging Face and will introduce more models from it to Azure AI Studio.

Microsoft is not only developing large language models, but also hopes to lead the small language model revolution.

Microsoft is now expanding the Phi-3 family of small models with the announcement of Phi-3-vision, a 4.2 billion parameter multimodal model with language and vision capabilities that can reason about images, generate insights, and answer questions about images.

Microsoft will also provide a 7 billion parameter Phi-3 small model and a 14 billion parameter Phi-3 medium model . With Phi-3, users can build applications across the Web, Android, iOS, Windows, and Edge, and can quickly switch between local hardware and the cloud.

From the benchmark test, the strongest open source model is likely to change hands. The medium cup Phi-3-Medium, with 14B parameters, is close to the performance of Mixtral 8x22B and Llama 3 with 70B parameters.

The ultra-small multi-modal model Phi-3 Vision also performed quite well. With only 4.2B parameters, its performance is comparable to Gemini 1.0 Pro V and Claude-3 Haiku.

Today, Microsoft also announced the launch of Phi-Silica , a SOTA SLM built from the Phi-3 family designed specifically for NPUs and Copilot+PCs , providing lightning-fast device reasoning and first-token responsiveness. Windows is the first platform to have the most advanced SLM customized for NPUs.

03. Launched Windows Copilot library to natively support frameworks such as PyTorch

To make Windows the best platform for building AI applications, Microsoft will launch the Windows Copilot library in June, which includes a variety of out-of-the-box local APIs and more than 40 model models, covering multiple parts from low-code tools to complex pipelines to fully multimodal models.

Take the Recall experience as an example. It relies on a device model that is deeply integrated with Windows to capture the context on the screen, convert the data into vector embeddings and build indexes, so that users can go back to the past location of the application and operate it. Edge and Microsoft 365 applications already support this feature, and soon Recall will extract context in the Microsoft 365 graph.

The Windows Copilot library also provides RAG (Retrieval Enhancement Generation) capabilities that users can use to process local data and use this capability in their own applications.

Microsoft announced that starting today, it will provide native support for PyTorch and WebNN frameworks through Windows DirectML, which means that Web developers finally have a Web-native machine learning framework that allows them to directly access GPUs and NPUs.

04. Release Copilot Runtime AI to go back in time and trace back to PC pages at any time

Additionally, Microsoft announced that it would make Microsoft Teams the best place for developers to collaborate with AI on programming , with a full policy announcement to be released this week.

Developers will be able to use and obtain source code in Microsoft Teams, and Microsoft announced the launch of the "Meet Now" feature, which supports Teams team members to solve problems within seconds. In addition, users will be able to use custom emojis in Teams.

Yesterday, Microsoft announced that Copilot is connected to PC, but building a powerful AI platform requires more than just a chip or model, but rebuilding the entire system from top to bottom.

The new Windows Copilot runtime extends the Copilot stack to the Windows system. The Windows Copilot runtime is a new component of Windows 11. It includes the Windows Copilot library , AI framework, and tool chain , and it is built on a powerful client chip foundation.

Windows Copilot Runtime On the operating system side, users can use Copilot Runtime to replay anything they see in their PC inbox; applications for photos and paintings allow users to turn ideas into reality using real-time image generation and some filter effects.

05.GitHub Copilot allows developers to customize Copilot

Copilot is the first popular product in the era of generative AI. Currently, GitHub Copilot has more than 1.8 million developers, and Microsoft is empowering developers to access programming languages ​​​​and knowledge in their native language.

GitHub Copilot WorkSpace can create specifications based on its deep understanding of the code base, and then create plans, and users can execute the plans to generate code. In this process, developers can edit from plan to code. This is a fundamentally new way to build software. Microsoft will make this tool widely available in the coming months.

At the same time, Microsoft is connecting to a broader ecosystem of developer tools and services through Copilot.

GitHub is launching a private preview of the first set of GitHub Copilot extensions developed by Microsoft and third-party partners. These additions allow developers and organizations to customize the GitHub Copilot experience using Azure, Docker, Sentry, and more directly in GitHub Copilot Chat.

Neha Batra, GitHub's vice president of engineering, demonstrated the capabilities of GitHub Copilot, which lets developers write a prime number test in Java but interact with it in Spanish.

Developers can @Azure and ask it where available resources are.

On the web, developers can also ask Copilot to help update the README document.

06. Introduce Copilot connector to connect internal and external applications and customize the development of intelligent entities

Developers can now build Copilot extensions at the data layer and experience layer to further customize Copilot .

Nadella said that Copilot is penetrating into all walks of life. For example, 68% of marketers said that Copilot helped them start the creative process, 70% of knowledge workers said that Copilot helped them improve efficiency, and in customer service scenarios, Copilot increased the speed of problem solving by 12%...

Microsoft announced that it will introduce Copilot connectors , which can help companies build and customize Copilot using business data, applications, and workflows. Companies can also use this tool to connect third-party SaaS applications, including services from Adobe, Snowflake, ServiceNow, and other companies.

Microsoft is expanding Copilot from a personal assistant to a team assistant and announced Team Copilot .

It can play any role in team collaboration, such as acting as a meeting host, taking meeting notes, making charts, managing projects, etc. This feature will be launched later this year.

Not only that, Copilot will also expand its Agent capabilities. Users can use natural language instructions or select existing templates to let Copilot become an expert in different fields. Nadella said: "I think this is a key step that will bring real change next year."

Microsoft can switch between all Copilot experiences and Microsoft Team. Developers only need to click in SharePoint to synchronize their data, applications, operations, etc.

At the same time, the Copilot extension can run on any device anywhere. Copilot works by inferring user prompts and mapping them to the correct extension, or using the extension for a deeper conversation. The extension will provide quick action suggestions and show users relevant features, allowing Copilot to acquire knowledge in real time.

These Copilot extensions can also be used in various scenarios such as team meetings, one-on-one chats, etc.

Additionally, Microsoft announced that it will bring Windows Volumetric Apps to the Meta Quest headset, bringing Copilot to 3D virtual spaces.

07. End-to-end tool platform update, Microsoft Fabric real-time intelligence function online

Azure AI Studio provides an end-to-end tool solution to help developers build, train, and fine-tune AI models. It also provides tools to evaluate the performance and quality of AI models and applications, as well as tools to detect and organize injection attacks in prompt words.

The model will have many specific custom use cases, and Microsoft Azure AI custom models will be launched soon, allowing developers to build their own corresponding domain and data models.

The platform has five major advantages, including that anyone can build custom models, output will be domain-specific, multi-task processing, benchmark-defined multimodal best, and specific language capabilities.

On the data front, Microsoft has added new real-time intelligence capabilities to its end-to-end data analytics platform, Microsoft Fabric , which is now available in preview to developers.

At the data level, in order to train and fine-tune models, Microsoft is building a platform for the complete data state from operation, storage to analysis. The core of this is Microsoft Fabric, which currently has more than 11,000 customers.

Microsoft Fabric unifies computing, storage, user experience, and governance, and allows developers to process data anywhere outside of Azure based on the platform.

Real-time intelligence capabilities are available to code-free analysts and professional developers. In the platform, developers can obtain real-time actionable insights about data flows and use them to discover, manage and use these event data; and provide a large number of controlled experiences. Developers can use out-of-the-box connectors to introduce data from Microsoft and cross-clouds, and simply drag and drop operations can introduce relevant data into Fabric's catalog.

Developers can analyze, explore and act on data in real time, and Microsoft has also launched a new Microsoft Fabric workload development kit that enables independent software vendors (ISVs) and developers to extend applications within the Fabric to create a unified user experience, making this possible.

Microsoft is building a new application platform through the Fabric Workload Development Kit, which integrates spatial analysis capabilities and allows developers to use ESRI's tools and libraries to analyze their own data.

08. The point of diminishing returns of AI models is far from being reached, so we need to build larger supercomputers

Microsoft CTO Kevin Scott said that over the past year, Microsoft has done a lot of work based on the Copilot stack, optimizing the system to make it lower cost and more powerful, and building the entire function, system, service and cloud around the core AI platform.

Why can this be done? He said that this is because Microsoft has deployed the most generative AI applications, has its own Copilot stack, and builds them in a safe and reliable way.

One of the amazing achievements of GPT-4o is that it responds to users' audio and video interaction needs in real time, making it natural and smooth. Behind the scenes, Microsoft and OpenAI are pursuing an efficiency point by building larger supercomputers to create the next generation of large models.

From last year’s GPT-4 to this year’s GPT-4o, the conversation price has become 12 times cheaper, and the model’s first token response speed is 6 times faster.

Behind the scenes, Microsoft is also making a full set of optimizations from building network chips to data center iterations, and doing a lot of software development work based on these hardware to truly unleash hardware performance.

Microsoft thinks one of the amazing things is that there is no sign of diminishing returns. The message Microsoft conveyed to everyone today is that things will become more powerful and cheaper over time at an extremely fast rate .

Let's look at small models. Small models require less computational cost to run, but are more suitable for running on devices, which usually means reduced quality. But Microsoft has found an efficient frontier in the past year, that is, the quality achieved by small models in the scene has become quite high.

Wharton School professor Ethan Mollick commented: Because Microsoft is training the models, it understands the impact of more computing better than almost anyone, which is worth noting.

09. Altman made his final appearance, netizens criticized and questioned Scarlett's voice incident

At the end of the conference, OpenAI CEO Sam Altman appeared as a special guest. He did not explicitly predict the next generation of large models, but mentioned that "the models will become more and more intelligent, generally speaking, comprehensive intelligence."

Altman revealed that new patterns and overall intelligence will be key to OpenAI's next model, while speed and cost will also be important.

He also mentioned that OpenAI's R&D team has done a lot of work to ensure the safety of GPT-4, but in order to achieve true alignment, they must set up different teams from researching and creating models to security systems, from formulating policies to how to monitor. This is a huge workload, but it must be deployed and the product must be made available to users. Altman is very proud of the work the team has done together.

However, netizens seemed to have some complaints about Altman’s appearance. In the comments section of the foreign media reporter’s tweet, almost all of them were mocking or sarcastic comments about OpenAI using Black Widow’s voice without permission. (OpenAI is in big trouble again! Accused of plagiarizing Black Widow’s voice, if you don’t agree, you can copy it)

One netizen said: "Someone go ask him about Scarlett Johansson?"

Some people also posted a gif of Black Widow and said, "Come on, ask that question."

Another netizen sarcastically said: "This is a list of celebrities whose voices we used without permission."

10. Conclusion: Copilot accelerates the implementation of Microsoft AI universe

Just as the theme of this year's Microsoft Build conference is "How will AI shape your future?", Microsoft has made more than 50 updates in infrastructure, models, software tool chains, applications, etc., allowing us to more concretely and deeply feel the changes that AI has brought to various industries and accelerate its penetration into people's lives.

Nadella said that 70 years ago he had two dreams: Can computers really understand us? Can computers help us effectively reason, plan and act with more data? He believes that real breakthroughs have been made in both aspects now. Scaling Laws, like Morris's Law driving the information revolution, together with model architecture, will drive this intelligent revolution.

If we say that a year ago Microsoft launched Windows Copilot and embedded GPT-4 into the Windows operating system, marking the beginning of the construction of the Microsoft AI universe; then today Microsoft has initially formed this AI universe through the upgrade of Copilot and the practice of connecting it to major products, and has pushed the industry into a new stage of reshuffle through a series of measures such as speeding up and reducing prices, and ecological cooperation.

This article comes from the WeChat public account "Zhidongxi" (ID: zhidxcom) , author: Zhidongxi Editorial Department, and is authorized to be published by 36Kr.

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments