Apple releases details of its self-developed model for the first time, revealing how Apple’s intelligence is made, and it can be used even without GPT-4o

avatar
36kr
06-12
This article is machine translated
Show original

Yesterday’s Apple conference was just halfway through, and the term “Apple Intelligence” had already swept the hot search list.

At the press conference, Apple officially announced its partnership with OpenAI, and GPT-4o will be officially integrated into the Apple intelligent system.

Although Apple executive Craig Fedrighi said that OpenAI was only one of the selected candidates, this seemingly perfect collaboration still could not escape the "nitpicking" and "melon eating" of the outside world.

Even Musk joined in the fun, first giving Apple's privacy protection a bad review, and then threatening to ban Apple devices. However, there is always a twist in the plot. According to CNBC, Musk has withdrawn the lawsuit against OpenAI and its CEO Sam Altman.

In addition, some careful netizens have discovered that the new version of Siri seems to be able to read all the applications on the mobile phone. To find out the truth, you may want to read Apple’s latest blog. Perhaps the answer is hidden there.

End-to-end hybrid cloud, 3 billion parameter end-to-end model has surprises

Apple's intelligence is taking a two-pronged approach, namely the edge model and the cloud model.

Needless to say, large cloud models are huge and complex. They need to run on Apple's chip servers and can also handle more professional and complex tasks.

On the end-side model, Apple has a model with about 3B parameters. In the face of the 7B level generally hovering in China, Apple's 3B parameters seem a bit low-key.

Generally speaking, considering the limited computing power and storage space of the end-device, the more parameters there are, the stronger the model’s learning ability. Although Apple’s end-device model has only 3B parameters, it is also a benchmark for “making the most of a small effort”.

Over the past year, we have seen many cases like this that violate the "Scaling Laws".

For example, Microsoft's latest Phi-3-mini model has only 3.8B parameters and dares to challenge its 7B brother. Or, the Gemini Nano running on Google Pixel 8 Pro is only 1.8B (Nano-1) and 3.25B (Nano-2).

Compared with the competition on paper parameters, Apple’s focus on user experience is king.

The blog revealed that Apple used many real-world examples to test the actual effectiveness of the model, ranging from classification, question and answer, mathematical reasoning, to open-ended question and answer, security, summarization and writing.

Moreover, even when competing with models such as Phi-3-mini, Gemma-7B, and Mistral-7B, Apple's edge-to-edge models are considered the best by the votes of human "judges".

Apple's pursuit of AI is not only about usability, but also about safety.

For example, when it comes to testing the ability to handle harmful content, sensitive topics, and factual accuracy, Apple's basic models have put a lot of effort into it, and the violation rate is much lower than that of most models.

As a giant with more than 2.2 billion active devices, Apple's only choice in terms of violation rate seems to be lower and lower. This is actually consistent with Apple's usual security measures.

It has to understand you and based on your personal context, like your daily life, your relationships, your communications, etc., all of which goes beyond the scope of artificial intelligence. This is personal intelligence, and it is Apple's next big move.

Although Cook did not mention privacy in his remarks, his words were full of privacy risks.

If AI becomes our "second brain", privacy protection cannot and should not be a decoration. Apple's solution is that Apple intelligence is to be deeply rooted in iPhone, iPad and Mac. This is by no means a function or service, but to become part of the system.

But it is also because of this that Musk claimed that if Apple integrates ChatGPT at the system level, he will ban employees from bringing iPhones into Tesla.

However, perhaps there is no need to worry too much about this issue. The model behind Apple's intelligence is mainly divided into three layers.

Local model: mainly a fine-tuned 3B small model, dedicated to tasks such as summarization and polishing. After the support of the adapter, the ability is not weak

Private cloud computing: If the local model cannot meet the requirements, it will be transferred to the cloud for computing. Apple ensures end-to-end encryption to protect the security and privacy of user data.

Third-party LLM: used for general knowledge question-and-answer chats, such as Siri and other applications that have access to external models such as GPT-4o

In other words, Apple essentially still regards OpenAI's ChatGPT model as a plug-in, and it may also cooperate with other models. If Apple's self-developed model is strong enough, Apple can naturally completely eliminate third-party LLM.

In addition, the blog also mentioned that Apple's intelligent system also includes some other models, such as a model that can help programmers write code in Xcode, and a diffusion model that helps users express their ideas more intuitively and interestingly when texting.

How Apple's intelligence is made

If you want to make videos on your computer, you need to install some additional applications. In the world of AI models, the same is true for the "adapters" used behind Apple's models.

Simply put, an adapter is a small collection of model weights, equivalent to a small plug-in that allows the model to quickly adapt to different tasks.

For example, the model handles summarization of emails and notifications, which looks similar, but in fact there are many subtle differences, so Apple added an adapter called LoRA to the model to enable it to better complete this task.

Apple also specially selected 750 different summaries to test the actual effects, and found that the model using the adapter did perform better than other models.

Apple's trick is to tweak only these adapters without changing the "factory settings" of the base model. The benefit of this is that the model retains its original broad knowledge while being able to learn some special skills through the adapters.

More importantly, each adapter does not take up much space. Even if the model has a model brain with 3 billion parameters, it only takes up a few tens of megabytes of "brain cells".

In order for the model to learn well, the quality of the data is critical.

Apple adopted a hybrid strategy when training the model, choosing to train it with both manually annotated data and data generated by Apple itself.

When training these basic models, Apple uses some specific licensed data, including some data specially selected to enhance specific functions of the model, and public data collected from the Internet using the web crawler AppleBot.

Apple also emphasized that in the process of training these basic models, Apple did not use users' private information or any user interaction data, and even used filters to carefully remove personal information that was publicly available online.

During the training process, Apple developed two new technical methods to improve the effect of the model:

Specifically, the first method is that during training, Apple will let the model refer to the opinions of some "teachers". These "teachers" will help the model make choices when encountering uncertain situations.

The second technique is called reinforcement learning with human feedback (RLHF), which uses a special optimization strategy and leave-one-out algorithm to adjust the model so that the model can better estimate whether its output is accurate.

Through these two methods, the accuracy of the model in performing tasks has been greatly improved, and it can learn faster and more accurately. In order to solve the problem of limited resources on mobile phones and cloud servers, Apple has also used several new tricks:

  • Grouped-query-attention: Optimizing the way models process text
  • Shared input and output vocabulary: The client-side model has 49k tokens, the cloud model has 100k tokens, and contains more language and technology-related vocabulary
  • Low-bit palletization: Make the model run faster while reducing the pressure on the phone's battery and memory
  • Hybrid configuration strategy: Using 2-bit and 4-bit configuration strategies, even in limited space, it can maintain the same accuracy as the uncompressed model
  • Talaria tool: Help the model choose the most appropriate "transmission speed"
  • Activation Quantization and Embedding Quantization: Making Key-Value Cache on the Neural Engine More Flexible and Efficient

With these optimizations in place, Apple’s model performs impressively on an iPhone 15 Pro, processing each token in just 0.6 milliseconds and generating 30 tokens per second.

That’s not all. Apple has also “hidden” some tricks that can make token generation faster, but Apple did not reveal too much in the blog.

In fact, the debut of Apple smartphones is not too early, but it is not too late either.

What's late is that while other Android manufacturers have been racing in the AI mobile phone race for a year or two, Apple seemed to be just observing quietly from the sidelines. It was only recently that it slowly took its own steps.

But don’t forget that as the world’s leading manufacturer of terminal consumer scenarios, every move of Apple affects the pulse of the market. In short, in the actual implementation of AI, Apple is the indispensable one.

It’s like naming AI Apple Intelligence. On the surface, it’s a clever pun, but on a deeper level, when Apple intelligence is integrated into Apple’s ecosystem, it itself is a symbol of strength and confidence.

Of course, before that, whether it is the game among manufacturers in the AI field or the unavoidable privacy issues, what I am more interested in is who will get the AI function of Apple in China?

Original blog: https://machinelearning.apple.com/research/introducing-apple-foundation-models

This article comes from the WeChat public account "APPSO" (ID: appsolution) , author: Mo Chongyu, and is authorized to be published by 36Kr.

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments