At the end of June this year, Google open-sourced the 9B and 27B versions of the Gemma 2 model series. Since its debut, the 27B version has quickly become one of the highest-ranked open models in the large model arena LMSYS Chatbot Arena, performing better than models more than twice its size in real dialogue tasks.
Now, just over a month has passed, and Google has given more consideration to the safety and accessibility of this series of models on the basis of its pursuit of responsible AI, and has achieved a series of new results.
This time, Gemma 2 not only has a lighter version " Gemma 2 2B ", but also builds a security content classifier model " ShieldGemma " and a model interpretability tool " Gemma Scope ". The details are as follows:
With built-in safety improvements, the Gemma 2 2B offers a powerful balance of performance and efficiency;
ShieldGemma is built on Gemma 2 and is used to filter the input and output of AI models to ensure user safety;
Gemma Scope provides unparalleled insight into the inner workings of your model.
Among them, Gemma 2 2B is undoubtedly the "best star". Its results in the large model arena LMSYS Chatbot Arena are eye-catching: it scored 1130 points with only 2 billion parameters, which is higher than GPT-3.5-Turbo (0613) and Mixtral-8x7b.
This also means that Gemma 2 2B will be the best choice for end-side models.
Awni Hannun, a research scientist on Apple's Machine Learning Research (MLR) team, demonstrated Gemma 2 2B running on an iPhone 15 pro, using a 4-bit quantized version, and the results showed that the speed was quite fast.
In addition, Gemma 2 2B can easily answer the question of "Which is bigger, 9.9 or 9.11?", which caused many large models to fail recently.
Image source: https://x.com/tuturetom/status/1818823253634564134
At the same time, the powerful performance of Google Gemma 2 2B also shows a trend that "small" models are gradually gaining the confidence and performance advantages to compete with larger models.
This trend has also attracted the attention of some industry insiders. For example, Jia Yangqing, a well-known artificial intelligence scientist and founder of Lepton AI, raised a point of view: Is the model size of the large language model (LLM) following the old path of CNN?
In the ImageNet era, we saw parameter size grow rapidly, and then we moved to smaller, more efficient models. This was before the LLM era, which many of us may have forgotten.
The dawn of large models: We started with AlexNet (2012) as a baseline, and then experienced about 3 years of model size growth. VGGNet (2014) is a powerful model in terms of both performance and size.
Shrinking the model: GoogLeNet (2015) reduced the model size from GB to MB, a 100x reduction while maintaining good performance. Similar works such as SqueezeNet (2015) and others follow a similar trend.
Reasonable balance: Later works such as ResNet (2015), ResNeXT (2016), etc., all maintain a moderate model size. Note that we are actually happy to use more computing power, but parameter efficiency is equally important.
On-device learning? MobileNet (2017) is a particularly interesting work from Google that has a very small footprint but very impressive performance. Last week, a friend of mine told me "Wow, we still use MobileNet because of its excellent generalizability of feature embeddings on-device". Yes, feature embeddings are really good.
Finally, Jia Yangqing asked a soul-searching question, "Will LLM follow the same trend?"
Image from Ghimire et al., A Survey on Efficient Convolutional Neural Networks and Hardware Acceleration.
Gemma 2 2B surpasses GPT-3.5 Turbo
The Gemma 2 family has added the highly anticipated Gemma 2 2B model, which was trained on a massive 2 trillion tokens using Google’s advanced TPU v5e hardware.
This lightweight model, which is distilled from a larger model, produces very good results and could have a significant impact on mobile AI and edge computing due to its small footprint, making it particularly suitable for on-device applications.
In fact, Google's Gemma 2 2B model outperformed large AI chatbots in the Chatbot Arena Elo Score ranking, demonstrating the potential of small, more efficient language models. The chart below shows the superior performance of Gemma 2 2B compared to well-known models such as GPT-3.5 and Llama 2, challenging the notion that "bigger is better".
Gemma 2 2B offers:
Excellent performance: Provides best-in-class performance at the same scale, surpassing other open source models of the same type;
Flexible and cost-effective deployment: It can run efficiently on a variety of hardware, from edge devices and laptops to cloud deployments such as Vertex AI and Google Kubernetes Engine (GKE). To further improve speed, the model is optimized using the NVIDIA TensorRT-LLM library and is available as NVIDIA NIM. In addition, Gemma 2 2B can be seamlessly integrated with Keras, JAX, Hugging Face, NVIDIA NeMo, Ollama, Gemma.cpp, and the upcoming MediaPipe to simplify development;
Open source and easily accessible: It can be used for both research and commercial applications, and it’s small enough to even run on the free tier of Google Colab’s T4 GPUs, making experimentation and development simpler than ever.
Starting today, users can download model weights from Kaggle, Hugging Face, and Vertex AI Model Garden. Users can also try out its features in Google AI Studio.
Download weight address: https://huggingface.co/collections/google/gemma-2-2b-release-66a20f3796a2ff2a7c76f98f
The emergence of Gemma 2 2B challenges the mainstream view in the field of artificial intelligence development that the larger the model, the better the performance. The success of Gemma 2 2B shows that sophisticated training techniques, efficient architectures, and high-quality datasets can make up for the lack of raw parameter numbers. This breakthrough may have a profound impact on the field, potentially shifting the focus from competing for larger and larger models to improving smaller and more efficient models.
The development of Gemma 2 2B also highlights the growing importance of model compression and distillation techniques. By efficiently distilling knowledge from larger models into smaller ones, researchers can create more accessible AI tools without sacrificing performance. This approach not only reduces computing requirements, but also addresses concerns about the environmental impact of training and running large AI models.
ShieldGemma: A State-of-the-Art Security Classifier
Technical report: https://storage.googleapis.com/deepmind-media/gemma/shieldgemma-report.pdf
ShieldGemma is a set of advanced security classifiers designed to detect and mitigate harmful content in AI model inputs and outputs, helping developers deploy models responsibly.
ShieldGemma is specifically designed to target four key hazard areas:
Hate Speech
Harassment
Sexual Content
Dangerous Content
These open classifiers complement the existing suite of safety classifiers in the Responsible AI Toolkit.
With ShieldGemma, users can create safer and better AI applications
SOTA performance: As a safety classifier, ShieldGemma has reached industry-leading levels;
Different scales: ShieldGemma offers various models to meet different needs. The 2B model is well suited for online classification tasks, while the 9B and 27B versions provide higher performance for offline applications where latency is less of a concern.
As shown in the table below, ShieldGemma (SG) models (2B, 9B, and 27B) outperform all baseline models, including GPT-4.
Gemma Scope: Make the model more transparent
Gemma Scope is designed to help the AI research community explore how to build more understandable and reliable AI systems. It provides researchers and developers with unprecedented transparency into the decision-making process of Gemma 2 models. Gemma Scope is like a powerful microscope that uses sparse autoencoders (SAEs) to zoom in on the inner workings of a model, making it easier to interpret.
Gemma Scope Technical Report: https://storage.googleapis.com/gemma-scope/gemma-scope-report.pdf
SAE helps users parse the complex information processed by Gemma 2 and expand it into a form that is easier to analyze and understand, so researchers can gain valuable insights into how Gemma 2 recognizes patterns, processes information, and ultimately makes predictions.
Here’s why Gemma Scope is groundbreaking:
Open SAEs: Over 400 free SAEs covering all layers of Gemma 2 2B and 9B;
Interactive Demos: Explore SAE capabilities and analyze model behavior without writing code on Neuronpedia;
Easy-to-use repository: Provides code and examples for SAE and Gemma 2 interaction.
Reference Links:
https://developers.googleblog.com/en/smaller-safer-more-transparent-advancing-responsible-ai-with-gemma/
This article comes from the WeChat public account "Machine Heart" and is authorized to be published by 36Kr.





