English
Share
Sign In
The era of small but strong SLM is fast approaching.
Haebom
👍
4
Created by
  • Haebom
Created at
Microsoft has released a new language model called Phi-3. In fact, I didn't have high expectations since it was released not long after Phi-2 was released in December last year, but looking at the released report, it seems to be showing meaningful results in the sLM field.
The most striking thing is its size. The Phi-3-mini model was trained with only 380 million parameters. Do you think 380 million sounds like a lot? In fact, some language models that are coming out these days have hundreds of billions or even trillions of parameters. Phi-3-mini is less than 1/10th of the size of those large models, and it is small enough to run on a mobile phone. (Please don’t use the weird word sLLM… Let’s use sLM.)
The subtitle of this technical report, A Highly Capable Language Model Locally on Your Phone, shows that it was taken into consideration to provide it in the on-device form that has recently been in the spotlight. It is a so-called small and smart model.
But what about performance?
It's unbelievably good. It's not far behind the latest large models like Mixtral 8x7B or GPT-3.5. It also scored high on language understanding and reasoning benchmarks like MMLU and MT-bench. It may be small in size, but its performance is top-notch.
What is the secret to achieving such high performance with such a small model? It is the training data that has been well-processed. By appropriately combining high-quality data collected from the web and artificially generated data, the model can achieve maximum performance even with a small capacity.
This miniaturization can greatly increase the utility of language models. Smaller models can run directly on a wider range of devices. It will also have great advantages in terms of privacy and response speed. In the future, more people will be able to use high-performance language AI in their daily lives.
Another encouraging point is safety. The Phi-3 model has significantly improved safety and robustness through post-training. The probability of giving harmful responses has also been greatly reduced. It is expected to provide users with a safer and more correct conversation experience.
Maximum?
Of course, its performance is not perfect. It still shows shortcomings in tests that require a large amount of encyclopedic knowledge, such as TriviaQA. However, it seems that this part can be improved by linking it with search engines, etc.
With the recent release of LLaMA3, there are certainly meaningful models coming out. While LLaMA aims to open source and somewhere in between with its license, Microsoft's Phi-2 is actually more promising because it was released under the MIT license.
In this way, the Phi-3 model shows the new possibilities of small but powerful language models. The technology to make high-performance AI lighter and more accessible will continue to develop in the future. It seems that good models are becoming more accessible at a faster rate than I thought, so I personally think that changes may come much faster than I thought.
Subscribe to 'haebom'
📚 Welcome to Haebom's archives.
---
I post articles related to IT 💻, economy 💰, and humanities 🎭.
If you are curious about my thoughts, perspectives or interests, please subscribe.
Would you like to be notified when new articles are posted? 🔔 Yes, that means subscribe.
haebom@kakao.com
Subscribe
👍
4