Introduction
In recent years, AI-powered language models have become a crucial part of various applications, from chatbots to content generation. OpenAI's ChatGPT has been at the forefront of this revolution, but there's a new player in town - Zephyr-7B Beta. This language model, part of the Zephyr series, has outperformed all large language models, including GPT-3.5 Turbo and Llama-70b, and is even competing with GPT-4. The key fact that sets Zephyr apart is its incredible efficiency; it's 25 times smaller than GPT-3.5, making it a game-changer for developers and researchers looking to reduce inference times on large language models.
About Zephyr-7B Beta
The second model in the series, Zephyr-7B-β, is an improved version of mistralai/Mistral-7B-v0.1 that was trained using Direct Preference Optimization (DPO) using a variety of publicly available, synthetic datasets.
Alpaca Eval Leaderboard Triumph
One of the most exciting outcomes of Zephyr-7B-β development is its impressive performance on the Alpaca Eval Leaderboard. By outperforming ChatGPT, Zephyr-7B-β has proven its prowess in generating high-quality, contextually relevant responses to user prompts. Zephyr-7B-β has asserted itself as the leading 7B parameter LLM currently available. In several categories of MT-Bench, Zephyr-7B-β outperforms larger open models like Llama2-70B-chat.
Tutorial - Using Zephyr-7B Beta on E2E Cloud
If you require extra GPU resources for the tutorials ahead, you can explore the offerings on E2E CLOUD. We provide a diverse selection of GPUs, making them a suitable choice for more advanced LLM-based applications.
To get one, head over to MyAccount, and sign up. Then launch a GPU node as is shown in the screenshot below:
Make sure you add your ssh keys during launch, or through the security tab after launching.
Once you have launched a node, you can use VSCode Remote Explorer to ssh into the node and use it as a local development environment.
Now follow these steps:
- Install required libraries:
- Set up the model:
- Define a prompt and stream the input:
- Test the model:
Key Components of Zephyr-7B Beta’s Success
Beyond just its impressive performance, the Zephyr-7B Beta Model is fascinating for how it was trained. Some of the key components that contribute to its success include:
- Fine-tuning of the best small open-source pre-trained model, Mistral 7B.
- Usage of a large-scale preferences dataset, UltraFeedback.
- Replacing Reinforcement Learning (RL) with Direct Preference Optimization (DPO).
- Overfitting on the preference dataset, which surprisingly yields better chat results.
Training Steps
- Distilled Supervised Fine-tuning (dSFT)
- AI Feedback (AIF) collection
- Distilled Direct Preference Optimization (dDPO)
Major Facts about DPO
- DPO results in improved chat model performance through addressing overfitting, as indicated by benchmarks.
- Ablation experiments confirm that SFT and DPO are necessary for the best results.
- Feedback from Zephyr Alpha led to additional filtering for incorrect casing and weirdly prefaced responses.
Performance
Upon its launch, Zephyr-7B-β holds the top position among 7B chat models on both the MT Bench and Alpaca Eval leaderboards.
You can evaluate Zephyr-7B Beta with other language models using Chatbot Arena, LMSYS arena: http://arena.lmsys.org
Zephyr-7B Beta: A Fine-Tuned Marvel
Zephyr-7B-β’s exceptional performance can be attributed to its three-step fine-tuning process:
- Supervised Fine-Tuning: This initial step is crucial for teaching the model to understand and utilize chat templates effectively. Ablation studies have shown that without supervised fine-tuning, the model struggles to generate meaningful and contextually relevant responses.
- AI Feedback: In this step, Zephyr-7B-β goes the extra mile. For each prompt, it generates four different responses using four distinct Large Language Models. Then it employs GPT-4, a powerful sibling model, to rank these responses. This process, based on the UltraFeedback dataset, ensures that only the most relevant and contextually accurate responses are considered. It's like having a team of experts to evaluate and choose the best answer.
- Direct Preference Optimization: Zephyr-7B-β takes its fine-tuning to the next level by overfitting on a preference dataset. This meticulous optimization process leads to improved performance, making the model more efficient at generating responses aligned with user preferences.
Data Quality Matters
To achieve these astounding results, the Zephyr-7B-β team meticulously filtered the data they used. They removed issues related to incorrect casing and unusual sentence starts, ensuring that the model was trained on high-quality, consistent data. This data-cleaning process significantly contributes to the model's impressive performance and its ability to compete with ChatGPT.
Conclusion
The Zephyr-7B Beta model is a remarkable achievement in the field of natural language processing. Its outstanding performance, combined with its small model size, makes it an ideal choice for developers and researchers looking to improve inference times. With its efficient usage on consumer hardware, Zephyr is set to revolutionize the way we interact with large language models, making them faster and more accessible for a wide range of applications. Hugging Face's commitment to openness and alignment with their language model alignment handbook only reinforces the significance of this release in the NLP community.
References
Paper: https://arxiv.org/abs/2310.16944
Model: https://huggingface.co/HuggingFaceH4/zephyr-7b-beta
Demo: https://huggingfaceh4-zephyr-chat.hf.space/
LMSYS arena: http://arena.lmsys.org
Alpaca Eval Benchmarks: https://tatsu-lab.github.io/alpaca_eval/
MT Bench Benchmarks: https://huggingface.co/spaces/lmsys/mt-bench