Zephyr-7B Beta: An Alternative to ChatGPT

November 28, 2023

Introduction

In recent years, AI-powered language models have become a crucial part of various applications, from chatbots to content generation. OpenAI's ChatGPT has been at the forefront of this revolution, but there's a new player in town - Zephyr-7B Beta. This language model, part of the Zephyr series, has outperformed all large language models, including GPT-3.5 Turbo and Llama-70b, and is even competing with GPT-4. The key fact that sets Zephyr apart is its incredible efficiency; it's 25 times smaller than GPT-3.5, making it a game-changer for developers and researchers looking to reduce inference times on large language models.

About Zephyr-7B Beta

The second model in the series, Zephyr-7B-β, is an improved version of mistralai/Mistral-7B-v0.1 that was trained using Direct Preference Optimization (DPO) using a variety of publicly available, synthetic datasets.

Alpaca Eval Leaderboard Triumph

One of the most exciting outcomes of Zephyr-7B-β development is its impressive performance on the Alpaca Eval Leaderboard. By outperforming ChatGPT, Zephyr-7B-β has proven its prowess in generating high-quality, contextually relevant responses to user prompts. Zephyr-7B-β has asserted itself as the leading 7B parameter LLM currently available. In several categories of MT-Bench, Zephyr-7B-β outperforms larger open models like Llama2-70B-chat.

Tutorial - Using Zephyr-7B Beta on E2E Cloud

If you require extra GPU resources for the tutorials ahead, you can explore the offerings on E2E CLOUD. We provide a diverse selection of GPUs, making them a suitable choice for more advanced LLM-based applications.

To get one, head over to MyAccount, and sign up. Then launch a GPU node as is shown in the screenshot below:

‍

Make sure you add your ssh keys during launch, or through the security tab after launching.

Once you have launched a node, you can use VSCode Remote Explorer to ssh into the node and use it as a local development environment.

Now follow these steps:

Install required libraries:


!pip install -q accelerate bitsandbytes gradio transformers langchain xformers einops scipy‍

Set up the model:


import torch
import transformers
from transformers import AutoTokenizer
from transformers import AutoModelForCausalLM
from transformers import TextStreamer
from transformers import pipeline
import gradio


torch.set_default_device('cuda')
model = AutoModelForCausalLM.from_pretrained("HuggingFaceH4/zephyr-7b-beta", torch_dtype="auto", load_in_4bit=True)
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-beta", torch_dtype="auto")

Define a prompt and stream the input:


def run_prompt(user_prompt):
    runtimeFlag = "cuda:0"
    system_prompt = 'You are Jarvis AI, an intelligent assistant dedicated to providing effective solutions. Your replies will incorporate emojis to create a friendly and engaging atmosphere. 😊 Evaluate user queries and offer clear, practical answers, utilizing emojis to enhance the overall user experience. Concentrate on delivering solutions that are precise, actionable, and beneficial. If further details are needed for a more accurate resolution, kindly pose clarifying questions. Your objective is to aid users by furnishing effective and dependable solutions to their inquiries.'
    E_INST = ""
    user, assistant = "<|user|>", "<|assistant|>"
    prompt = f"{system_prompt}{E_INST}\n{user}\n{user_prompt.strip()}{E_INST}\n{assistant}\n"
    inputs = tokenizer([prompt], return_tensors="pt").to(runtimeFlag)
    prompt_streamer = TextStreamer(tokenizer,  skip_prompt=True, skip_special_tokens=True)
    _ = model.generate(**inputs, streamer=prompt_streamer, max_new_tokens=500)

Test the model:

run_prompt("compose an email regarding a one-day delay in product delivery, with the new delivery date set for 10-11-2023.")

Key Components of Zephyr-7B Beta’s Success

Beyond just its impressive performance, the Zephyr-7B Beta Model is fascinating for how it was trained. Some of the key components that contribute to its success include:

Fine-tuning of the best small open-source pre-trained model, Mistral 7B.
Usage of a large-scale preferences dataset, UltraFeedback.
Replacing Reinforcement Learning (RL) with Direct Preference Optimization (DPO).
Overfitting on the preference dataset, which surprisingly yields better chat results.

Training Steps

Distilled Supervised Fine-tuning (dSFT)
AI Feedback (AIF) collection
Distilled Direct Preference Optimization (dDPO)

Major Facts about DPO

DPO results in improved chat model performance through addressing overfitting, as indicated by benchmarks.
Ablation experiments confirm that SFT and DPO are necessary for the best results.
Feedback from Zephyr Alpha led to additional filtering for incorrect casing and weirdly prefaced responses.

Performance

Upon its launch, Zephyr-7B-β holds the top position among 7B chat models on both the MT Bench and Alpaca Eval leaderboards.

‍

You can evaluate Zephyr-7B Beta with other language models using Chatbot Arena, LMSYS arena: http://arena.lmsys.org

Zephyr-7B Beta: A Fine-Tuned Marvel

Zephyr-7B-β’s exceptional performance can be attributed to its three-step fine-tuning process:

Supervised Fine-Tuning: This initial step is crucial for teaching the model to understand and utilize chat templates effectively. Ablation studies have shown that without supervised fine-tuning, the model struggles to generate meaningful and contextually relevant responses.
AI Feedback: In this step, Zephyr-7B-β goes the extra mile. For each prompt, it generates four different responses using four distinct Large Language Models. Then it employs GPT-4, a powerful sibling model, to rank these responses. This process, based on the UltraFeedback dataset, ensures that only the most relevant and contextually accurate responses are considered. It's like having a team of experts to evaluate and choose the best answer.
Direct Preference Optimization: Zephyr-7B-β takes its fine-tuning to the next level by overfitting on a preference dataset. This meticulous optimization process leads to improved performance, making the model more efficient at generating responses aligned with user preferences.

Data Quality Matters

To achieve these astounding results, the Zephyr-7B-β team meticulously filtered the data they used. They removed issues related to incorrect casing and unusual sentence starts, ensuring that the model was trained on high-quality, consistent data. This data-cleaning process significantly contributes to the model's impressive performance and its ability to compete with ChatGPT.

Conclusion

The Zephyr-7B Beta model is a remarkable achievement in the field of natural language processing. Its outstanding performance, combined with its small model size, makes it an ideal choice for developers and researchers looking to improve inference times. With its efficient usage on consumer hardware, Zephyr is set to revolutionize the way we interact with large language models, making them faster and more accessible for a wide range of applications. Hugging Face's commitment to openness and alignment with their language model alignment handbook only reinforces the significance of this release in the NLP community.

References

Paper: https://arxiv.org/abs/2310.16944

Model: https://huggingface.co/HuggingFaceH4/zephyr-7b-beta

Demo: https://huggingfaceh4-zephyr-chat.hf.space/

LMSYS arena: http://arena.lmsys.org

Alpaca Eval Benchmarks: https://tatsu-lab.github.io/alpaca_eval/

MT Bench Benchmarks: https://huggingface.co/spaces/lmsys/mt-bench

Sign up for Free Trial

Latest Blogs

A vector illustration of a tech city using latest cloud technologies & infrastructure

Zephyr-7B Beta: An Alternative to ChatGPT

November 28, 2023

Hady Khamis Khan

Introduction

About Zephyr-7B Beta

Alpaca Eval Leaderboard Triumph

Tutorial - Using Zephyr-7B Beta on E2E Cloud

To get one, head over to MyAccount, and sign up. Then launch a GPU node as is shown in the screenshot below:

‍

Make sure you add your ssh keys during launch, or through the security tab after launching.

Once you have launched a node, you can use VSCode Remote Explorer to ssh into the node and use it as a local development environment.

Now follow these steps:

Install required libraries:


!pip install -q accelerate bitsandbytes gradio transformers langchain xformers einops scipy‍

Set up the model:


import torch
import transformers
from transformers import AutoTokenizer
from transformers import AutoModelForCausalLM
from transformers import TextStreamer
from transformers import pipeline
import gradio


torch.set_default_device('cuda')
model = AutoModelForCausalLM.from_pretrained("HuggingFaceH4/zephyr-7b-beta", torch_dtype="auto", load_in_4bit=True)
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-beta", torch_dtype="auto")

Define a prompt and stream the input:


def run_prompt(user_prompt):
    runtimeFlag = "cuda:0"
    system_prompt = 'You are Jarvis AI, an intelligent assistant dedicated to providing effective solutions. Your replies will incorporate emojis to create a friendly and engaging atmosphere. 😊 Evaluate user queries and offer clear, practical answers, utilizing emojis to enhance the overall user experience. Concentrate on delivering solutions that are precise, actionable, and beneficial. If further details are needed for a more accurate resolution, kindly pose clarifying questions. Your objective is to aid users by furnishing effective and dependable solutions to their inquiries.'
    E_INST = ""
    user, assistant = "<|user|>", "<|assistant|>"
    prompt = f"{system_prompt}{E_INST}\n{user}\n{user_prompt.strip()}{E_INST}\n{assistant}\n"
    inputs = tokenizer([prompt], return_tensors="pt").to(runtimeFlag)
    prompt_streamer = TextStreamer(tokenizer,  skip_prompt=True, skip_special_tokens=True)
    _ = model.generate(**inputs, streamer=prompt_streamer, max_new_tokens=500)

Test the model:

run_prompt("compose an email regarding a one-day delay in product delivery, with the new delivery date set for 10-11-2023.")

Key Components of Zephyr-7B Beta’s Success

Beyond just its impressive performance, the Zephyr-7B Beta Model is fascinating for how it was trained. Some of the key components that contribute to its success include:

Fine-tuning of the best small open-source pre-trained model, Mistral 7B.
Usage of a large-scale preferences dataset, UltraFeedback.
Replacing Reinforcement Learning (RL) with Direct Preference Optimization (DPO).
Overfitting on the preference dataset, which surprisingly yields better chat results.

Training Steps

Distilled Supervised Fine-tuning (dSFT)
AI Feedback (AIF) collection
Distilled Direct Preference Optimization (dDPO)

Major Facts about DPO

DPO results in improved chat model performance through addressing overfitting, as indicated by benchmarks.
Ablation experiments confirm that SFT and DPO are necessary for the best results.
Feedback from Zephyr Alpha led to additional filtering for incorrect casing and weirdly prefaced responses.

Performance

Upon its launch, Zephyr-7B-β holds the top position among 7B chat models on both the MT Bench and Alpaca Eval leaderboards.

‍

You can evaluate Zephyr-7B Beta with other language models using Chatbot Arena, LMSYS arena: http://arena.lmsys.org

Zephyr-7B Beta: A Fine-Tuned Marvel

Zephyr-7B-β’s exceptional performance can be attributed to its three-step fine-tuning process:

Supervised Fine-Tuning: This initial step is crucial for teaching the model to understand and utilize chat templates effectively. Ablation studies have shown that without supervised fine-tuning, the model struggles to generate meaningful and contextually relevant responses.
AI Feedback: In this step, Zephyr-7B-β goes the extra mile. For each prompt, it generates four different responses using four distinct Large Language Models. Then it employs GPT-4, a powerful sibling model, to rank these responses. This process, based on the UltraFeedback dataset, ensures that only the most relevant and contextually accurate responses are considered. It's like having a team of experts to evaluate and choose the best answer.
Direct Preference Optimization: Zephyr-7B-β takes its fine-tuning to the next level by overfitting on a preference dataset. This meticulous optimization process leads to improved performance, making the model more efficient at generating responses aligned with user preferences.

Data Quality Matters

Conclusion

References

Paper: https://arxiv.org/abs/2310.16944

Model: https://huggingface.co/HuggingFaceH4/zephyr-7b-beta

Demo: https://huggingfaceh4-zephyr-chat.hf.space/

LMSYS arena: http://arena.lmsys.org

Alpaca Eval Benchmarks: https://tatsu-lab.github.io/alpaca_eval/

MT Bench Benchmarks: https://huggingface.co/spaces/lmsys/mt-bench

Sign up for Free Trial

Latest Blogs

Zephyr-7B Beta: An Alternative to ChatGPT

Table of Contents

Introduction

About Zephyr-7B Beta

Alpaca Eval Leaderboard Triumph

Tutorial - Using Zephyr-7B Beta on E2E Cloud

Key Components of Zephyr-7B Beta’s Success

Training Steps

Major Facts about DPO

Performance

Zephyr-7B Beta: A Fine-Tuned Marvel

Data Quality Matters

Conclusion

Zephyr-7B Beta: An Alternative to ChatGPT

Table of Contents

Introduction

About Zephyr-7B Beta

Alpaca Eval Leaderboard Triumph

Tutorial - Using Zephyr-7B Beta on E2E Cloud

Key Components of Zephyr-7B Beta’s Success

Training Steps

Major Facts about DPO

Performance

Zephyr-7B Beta: A Fine-Tuned Marvel

Data Quality Matters

Conclusion

What is Retrieval-Augmented Generation (RAG)?

AI Inference vs Training: Understanding Key Differences

Sovereign Cloud: India's Key to Digital Independence in the AI Age

E2E Sovereign Cloud Platform: Revolutionizing Cloud Sovereignty

Top 8 Generative AI Applications in 2025

A Comparison between TIR Containerized VMs vs Traditional VMs

Accelerate Your AI Application Development Using TIR Containerized VMs

The AI Revolution in the Automotive Industry: Steering Toward a Smarter, Safer, and Sustainable Future

How to Build an AI Agent for Personalized Customer Experiences with LangGraph, LangChain and Gradio

Unleash Your AI Creativity at DeepSeek HackAIthon