Magicoder: An Overview

April 2, 2025

Table of Contents

Now and then, the open-source community releases a new incredible model, a revolutionary dataset, or an improved training method. All of it started with Dolly 2.0 by Databricks. The incredible journey saw the rise of exceptional rivals to GPT like Llama2, eventually reaching multimodal LLMs like LLaVA. The open-source community has never failed to amaze us when we needed them the most, especially when the monopolies decided to adopt a paid model for our favorite AI applications. The journey of open-source LLMs is a testament to the power of collective intelligence and the spirit of sharing knowledge. As we move forward, we can only expect this trend to continue, bringing more advanced and accessible AI tools to the world.

Code LLMs

Besides all kinds of language models that could generally perform any task, there came LLMs which specialized in coding. They were specially trained in large datasets containing code snippets and instructions. These models, with their unique training methods, have opened up new possibilities in the field of automated code generation and have become invaluable tools for developers around the world. Each model adopts its own method of training which makes it unique when compared to counterparts. StarCoder, an open-source language model was trained on the dataset called ‘The Stack’ which contains code and content from GitHub. CodeLlama, another open-source language model, was trained in a ‘fill-in-the-blank’ fashion which included the completion of missing code snippets based on the surrounding code. Despite their different training methods, both models have proven to be highly effective in their respective domains, demonstrating the versatility and potential of LLMs in coding. Both CodeLlama and StarCoder have the ability to generate high-quality code in a matter of seconds and vary on the benchmarks in various tasks they are used for. 

Instruction Tuning

LLM capabilities can be further enhanced by training methods like instruction tuning. An LLM is trained on instruction-response pairs in a supervised fashion. This enhances their user interactivity by following human instructions, as opposed to the typical goal of transformers, which is to predict the next word in a text sequence. There are a handful of instruction tuning methods.

Self-Instruct Tuning

This method was largely aimed to reduce the dependence on human annotators. Huge datasets with instructions and responses can be created effortlessly using this method. The output generated by the LLM is used to create instructional data. The LLM is then fine-tuned on this data. This method has been a valuable tool for developers in automated code generation.

Evol-Instruct Tuning

The evol-instruct was another method used for enhancing LLM capability. It involves gradually developing complex instructions. Starting with an initial instruction set, the data is regenerated in each step to create complex instructions. These are called evolved instructions. This dataset is then used to fine-tune the LLM.

Magicoder

Magicoder is the latest development in the code LLM space, contributed this time by researchers from the University of Illinois and Tsinghua University. Released in December 2023, it laid a new benchmark for open-source LLMs for code. Despite its relatively small parameter size of just 7B, compared to other larger LLMs, Magicoder has outperformed leading code-based large language models in generating text-to-code, particularly for data science programs. Researchers achieved this milestone using yet another instruction tuning method called OSS-Instruct. This method involves open-source code snippets which ensure high-quality low-bias instruction data to fine-tune the LLM.  

OSS-Instruct ensures more diverse and realistic data to finetune the LLM. Unlike other code LLMs, Magicoder is able to produce high standard coding problems and solutions. Currently, the configurations available are Magicoder-DS, Magicoder-S-DS, Magicoder-CL, and Magicoder-S-CL, all with 6.7 B parameters.

OSS-Instruct

OSS-Instruct functions by guiding an LLM to generate a coding problem along with its solution, using a seed code snippet obtained from freely available sources such as GitHub. This seed snippet offers a degree of influence over the generation, encouraging the LLM to create diverse coding problems that are reflective of real-world programming scenarios. This method ensures that the coding problems generated are not only diverse but also practical and applicable, enhancing the learning experience for those using the LLM. Furthermore, the use of real-world code snippets as seeds contributes to the authenticity of the problems, making them more relevant to actual programming situations. 

Here is a detailed prompt design specified in the paper.

Prompt Design in OSS Instruct (Image from paper)

A diverse dataset with 75K instruction data samples is used to finetune Magicoder. Magicoder-OSS-Instruct-75K is generated through OSS-Instruct using gpt-3.5-turbo-1106 and used to train both Magicoder and Magicoder-S variations. Magicoder-Evol-Instruct-110K is another dataset used to perform evol-instruct on the S variations.

Comparison & Performance

Magicoder outperforms variously sized state-of-the-art LLMs across a broad spectrum of coding benchmarks. It shines in tackling Python text-to-code problems, handling coding tasks in various languages, and solving data science-related challenges. Specifically, Magicoder-S-DS-6.7B surpasses GPT-3.5-Turbo and Gemini Ultra on HumanEval, showcasing superior performance. Here are some test results from the paper. Further details are available in the leaderboard.

Overview of OSS-INSTRUCT and the pass@1 results of different LLMs on HumanEval (+) (Image from paper)

With that in mind, let’s delve into the practical aspects of this technology. In this tutorial, let’s explore how we can bring Magicoder to our table.

Prerequisites

The tutorial will be covered using the model and functions from the Hugging Face ecosystem. The model will be fetched from Hugging Face models. Launch a Jupyter notebook on the E2E TIR AI Platform and login using your Hugging Face credentials.


from huggingface_hub import notebook_login


notebook_login()

Inference

We will perform the inference using the Magicoder-CL-7B model. Import the required libraries.


from transformers import pipeline
import torch

Define the prompt structure for Magicoder as shown.


Magicoder_PROMPT = """You are an intelligent coding assistant that delivers accurate and reliable responses to user instructions.


@@ Instruction
{instruction}


@@ Response
"""

@@ Instruction and @@ Response are special tokens used to structure the prompt for the model. They are not actual programming syntax but serve as markers within the text prompt to delineate the input instruction and the expected response.

Now create the prompt.


instruction = """
Create a
Python function that takes in two numbers as arguments and returns their sum. """ prompt = Magicoder_PROMPT.format(instruction=instruction)

Define the pipeline with the parameters as shown.


generator = pipeline(
    model="ise-uiuc/Magicoder-CL-7B",
    task="text-generation",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

task is set to ‘text-generation’ as we need to generate code based on the input prompt. Other configurations include ‘summarization’ and ‘question-answering’ which are not required here. 

Now generate and fetch the response. The maximum token length is set to 1024 and only a single sequence is returned. The temperature parameter adjusts the randomness of the generated text. Feel free to try other values.


result = generator(prompt, max_length=1024, num_return_sequences=1, temperature=0.0)


generated_code = result[0]["generated_text"]
print("Generated Code:")
print(generated_code)

Output: 
@@ Response
 ```python
def add_numbers(num1, num2):
    return num1 + num2
```


The `add_numbers` function takes in two numbers as arguments and returns their sum. This solution accurately fulfills the requirement of adding two numbers and returning the result.

That’s great. As Magicoder is known for advanced coding capabilities, let's try problems with gradual increase in difficulty levels in other programming languages.

Going to a higher level, let us test the model with a tree-based hard LeetCode problem:


instruction = """
Write a C++ function vector sumOfDistancesInTree(int n, vector>& edges) that solves: There is an undirected connected tree with n nodes labeled from 0 to n - 1 and n - 1 edges.
You are given the integer n and the array edges where edges[i] = [ai, bi] indicates that there is an edge between nodes ai and bi in the tree.
Return an array answer of length n where answer[i] is the sum of the distances between the ith node in the tree and all other nodes..
"""




prompt = Magicoder_PROMPT.format(instruction=instruction)

@@ Response
 ```cpp
#include 
#include 
#include 
using namespace std;
vector sumOfDistancesInTree(int n, vector>& edges) {
    vector answer(n);
    vector> adj(n);
    vector distance(n, -1);
    vector parent(n, -1);


    for (auto& edge : edges) {
        int u = edge[0];
        int v = edge[1];
        adj[u].push_back(v);
        adj[v].push_back(u);
    }
    queue q;
    q.push(0);
    distance[0] = 0;
    parent[0] = -1;
    while (!q.empty()) {
        int u = q.front();
        q.pop();


        for (int v : adj[u]) {
            if (distance[v] == -1) {
                distance[v] = distance[u] + 1;
                parent[v] = u;
                q.push(v);
            }
        }
    }
    for (int i = 0; i < n; i++) {
        int sum = 0;
        for (int j = 0; j < n; j++) {
            if (i!= j && distance[j]!= -1) {
                int dist = distance[j] - distance[i];
                sum += dist;
            }
        }
        answer[i] = sum;
    }


    return answer;
}
```

Lastly, let’s evaluate the performance of this LLM in terms of deploying and serving other Large Language Models.


instruction = """
Write an API handler in python that serves a codeLlama LLM using rayserve framework.
"""
prompt = Magicoder_PROMPT.format(instruction=instruction)

Output:
@@ Response
 ```python
import ray
from ray import serve


# Initialize ray
ray.init()


# Define the codeLlama LLM
class CodeLlamaLLM:
    def __init__(self, model_path):
        # Initialize the model or load the pre-trained model
        self.model = load_model(model_path)


    def handle_request(self, input_data):
        # Process the input data and generate the output
        output = self.model.generate(input_data)
        return output


# Initialize rayserve
serve.start()


# Define the API handler
@serve.deployment
class CodeLlamaHandler:
    def __init__(self, model_path):
        self.llm = CodeLlamaLLM(model_path)


    async def __call__(self, request):
        input_data = await request.json()
        return self.llm.handle_request(input_data)


# Deploy the API handler
CodeLlamaHandler.deploy("code_llama_handler", "/code_llama", model_path="path_to_model")


# Example usage
handle = CodeLlamaHandler.get_handle("/code_llama")
input_data = {"input": "example_input"}
result = ray.get(handle.handle_request.remote(input_data))
print(result)
```


In this solution, we define a `CodeLlamaLLM` class that represents the codeLlama LLM. The `handle_request` method processes the input data and generates the output. We then define an API handler `CodeLlamaHandler` using the `@serve.deployment` decorator to serve the LLM. The `__call__` method handles the incoming requests and delegates the processing to the LLM. Finally, we deploy the API handler with the specified route and model path.

The responses show that Magicoder is pretty savvy with programming languages and is even updated with the latest developments in the LLM space.

Wrapping Up

Congratulations! You have learned about Magicoder and how to perform inference on it using Hugging Face endpoints. Language models dedicated to generating code still have substantial room for improvement regarding both the efficiency and quality of the code they produce. Researchers are actively working on addressing these shortcomings by focusing on enhancing open-source datasets' quality and employing novel training techniques. Magicoder was unique enough to achieve remarkable results with just 7B parameters, signaling a promising trajectory for the development of efficient, low-compute LLMs.

The E2E cloud platform serves as an ideal tool for deploying state-of-the-art models like Magicoder into production environments. It offers user-friendly functionalities for training, refining, and deploying code-based Language Models (LLMs). You can create tailored inference endpoints with custom API handlers or utilize pre-existing containers available in the Inference Endpoints section, equipped with ready-to-use API handlers. Depending on the complexity and demands of the models used, adjusting the infrastructure scale might be necessary. I hope you enjoyed this tutorial and found it useful. 

References

Magicoder: Source Code Is All You Need

Official Github repo

Latest Blogs

A vector illustration of a tech city using latest cloud technologies & infrastructure