Top 8 Open-Source LLMs for Coding

May 8, 2024

1. Mistral 7B & Mixtral 8X7B

Mistral 7B and Mixtral 8x7B are two open-source language models developed by Mistral AI, both released under the Apache 2.0 license.

Mistral 7B is a 7.3B parameter model that outperforms Llama 2 13B on all benchmarks and even surpasses Llama 1 34B on many tasks. It approaches the performance of CodeLlama 7B on coding tasks while maintaining strong performance in English-language tasks. Mistral 7B uses techniques like Grouped Query Attention (GQA) for faster inference and Sliding Window Attention (SWA) to efficiently handle longer sequences.

Mixtral 8x7B is a larger, 46.7B parameter Sparse Mixture-of-Experts (SMoE) model. Despite its high parameter count, it only uses 12.9B parameters per token, allowing it to process input and generate output at the same speed and cost as much as a 12.9B model. Mixtral 8x7B matches or outperforms Llama 2 70B on most benchmarks.

Both models demonstrate strong performance on coding-related tasks:

1. Mistral 7B approaches the performance of CodeLlama 7B on code generation tasks while maintaining its proficiency in English-language tasks.

2. Mixtral 8x7B shows strong performance in code generation.

The models can be easily fine-tuned for various tasks. For example, Mistral 7B was fine-tuned on publicly available instruction datasets to create Mistral 7B Instruct, which outperforms all 7B models on the MT-Bench benchmark.

‍

Models Available:

- Mistralai/Mistral-7B-Instruct-v0.2

- Mistralai/Mixtral-8x7B-Instruct-v0.1

- Mistralai/Mistral-7B-Instruct-v0.1

- Mistralai/Mixtral-8x7B-v0.1

- Mistralai/Mistral-7B-v0.1

2. CodeLlama

CodeLlama by Meta is a state-of-the-art large language model (LLM) designed for code generation and natural language tasks related to code. It is built on top of Llama 2 and is available in three versions:

1. CodeLlama: The foundational code model.

2. CodeLlama - Python: Specialized for Python programming.

3. CodeLlama - Instruct: Fine-tuned for understanding natural language instructions.

‍

Four sizes of CodeLlama have been released: 7B, 13B, 34B, and 70B parameters. The models are trained on a massive dataset of code and code-related data:

- 7B, 13B, and 34B models are trained on 500B tokens of code and code-related data.

- 70B model is trained on 1T tokens.

The 7B and 13B base and instruct models have also been trained with fill-in-the-middle (FIM) capability, allowing them to insert code into existing code for tasks like code completion.

CodeLlama - Python is further fine-tuned on 100B tokens of Python code, while CodeLlama - Instruct is instruction fine-tuned and aligned to better understand human prompts.

In benchmark tests using HumanEval and Mostly Basic Python Programming (MBPP), CodeLlama outperformed state-of-the-art publicly available LLMs on code tasks. CodeLlama 34B scored 53.7% on HumanEval and 56.2% on MBPP, the highest among open-source solutions.

‍

The models are released under the same community license as Llama 2, and the training recipes and model weights are available on GitHub.

Models Available:

- CodeLlama-34b-Instruct-hf

- CodeLlama-13b-Instruct-hf

- CodeLlama-7b-Instruct-hf

- CodeLlama-70b-Instruct-hf

- CodeLlama-70b-Python-hf

- CodeLlama-70b-hf

- CodeLlama-7b-hf

- CodeLlama-13b-hf

- CodeLlama-34b-hf

- CodeLlama-7b-Python-hf

- CodeLlama-13b-Python-hf

- CodeLlama-34b-Python-hf

‍

3. Phind-CodeLlama

Phind, an AI company, has fine-tuned two models, CodeLlama-34B and CodeLlama-34B-Python, using their internal dataset. The resulting models, named Phind-CodeLlama-34B-v1 and Phind-CodeLlama-34B-Python-v1, have achieved impressive results on the HumanEval benchmark, scoring 67.6% and 69.5% pass@1, respectively.

Phind's dataset consists of approximately 80,000 high-quality programming problems and solutions, structured as instruction-answer pairs rather than code completion examples. The models were trained over two epochs, totaling around 160,000 examples, using native fine-tuning without LoRA. The training process was optimized using DeepSpeed ZeRO 3 and Flash Attention 2, allowing the models to be trained in just three hours using 32 A100-80GB GPUs with a sequence length of 4096 tokens.

To ensure the validity of their results, Phind applied the decontamination methodology to their dataset, which involves sampling substrings from each evaluation example and checking for matches in the processed training examples. No contaminated examples were found in Phind's dataset.

Phind-CodeLlama-34B-v2 is a newer version, which was initialized from Phind-CodeLlama-34B-v1 and trained on an additional 1.5 billion tokens. This new model achieved an even higher score of 73.8% pass@1 on the HumanEval benchmark, further demonstrating the effectiveness of Phind's fine-tuning approach.

Models Available:

- Phind-CodeLlama-34B-v2

- Phind-CodeLlama-34B-v1

- Phind-CodeLlama-34B-Python-v1

4. StarCoder & StarCoder2

StarCoder and StarCoder2 are two large language models developed by the BigCode project, an open scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs).

StarCoder:

- StarCoder is a 15.5B parameter model with an 8K context length, infilling capabilities, and fast large-batch inference enabled by multi-query attention.

- It is built upon StarCoderBase, which was trained on 1 trillion tokens sourced from The Stack, a large collection of permissively licensed GitHub repositories with inspection tools and an opt-out process.

- StarCoder is a fine-tuned version of StarCoderBase, trained on an additional 35B Python tokens.

StarCoder2:

- StarCoder2 is built upon The Stack v2, which is 4× larger than the first StarCoder dataset, in partnership with Software Heritage (SWH).

- The Stack v2 contains over 3B files in 600+ programming and markup languages, derived from the Software Heritage archive.

- StarCoder2 models come in three sizes: 3B, 7B, and 15B parameters, trained on 3.3 to 4.3 trillion tokens.

- StarCoder2-3B outperforms other Code LLMs of similar size on most benchmarks and also outperforms StarCoderBase-15B.

Models Available:

‍

- StarCoder2-15b

- StarCoder2-7b

- StarCoder2-3b

- StarCoder

- StarCoderBase

‍

5. WizardCoder

WizardCoder is a code large language model (LLM) that enhances the open-source StarCoder model through complex instruction fine-tuning using the Evol-Instruct method adapted for code.

The Evol-Instruct method, introduced by WizardLM, is a technique for generating more complex and diverse instruction data to improve the fine-tuning of language models. The key idea is to "evolve" an existing dataset of instructions by iteratively applying various transformations to make the instructions more challenging and varied.

‍

Available Models:

- WizardCoder-Python-34B-V1.0

- WizardCoder-15B-V1.0

- WizardCoder-Python-13B-V1.0

- WizardCoder-Python-7B-V1.0

- WizardCoder-3B-V1.0

- WizardCoder-1B-V1.0

- WizardCoder-33B-V1.1

6. Solar-10.7B

SOLAR 10.7B is a large language model with 10.7 billion parameters that demonstrates strong performance in various natural language processing tasks. The model was initialized from the pretrained weights of Mistral 7B.

For fine-tuning, SOLAR 10.7B underwent a two-stage process: instruction tuning and alignment tuning. The instruction tuning stage utilized mostly open-source datasets such as Alpaca-GPT4, OpenOrca, and a synthetically generated math question-answering dataset called “Synth. Math-Instruct”. In the alignment tuning stage, the model was further fine-tuned using human preference data from datasets like Orca DPO Pairs, Ultrafeedback Cleaned, and a synthesized math alignment dataset called “Synth. Math-Alignment”.

The resulting instruction-tuned and alignment-tuned model, SOLAR 10.7B-Instruct, outperforms larger models like Mixtral 8x7B-Instruct on benchmark tasks, demonstrating the effectiveness of the training approach.

The Economics of Hosting an Open-Source Coding LLM on E2E’s Cloud Server

‍

E2E Networks provides a wide range of cloud computing GPUs to host and inference these high-memory coding LLMs.

To calculate the GPU memory requirements, let’s spin a GPU node on E2E, and then load these models.

We’ll be using a V100 32 GB GPU node for loading the models.

‍

You can install Ollama to run the models. Ollama is a great service to serve and inference AI models locally. It provides super fast speeds.


curl -fsSL https://ollama.com/install.sh | sh

Now let’s run WizardCoder:33b by using the following command.


ollama run wizardcoder:33b

To check the GPU usage, open another terminal and run the following.


nvidia-smi

This is the output we received:

‍

‍

This shows that WizardCoder:33b takes about 20 GB of GPU memory space to be deployed.

Using the above approach we calculated the GPU requirements of various models:

- Mixtral 8X7B: 25 GB

- CodeLlama-70b-Instruct-hf: 30.8 GB

- Phind-CodeLlama-34B-v2: 20 GB

- StarCoder2-15b: 9.51 GB

‍

Now let’s assume that an organization has 1000 developers and the concurrency of requests sent to the LLM is 1%. This would mean that we need at least 10 instances of our deployed LLM so that there is lower latency and queuing of requests. For a team size of 2000 developers we would need 20 instances, and so on.

Based on the GPU requirements as calculated above, we can decide to select the median value, which is roughly 20 GB.

Every instance consumes around 20GB, and for our team of 1000 developers we need 10 instances. So the total memory requirement is about 200GB.

So we would need 8 times V100 - 32GB GPUs, which would give us a total GPU memory of 256 GB. This way we’ll also have extra memory for resource overheads.

E2E Networks offers a 4xV100 GPU node for 1,80,000 INR per month. Since we would be needing 2 of those, the cost would be roughly about 3,60,000 INR per month (if you were using V100).

However, we recommend using H100 for this instead, due to its low latency and top-notch GPU capabilities. The HGX powered 8XH100 Cloud GPU has a total GPU memory of 640GB. So about 30 instances of our model could be launched on this GPU, which can cater to about 3000 developers!

The cost for this series of Cloud GPUs is 20,00,000 INR per month. It comes with 200 CPU cores, a RAM of 1,800 GB, and an SSD storage of 21,000 GB. The system supports a combined memory bandwidth of 24 TB/s. With a remarkable 32 PetaFLOPS of computational power, it represents the most potent accelerated scale-up server platform for artificial intelligence and high-performance computing applications. This cutting-edge hardware enables the efficient processing of complex and demanding workloads, pushing the boundaries of what is possible in these domains.

On the other hand, if you want to reduce cost (and can handle higher latency and delays in response times), you could consider hosting a model with lower GPU requirements like StarCoder2-15B, on a cloud GPU like 4XL4 GPU on E2E networks, which costs about 1,27,000 INR per month. This has a memory capacity of 96 GB, and can easily host 10 instances of StarCoder2-15 B.

References

Refer to this table for a comprehensive comparison of all the available open-source coding LLMs.

Sign up for Free Trial

Latest Blogs

A vector illustration of a tech city using latest cloud technologies & infrastructure

Top 8 Open-Source LLMs for Coding

May 8, 2024

Vardhanam Daga

1. Mistral 7B & Mixtral 8X7B

Mistral 7B and Mixtral 8x7B are two open-source language models developed by Mistral AI, both released under the Apache 2.0 license.

Both models demonstrate strong performance on coding-related tasks:

1. Mistral 7B approaches the performance of CodeLlama 7B on code generation tasks while maintaining its proficiency in English-language tasks.

2. Mixtral 8x7B shows strong performance in code generation.

‍

Models Available:

- Mistralai/Mistral-7B-Instruct-v0.2

- Mistralai/Mixtral-8x7B-Instruct-v0.1

- Mistralai/Mistral-7B-Instruct-v0.1

- Mistralai/Mixtral-8x7B-v0.1

- Mistralai/Mistral-7B-v0.1

2. CodeLlama

1. CodeLlama: The foundational code model.

2. CodeLlama - Python: Specialized for Python programming.

3. CodeLlama - Instruct: Fine-tuned for understanding natural language instructions.

‍

Four sizes of CodeLlama have been released: 7B, 13B, 34B, and 70B parameters. The models are trained on a massive dataset of code and code-related data:

- 7B, 13B, and 34B models are trained on 500B tokens of code and code-related data.

- 70B model is trained on 1T tokens.

The 7B and 13B base and instruct models have also been trained with fill-in-the-middle (FIM) capability, allowing them to insert code into existing code for tasks like code completion.

CodeLlama - Python is further fine-tuned on 100B tokens of Python code, while CodeLlama - Instruct is instruction fine-tuned and aligned to better understand human prompts.

‍

The models are released under the same community license as Llama 2, and the training recipes and model weights are available on GitHub.

Models Available:

- CodeLlama-34b-Instruct-hf

- CodeLlama-13b-Instruct-hf

- CodeLlama-7b-Instruct-hf

- CodeLlama-70b-Instruct-hf

- CodeLlama-70b-Python-hf

- CodeLlama-70b-hf

- CodeLlama-7b-hf

- CodeLlama-13b-hf

- CodeLlama-34b-hf

- CodeLlama-7b-Python-hf

- CodeLlama-13b-Python-hf

- CodeLlama-34b-Python-hf

‍

3. Phind-CodeLlama

Models Available:

- Phind-CodeLlama-34B-v2

- Phind-CodeLlama-34B-v1

- Phind-CodeLlama-34B-Python-v1

4. StarCoder & StarCoder2

StarCoder:

- StarCoder is a 15.5B parameter model with an 8K context length, infilling capabilities, and fast large-batch inference enabled by multi-query attention.

- StarCoder is a fine-tuned version of StarCoderBase, trained on an additional 35B Python tokens.

StarCoder2:

- StarCoder2 is built upon The Stack v2, which is 4× larger than the first StarCoder dataset, in partnership with Software Heritage (SWH).

- The Stack v2 contains over 3B files in 600+ programming and markup languages, derived from the Software Heritage archive.

- StarCoder2 models come in three sizes: 3B, 7B, and 15B parameters, trained on 3.3 to 4.3 trillion tokens.

- StarCoder2-3B outperforms other Code LLMs of similar size on most benchmarks and also outperforms StarCoderBase-15B.

Models Available:

‍

- StarCoder2-15b

- StarCoder2-7b

- StarCoder2-3b

- StarCoder

- StarCoderBase

‍

5. WizardCoder

WizardCoder is a code large language model (LLM) that enhances the open-source StarCoder model through complex instruction fine-tuning using the Evol-Instruct method adapted for code.

‍

Available Models:

- WizardCoder-Python-34B-V1.0

- WizardCoder-15B-V1.0

- WizardCoder-Python-13B-V1.0

- WizardCoder-Python-7B-V1.0

- WizardCoder-3B-V1.0

- WizardCoder-1B-V1.0

- WizardCoder-33B-V1.1

6. Solar-10.7B

The Economics of Hosting an Open-Source Coding LLM on E2E’s Cloud Server

‍

E2E Networks provides a wide range of cloud computing GPUs to host and inference these high-memory coding LLMs.

To calculate the GPU memory requirements, let’s spin a GPU node on E2E, and then load these models.

We’ll be using a V100 32 GB GPU node for loading the models.

‍

You can install Ollama to run the models. Ollama is a great service to serve and inference AI models locally. It provides super fast speeds.


curl -fsSL https://ollama.com/install.sh | sh

Now let’s run WizardCoder:33b by using the following command.


ollama run wizardcoder:33b

To check the GPU usage, open another terminal and run the following.


nvidia-smi

This is the output we received:

‍

This shows that WizardCoder:33b takes about 20 GB of GPU memory space to be deployed.

Using the above approach we calculated the GPU requirements of various models:

- Mixtral 8X7B: 25 GB

- CodeLlama-70b-Instruct-hf: 30.8 GB

- Phind-CodeLlama-34B-v2: 20 GB

- StarCoder2-15b: 9.51 GB

‍

Based on the GPU requirements as calculated above, we can decide to select the median value, which is roughly 20 GB.

Every instance consumes around 20GB, and for our team of 1000 developers we need 10 instances. So the total memory requirement is about 200GB.

So we would need 8 times V100 - 32GB GPUs, which would give us a total GPU memory of 256 GB. This way we’ll also have extra memory for resource overheads.

E2E Networks offers a 4xV100 GPU node for 1,80,000 INR per month. Since we would be needing 2 of those, the cost would be roughly about 3,60,000 INR per month (if you were using V100).

References

Refer to this table for a comprehensive comparison of all the available open-source coding LLMs.

Sign up for Free Trial

Latest Blogs

Top 8 Open-Source LLMs for Coding

Table of Contents

1. Mistral 7B & Mixtral 8X7B

2. CodeLlama

3. Phind-CodeLlama

4. StarCoder & StarCoder2

5. WizardCoder

6. Solar-10.7B

The Economics of Hosting an Open-Source Coding LLM on E2E’s Cloud Server

References

Top 8 Open-Source LLMs for Coding

Table of Contents

1. Mistral 7B & Mixtral 8X7B

2. CodeLlama

3. Phind-CodeLlama

4. StarCoder & StarCoder2

5. WizardCoder

6. Solar-10.7B

The Economics of Hosting an Open-Source Coding LLM on E2E’s Cloud Server

References

How Does RAG Improve the Accuracy of LLM Responses?

Top 10 Cloud GPU Providers in 2025

What is Retrieval-Augmented Generation (RAG)?

AI Inference vs Training: Understanding Key Differences

Sovereign Cloud: India's Key to Digital Independence in the AI Age

E2E Sovereign Cloud Platform: Revolutionizing Cloud Sovereignty

Top 8 Generative AI Applications in 2025

A Comparison between TIR Containerized VMs vs Traditional VMs

Accelerate Your AI Application Development Using TIR Containerized VMs

The AI Revolution in the Automotive Industry: Steering Toward a Smarter, Safer, and Sustainable Future