Practical Implementation of Falcon-7B on E2E Networks

April 2, 2025

In this article, we will offer a step-by-step guide on how to train the Falcon-7B model on E2E Cloud.

This is a follow up post to the blog on Mastering Falcon-40B. An experiment was conducted on the E2E cloud and we will walk you through the step-by-step process of how to implement the Falcon-7B Model. Falcon-7B is a smaller version of the same Falcon-40B model, with fewer parameters. It is trained on 1.5 billion tokens and is less expensive to train.

Getting Started

To get started with launching a Falcon 7B-based private LLM, head over to MyAccount on E2E Cloud and signup or login.
Once you have created your account, or logged in, you would need to create a volume space, and then a CPU and GPU node as described below.

Creating Volume

The dataset on which the model is trained is 2TB. So first, we need to create volume. In this particular instance, we will create a volume space of 4TB.

‍

‍

Under Compute menu on the sidebar, click on Volumes.

‍

‍

Click on Add Volume.

‍

‍

Since the dataset itself is 2TB, the hugging face repository might need additional space. So we opt for a 4TB volume.

‍

Once the volume is created, we will be directed to this screen:

‍

‍

Creating a CPU Node

Now create a vCPU node, which has the following specifications:

‍

The node can be created using the following steps. Under Compute, go to Nodes.

‍

‍

Click on Add New Node.

‍

‍

Make sure to click on Linux Smart Dedicated Compute tab, then click on Ubuntu 22.04.

‍

‍

Your node will be created successfully within a couple of minutes.

‍

Attaching the 4 TB Volume

Once the node is running, go to Volumes.

‍

‍

You will notice that you are unable to attach the 4TB volume because the node is running. Under Actions drop down menu of the node, power off the machine and then try attaching the Volume.

‍

‍

Now, when you try, the Volume will be attached successfully.

‍

Mounting the Volume into the Machine

‍


$ ssh username@ip_address

You will be logged in successfully. Type in:


lsblk

You will get the following output:


NAME   MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda    253:0    0  30G  0 disk
└─vda1 253:1    0  30G  0 part /
vdb    253:16   0  60G  0 disk

The variables might change in your case.

To create a partition, type in:


parted /dev/vdb

You will get:


GNU Parted 2.3
Using /dev/vdb
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted)

Create a new GPT disklabel:


(parted) mklabel gpt

You will get the following output.


Warning: The existing disk label on /dev/vdb will be destroyed and all data on this disk will be lost. Do you want to continue?
Yes/No? yes

Set the default unit to TB.


(parted) unit TB

To create 4TB partition size, enter:


(parted) mkpart primary 0.00TB 4.00TB

To print the current partition, type in:


(parted) print

You will get the following output:


Model: ATA ST33000651AS (scsi)
Disk /dev/vdb: 4.00TB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start   End     Size    File system  Name     Flags
 1      0.00TB  4.00TB  4.00TB  ext4         primary

To quit and save changes, enter:


(parted) quit

You will get an output like this:


Information: You may need to update /etc/fstab.

Type in:


$ mkfs.ext4 /dev/vdb1

Mount dev/vdb1 on a new directory in the /mnt folder, which is in the root directory.


$ mkdir /mnt/data
$ mount /dev/vdb1 /mnt/data

Your disk is now successfully mounted.

Downloading the Data

As always, it is good practice to update and upgrade the machine.


$ sudo apt update & upgrade

Hugging Face repositories require lfs installed because they train on large datasets.

So install lfs using:


$ git lfs install

Before downloading the data, create a screen using:


$ screen -S data_download

Then download the data from this link using.


$ git clone https://huggingface.co/datasets/tiiuae/falcon-refinedweb

It will take approximately 5 hours to download the data. When the download is completed successfully, unmount the disk from the machine using:


$ umount /mnt/data

Shut down the CPU machine, and detach the 4TB volume which now contains the RefinedWeb Dataset, just like you attached the Volume.

Training on GPU Machine

Create a GPU node, just like you created the CPU node. The specifications of the GPU machine are as follows:

‍

‍

Now, mount the 4TB volume just like we did for the CPU machine.

‍

Download the Falcon-7B model repository from here using:


$ git clone https://huggingface.co/tiiuae/falcon-7b

Once it is downloaded, get started with the model by creating a new python file called script.py.


$ touch script.py

Edit it using:


$ nano script.py

A text editor will appear and type in the following:


from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model = "tiiuae/falcon-7b"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
)
sequences = pipeline(
   "Girafatron is obsessed with giraffes,\
   the most glorious animal on the face of this Earth.\
   Giraftron believes all other animals are irrelevant\
   when compared to the glorious majesty of the giraffe.\
   nDaniel: Hello, Girafatron!\nGirafatron:",
    max_length=200,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

The weights will be downloaded and the following output will be generated:

‍

‍

We can replace parts of the data in the RefinedWeb Dataset in the pipeline like this:


from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model = "tiiuae/falcon-7b"
data= "path/to/data"
tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
)
sequences = pipeline(
    data,
    max_length=200,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

And it will generate the result accordingly.

Closing Thoughts

As compared to Falcon-40B, the Falcon-7B is less powerful.
Training this model is easier and faster.
You can use your own private data to create an interactive chatbot for a specific domain and you can be assured that there will be no data leakage.

Sign up for Free Trial

Latest Blogs