In this article, we will offer a step-by-step guide on how to train the Falcon-7B model on E2E Cloud.

This is a follow up post to the blog on Mastering Falcon-40B. An experiment was conducted on the E2E cloud and we will walk you through the step-by-step process of how to implement the Falcon-7B Model. Falcon-7B is a smaller version of the same Falcon-40B model, with fewer parameters. It is trained on 1.5 billion tokens and is less expensive to train.
Getting Started
To get started with launching a Falcon 7B-based private LLM, head over to MyAccount on E2E Cloud and signup or login.
Once you have created your account, or logged in, you would need to create a volume space, and then a CPU and GPU node as described below.
Creating Volume
The dataset on which the model is trained is 2TB. So first, we need to create volume. In this particular instance, we will create a volume space of 4TB.

Under Compute menu on the sidebar, click on Volumes.

Click on Add Volume.

Since the dataset itself is 2TB, the hugging face repository might need additional space. So we opt for a 4TB volume.
Once the volume is created, we will be directed to this screen:

Creating a CPU Node
Now create a vCPU node, which has the following specifications:

The node can be created using the following steps. Under Compute, go to Nodes.

Click on Add New Node.

Make sure to click on Linux Smart Dedicated Compute tab, then click on Ubuntu 22.04.

Your node will be created successfully within a couple of minutes.
Attaching the 4 TB Volume
Once the node is running, go to Volumes.

You will notice that you are unable to attach the 4TB volume because the node is running. Under Actions drop down menu of the node, power off the machine and then try attaching the Volume.

Now, when you try, the Volume will be attached successfully.
Mounting the Volume into the Machine
Log in to your machine using terminal via ssh, like this:
$ ssh username@ip_address
You will be logged in successfully. Type in:
lsblk
You will get the following output:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda 253:0 0 30G 0 disk
└─vda1 253:1 0 30G 0 part /
vdb 253:16 0 60G 0 disk
The variables might change in your case.
To create a partition, type in:
parted /dev/vdb
You will get:
GNU Parted 2.3
Using /dev/vdb
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted)
Create a new GPT disklabel:
(parted) mklabel gpt
You will get the following output.
Warning: The existing disk label on /dev/vdb will be destroyed and all data on this disk will be lost. Do you want to continue?
Yes/No? yes
Set the default unit to TB.
(parted) unit TB
To create 4TB partition size, enter:
(parted) mkpart primary 0.00TB 4.00TB
To print the current partition, type in:
(parted) print
You will get the following output:
Model: ATA ST33000651AS (scsi)
Disk /dev/vdb: 4.00TB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Number Start End Size File system Name Flags
1 0.00TB 4.00TB 4.00TB ext4 primary
To quit and save changes, enter:
(parted) quit
You will get an output like this:
Information: You may need to update /etc/fstab.
Type in:
$ mkfs.ext4 /dev/vdb1
Mount dev/vdb1 on a new directory in the /mnt folder, which is in the root directory.
$ mkdir /mnt/data
$ mount /dev/vdb1 /mnt/data
Your disk is now successfully mounted.
Downloading the Data
As always, it is good practice to update and upgrade the machine.
$ sudo apt update & upgrade
Hugging Face repositories require lfs installed because they train on large datasets.
So install lfs using:
$ git lfs install
Before downloading the data, create a screen using:
$ screen -S data_download
Then download the data from this link using.
$ git clone https://huggingface.co/datasets/tiiuae/falcon-refinedweb
It will take approximately 5 hours to download the data. When the download is completed successfully, unmount the disk from the machine using:
$ umount /mnt/data
Shut down the CPU machine, and detach the 4TB volume which now contains the RefinedWeb Dataset, just like you attached the Volume.
Training on GPU Machine
Create a GPU node, just like you created the CPU node. The specifications of the GPU machine are as follows:

Now, mount the 4TB volume just like we did for the CPU machine.
Download the Falcon-7B model repository from here using:
$ git clone https://huggingface.co/tiiuae/falcon-7b
Once it is downloaded, get started with the model by creating a new python file called script.py.
$ touch script.py
Edit it using:
$ nano script.py
A text editor will appear and type in the following:
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch
model = "tiiuae/falcon-7b"
tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto",
)
sequences = pipeline(
"Girafatron is obsessed with giraffes,\
the most glorious animal on the face of this Earth.\
Giraftron believes all other animals are irrelevant\
when compared to the glorious majesty of the giraffe.\
nDaniel: Hello, Girafatron!\nGirafatron:",
max_length=200,
do_sample=True,
top_k=10,
num_return_sequences=1,
eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
print(f"Result: {seq['generated_text']}")
The weights will be downloaded and the following output will be generated:

We can replace parts of the data in the RefinedWeb Dataset in the pipeline like this:
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch
model = "tiiuae/falcon-7b"
data= "path/to/data"
tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto",
)
sequences = pipeline(
data,
max_length=200,
do_sample=True,
top_k=10,
num_return_sequences=1,
eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
print(f"Result: {seq['generated_text']}")
And it will generate the result accordingly.
Closing Thoughts
- As compared to Falcon-40B, the Falcon-7B is less powerful.
- Training this model is easier and faster.
- You can use your own private data to create an interactive chatbot for a specific domain and you can be assured that there will be no data leakage.