A Comparative Study of Jina Embeddings vs. Llama Model for Computing Textual Semantic Similarity

January 16, 2024

Akash Mor

Requirements for Initiating a GPU Node on E2E Cloud

Account and Access

E2E Cloud Account: An active E2E Cloud account is a necessity to access the platform and initiate your GPU node. If you haven't created an account yet, the process is straightforward and can be completed through the website.
Billing Information: Ensure that your billing information is current and contains sufficient funds to cover the expenses associated with launching and operating your GPU node.

Technical Requirements

Operating System: Choose the operating system that aligns with your preferences for the GPU node. E2E Cloud provides a range of Linux distributions and Windows Server versions to cater to diverse needs. Consider compatibility with your software and tools when making your selection.
Software Dependencies: Check if your application or workflow requires specific software libraries or dependencies pre-installed on the node. If so, compile a list of these requirements to specify during the configuration of the node.
Network Connectivity: Confirm that your local internet connection can accommodate the bandwidth demands of running applications on a remote GPU node. E2E Cloud offers various network bandwidth options, allowing you to choose the one best suited for your expected data transfer and processing requirements.

Knowledge and Preparation

Basic Cloud Computing Understanding: Acquaint yourself with fundamental cloud computing concepts, including virtual machines, instances, and resource allocation. This familiarity will facilitate your interaction with the E2E Cloud platform.
Security Credentials: Have your SSH key or preferred security credentials ready for accessing your launched GPU node remotely.
Application and Script Preparation: If you intend to run specific applications or scripts on the node, ensure they are prepared and compatible with the chosen operating system and GPU environment.

‍

By fulfilling these prerequisites, you can confidently embark on launching your GPU node on E2E Cloud, unlocking the remarkable potential of accelerated computing for your projects. Remember, meticulous planning and preparation form the bedrock of a successful and fruitful cloud computing experience.

Jina Embeddings

Within this integration, we utilize the robust Jina Embeddings, a text embedding model seamlessly combined with the Hugging Face Transformers library. Jina Embeddings, known as JinaBert, is a specialized embedding model grounded in the Bert architecture, specifically tailored to accommodate English text with a maximum sequence length of 8192 tokens. The model undergoes pre-training on the C4 dataset and subsequent fine-tuning on a meticulously curated set of over 400 million sentence pairs and challenging negatives from diverse domains. This thorough training regimen ensures that the embeddings effectively capture intricate semantic relationships, rendering them indispensable for applications demanding a profound comprehension of text.

Importing Libraries and Defining Cosine Similarity Function


from transformers import AutoModel 
from numpy.linalg import norm

Defining a cosine similarity function using lambda expression
cos_sim = lambda a, b: (a @ b.T) / (norm(a) * norm(b))

In this section, the code includes the essential libraries. The use of AutoModel from the transformers library facilitates the loading of a pre-trained transformer model. The cos_sim function is employed to calculate cosine similarity between two vectors, utilizing the dot product and normalization.

Loading the Pre-Trained Transformer Model


model = AutoModel.from_pretrained("jinaai/jina-embeddings-v2-base-en", trust_remote_code=True)

This line of code loads a pre-trained transformer model named "jinaai/jina-embeddings-v2-base-en". The parameter trust_remote_code=True is specified to guarantee the trustworthiness of the associated remote code for the model.

Generating Embeddings for Sentences


similarity = model.encode(["This is me", "A 2nd sentence"])

The encode method of the model accepts a list of sentences and produces their respective embeddings. In this context, embeddings for two sample sentences are calculated.

Calculating Cosine Similarity


cosine_similarity_score = cos_sim(similarity[0], similarity[1])
print(cosine_similarity_score)

Defining compute_similarity Function


def compute_similarity(sentence1, sentence2):
    embeddings = model.encode([sentence1, sentence2])
    result = cos_sim(embeddings[0], embeddings[1])
    return result

This function receives two sentences as input, generates their embeddings using the loaded model, and subsequently determines their cosine similarity using the cos_sim function. The outcome is then returned as the similarity score between the input sentences.

Example Usages of compute_similarity Function


similarity1 = compute_similarity("I love cricket.", "I like football.")
similarity2 = compute_similarity("I like basketball.", "I like basketball.")
similarity3 = compute_similarity("I like football.", "I don't like football.")

These lines exemplify the application of the compute_similarity function with various pairs of sentences. The obtained similarity scores serve as indicators of the semantic similarity between the corresponding sentence pairs.

Result


from transformers import AutoModel
from numpy.linalg import norm
Define cosine similarity function
cos_sim = lambda a, b: (a @ b.T) / (norm(a) * norm(b))
Load Jina Embeddings model from Hugging Face Transformers
model = AutoModel.from_pretrained("jinaai/jina-embeddings-v2-base-en", trust_remote_code=True)
Encode sentences and compute embeddings
similarity = model.encode(["This is me", "A 2nd sentence"])
Calculate cosine similarity between the embeddings
similarity = cos_sim(embeddings[0], embeddings[1])
print("Cosine Similarity:", similarity)
Output__
Cosine Similarity: 0.7132004

To summarize, this code snippet illustrates the process of loading a pre-trained transformer model, producing sentence embeddings, computing cosine similarity, and encapsulating these steps into a reusable function for comparing the semantic similarity of arbitrary sentences.

Llama 2

The Llama Model, accessible via the Hugging Face Transformers library, provides cutting-edge generative text capabilities. Created by Meta, this model is available in multiple sizes, spanning from 7 billion to 70 billion parameters, thereby facilitating a diverse range of applications in natural language processing. A specialized version, Llama 2-Chat, fine-tuned for dialogue scenarios, surpasses numerous open-source chat models and demonstrates competitive performance against well-known closed-source models.

Importing Libraries and Loading Pre-Trained Llama Model


from transformers import LlamaTokenizer, LlamaForCausalLM
import torch
Load the pre-trained model and tokenizer
model_base_name = "meta-llama/Llama-2-7b-hf"
model = LlamaForCausalLM.from_pretrained(model_base_name)
tokenizer = LlamaTokenizer.from_pretrained(model_base_name)

Within this code snippet, the necessary libraries are imported, and a pre-trained Llama model along with its associated tokenizer are loaded. The variable model_base_name is used to specify the name of the pre-trained model.

Checking Vocabulary Size and Maximum Sequence Length


vocab_size = tokenizer.vocab_size
max_seq_length = model.config.max_position_embeddings
print("Vocabulary Size:", vocab_size)
print("Max Sequence Length:", max_seq_length)

The provided code outputs the vocabulary size and the maximum sequence length permitted by the loaded model. Gaining insights into these values is essential for tokenization and processing the input data effectively.

Modifying Tokenizer for Padding and Special Tokens


tokenizer.add_special_tokens({'pad_token': '[PAD]'})

To manage variable-length sequences, the code includes a padding token in the tokenizer. Special tokens such as [PAD] play a crucial role in ensuring the proper functioning of the model during the tokenization process.

Tokenizing and Preprocessing Input Sentences


sentences = ["This is me", "A 2nd sentence"]
input_ids = tokenizer(sentences, return_tensors='pt', padding=True, truncation=True, max_length=max_seq_length)['input_ids']
input_ids = input_ids.clamp(max=vocab_size - 1)

The Llama tokenizer is employed to tokenize the input sentences. The ensuing input_ids undergo further processing: padding is incorporated, sequences exceeding the specified max_seq_length are truncated, and token IDs are clamped to guarantee they fall within the vocabulary range of the model.

Obtaining Model Outputs (Logits) and Extracting Embeddings


with torch.no_grad():
    outputs = model(input_ids)
Extract hidden states from the base model
hidden_states = outputs.logits
Extract embeddings for [CLS] tokens
cls_embeddings = hidden_states[:, 0, :]

The tokenized input IDs are fed through the Llama model, producing outputs in the form of logits. From these logits, embeddings for the [CLS] tokens are extracted. The [CLS] token conventionally encapsulates a condensed representation of the entire input sequence.

Computing Cosine Similarity


import torch.nn.functional as F
similarity = F.cosine_similarity(cls_embeddings[0].unsqueeze(0), cls_embeddings[1].unsqueeze(0))
print("Cosine Similarity:", similarity.item())

Result


from transformers import LlamaTokenizer, LlamaForCausalLM
import torch
Load Llama Model and tokenizer from Hugging Face Transformers
model_base_name = "meta-llama/Llama-2-7b-hf"
model = LlamaForCausalLM.from_pretrained(model_base_name)
tokenizer = LlamaTokenizer.from_pretrained(model_base_name)
Specify input sentences
sentences = ["This is me", "A 2nd sentence"]
Tokenize the input sentences with padding and truncation
input_ids = tokenizer(sentences, return_tensors='pt', padding=True, truncation=True, max_length=4096)['input_ids']
Ensure token IDs are within the vocabulary range
input_ids = input_ids.clamp(max=tokenizer.vocab_size - 1)
Get model outputs (logits)
with torch.no_grad():
    outputs = model(input_ids)
Extract hidden states from the base model
hidden_states = outputs.logits
Extract embeddings for [CLS] tokens
cls_embeddings = hidden_states[:, 0, :]
Compute cosine similarity using torch.nn.functional.cosine_similarity
similarity = torch.nn.functional.cosine_similarity(cls_embeddings[0].unsqueeze(0), cls_embeddings[1].unsqueeze(0))
print("Cosine Similarity:", similarity.item())
__Output
Loading checkpoint shards: 100%|██████████| 2/2 [01:55<00:00, 0="" 4096="" 32000="" 57.98s="" it]="" vocabulary="" size:="" max="" sequence="" length:="" cosine="" similarity:="" 0.9999995419367911="" process="" finished="" with="" exit="" code="" <="">

By leveraging PyTorch's torch.nn.functional.cosine_similarity, the code calculates the cosine similarity between the [CLS] embeddings of the two input sentences. The outcome serves as an indicator of the semantic similarity between the sentences, where a value close to 1 signifies high similarity.

The resulting output presents the cosine similarity score for the given input sentences, showcasing their semantic relatedness. This code snippet illustrates the procedure of extracting embeddings from a pre-trained Llama model and assessing sentence similarity through cosine similarity computation.

Unpacking the Cosine Similarity Discrepancy

The Notable Contrast in Cosine Similarity Scores

The significant difference in cosine similarity scores, specifically 0.7132 for Jina and 0.9999 for Llama2, when evaluating the sentences "This is me" and "A 2nd sentence," prompts a closer examination. While it's essential to acknowledge that drawing definitive conclusions from a single data point is limited, it underscores the importance of investigating potential reasons for this divergence.

Potential Explanations

Model Focus

Jina: Primarily focuses on capturing nuanced semantic relationships between words and phrases, potentially penalizing the absence of shared vocabulary and semantic connections between the two sentences.
Llama2: A more expansive language model adept at handling intricate language tasks, potentially prioritizing the inherent self-referential nature of "This is me" and overlooking the lack of direct semantic overlap with "A 2nd sentence."

Training Data

Jina: Trained on extensive text corpora specifically emphasizing semantic relationships and contextual understanding, making it more attuned to subtle semantic differences.
Llama2: Trained on a diverse dataset covering various text formats, potentially prone to generalizing from simple self-referential statements, resulting in higher similarity scores even with limited overlap.

Conclusion

In the ever-evolving realm of natural language processing, the fusion of cutting-edge models like Jina Embeddings and the Llama Model with the user-friendly and versatile Hugging Face Transformers opens up avenues for groundbreaking applications. Jina Embeddings, rooted in the robust Bert architecture and refined through the ALiBi variant, provides developers with an opportunity to explore the intricacies of textual semantics. With its capacity for extended sequence lengths and meticulous curation of training data, it becomes a potent tool for tasks such as long document retrieval and semantic textual similarity. The seamless integration with Hugging Face Transformers ensures accessibility, enabling developers to effortlessly leverage the capabilities of this sophisticated model.

On another front, the Llama Model family, particularly Llama 2, showcases the capabilities of generative language models. Trained on extensive corpora and optimized for a variety of dialogue applications, Llama 2 models empower developers to create intelligent virtual assistants, customer support bots, and interactive dialogue systems. Its integration with Hugging Face Transformers simplifies the tokenization process, allowing developers to concentrate on crafting engaging conversations without the complexity of intricate model interactions

Sign up for Free Trial

Latest Blogs

A vector illustration of a tech city using latest cloud technologies & infrastructure

A Comparative Study of Jina Embeddings vs. Llama Model for Computing Textual Semantic Similarity

January 16, 2024

Akash Mor

Requirements for Initiating a GPU Node on E2E Cloud

Account and Access

E2E Cloud Account: An active E2E Cloud account is a necessity to access the platform and initiate your GPU node. If you haven't created an account yet, the process is straightforward and can be completed through the website.
Billing Information: Ensure that your billing information is current and contains sufficient funds to cover the expenses associated with launching and operating your GPU node.

Technical Requirements

Operating System: Choose the operating system that aligns with your preferences for the GPU node. E2E Cloud provides a range of Linux distributions and Windows Server versions to cater to diverse needs. Consider compatibility with your software and tools when making your selection.
Software Dependencies: Check if your application or workflow requires specific software libraries or dependencies pre-installed on the node. If so, compile a list of these requirements to specify during the configuration of the node.
Network Connectivity: Confirm that your local internet connection can accommodate the bandwidth demands of running applications on a remote GPU node. E2E Cloud offers various network bandwidth options, allowing you to choose the one best suited for your expected data transfer and processing requirements.

Knowledge and Preparation

Basic Cloud Computing Understanding: Acquaint yourself with fundamental cloud computing concepts, including virtual machines, instances, and resource allocation. This familiarity will facilitate your interaction with the E2E Cloud platform.
Security Credentials: Have your SSH key or preferred security credentials ready for accessing your launched GPU node remotely.
Application and Script Preparation: If you intend to run specific applications or scripts on the node, ensure they are prepared and compatible with the chosen operating system and GPU environment.

‍

Jina Embeddings

Importing Libraries and Defining Cosine Similarity Function


from transformers import AutoModel 
from numpy.linalg import norm

Defining a cosine similarity function using lambda expression
cos_sim = lambda a, b: (a @ b.T) / (norm(a) * norm(b))

Loading the Pre-Trained Transformer Model


model = AutoModel.from_pretrained("jinaai/jina-embeddings-v2-base-en", trust_remote_code=True)

Generating Embeddings for Sentences


similarity = model.encode(["This is me", "A 2nd sentence"])

The encode method of the model accepts a list of sentences and produces their respective embeddings. In this context, embeddings for two sample sentences are calculated.

Calculating Cosine Similarity


cosine_similarity_score = cos_sim(similarity[0], similarity[1])
print(cosine_similarity_score)

Defining compute_similarity Function


def compute_similarity(sentence1, sentence2):
    embeddings = model.encode([sentence1, sentence2])
    result = cos_sim(embeddings[0], embeddings[1])
    return result

Example Usages of compute_similarity Function


similarity1 = compute_similarity("I love cricket.", "I like football.")
similarity2 = compute_similarity("I like basketball.", "I like basketball.")
similarity3 = compute_similarity("I like football.", "I don't like football.")

Result


from transformers import AutoModel
from numpy.linalg import norm
Define cosine similarity function
cos_sim = lambda a, b: (a @ b.T) / (norm(a) * norm(b))
Load Jina Embeddings model from Hugging Face Transformers
model = AutoModel.from_pretrained("jinaai/jina-embeddings-v2-base-en", trust_remote_code=True)
Encode sentences and compute embeddings
similarity = model.encode(["This is me", "A 2nd sentence"])
Calculate cosine similarity between the embeddings
similarity = cos_sim(embeddings[0], embeddings[1])
print("Cosine Similarity:", similarity)
Output__
Cosine Similarity: 0.7132004

Llama 2

Importing Libraries and Loading Pre-Trained Llama Model


from transformers import LlamaTokenizer, LlamaForCausalLM
import torch
Load the pre-trained model and tokenizer
model_base_name = "meta-llama/Llama-2-7b-hf"
model = LlamaForCausalLM.from_pretrained(model_base_name)
tokenizer = LlamaTokenizer.from_pretrained(model_base_name)

Checking Vocabulary Size and Maximum Sequence Length


vocab_size = tokenizer.vocab_size
max_seq_length = model.config.max_position_embeddings
print("Vocabulary Size:", vocab_size)
print("Max Sequence Length:", max_seq_length)

Modifying Tokenizer for Padding and Special Tokens


tokenizer.add_special_tokens({'pad_token': '[PAD]'})

Tokenizing and Preprocessing Input Sentences


sentences = ["This is me", "A 2nd sentence"]
input_ids = tokenizer(sentences, return_tensors='pt', padding=True, truncation=True, max_length=max_seq_length)['input_ids']
input_ids = input_ids.clamp(max=vocab_size - 1)

Obtaining Model Outputs (Logits) and Extracting Embeddings


with torch.no_grad():
    outputs = model(input_ids)
Extract hidden states from the base model
hidden_states = outputs.logits
Extract embeddings for [CLS] tokens
cls_embeddings = hidden_states[:, 0, :]

Computing Cosine Similarity


import torch.nn.functional as F
similarity = F.cosine_similarity(cls_embeddings[0].unsqueeze(0), cls_embeddings[1].unsqueeze(0))
print("Cosine Similarity:", similarity.item())

Result


from transformers import LlamaTokenizer, LlamaForCausalLM
import torch
Load Llama Model and tokenizer from Hugging Face Transformers
model_base_name = "meta-llama/Llama-2-7b-hf"
model = LlamaForCausalLM.from_pretrained(model_base_name)
tokenizer = LlamaTokenizer.from_pretrained(model_base_name)
Specify input sentences
sentences = ["This is me", "A 2nd sentence"]
Tokenize the input sentences with padding and truncation
input_ids = tokenizer(sentences, return_tensors='pt', padding=True, truncation=True, max_length=4096)['input_ids']
Ensure token IDs are within the vocabulary range
input_ids = input_ids.clamp(max=tokenizer.vocab_size - 1)
Get model outputs (logits)
with torch.no_grad():
    outputs = model(input_ids)
Extract hidden states from the base model
hidden_states = outputs.logits
Extract embeddings for [CLS] tokens
cls_embeddings = hidden_states[:, 0, :]
Compute cosine similarity using torch.nn.functional.cosine_similarity
similarity = torch.nn.functional.cosine_similarity(cls_embeddings[0].unsqueeze(0), cls_embeddings[1].unsqueeze(0))
print("Cosine Similarity:", similarity.item())
__Output
Loading checkpoint shards: 100%|██████████| 2/2 [01:55<00:00, 0="" 4096="" 32000="" 57.98s="" it]="" vocabulary="" size:="" max="" sequence="" length:="" cosine="" similarity:="" 0.9999995419367911="" process="" finished="" with="" exit="" code="" <="">

Unpacking the Cosine Similarity Discrepancy

The Notable Contrast in Cosine Similarity Scores

Potential Explanations

Model Focus

Jina: Primarily focuses on capturing nuanced semantic relationships between words and phrases, potentially penalizing the absence of shared vocabulary and semantic connections between the two sentences.
Llama2: A more expansive language model adept at handling intricate language tasks, potentially prioritizing the inherent self-referential nature of "This is me" and overlooking the lack of direct semantic overlap with "A 2nd sentence."

Training Data

Jina: Trained on extensive text corpora specifically emphasizing semantic relationships and contextual understanding, making it more attuned to subtle semantic differences.
Llama2: Trained on a diverse dataset covering various text formats, potentially prone to generalizing from simple self-referential statements, resulting in higher similarity scores even with limited overlap.

Conclusion

Sign up for Free Trial

Latest Blogs

A Comparative Study of Jina Embeddings vs. Llama Model for Computing Textual Semantic Similarity

Table of Contents

Requirements for Initiating a GPU Node on E2E Cloud

Account and Access

Technical Requirements

Knowledge and Preparation

Jina Embeddings

Importing Libraries and Defining Cosine Similarity Function

Defining a cosine similarity function using lambda expression

Loading the Pre-Trained Transformer Model

Result

Define cosine similarity function

Load Jina Embeddings model from Hugging Face Transformers

Encode sentences and compute embeddings

Calculate cosine similarity between the embeddings

Llama 2

Importing Libraries and Loading Pre-Trained Llama Model

Load the pre-trained model and tokenizer

Checking Vocabulary Size and Maximum Sequence Length

Modifying Tokenizer for Padding and Special Tokens

Tokenizing and Preprocessing Input Sentences

Obtaining Model Outputs (Logits) and Extracting Embeddings

Extract hidden states from the base model

Extract embeddings for [CLS] tokens

Computing Cosine Similarity

Result

Load Llama Model and tokenizer from Hugging Face Transformers

Specify input sentences

Tokenize the input sentences with padding and truncation

Ensure token IDs are within the vocabulary range

Get model outputs (logits)

Extract hidden states from the base model

Extract embeddings for [CLS] tokens

Compute cosine similarity using torch.nn.functional.cosine_similarity

Unpacking the Cosine Similarity Discrepancy

The Notable Contrast in Cosine Similarity Scores

Potential Explanations

Model Focus

Training Data

Conclusion

A Comparative Study of Jina Embeddings vs. Llama Model for Computing Textual Semantic Similarity

Table of Contents

Requirements for Initiating a GPU Node on E2E Cloud

Account and Access

Technical Requirements

Knowledge and Preparation

Jina Embeddings

Importing Libraries and Defining Cosine Similarity Function

Defining a cosine similarity function using lambda expression

Loading the Pre-Trained Transformer Model

Result

Define cosine similarity function

Load Jina Embeddings model from Hugging Face Transformers

Encode sentences and compute embeddings

Calculate cosine similarity between the embeddings

Llama 2

Importing Libraries and Loading Pre-Trained Llama Model

Load the pre-trained model and tokenizer

Checking Vocabulary Size and Maximum Sequence Length

Modifying Tokenizer for Padding and Special Tokens

Tokenizing and Preprocessing Input Sentences

Obtaining Model Outputs (Logits) and Extracting Embeddings

Extract hidden states from the base model

Extract embeddings for [CLS] tokens

Computing Cosine Similarity

Result

Load Llama Model and tokenizer from Hugging Face Transformers

Specify input sentences

Tokenize the input sentences with padding and truncation

Ensure token IDs are within the vocabulary range

Get model outputs (logits)

Extract hidden states from the base model

Extract embeddings for [CLS] tokens

Compute cosine similarity using torch.nn.functional.cosine_similarity

Unpacking the Cosine Similarity Discrepancy

The Notable Contrast in Cosine Similarity Scores

Potential Explanations

Model Focus

Training Data

Conclusion