A Comparative Study of Jina Embeddings vs. Llama Model for Computing Textual Semantic Similarity

Text similarity holds substantial importance in diverse natural language processing applications, including but not limited to search engines, recommendation systems, and chatbots. This article will examine two cutting-edge approaches for measuring text similarity: Jina Embeddings and the Llama Model. The exploration will encompass an in-depth analysis of their fundamental mechanisms and practical implementation utilizing the Hugging Face Transformer. Let's proceed with our investigation.

Requirements for Initiating a GPU Node on E2E Cloud

Account and Access

E2E Cloud Account: An active E2E Cloud account is a necessity to access the platform and initiate your GPU node. If you haven't created an account yet, the process is straightforward and can be completed through the website.
Billing Information: Ensure that your billing information is current and contains sufficient funds to cover the expenses associated with launching and operating your GPU node.

Technical Requirements

Operating System: Choose the operating system that aligns with your preferences for the GPU node. E2E Cloud provides a range of Linux distributions and Windows Server versions to cater to diverse needs. Consider compatibility with your software and tools when making your selection.
Software Dependencies: Check if your application or workflow requires specific software libraries or dependencies pre-installed on the node. If so, compile a list of these requirements to specify during the configuration of the node.
Network Connectivity: Confirm that your local internet connection can accommodate the bandwidth demands of running applications on a remote GPU node. E2E Cloud offers various network bandwidth options, allowing you to choose the one best suited for your expected data transfer and processing requirements.

Knowledge and Preparation

Basic Cloud Computing Understanding: Acquaint yourself with fundamental cloud computing concepts, including virtual machines, instances, and resource allocation. This familiarity will facilitate your interaction with the E2E Cloud platform.
Security Credentials: Have your SSH key or preferred security credentials ready for accessing your launched GPU node remotely.
Application and Script Preparation: If you intend to run specific applications or scripts on the node, ensure they are prepared and compatible with the chosen operating system and GPU environment.

‍

By fulfilling these prerequisites, you can confidently embark on launching your GPU node on E2E Cloud, unlocking the remarkable potential of accelerated computing for your projects. Remember, meticulous planning and preparation form the bedrock of a successful and fruitful cloud computing experience.

Jina Embeddings

Within this integration, we utilize the robust Jina Embeddings, a text embedding model seamlessly combined with the Hugging Face Transformers library. Jina Embeddings, known as JinaBert, is a specialized embedding model grounded in the Bert architecture, specifically tailored to accommodate English text with a maximum sequence length of 8192 tokens. The model undergoes pre-training on the C4 dataset and subsequent fine-tuning on a meticulously curated set of over 400 million sentence pairs and challenging negatives from diverse domains. This thorough training regimen ensures that the embeddings effectively capture intricate semantic relationships, rendering them indispensable for applications demanding a profound comprehension of text.

Importing Libraries and Defining Cosine Similarity Function

from transformers import AutoModel 
from numpy.linalg import norm

# Defining a cosine similarity function using lambda expression

cos_sim = lambda a, b: (a @ b.T) / (norm(a) * norm(b))

In this section, the code includes the essential libraries. The use of AutoModel from the transformers library facilitates the loading of a pre-trained transformer model. The cos_sim function is employed to calculate cosine similarity between two vectors, utilizing the dot product and normalization.

Loading the Pre-Trained Transformer Model

model = AutoModel.from_pretrained("jinaai/jina-embeddings-v2-base-en", trust_remote_code=True)

This line of code loads a pre-trained transformer model named "jinaai/jina-embeddings-v2-base-en". The parameter trust_remote_code=True is specified to guarantee the trustworthiness of the associated remote code for the model.

Generating Embeddings for Sentences

similarity = model.encode(["This is me", "A 2nd sentence"])

The encode method of the model accepts a list of sentences and produces their respective embeddings. In this context, embeddings for two sample sentences are calculated.

Calculating Cosine Similarity

cosine_similarity_score = cos_sim(similarity[0], similarity[1])
print(cosine_similarity_score)

Defining compute_similarity Function

def compute_similarity(sentence1, sentence2):
    embeddings = model.encode([sentence1, sentence2])
    result = cos_sim(embeddings[0], embeddings[1])
    return result

This function receives two sentences as input, generates their embeddings using the loaded model, and subsequently determines their cosine similarity using the cos_sim function. The outcome is then returned as the similarity score between the input sentences.

Example Usages of compute_similarity Function

similarity1 = compute_similarity("I love cricket.", "I like football.")
similarity2 = compute_similarity("I like basketball.", "I like basketball.")
similarity3 = compute_similarity("I like football.", "I don't like football.")

These lines exemplify the application of the compute_similarity function with various pairs of sentences. The obtained similarity scores serve as indicators of the semantic similarity between the corresponding sentence pairs.

Result

from transformers import AutoModel
from numpy.linalg import norm

# Define cosine similarity function

cos_sim = lambda a, b: (a @ b.T) / (norm(a) * norm(b))

# Load Jina Embeddings model from Hugging Face Transformers

model = AutoModel.from_pretrained("jinaai/jina-embeddings-v2-base-en", trust_remote_code=True)

# Encode sentences and compute embeddings

similarity = model.encode(["This is me", "A 2nd sentence"])

# Calculate cosine similarity between the embeddings

similarity = cos_sim(embeddings[0], embeddings[1])
print("Cosine Similarity:", similarity)

Output__

Cosine Similarity: 0.7132004

To summarize, this code snippet illustrates the process of loading a pre-trained transformer model, producing sentence embeddings, computing cosine similarity, and encapsulating these steps into a reusable function for comparing the semantic similarity of arbitrary sentences.

Llama 2

The Llama Model, accessible via the Hugging Face Transformers library, provides cutting-edge generative text capabilities. Created by Meta, this model is available in multiple sizes, spanning from 7 billion to 70 billion parameters, thereby facilitating a diverse range of applications in natural language processing. A specialized version, Llama 2-Chat, fine-tuned for dialogue scenarios, surpasses numerous open-source chat models and demonstrates competitive performance against well-known closed-source models.

Importing Libraries and Loading Pre-Trained Llama Model

from transformers import LlamaTokenizer, LlamaForCausalLM
import torch

# Load the pre-trained model and tokenizer

model_base_name = "meta-llama/Llama-2-7b-hf"
model = LlamaForCausalLM.from_pretrained(model_base_name)
tokenizer = LlamaTokenizer.from_pretrained(model_base_name)

Within this code snippet, the necessary libraries are imported, and a pre-trained Llama model along with its associated tokenizer are loaded. The variable model_base_name is used to specify the name of the pre-trained model.

Checking Vocabulary Size and Maximum Sequence Length

vocab_size = tokenizer.vocab_size
max_seq_length = model.config.max_position_embeddings
print("Vocabulary Size:", vocab_size)
print("Max Sequence Length:", max_seq_length)

The provided code outputs the vocabulary size and the maximum sequence length permitted by the loaded model. Gaining insights into these values is essential for tokenization and processing the input data effectively.

Modifying Tokenizer for Padding and Special Tokens

tokenizer.add_special_tokens({'pad_token': '[PAD]'})

To manage variable-length sequences, the code includes a padding token in the tokenizer. Special tokens such as [PAD] play a crucial role in ensuring the proper functioning of the model during the tokenization process.

Tokenizing and Preprocessing Input Sentences

sentences = ["This is me", "A 2nd sentence"]
input_ids = tokenizer(sentences, return_tensors='pt', padding=True, truncation=True, max_length=max_seq_length)['input_ids']
input_ids = input_ids.clamp(max=vocab_size - 1)

The Llama tokenizer is employed to tokenize the input sentences. The ensuing input_ids undergo further processing: padding is incorporated, sequences exceeding the specified max_seq_length are truncated, and token IDs are clamped to guarantee they fall within the vocabulary range of the model.

Obtaining Model Outputs (Logits) and Extracting Embeddings

with torch.no_grad():
    outputs = model(input_ids)

# Extract hidden states from the base model

hidden_states = outputs.logits

# Extract embeddings for [CLS] tokens

cls_embeddings = hidden_states[:, 0, :]

The tokenized input IDs are fed through the Llama model, producing outputs in the form of logits. From these logits, embeddings for the [CLS] tokens are extracted. The [CLS] token conventionally encapsulates a condensed representation of the entire input sequence.

Computing Cosine Similarity

import torch.nn.functional as F
similarity = F.cosine_similarity(cls_embeddings[0].unsqueeze(0), cls_embeddings[1].unsqueeze(0))
print("Cosine Similarity:", similarity.item())

Result

from transformers import LlamaTokenizer, LlamaForCausalLM
import torch

# Load Llama Model and tokenizer from Hugging Face Transformers

model_base_name = "meta-llama/Llama-2-7b-hf"
model = LlamaForCausalLM.from_pretrained(model_base_name)
tokenizer = LlamaTokenizer.from_pretrained(model_base_name)

# Specify input sentences

sentences = ["This is me", "A 2nd sentence"]

# Tokenize the input sentences with padding and truncation

input_ids = tokenizer(sentences, return_tensors='pt', padding=True, truncation=True, max_length=4096)['input_ids']

# Ensure token IDs are within the vocabulary range

input_ids = input_ids.clamp(max=tokenizer.vocab_size - 1)

# Get model outputs (logits)

with torch.no_grad():
    outputs = model(input_ids)

# Extract hidden states from the base model

hidden_states = outputs.logits

# Extract embeddings for [CLS] tokens

cls_embeddings = hidden_states[:, 0, :]

# Compute cosine similarity using torch.nn.functional.cosine_similarity

similarity = torch.nn.functional.cosine_similarity(cls_embeddings[0].unsqueeze(0), cls_embeddings[1].unsqueeze(0))
print("Cosine Similarity:", similarity.item())

__Output

Loading checkpoint shards: 100%|██████████| 2/2 [01:55<00:00, 0="" 4096="" 32000="" 57.98s="" it]="" vocabulary="" size:="" max="" sequence="" length:="" cosine="" similarity:="" 0.9999995419367911="" process="" finished="" with="" exit="" code="" <="">

By leveraging PyTorch's torch.nn.functional.cosine_similarity, the code calculates the cosine similarity between the [CLS] embeddings of the two input sentences. The outcome serves as an indicator of the semantic similarity between the sentences, where a value close to 1 signifies high similarity.

The resulting output presents the cosine similarity score for the given input sentences, showcasing their semantic relatedness. This code snippet illustrates the procedure of extracting embeddings from a pre-trained Llama model and assessing sentence similarity through cosine similarity computation.

Unpacking the Cosine Similarity Discrepancy

The Notable Contrast in Cosine Similarity Scores

The significant difference in cosine similarity scores, specifically 0.7132 for Jina and 0.9999 for Llama2, when evaluating the sentences "This is me" and "A 2nd sentence," prompts a closer examination. While it's essential to acknowledge that drawing definitive conclusions from a single data point is limited, it underscores the importance of investigating potential reasons for this divergence.

Potential Explanations

Model Focus

Jina: Primarily focuses on capturing nuanced semantic relationships between words and phrases, potentially penalizing the absence of shared vocabulary and semantic connections between the two sentences.
Llama2: A more expansive language model adept at handling intricate language tasks, potentially prioritizing the inherent self-referential nature of "This is me" and overlooking the lack of direct semantic overlap with "A 2nd sentence."

Training Data

Jina: Trained on extensive text corpora specifically emphasizing semantic relationships and contextual understanding, making it more attuned to subtle semantic differences.
Llama2: Trained on a diverse dataset covering various text formats, potentially prone to generalizing from simple self-referential statements, resulting in higher similarity scores even with limited overlap.

Conclusion

In the ever-evolving realm of natural language processing, the fusion of cutting-edge models like Jina Embeddings and the Llama Model with the user-friendly and versatile Hugging Face Transformers opens up avenues for groundbreaking applications. Jina Embeddings, rooted in the robust Bert architecture and refined through the ALiBi variant, provides developers with an opportunity to explore the intricacies of textual semantics. With its capacity for extended sequence lengths and meticulous curation of training data, it becomes a potent tool for tasks such as long document retrieval and semantic textual similarity. The seamless integration with Hugging Face Transformers ensures accessibility, enabling developers to effortlessly leverage the capabilities of this sophisticated model.

On another front, the Llama Model family, particularly Llama 2, showcases the capabilities of generative language models. Trained on extensive corpora and optimized for a variety of dialogue applications, Llama 2 models empower developers to create intelligent virtual assistants, customer support bots, and interactive dialogue systems. Its integration with Hugging Face Transformers simplifies the tokenization process, allowing developers to concentrate on crafting engaging conversations without the complexity of intricate model interactions

A Comparative Study of Jina Embeddings vs. Llama Model for Computing Textual Semantic Similarity

Requirements for Initiating a GPU Node on E2E Cloud

Account and Access

Technical Requirements

Knowledge and Preparation

Jina Embeddings

Importing Libraries and Defining Cosine Similarity Function

Loading the Pre-Trained Transformer Model

Result

Llama 2

Importing Libraries and Loading Pre-Trained Llama Model

Checking Vocabulary Size and Maximum Sequence Length

Modifying Tokenizer for Padding and Special Tokens

Tokenizing and Preprocessing Input Sentences

Obtaining Model Outputs (Logits) and Extracting Embeddings

Computing Cosine Similarity

Result

Unpacking the Cosine Similarity Discrepancy

The Notable Contrast in Cosine Similarity Scores

Potential Explanations

Model Focus

Training Data

Conclusion

Related Articles

Making AI Deployment Affordable and Scalable: Cost Efficiency of Quantization

Interpretable vs. Black-Box Models: A Comprehensive Exploration on Early Prediction under Uncertainty

Generative AI in Healthcare: Applications, Benefits, and Its Future

Company

Legal & Policies

Investor Relations

Resources