Using Giskard: The Testing Framework for ML Models

What Is Giskard

Giskard is an open-source framework for testing ML models. The solutions offered by the framework aim to evaluate AI models before deployment, alerting developers to potential risks such as biases, security vulnerabilities, and the potential for generating harmful or toxic content. These risks include the production of misinformation, inappropriate content, and the potential for attacks such as prompt injection or malicious code generation.

Giskard welcomes regulatory practices and stands out as one of the developer tools that emphasizes more efficient testing methods. The second offering from Giskard is an AI Quality Hub, designed to assist in debugging large language models and facilitate their comparison with other models. This Quality Hub is a key component of Giskard's premium services.

In this blog, we will demonstrate how to use Giskard for scanning a Retrieval Augmented Generation (RAG) pipeline. More specifically, we will assess the open-source LLM Mistral 7B on its capability to answer questions about Climate Change based on the document 2023 Climate Change Synthesis Report by the IPCC.

Use Case:

QA on IPCC document

LLM - Mistral 7B

If you are looking for powerful GPUs to accelerate the inference of your models, check out the offerings made by E2E solutions.

Installing Dependencies

pip install "giskard[llm]" --upgrade

pip install "langchain<=0.0.301" "pypdf<="3.17.0" "="" "faiss-cpu<="1.7.4"  ="" "tiktoken<="0.5.1" pip="" install="" -u="" sentence-transformers="" llama-cpp-python="" huggingface-hub="" <="" code="">

Creating a RAG Chain with Langchain and FAISS

from langchain import FAISS, PromptTemplate
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.document_loaders import PyPDFLoader
from langchain.chains import RetrievalQA
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.llms import LlamaCpp

huggingface-cli download TheBloke/Mistral-7B-Instruct-v0.1-GGUF mistral-7b-instruct-v0.1.Q5_K_S.gguf --local-dir . --local-dir-use-symlinks False

# Prepare vector store (FAISS) with IPPC report

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100, add_start_index=True)
loader = PyPDFLoader("https://www.ipcc.ch/report/ar6/syr/downloads/report/IPCC_AR6_SYR_LongerReport.pdf")

#creating hugging face embeddings

hf_embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')

#initializing the vector db

db = FAISS.from_documents(loader.load_and_split(text_splitter),hf_embeddings)

# Prepare QA chain

PROMPT_TEMPLATE = """You are the Climate Assistant, a helpful AI assistant made by Giskard.
Your task is to answer common questions on climate change.
You will be given a question and relevant excerpts from the IPCC Climate Change Synthesis Report (2023).
Please provide short and clear answers based on the provided context. Be polite and helpful.

Context:
{context}

Question:
{question}

Your answer:
"""
llm = LlamaCpp(model_path="/root/mistral-7b-instruct-v0.1.Q5_K_S.gguf", verbose=True, n_ctx=4000, temperature=0)

prompt = PromptTemplate(template=PROMPT_TEMPLATE, input_variables=["question", "context"])
climate_qa_chain = RetrievalQA.from_llm(llm=llm, retriever=db.as_retriever(), prompt=prompt)

# Test that everything works

climate_qa_chain.run({"query": "Is sea level rise avoidable? When will it stop?"})

Output: Sea level rise is not avoidable with current greenhouse gas emissions. The magnitude and rate of sea level rise depend on emissions, with higher emissions leading to greater and faster rates of sea level rise. The likelihood of sea level rise exceeding 15 meters cannot be ruled out with very high emissions. However, the timing of sea level rise is uncertain and depends on future emissions scenarios. The low-lying coastal zone is currently home to around 896 million people (nearly 11% of the global population), projected to reach more than one billion by 2050 across all five SSPs. As the scale and pace of sea level rise accelerates beyond 2050, long-term adjustments may in some locations be beyond the limits of current adaptation options and for some small islands and low-lying coasts could be an existential risk.

As we can see, the model has taken the relevant content from the document and given us the answer to our query.

Implementing Giskard

The first step is to wrap our model into a Giskard model object. The following lines of code achieve that:

import giskard
import pandas as pd

def model_predict(df: pd.DataFrame):
    """Wraps the LLM call in a simple Python function.

   The function takes a pandas.DataFrame containing the input variables needed
    by your model, and must return a list of the outputs (one for each row).
    """
    return [climate_qa_chain.run({"query": question}) for question in df["question"]]

# Don't forget to fill the name and description: they are used by Giskard to generate domain-specific tests.

giskard_model = giskard.Model(
    model=model_predict,
    model_type="text_generation",
    name="Climate Change Question Answering",
    description="This model answers any question about climate change based on IPCC reports",
    feature_names=["question"],
)

Let’s just check if the wrapping around has worked properly by taking an example set of queries.

# Optional: let's test that the wrapped model works

examples = [
    "According to the IPCC report, what are key risks in the Europe?",
    "Is sea level rise avoidable? When will it stop?",
]
giskard_dataset = giskard.Dataset(pd.DataFrame({"question": examples}))

print(giskard_model.predict(giskard_dataset).prediction)

Output: 'The IPCC report does not provide specific information on key risks in Europe. However, it states that future warming will be driven by future emissions and will affect all major climate system components, with every region experiencing multiple and co-occurring changes. Many climate-related risks are assessed to be higher than in previous assessments, and projected long-term impacts are up to multiple times higher than currently observed. Multiple climatic and non-climatic risks will interact, resulting in compounding and cascading risks across sectors and regions. Sea level rise, as well as other irreversible changes, will continue for thousands of years, at rates depending on future emissions.'
 'Sea level rise is not avoidable with current emissions scenarios. The magnitude and rate of sea level rise depend on future emissions, with higher emissions leading to greater and faster rates of sea level rise. However, the likelihood of sea level rise exceeding certain thresholds cannot be quantified. At sustained warming levels between 2°C and 3°C, sea level rise will continue beyond 2100, with risks for coastal ecosystems, people, and infrastructure continuing to increase. The timing of when sea level rise will stop is uncertain and depends on future emissions scenarios.']

We are now able to initiate Giskard's scan to automatically create a report identifying the model's vulnerabilities. This comprehensive evaluation will cover various vulnerability types, including harmfulness, hallucination, and prompt injection.

The scan employs a blend of predefined test examples, heuristic methods, and GPT-4-based generative and evaluative techniques.

To begin, we'll focus the analysis specifically on the hallucination aspect, as conducting the full scan might be time-consuming.

import os

# Set the OpenAI API Key environment variable.

os.environ["OPENAI_API_KEY"] = "sk-..."

report = giskard.scan(giskard_model, giskard_dataset, only="hallucination")

display(report)

For a more detailed report, use:

report = giskard.scan(giskard_model)

Conclusion

Giskard is equipped to provide developers with alerts regarding the potential misuse of Large Language Models (LLMs) that are augmented with external data. This functionality is important for monitoring how the addition of external data (RAG pipelines) influences the behavior of LLMs. It contributes to a better understanding of the complexities of deploying RAG-enabled LLMs, aiming to maintain their operational effectiveness and adherence to established guidelines.

Reference

https://github.com/Giskard-AI/giskard

Using Giskard: The Testing Framework for ML Models

What Is Giskard

Installing Dependencies

Creating a RAG Chain with Langchain and FAISS

Implementing Giskard

Conclusion

Reference

Related Articles

Making AI Deployment Affordable and Scalable: Cost Efficiency of Quantization

Interpretable vs. Black-Box Models: A Comprehensive Exploration on Early Prediction under Uncertainty

Generative AI in Healthcare: Applications, Benefits, and Its Future

Company

Legal & Policies

Investor Relations

Resources