What Is Giskard
Giskard is an open-source framework for testing ML models. The solutions offered by the framework aim to evaluate AI models before deployment, alerting developers to potential risks such as biases, security vulnerabilities, and the potential for generating harmful or toxic content. These risks include the production of misinformation, inappropriate content, and the potential for attacks such as prompt injection or malicious code generation.
Giskard welcomes regulatory practices and stands out as one of the developer tools that emphasizes more efficient testing methods. The second offering from Giskard is an AI Quality Hub, designed to assist in debugging large language models and facilitate their comparison with other models. This Quality Hub is a key component of Giskard's premium services.
In this blog, we will demonstrate how to use Giskard for scanning a Retrieval Augmented Generation (RAG) pipeline. More specifically, we will assess the open-source LLM Mistral 7B on its capability to answer questions about Climate Change based on the document 2023 Climate Change Synthesis Report by the IPCC.
Use Case:
QA on IPCC document
LLM - Mistral 7B
If you are looking for powerful GPUs to accelerate the inference of your models, check out the offerings made by E2E solutions.
Installing Dependencies
Creating a RAG Chain with Langchain and FAISS
As we can see, the model has taken the relevant content from the document and given us the answer to our query.
Implementing Giskard
The first step is to wrap our model into a Giskard model object. The following lines of code achieve that:
Let’s just check if the wrapping around has worked properly by taking an example set of queries.
We are now able to initiate Giskard's scan to automatically create a report identifying the model's vulnerabilities. This comprehensive evaluation will cover various vulnerability types, including harmfulness, hallucination, and prompt injection.
The scan employs a blend of predefined test examples, heuristic methods, and GPT-4-based generative and evaluative techniques.
To begin, we'll focus the analysis specifically on the hallucination aspect, as conducting the full scan might be time-consuming.
Conclusion
Giskard is equipped to provide developers with alerts regarding the potential misuse of Large Language Models (LLMs) that are augmented with external data. This functionality is important for monitoring how the addition of external data (RAG pipelines) influences the behavior of LLMs. It contributes to a better understanding of the complexities of deploying RAG-enabled LLMs, aiming to maintain their operational effectiveness and adherence to established guidelines.
Reference
https://github.com/Giskard-AI/giskard