Step-by-Step Guide to Building RAG Using Knowledge Graph and LangChain

August 29, 2024

Knowledge Graphs (KGs) are structured representations of knowledge that organize information in the form of queryable graphs. In a knowledge graph, entities such as people, places, things, and concepts are represented as nodes, while relationships between these entities are depicted as edges. Knowledge graphs are particularly valuable for reasoning over complex data.

Thanks to their inherent reasoning capabilities, Knowledge Graphs have become crucial in the world of AI, especially for building systems that require modeling complex relationships within data. They are particularly beneficial in building Retrieval-Augmented Generation (RAG) applications, where, instead of relying solely on vector databases, knowledge graphs are used to index and reason over documents, creating a richer context for large language models (LLMs).

In this step-by-step guide, we'll explore how to create a RAG application using LangChain, integrating knowledge graphs to enhance data retrieval and generation capabilities.

Understanding Knowledge Graphs

Knowledge graphs are structured representations of information that organize data into entities and their relationships, forming a network of interconnected knowledge. This allows for a more natural understanding of how different pieces of information relate to each other, similar to how humans connect concepts.

These graphs are widely used in applications such as search engines, recommendation systems, and natural language processing, as their structured approach to modeling information enhances the accuracy and relevance of results.

Key Components of a Knowledge Graph

  1. Entities (Nodes): These are the objects or concepts in a knowledge graph, such as "Albert Einstein," "Physics," or "Theory of Relativity."
  2. Relationships (Edges): These connect the entities and define how they are related. For example, an edge could represent the relationship "invented by" between "Theory of Relativity" and "Albert Einstein."
  3. Attributes: These are properties or characteristics of entities. For instance, the entity "Albert Einstein" might have attributes like "date of birth" and "occupation."
  4. Ontology: This is a schema that defines the types of entities and relationships in the graph, ensuring consistency in how knowledge is represented.

Knowledge graphs are typically stored and queried using Cypher queries. One key aspect of the Cypher query language is that it is highly readable and explainable, and can be easily understood by both machines and humans. They can also be visualized easily, making them useful for creating explainable AI systems.

The best way to see this is through an example Cypher query around Albert Einstein. 

Below, we will first create ‘nodes’ in a knowledge graph using Cypher queries: 


// Create nodes for Albert Einstein and related entities
CREATE (einstein:Person {name: "Albert Einstein", birthDate: "1879-03-14", placeOfBirth: "Ulm, Germany", occupation: "Theoretical Physicist", nationality: ["German", "Swiss", "American"]})
CREATE (relativity:Theory {name: "Theory of Relativity"})
CREATE (nobel:Award {name: "Nobel Prize in Physics", year: 1921})
CREATE (mileva:Person {name: "Mileva Marić"})
CREATE (princeton:Institution {name: "Princeton University"})
CREATE (photoelectric:Concept {name: "Photoelectric Effect"})
CREATE (newton:Person {name: "Isaac Newton"})

Next we will create relationships between the nodes: 


// Create relationships between Albert Einstein and related entities
CREATE (einstein)-[:DEVELOPED]->(relativity)
CREATE (einstein)-[:AWARDED {year: 1921}]->(nobel)
CREATE (einstein)-[:SPOUSE]->(mileva)
CREATE (einstein)-[:WORKED_AT]->(princeton)
CREATE (einstein)-[:CONTRIBUTED_TO]->(photoelectric)
CREATE (einstein)-[:INFLUENCED_BY]->(newton)

This will result in a graph that looks like the one below: 

This shows the nodes and the relationships. Note that the attributes associated with each node haven't been visualized here. 

How Do Knowledge Graphs Help in RAG 

Retrieval-Augmented Generation (RAG) is a technique that combines information retrieval with generative models to produce more accurate and contextually relevant outputs. 

In a RAG setup, a large language model (LLM) is paired with an information retrieval system that searches a database or document repository for relevant context or knowledge. This retrieved information is then prompted as context to the LLM, which it uses to generate its response. This approach ensures that the output is not only coherent but also grounded in factual and contextually appropriate information, even on data that might not have been part of the LLM’s training dataset. RAG systems, therefore, have emerged as a powerful tactic to leverage an LLM’s capabilities for internal content or knowledge base of a company. 

RAG systems commonly use vector databases for the retrieval of documents. Vector databases work by first converting data into vector embeddings—high-dimensional numerical representations of data—and then using them to perform similarity searches to retrieve relevant information.

However, a significant challenge with vector databases is that these vector embeddings are inherently abstract and difficult for humans to visualize or interpret. This lack of transparency can make it challenging to understand why certain documents were retrieved or how the relationships between different pieces of data were established.

Knowledge Graphs (KGs) offer a distinct advantage in this regard. Unlike vector databases, KGs explicitly model entities and the relationships between them in a more intuitive, graph-based structure that is easier for humans to visualize and understand. This structured representation allows for more semantically rich retrieval and reasoning, enabling the RAG system to not only retrieve relevant documents but also provide contextually meaningful relationships between entities. 

By leveraging the reasoning capabilities of KGs, RAG systems can produce outputs that are not only more accurate and context-aware but also more transparent and easier to interpret, making them particularly valuable in complex domains where understanding the connections between concepts is crucial.

How to Build RAG Using Knowledge Graph

Now that we understand KG-RAG or GraphRAG conceptually, let’s explore the steps to create them. 

To do this, we will use cloud GPU nodes on E2E Cloud. This will allow us to locally deploy the LLM and the knowledge graph, and then build a RAG application. 

Prerequisites

First, sign up to Myaccount on E2E Cloud. Once that’s done, launch a cloud GPU node. If you are using a large LLM like Llama 3.1, use A100 or L4OS. Alternatively, pick a cloud GPU node that works for the LLM of your choosing. Also, add your SSH keys while launching the node. 

Once that’s done, SSH into the node: 


$ ssh root@

You should now create a user using adduser (or useradd) command. 


$ adduser username

Also, give the user sudo permission using visudo.


$ visudo

Add the following line in the file: 


username ALL=(ALL) NOPASSWD:ALL

Deploying Neo4j

We will now deploy Neo4j, which is a powerful graph database (and also includes vector handling capabilities). We will assume Debian distribution. If you are installing in another Linux distribution, follow the steps here

Method 1 - Using Docker

You can use Docker to install Neo4j using the following command. 


docker run \
   --name neo4j \
   -p 7474:7474 -p 7687:7687 \
   -d \
   -e NEO4J_AUTH=neo4j/password \
   -e NEO4J_PLUGINS=\[\"apoc\"\]  \
   neo4j:latest

Method 2 - Using apt

You can also deploy using apt-get in the following way: 


$ sudo add-apt-repository -y ppa:openjdk-r/ppa
$ sudo apt-get update

Now let’s add the repository to our apt list: 


$ wget -O - https://debian.neo4j.com/neotechnology.gpg.key | sudo gpg --dearmor -o /etc/apt/keyrings/neotechnology.gpg
echo 'deb [signed-by=/etc/apt/keyrings/neotechnology.gpg] https://debian.neo4j.com stable latest' | sudo tee -a /etc/apt/sources.list.d/neo4j.list
sudo apt-get update

We can now find out which versions of Neo4j are available using the following command: 


$ apt list -a neo4j

We can pick from the versions listed, and install in the following way: 


$ sudo apt-get install neo4j=1:5.23.0

This will start the Neo4j graph database, which we will use to store the knowledge graph. 

Let’s store the values in a new .env file, which we can use in our code later. 


NEO4J_URI="YOUR_NEO4J_URL"
NEO4J_USERNAME="YOUR_NEO4J_USERNAME"
NEO4J_PASSWORD="YOUR_NEO4J_PASSWORD"

Installing Ollama and LLM

One of the easiest ways to create an LLM endpoint is through TIR. You can follow the steps here to do so.  

However, here we will use Ollama to leverage the same cloud GPU node. Install Ollama like this: 


$ curl -fsSL https://ollama.com/install.sh | sh

Then, you can pull and serve the LLM easily. 


$ ollama pull llama3.1
$ ollama run llama3.1

We can now use the Llama 3.1 model as our LLM.

Installing the Dependencies

Create a workspace folder, and then create a Python virtual environment: 


$ python3 -m venv .env
$ souce .env/bin/activate

Let’s install the dependencies.


$ pip install python-dotenv
$ pip install streamlit
$ pip install langchain
$ pip install langchain-community
$ pip install langchain-ollama

Importing Python Modules

Before getting into the code, let’s import all the libraries and modules that we need.


import os
import streamlit as st
from dotenv import load_dotenv
from langchain_community.graphs import Neo4jGraph
from langchain.chains import GraphCypherQAChain
from langchain_ollama.llms import OllamaLLM

load_dotenv()

Initiating Knowledge Graph and LLM

It's time to initiate the Neo4j knowledge graph and the LLM.


graph = Neo4jGraph(
    url=os.getenv("NEO4J_URI"),
    username=os.getenv("NEO4J_USERNAME"),
    password=os.getenv("NEO4J_PASSWORD"),
)


llm = OllamaLLM(model="llama3.1")

Creating the Knowledge Graph 

As a demonstration of the approach, we will use a CSV. Here are the first 6 rows.

You can download the full CSV here.

To insert the data into the Neo4j database, firstly load the nodes representing each entity (e.g., name, email, location) using Cypher queries.

Then, define the relationships between these entities (e.g.,LIVES_IN) to establish how they are connected within the graph. 

Finally, execute the Cypher queries in Neo4j using the following code that parses the CSV, iterates through the rows, and creates the nodes.


users_query = """
LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/vansh-khaneja/test5/main/amazon_prime_users.csv' AS row

MERGE (person:Person {name: COALESCE(row.`Name`, 'Unknown'), email: COALESCE(row.`Email Address`, 'Unknown Email')})
MERGE (location:Location {name: COALESCE(row.`Location`, 'Unknown Location')})
MERGE (plan:SubscriptionPlan {name: COALESCE(row.`Subscription Plan`, 'Unknown Plan')})
MERGE (deviceNode:Device {name: COALESCE(row.`Devices Used`, 'Unknown Device')})
MERGE (person)-[:USES]->(deviceNode)

MERGE (person)-[:LIVES_IN]->(location)
MERGE (person)-[:SUBSCRIBED_TO]->(plan)
MERGE (person)-[:HAS_USER_ID]->(:UserID {id: COALESCE(row.`User ID`, 'Unknown ID')})
MERGE (person)-[:HAS_USERNAME]->(:Username {name: COALESCE(row.`Username`, 'Unknown Username')})
MERGE (person)-[:HAS_BIRTH_DATE]->(:BirthDate {date: COALESCE(row.`Date of Birth`, 'Unknown Date')})
MERGE (person)-[:HAS_GENDER]->(:Gender {type: COALESCE(row.`Gender`, 'Unknown Gender')})
MERGE (person)-[:MEMBERSHIP_STARTED_ON]->(:MembershipStartDate {date: COALESCE(row.`Membership Start Date`, 'Unknown Start Date')})
MERGE (person)-[:MEMBERSHIP_ENDED_ON]->(:MembershipEndDate {date: COALESCE(row.`Membership End Date`, 'Unknown End Date')})
MERGE (person)-[:HAS_PAYMENT_INFO]->(:PaymentInformation {info: COALESCE(row.`Payment Information`, 'Unknown Payment Info')})
MERGE (person)-[:HAS_RENEWAL_STATUS]->(:RenewalStatus {status: COALESCE(row.`Renewal Status`, 'Unknown Status')})
MERGE (person)-[:HAS_USAGE_FREQUENCY]->(:UsageFrequency {frequency: COALESCE(row.`Usage Frequency`, 'Unknown Frequency')})
MERGE (person)-[:HAS_PURCHASE_HISTORY]->(:PurchaseHistory {history: COALESCE(row.`Purchase History`, 'Unknown History')})
MERGE (person)-[:HAS_FAVORITE_GENRES]->(:FavoriteGenres {genres: COALESCE(row.`Favorite Genres`, 'Unknown Genres')})
MERGE (person)-[:HAS_ENGAGEMENT_METRICS]->(:EngagementMetrics {metrics: COALESCE(row.`Engagement Metrics`, 'Unknown Metrics')})
MERGE (person)-[:GAVE_FEEDBACK]->(:Feedback {ratings: COALESCE(row.`Feedback/Ratings`, 'No Feedback')})
MERGE (person)-[:INTERACTED_WITH_SUPPORT]->(:CustomerSupport {interactions: COALESCE(row.`Customer Support Interactions`, 'No Interactions')})

"""

graph.query(users_query)

This will create a knowledge graph with different relations based on the entity and relations provided above in the Cypher query. You can visualize it as below:

Graph of All the User Data

Graph of annual/monthly subscription purchased.

If you have a piece of unstructured text, you can also use an LLM to generate the Cypher queries to create the knowledge graph. Try it out!

Response Generation

Once the knowledge graph (KG) is created, we can develop a function that takes a user query as input and returns a response. This function will use a language model to generate a Cypher query, which fetches results from Neo4j. The language model then rephrases the answer based on the query and the context provided.

All of this is managed through a predefined chain provided by LangChain.


def response_generation(query):
    chain = GraphCypherQAChain.from_llm(graph=graph, llm=llm, verbose=True)
    answer = chain.run(query + f"return with proper naming conventions what you get as input rephrase it with this {user_query}")
    return answer

User Interface

To make the app more interactive, we will be using Streamlit to create the frontend. Streamlit allows users to input queries, visualize, and interact with the Neo4j database through a simple Python-based web interface.


st.set_page_config(page_title="Knowledge Graph Chatbot", page_icon=":robot_face:")

st.markdown("""

Knowledge Graph Chatbot

""", unsafe_allow_html=True) st.markdown("""

A RAG chatbot created with langchain and knowledge graphs using llama3.

""", unsafe_allow_html=True) st.markdown("
", unsafe_allow_html=True) user_query = st.text_input("Enter your question:", placeholder="E.g., Which devices is uesd by Alice to watch shows?") if st.button("Ask"): bot_response = response_generation(user_query) st.markdown(f"""

Answer :

{bot_response}

""", unsafe_allow_html=True)

Output

This is the final view of our chatbot. When a user enters a query in the input box, the query is converted into a Cypher query, which is then executed against the knowledge graph to retrieve context. This context is passed to a language model, which generates a rephrased response based on the user’s query. Finally, the answer is displayed on the screen.

As you can see, the LLM responds back by leveraging the context data stored in the knowledge graph. 

Conclusion

In conclusion, building a Retrieval-Augmented Generation (RAG) system using knowledge graphs and LangChain offers a powerful approach to enhance information retrieval and generate contextually relevant responses. By using the structured relationships in knowledge graphs and the capabilities of LangChain, you can create applications that not only retrieve information efficiently but also generate accurate and context-aware outputs.

This guide provides an overview of the steps involved, enabling you to implement RAG solutions that meet the demands of modern applications. With these modern techniques, the potential for innovation in natural language processing and data retrieval is vast, providing the way for more intelligent and interactive systems.

To get started with building knowledge graph-powered RAG applications using LLMs, sign up to E2E Cloud today.

Latest Blogs
This is a decorative image for: A Complete Guide To Customer Acquisition For Startups
October 18, 2022

A Complete Guide To Customer Acquisition For Startups

Any business is enlivened by its customers. Therefore, a strategy to constantly bring in new clients is an ongoing requirement. In this regard, having a proper customer acquisition strategy can be of great importance.

So, if you are just starting your business, or planning to expand it, read on to learn more about this concept.

The problem with customer acquisition

As an organization, when working in a diverse and competitive market like India, you need to have a well-defined customer acquisition strategy to attain success. However, this is where most startups struggle. Now, you may have a great product or service, but if you are not in the right place targeting the right demographic, you are not likely to get the results you want.

To resolve this, typically, companies invest, but if that is not channelized properly, it will be futile.

So, the best way out of this dilemma is to have a clear customer acquisition strategy in place.

How can you create the ideal customer acquisition strategy for your business?

  • Define what your goals are

You need to define your goals so that you can meet the revenue expectations you have for the current fiscal year. You need to find a value for the metrics –

  • MRR – Monthly recurring revenue, which tells you all the income that can be generated from all your income channels.
  • CLV – Customer lifetime value tells you how much a customer is willing to spend on your business during your mutual relationship duration.  
  • CAC – Customer acquisition costs, which tells how much your organization needs to spend to acquire customers constantly.
  • Churn rate – It tells you the rate at which customers stop doing business.

All these metrics tell you how well you will be able to grow your business and revenue.

  • Identify your ideal customers

You need to understand who your current customers are and who your target customers are. Once you are aware of your customer base, you can focus your energies in that direction and get the maximum sale of your products or services. You can also understand what your customers require through various analytics and markers and address them to leverage your products/services towards them.

  • Choose your channels for customer acquisition

How will you acquire customers who will eventually tell at what scale and at what rate you need to expand your business? You could market and sell your products on social media channels like Instagram, Facebook and YouTube, or invest in paid marketing like Google Ads. You need to develop a unique strategy for each of these channels. 

  • Communicate with your customers

If you know exactly what your customers have in mind, then you will be able to develop your customer strategy with a clear perspective in mind. You can do it through surveys or customer opinion forms, email contact forms, blog posts and social media posts. After that, you just need to measure the analytics, clearly understand the insights, and improve your strategy accordingly.

Combining these strategies with your long-term business plan will bring results. However, there will be challenges on the way, where you need to adapt as per the requirements to make the most of it. At the same time, introducing new technologies like AI and ML can also solve such issues easily. To learn more about the use of AI and ML and how they are transforming businesses, keep referring to the blog section of E2E Networks.

Reference Links

https://www.helpscout.com/customer-acquisition/

https://www.cloudways.com/blog/customer-acquisition-strategy-for-startups/

https://blog.hubspot.com/service/customer-acquisition

This is a decorative image for: Constructing 3D objects through Deep Learning
October 18, 2022

Image-based 3D Object Reconstruction State-of-the-Art and trends in the Deep Learning Era

3D reconstruction is one of the most complex issues of deep learning systems. There have been multiple types of research in this field, and almost everything has been tried on it — computer vision, computer graphics and machine learning, but to no avail. However, that has resulted in CNN or convolutional neural networks foraying into this field, which has yielded some success.

The Main Objective of the 3D Object Reconstruction

Developing this deep learning technology aims to infer the shape of 3D objects from 2D images. So, to conduct the experiment, you need the following:

  • Highly calibrated cameras that take a photograph of the image from various angles.
  • Large training datasets can predict the geometry of the object whose 3D image reconstruction needs to be done. These datasets can be collected from a database of images, or they can be collected and sampled from a video.

By using the apparatus and datasets, you will be able to proceed with the 3D reconstruction from 2D datasets.

State-of-the-art Technology Used by the Datasets for the Reconstruction of 3D Objects

The technology used for this purpose needs to stick to the following parameters:

  • Input

Training with the help of one or multiple RGB images, where the segmentation of the 3D ground truth needs to be done. It could be one image, multiple images or even a video stream.

The testing will also be done on the same parameters, which will also help to create a uniform, cluttered background, or both.

  • Output

The volumetric output will be done in both high and low resolution, and the surface output will be generated through parameterisation, template deformation and point cloud. Moreover, the direct and intermediate outputs will be calculated this way.

  • Network architecture used

The architecture used in training is 3D-VAE-GAN, which has an encoder and a decoder, with TL-Net and conditional GAN. At the same time, the testing architecture is 3D-VAE, which has an encoder and a decoder.

  • Training used

The degree of supervision used in 2D vs 3D supervision, weak supervision along with loss functions have to be included in this system. The training procedure is adversarial training with joint 2D and 3D embeddings. Also, the network architecture is extremely important for the speed and processing quality of the output images.

  • Practical applications and use cases

Volumetric representations and surface representations can do the reconstruction. Powerful computer systems need to be used for reconstruction.

Given below are some of the places where 3D Object Reconstruction Deep Learning Systems are used:

  • 3D reconstruction technology can be used in the Police Department for drawing the faces of criminals whose images have been procured from a crime site where their faces are not completely revealed.
  • It can be used for re-modelling ruins at ancient architectural sites. The rubble or the debris stubs of structures can be used to recreate the entire building structure and get an idea of how it looked in the past.
  • They can be used in plastic surgery where the organs, face, limbs or any other portion of the body has been damaged and needs to be rebuilt.
  • It can be used in airport security, where concealed shapes can be used for guessing whether a person is armed or is carrying explosives or not.
  • It can also help in completing DNA sequences.

So, if you are planning to implement this technology, then you can rent the required infrastructure from E2E Networks and avoid investing in it. And if you plan to learn more about such topics, then keep a tab on the blog section of the website

Reference Links

https://tongtianta.site/paper/68922

https://github.com/natowi/3D-Reconstruction-with-Deep-Learning-Methods

This is a decorative image for: Comprehensive Guide to Deep Q-Learning for Data Science Enthusiasts
October 18, 2022

A Comprehensive Guide To Deep Q-Learning For Data Science Enthusiasts

For all data science enthusiasts who would love to dig deep, we have composed a write-up about Q-Learning specifically for you all. Deep Q-Learning and Reinforcement learning (RL) are extremely popular these days. These two data science methodologies use Python libraries like TensorFlow 2 and openAI’s Gym environment.

So, read on to know more.

What is Deep Q-Learning?

Deep Q-Learning utilizes the principles of Q-learning, but instead of using the Q-table, it uses the neural network. The algorithm of deep Q-Learning uses the states as input and the optimal Q-value of every action possible as the output. The agent gathers and stores all the previous experiences in the memory of the trained tuple in the following order:

State> Next state> Action> Reward

The neural network training stability increases using a random batch of previous data by using the experience replay. Experience replay also means the previous experiences stocking, and the target network uses it for training and calculation of the Q-network and the predicted Q-Value. This neural network uses openAI Gym, which is provided by taxi-v3 environments.

Now, any understanding of Deep Q-Learning   is incomplete without talking about Reinforcement Learning.

What is Reinforcement Learning?

Reinforcement is a subsection of ML. This part of ML is related to the action in which an environmental agent participates in a reward-based system and uses Reinforcement Learning to maximize the rewards. Reinforcement Learning is a different technique from unsupervised learning or supervised learning because it does not require a supervised input/output pair. The number of corrections is also less, so it is a highly efficient technique.

Now, the understanding of reinforcement learning is incomplete without knowing about Markov Decision Process (MDP). MDP is involved with each state that has been presented in the results of the environment, derived from the state previously there. The information which composes both states is gathered and transferred to the decision process. The task of the chosen agent is to maximize the awards. The MDP optimizes the actions and helps construct the optimal policy.

For developing the MDP, you need to follow the Q-Learning Algorithm, which is an extremely important part of data science and machine learning.

What is Q-Learning Algorithm?

The process of Q-Learning is important for understanding the data from scratch. It involves defining the parameters, choosing the actions from the current state and also choosing the actions from the previous state and then developing a Q-table for maximizing the results or output rewards.

The 4 steps that are involved in Q-Learning:

  1. Initializing parameters – The RL (reinforcement learning) model learns the set of actions that the agent requires in the state, environment and time.
  2. Identifying current state – The model stores the prior records for optimal action definition for maximizing the results. For acting in the present state, the state needs to be identified and perform an action combination for it.
  3. Choosing the optimal action set and gaining the relevant experience – A Q-table is generated from the data with a set of specific states and actions, and the weight of this data is calculated for updating the Q-Table to the following step.
  4. Updating Q-table rewards and next state determination – After the relevant experience is gained and agents start getting environmental records. The reward amplitude helps to present the subsequent step.  

In case the Q-table size is huge, then the generation of the model is a time-consuming process. This situation requires Deep Q-learning.

Hopefully, this write-up has provided an outline of Deep Q-Learning and its related concepts. If you wish to learn more about such topics, then keep a tab on the blog section of the E2E Networks website.

Reference Links

https://analyticsindiamag.com/comprehensive-guide-to-deep-q-learning-for-data-science-enthusiasts/

https://medium.com/@jereminuerofficial/a-comprehensive-guide-to-deep-q-learning-8aeed632f52f

This is a decorative image for: GAUDI: A Neural Architect for Immersive 3D Scene Generation
October 13, 2022

GAUDI: A Neural Architect for Immersive 3D Scene Generation

The evolution of artificial intelligence in the past decade has been staggering, and now the focus is shifting towards AI and ML systems to understand and generate 3D spaces. As a result, there has been extensive research on manipulating 3D generative models. In this regard, Apple’s AI and ML scientists have developed GAUDI, a method specifically for this job.

An introduction to GAUDI

The GAUDI 3D immersive technique founders named it after the famous architect Antoni Gaudi. This AI model takes the help of a camera pose decoder, which enables it to guess the possible camera angles of a scene. Hence, the decoder then makes it possible to predict the 3D canvas from almost every angle.

What does GAUDI do?

GAUDI can perform multiple functions –

  • The extensions of these generative models have a tremendous effect on ML and computer vision. Pragmatically, such models are highly useful. They are applied in model-based reinforcement learning and planning world models, SLAM is s, or 3D content creation.
  • Generative modelling for 3D objects has been used for generating scenes using graf, pigan, and gsn, which incorporate a GAN (Generative Adversarial Network). The generator codes radiance fields exclusively. Using the 3D space in the scene along with the camera pose generates the 3D image from that point. This point has a density scalar and RGB value for that specific point in 3D space. This can be done from a 2D camera view. It does this by imposing 3D datasets on those 2D shots. It isolates various objects and scenes and combines them to render a new scene altogether.
  • GAUDI also removes GANs pathologies like mode collapse and improved GAN.
  • GAUDI also uses this to train data on a canonical coordinate system. You can compare it by looking at the trajectory of the scenes.

How is GAUDI applied to the content?

The steps of application for GAUDI have been given below:

  • Each trajectory is created, which consists of a sequence of posed images (These images are from a 3D scene) encoded into a latent representation. This representation which has a radiance field or what we refer to as the 3D scene and the camera path is created in a disentangled way. The results are interpreted as free parameters. The problem is optimized by and formulation of a reconstruction objective.
  • This simple training process is then scaled to trajectories, thousands of them creating a large number of views. The model samples the radiance fields totally from the previous distribution that the model has learned.
  • The scenes are thus synthesized by interpolation within the hidden space.
  • The scaling of 3D scenes generates many scenes that contain thousands of images. During training, there is no issue related to canonical orientation or mode collapse.
  • A novel de-noising optimization technique is used to find hidden representations that collaborate in modelling the camera poses and the radiance field to create multiple datasets with state-of-the-art performance in generating 3D scenes by building a setup that uses images and text.

To conclude, GAUDI has more capabilities and can also be used for sampling various images and video datasets. Furthermore, this will make a foray into AR (augmented reality) and VR (virtual reality). With GAUDI in hand, the sky is only the limit in the field of media creation. So, if you enjoy reading about the latest development in the field of AI and ML, then keep a tab on the blog section of the E2E Networks website.

Reference Links

https://www.researchgate.net/publication/362323995_GAUDI_A_Neural_Architect_for_Immersive_3D_Scene_Generation

https://www.technology.org/2022/07/31/gaudi-a-neural-architect-for-immersive-3d-scene-generation/ 

https://www.patentlyapple.com/2022/08/apple-has-unveiled-gaudi-a-neural-architect-for-immersive-3d-scene-generation.html

Build on the most powerful infrastructure cloud

A vector illustration of a tech city using latest cloud technologies & infrastructure