Implementing Graph RAG with NebulaGraph

Graph RAG has been revolutionizing the industry since its joint introduction with NebulaGraph and LlamaIndex in August 2023, capitalizing on the momentum of LLM and RAG (Retrieval-augmented Generation). In this blog post, we will walk you through what Graph RAG is, why it is revolutionizing, and how to build your Graph RAG that draws on data context to answer complex, multi-part questions with NebulaGraph.

Understanding Graph RAG and Its Rising Popularity

This cutting-edge technology has garnered widespread interest due to its capability to harness knowledge graphs alongside LLMs, thereby offering a more comprehensive contextual understanding for search engines. This results in more cost-efficient, intelligent, and precise search outcomes. Beyond providing more accurate responses to build user trust, this technology has surmounted the primary obstacles of traditional search augmentation techniques: the absence of context comprehension and data training. Consequently, these advanced retrieval enhancement techniques are poised to expand their reach into various industries, such as medical diagnosis assistance, content creation and recommendation, interactive gaming, and knowledge graph construction.

Microsoft underscored the importance of Graph RAG in the blog post GraphRAG: Unlocking LLM Discovery on Narrative Private Data, authored by Jonathan Larson, Senior Principal Data Architect, and Steven Truitt, Principal Program Manager. They illustrated the effectiveness of Graph RAG by contrasting it with a baseline RAG system, concluding that the Graph RAG approach enables the LLM to anchor itself in the graph, resulting in a superior answer that includes provenance through original supporting text. Meanwhile, the baseline RAG struggles with queries that necessitate aggregation of information across the dataset to formulate an answer.

Building Your Graph RAG with NebulaGraph

NebulaGraph database has integrated with LLM frameworks such as LlamaIndex and LangChain; you could simply focus on LLM orchestration logic and pipeline design without worrying about complex implementation. You could take a trial to check whether the Graph RAG performs as awesome as it is said with simply 4 steps before exploring further to integrate it into applications or other scenarios.

Note:

As LlamaIndex evolves rapidly, the recent refactor/break change has led to some of the Graph RAG tutorials becoming non-runnable. This blog is here for the updated tutorials of Graph RAG so that you can hands on it without errors.

We have been exploring so many in-house approaches before we upstream them later. We believe the basic concept of Graph + RAG in LlamaIndex is worth exploring, so we put the original workshop and notebooks in the reference chapter.

Set up your NebulaGraph cluster

You could start with any one of the three methods below to quickly set up a NebulaGraph cluster:

start the NebulaGraph cluster via NebulaGraph Lite in Google Colab
deploy the cluster with Docker extension
instantiate a NebulaGraphStore

import os
from llama_index.core import KnowledgeGraphIndex, SimpleDirectoryReader
from llama_index.core import StorageContext
from llama_index.graph_stores.nebula import NebulaGraphStore

os.environ["NEBULA_USER"] = "root"
os.environ["NEBULA_PASSWORD"] = "nebula"
os.environ["NEBULA_ADDRESS"] = "127.0.0.1:9669"


space_name = "paul_graham_essay"
edge_types, rel_prop_names = ["relationship"], ["relationship"]  # default, could be omit if create from an empty kg
tags = ["entity"]  # default, could be omit if create from an empty kg

graph_store = NebulaGraphStore(
    space_name=space_name,
    edge_types=edge_types,
    rel_prop_names=rel_prop_names,
    tags=tags)

storage_context = StorageContext.from_defaults(graph_store=graph_store)

Now, with your NebulaGraph cluster set up, you could hands on the knowledge graph building section. Please make sure to run all the steps below in the Jupyter environment.

Dependencies

The implementation of Graph RAG requires the below dependencies.

pip install llama-index
pip install llama-index-llms-ollama
pip install llama-index-graph-stores-nebula
pip install ipython-ngql
pip install llama-index-embeddings-huggingface

Create a LlamaIndex Knowledge Graph Index

KnowledgeGraphIndex handles automated knowledge graph construction from unstructured text as well as entity-based querying. You need to set up a LlamaIndex Knowledge Graph Index Before creating a Graph RAG query engine. In this example, we will demonstrate how to build KG from textual data by creating a document object before moving on to create an index.

from llama_index.core import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    KnowledgeGraphIndex,
)

graph_store.query("SHOW HOSTS")

Settings.chunk_size = 512

documents = SimpleDirectoryReader(
    "data"
).load_data()

kg_index = KnowledgeGraphIndex.from_documents(
    documents,
    storage_context=storage_context,
    max_triplets_per_chunk=10,
    space_name=space_name,
    edge_types=edge_types,
    rel_prop_names=rel_prop_names,
    tags=tags,
    max_knowledge_sequence=15,
)

Create A Naive Graph RAG Retriever

The last step comes to creating a Graph RAG retriever, on top of which the query engine will be built so that you could query with natural language questions.

from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.retrievers import KnowledgeGraphRAGRetriever

graph_rag_retriever = KnowledgeGraphRAGRetriever(
    storage_context=storage_context,
    verbose=True,
)

query_engine = RetrieverQueryEngine.from_args(
    graph_rag_retriever,
)
response = query_engine.query("What's the relationship between Bob and Alice")

Now, you’ve tried Graph RAG and if you’d like to explore more on how to retrieve via both GraphStore and VectorStore, you could find more demo examples here: https://www.siwei.io/tutors/GraphRAG/101.html.

Join our RAG discussion and let us know how your Graph RAG goes in our Slack channel.

Reference: