Kuzu Graph Store (LlamaIndex docs)

kuzu
llamaindex
pyvis
Author
Published

June 26, 2024

Kùzu Graph Store

This notebook walks through configuring Kùzu to be the backend for graph storage in LlamaIndex.

%%capture
%pip install llama-index
%pip install llama-index-llms-openai
%pip install llama-index-graph-stores-kuzu
%pip install pyvis
# My OpenAI Key
import os

os.environ["OPENAI_API_KEY"] = ""

Prepare for Kùzu

# Clean up all the directories used in this notebook
import shutil

shutil.rmtree("./test1", ignore_errors=True)
shutil.rmtree("./test2", ignore_errors=True)
shutil.rmtree("./test3", ignore_errors=True)
import kuzu

db = kuzu.Database("test1")

Using Knowledge Graph with KuzuGraphStore

from llama_index.graph_stores.kuzu import KuzuGraphStore

graph_store = KuzuGraphStore(db)

Building the Knowledge Graph

from llama_index.core import SimpleDirectoryReader, KnowledgeGraphIndex
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings
from IPython.display import Markdown, display
import kuzu
!curl  -LJO https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 75042  100 75042    0     0   276k      0 --:--:-- --:--:-- --:--:--  276k
!mkdir data
!mv paul_graham_essay.txt data/
!ls -ltra data/
total 84
-rw-r--r-- 1 root root 75042 Aug 21 13:49 paul_graham_essay.txt
drwxr-xr-x 1 root root  4096 Aug 21 13:50 ..
drwxr-xr-x 2 root root  4096 Aug 21 13:50 .
documents = SimpleDirectoryReader(
    "data/"
).load_data()
# define LLM

llm = OpenAI(temperature=0, model="gpt-3.5-turbo")
Settings.llm = llm
Settings.chunk_size = 512
from llama_index.core import StorageContext

storage_context = StorageContext.from_defaults(graph_store=graph_store)

# NOTE: can take a while!
index = KnowledgeGraphIndex.from_documents(
    documents,
    max_triplets_per_chunk=2,
    storage_context=storage_context,
)
# # To reload from an existing graph store without recomputing each time, use:
# index = KnowledgeGraphIndex(nodes=[], storage_context=storage_context)

Querying the Knowledge Graph

First, we can query and send only the triplets to the LLM.

query_engine = index.as_query_engine(
    include_text=False, response_mode="tree_summarize"
)
response = query_engine.query(
    "Tell me more about Interleaf",
)
display(Markdown(f"<b>{response}</b>"))

Interleaf was involved in making software and added a scripting language. Additionally, it was also associated with a reason for existence and eventually faced challenges due to Moore’s law.

For more detailed answers, we can also send the text from where the retrieved tripets were extracted.

query_engine = index.as_query_engine(
    include_text=True, response_mode="tree_summarize"
)
response = query_engine.query(
    "Tell me more about Interleaf",
)
display(Markdown(f"<b>{response}</b>"))

Interleaf was a company that made software for creating documents. They added a scripting language inspired by Emacs, which was a dialect of Lisp. The software they created had a specific purpose, which was to allow users to build their own online stores. Despite their impressive technology and smart employees, Interleaf ultimately faced challenges and was impacted by Moore’s Law, leading to its eventual decline.

Query with embeddings

# NOTE: can take a while!
db = kuzu.Database("test2")
graph_store = KuzuGraphStore(db)
storage_context = StorageContext.from_defaults(graph_store=graph_store)
new_index = KnowledgeGraphIndex.from_documents(
    documents,
    max_triplets_per_chunk=2,
    storage_context=storage_context,
    include_embeddings=True,
)
# query using top 3 triplets plus keywords (duplicate triplets are removed)
query_engine = index.as_query_engine(
    include_text=True,
    response_mode="tree_summarize",
    embedding_mode="hybrid",
    similarity_top_k=5,
)
response = query_engine.query(
    "Tell me more about what the author worked on at Interleaf",
)
display(Markdown(f"<b>{response}</b>"))

The author worked at Interleaf, a company that made software for creating documents. Inspired by Emacs, Interleaf added a scripting language that was a dialect of Lisp. The author was hired as a Lisp hacker to write things in this scripting language. However, the author found it challenging to work with the software at Interleaf due to his lack of understanding of C and his reluctance to learn it. Despite this, the author managed to learn some valuable lessons at Interleaf, mostly about what not to do.

Visualizing the Graph

## create graph
from pyvis.network import Network
from IPython.core.display import display, HTML

g = index.get_networkx_graph()
net = Network(notebook=True, cdn_resources="in_line", directed=True)
net.from_nx(g)
net.show("kuzugraph_draw.html")
display(HTML('kuzugraph_draw.html'))
kuzugraph_draw.html

0%

[Optional] Try building the graph and manually add triplets!

from llama_index.core.node_parser import SentenceSplitter
node_parser = SentenceSplitter()
nodes = node_parser.get_nodes_from_documents(documents)
# initialize an empty database
db = kuzu.Database("test3")
graph_store = KuzuGraphStore(db)
storage_context = StorageContext.from_defaults(graph_store=graph_store)
index = KnowledgeGraphIndex(
    [],
    storage_context=storage_context,
)
# add keyword mappings and nodes manually
# add triplets (subject, relationship, object)

# for node 0
node_0_tups = [
    ("author", "worked on", "writing"),
    ("author", "worked on", "programming"),
]
for tup in node_0_tups:
    index.upsert_triplet_and_node(tup, nodes[0])

# for node 1
node_1_tups = [
    ("Interleaf", "made software for", "creating documents"),
    ("Interleaf", "added", "scripting language"),
    ("software", "generate", "web sites"),
]
for tup in node_1_tups:
    index.upsert_triplet_and_node(tup, nodes[1])
query_engine = index.as_query_engine(
    include_text=False, response_mode="tree_summarize"
)
response = query_engine.query(
    "Tell me more about Interleaf",
)
str(response)
'Interleaf was involved in creating documents and also added a scripting language to its software.'