Kuzu - Export Query Results to NetworkX

kuzu
networkx
yfiles
Author

Kuzu docs

Published

June 26, 2024

Overview & Goals

One of the overarching goals of Kùzu is to function as the go-to graph database for data science use cases. NetworkX is a popular library in Python for graph algorithms and data science. In this notebook, we demonstrate Kùzu’s ease of use in exporting subgraphs to the NetworkX format using the get_as_networkx() function. In addition, the following two capabilities are demonstrated.

  • Graph Visualization: In the first part, we simply draw the nodes and edges in the results using NetworkX.
  • PageRank: In the second part, we compute PageRank on an extracted subgraph of nodes and edges; store these values back in Kùzu’s node tables and query them.

MovieLens Dataset

We will be working on the popular MovieLens dataset from GroupLens. The schema of the dataset is illustrated as below:

We use the small version of the dataset, which contains 610 user nodes, 9724 movie nodes, 100863 rates edges, and 3684 tags edges.

Necessary Package Installations and Imports

!pip install kuzu scipy networkx pandas
!pip install matplotlib ipywidgets yfiles-jupyter-graphs
Requirement already satisfied: kuzu in /usr/local/lib/python3.10/dist-packages (0.3.1)
Requirement already satisfied: scipy in /usr/local/lib/python3.10/dist-packages (1.11.4)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (3.2.1)
Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (1.5.3)
Requirement already satisfied: numpy<1.28.0,>=1.21.6 in /usr/local/lib/python3.10/dist-packages (from scipy) (1.25.2)
Requirement already satisfied: python-dateutil>=2.8.1 in /usr/local/lib/python3.10/dist-packages (from pandas) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas) (2023.4)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.1->pandas) (1.16.0)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.10/dist-packages (3.7.1)
Requirement already satisfied: ipywidgets in /usr/local/lib/python3.10/dist-packages (7.7.1)
Requirement already satisfied: yfiles-jupyter-graphs in /usr/local/lib/python3.10/dist-packages (1.6.1)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (1.2.0)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (4.49.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (1.4.5)
Requirement already satisfied: numpy>=1.20 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (1.25.2)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (23.2)
Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (9.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (3.1.1)
Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (2.8.2)
Requirement already satisfied: ipykernel>=4.5.1 in /usr/local/lib/python3.10/dist-packages (from ipywidgets) (5.5.6)
Requirement already satisfied: ipython-genutils~=0.2.0 in /usr/local/lib/python3.10/dist-packages (from ipywidgets) (0.2.0)
Requirement already satisfied: traitlets>=4.3.1 in /usr/local/lib/python3.10/dist-packages (from ipywidgets) (5.7.1)
Requirement already satisfied: widgetsnbextension~=3.6.0 in /usr/local/lib/python3.10/dist-packages (from ipywidgets) (3.6.6)
Requirement already satisfied: ipython>=4.0.0 in /usr/local/lib/python3.10/dist-packages (from ipywidgets) (7.34.0)
Requirement already satisfied: jupyterlab-widgets>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from ipywidgets) (3.0.10)
Requirement already satisfied: jupyter-client in /usr/local/lib/python3.10/dist-packages (from ipykernel>=4.5.1->ipywidgets) (6.1.12)
Requirement already satisfied: tornado>=4.2 in /usr/local/lib/python3.10/dist-packages (from ipykernel>=4.5.1->ipywidgets) (6.3.3)
Requirement already satisfied: setuptools>=18.5 in /usr/local/lib/python3.10/dist-packages (from ipython>=4.0.0->ipywidgets) (67.7.2)
Requirement already satisfied: jedi>=0.16 in /usr/local/lib/python3.10/dist-packages (from ipython>=4.0.0->ipywidgets) (0.19.1)
Requirement already satisfied: decorator in /usr/local/lib/python3.10/dist-packages (from ipython>=4.0.0->ipywidgets) (4.4.2)
Requirement already satisfied: pickleshare in /usr/local/lib/python3.10/dist-packages (from ipython>=4.0.0->ipywidgets) (0.7.5)
Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from ipython>=4.0.0->ipywidgets) (3.0.43)
Requirement already satisfied: pygments in /usr/local/lib/python3.10/dist-packages (from ipython>=4.0.0->ipywidgets) (2.16.1)
Requirement already satisfied: backcall in /usr/local/lib/python3.10/dist-packages (from ipython>=4.0.0->ipywidgets) (0.2.0)
Requirement already satisfied: matplotlib-inline in /usr/local/lib/python3.10/dist-packages (from ipython>=4.0.0->ipywidgets) (0.1.6)
Requirement already satisfied: pexpect>4.3 in /usr/local/lib/python3.10/dist-packages (from ipython>=4.0.0->ipywidgets) (4.9.0)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.7->matplotlib) (1.16.0)
Requirement already satisfied: notebook>=4.4.1 in /usr/local/lib/python3.10/dist-packages (from widgetsnbextension~=3.6.0->ipywidgets) (6.5.5)
Requirement already satisfied: parso<0.9.0,>=0.8.3 in /usr/local/lib/python3.10/dist-packages (from jedi>=0.16->ipython>=4.0.0->ipywidgets) (0.8.3)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (3.1.3)
Requirement already satisfied: pyzmq<25,>=17 in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (23.2.1)
Requirement already satisfied: argon2-cffi in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (23.1.0)
Requirement already satisfied: jupyter-core>=4.6.1 in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (5.7.1)
Requirement already satisfied: nbformat in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (5.9.2)
Requirement already satisfied: nbconvert>=5 in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (6.5.4)
Requirement already satisfied: nest-asyncio>=1.5 in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (1.6.0)
Requirement already satisfied: Send2Trash>=1.8.0 in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (1.8.2)
Requirement already satisfied: terminado>=0.8.3 in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (0.18.0)
Requirement already satisfied: prometheus-client in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (0.20.0)
Requirement already satisfied: nbclassic>=0.4.7 in /usr/local/lib/python3.10/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (1.0.0)
Requirement already satisfied: ptyprocess>=0.5 in /usr/local/lib/python3.10/dist-packages (from pexpect>4.3->ipython>=4.0.0->ipywidgets) (0.7.0)
Requirement already satisfied: wcwidth in /usr/local/lib/python3.10/dist-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython>=4.0.0->ipywidgets) (0.2.13)
Requirement already satisfied: platformdirs>=2.5 in /usr/local/lib/python3.10/dist-packages (from jupyter-core>=4.6.1->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (4.2.0)
Requirement already satisfied: jupyter-server>=1.8 in /usr/local/lib/python3.10/dist-packages (from nbclassic>=0.4.7->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (1.24.0)
Requirement already satisfied: notebook-shim>=0.2.3 in /usr/local/lib/python3.10/dist-packages (from nbclassic>=0.4.7->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (0.2.4)
Requirement already satisfied: lxml in /usr/local/lib/python3.10/dist-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (4.9.4)
Requirement already satisfied: beautifulsoup4 in /usr/local/lib/python3.10/dist-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (4.12.3)
Requirement already satisfied: bleach in /usr/local/lib/python3.10/dist-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (6.1.0)
Requirement already satisfied: defusedxml in /usr/local/lib/python3.10/dist-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (0.7.1)
Requirement already satisfied: entrypoints>=0.2.2 in /usr/local/lib/python3.10/dist-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (0.4)
Requirement already satisfied: jupyterlab-pygments in /usr/local/lib/python3.10/dist-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (0.3.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (2.1.5)
Requirement already satisfied: mistune<2,>=0.8.1 in /usr/local/lib/python3.10/dist-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (0.8.4)
Requirement already satisfied: nbclient>=0.5.0 in /usr/local/lib/python3.10/dist-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (0.9.0)
Requirement already satisfied: pandocfilters>=1.4.1 in /usr/local/lib/python3.10/dist-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (1.5.1)
Requirement already satisfied: tinycss2 in /usr/local/lib/python3.10/dist-packages (from nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (1.2.1)
Requirement already satisfied: fastjsonschema in /usr/local/lib/python3.10/dist-packages (from nbformat->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (2.19.1)
Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.10/dist-packages (from nbformat->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (4.19.2)
Requirement already satisfied: argon2-cffi-bindings in /usr/local/lib/python3.10/dist-packages (from argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (21.2.0)
Requirement already satisfied: attrs>=22.2.0 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=2.6->nbformat->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (23.2.0)
Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=2.6->nbformat->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (2023.12.1)
Requirement already satisfied: referencing>=0.28.4 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=2.6->nbformat->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (0.33.0)
Requirement already satisfied: rpds-py>=0.7.1 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=2.6->nbformat->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (0.18.0)
Requirement already satisfied: anyio<4,>=3.1.0 in /usr/local/lib/python3.10/dist-packages (from jupyter-server>=1.8->nbclassic>=0.4.7->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (3.7.1)
Requirement already satisfied: websocket-client in /usr/local/lib/python3.10/dist-packages (from jupyter-server>=1.8->nbclassic>=0.4.7->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (1.7.0)
Requirement already satisfied: cffi>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from argon2-cffi-bindings->argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (1.16.0)
Requirement already satisfied: soupsieve>1.2 in /usr/local/lib/python3.10/dist-packages (from beautifulsoup4->nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (2.5)
Requirement already satisfied: webencodings in /usr/local/lib/python3.10/dist-packages (from bleach->nbconvert>=5->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (0.5.1)
Requirement already satisfied: idna>=2.8 in /usr/local/lib/python3.10/dist-packages (from anyio<4,>=3.1.0->jupyter-server>=1.8->nbclassic>=0.4.7->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (3.6)
Requirement already satisfied: sniffio>=1.1 in /usr/local/lib/python3.10/dist-packages (from anyio<4,>=3.1.0->jupyter-server>=1.8->nbclassic>=0.4.7->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (1.3.1)
Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio<4,>=3.1.0->jupyter-server>=1.8->nbclassic>=0.4.7->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (1.2.0)
Requirement already satisfied: pycparser in /usr/local/lib/python3.10/dist-packages (from cffi>=1.0.1->argon2-cffi-bindings->argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.6.0->ipywidgets) (2.21)
import kuzu
import pandas as pd
import networkx as nx

Wget Dataset Files and Import Into Kùzu

!rm -rf *.csv ml-small_db
!wget https://kuzudb.com/data/movie-lens/movies.csv
!wget https://kuzudb.com/data/movie-lens/users.csv
!wget https://kuzudb.com/data/movie-lens/ratings.csv
!wget https://kuzudb.com/data/movie-lens/tags.csv
--2024-03-08 20:27:20--  https://kuzudb.com/data/movie-lens/movies.csv
Resolving kuzudb.com (kuzudb.com)... 188.114.97.0, 188.114.96.0, 2a06:98c1:3120::, ...
Connecting to kuzudb.com (kuzudb.com)|188.114.97.0|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/csv]
Saving to: ‘movies.csv’

movies.csv              [ <=>                ] 520.78K  --.-KB/s    in 0.01s   

2024-03-08 20:27:21 (36.4 MB/s) - ‘movies.csv’ saved [533280]

--2024-03-08 20:27:21--  https://kuzudb.com/data/movie-lens/users.csv
Resolving kuzudb.com (kuzudb.com)... 188.114.97.0, 188.114.96.0, 2a06:98c1:3120::, ...
Connecting to kuzudb.com (kuzudb.com)|188.114.97.0|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/csv]
Saving to: ‘users.csv’

users.csv               [ <=>                ]   2.28K  --.-KB/s    in 0s      

2024-03-08 20:27:21 (37.6 MB/s) - ‘users.csv’ saved [2338]

--2024-03-08 20:27:21--  https://kuzudb.com/data/movie-lens/ratings.csv
Resolving kuzudb.com (kuzudb.com)... 188.114.97.0, 188.114.96.0, 2a06:98c1:3120::, ...
Connecting to kuzudb.com (kuzudb.com)|188.114.97.0|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/csv]
Saving to: ‘ratings.csv’

ratings.csv             [ <=>                ]   2.27M  --.-KB/s    in 0.03s   

2024-03-08 20:27:21 (79.0 MB/s) - ‘ratings.csv’ saved [2382885]

--2024-03-08 20:27:21--  https://kuzudb.com/data/movie-lens/tags.csv
Resolving kuzudb.com (kuzudb.com)... 188.114.97.0, 188.114.96.0, 2a06:98c1:3120::, ...
Connecting to kuzudb.com (kuzudb.com)|188.114.97.0|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/csv]
Saving to: ‘tags.csv’

tags.csv                [ <=>                ] 112.28K  --.-KB/s    in 0.01s   

2024-03-08 20:27:21 (11.5 MB/s) - ‘tags.csv’ saved [114977]

Create schemas in Kùzu and import the unzipped csv files into Kùzu using COPY FROM clause.

import shutil

db_path = './ml-small_db'
shutil.rmtree(db_path, ignore_errors=True)

def load_data(connection):
    connection.execute('CREATE NODE TABLE Movie (movieId INT64, year INT64, title STRING, genres STRING, PRIMARY KEY (movieId))')
    connection.execute('CREATE NODE TABLE User (userId INT64, PRIMARY KEY (userId))')
    connection.execute('CREATE REL TABLE Rating (FROM User TO Movie, rating DOUBLE, timestamp INT64)')
    connection.execute('CREATE REL TABLE Tags (FROM User TO Movie, tag STRING, timestamp INT64)')

    connection.execute('COPY Movie FROM "./movies.csv" (HEADER=TRUE)')
    connection.execute('COPY User FROM "./users.csv" (HEADER=TRUE)')
    connection.execute('COPY Rating FROM "./ratings.csv" (HEADER=TRUE)')
    connection.execute('COPY Tags FROM "./tags.csv" (HEADER=TRUE)')

db = kuzu.Database(db_path)
conn = kuzu.Connection(db)
load_data(conn)

Example 1: Graph Visualization with NetworkX

Extract a subgraph of 250 ratings edges using Cypher; convert to a NetworkX graph G; and draw a node-link visualization.

res = conn.execute('MATCH (u:User)-[r:Rating]->(m:Movie) RETURN u, r, m LIMIT 250')
G = res.get_as_networkx(directed=False)
colors = ['red' if G.nodes[node]['_label'] == 'User' else 'blue' for node in list(G.nodes())]
nx.draw_spring(G, node_color=colors, node_size=40)

Example 2: Interactive graph visualization with yWorks

yfiles-jupyter-graphs is a free tool to generate interactive visualizations of graphs within a Jupyter notebook environment. Follow the docs to install the dependencies.

The following visualization uses the same graph G from above that consists of 250 movie ratings edges.

try:
  import google.colab
  from google.colab import output
  output.enable_custom_widget_manager()
except:
  pass
from typing import Union, Any

def custom_node_color_mapping(node: dict[str, Any]):
    """let the color be orange or blue if the index is even or odd respectively"""
    return ("#eb4934" if node['properties']['_label'] == "User" else "#2456d4")
from yfiles_jupyter_graphs import GraphWidget

w = GraphWidget(graph=G)
w.set_sidebar(enabled=False)
w.set_node_color_mapping(custom_node_color_mapping)
display(w)

Example 3: Compute PageRank, Store Back in the Database and Query

We extract only the subgraph between users and movies (so ignoring tags) and convert to a NetworkX graph G.

res = conn.execute('MATCH (u:User)-[r:Rating]->(m:Movie) RETURN u, r, m')
G = res.get_as_networkx(directed=False)

Next compute PageRanks of users and movies.

pageranks = nx.pagerank(G)

Put returned pageranks into a page_rank_df dataframe and get a list of movies and their pageranks into a movie_df data frame.

pagerank_df = pd.DataFrame.from_dict(pageranks, orient="index", columns=["pagerank"])
movie_df = pagerank_df[pagerank_df.index.str.contains("Movie")]
movie_df.index = movie_df.index.str.replace("Movie_", "").astype(int)
movie_df = movie_df.reset_index(names=["id"])
print(f"Calculated pageranks for {len(movie_df)} nodes")
movie_df.sort_values(by="pagerank", ascending=False).head()
Calculated pageranks for 9724 nodes
id pagerank
20 356 0.001155
232 318 0.001099
16 296 0.001075
166 2571 0.001006
34 593 0.000987
user_df = pagerank_df[pagerank_df.index.str.contains("User")]
user_df.index = user_df.index.str.replace("User_", "").astype(int)
user_df = user_df.reset_index(names=["id"])
user_df.sort_values(by="pagerank", ascending=False).head()
id pagerank
598 599 0.016401
413 414 0.014711
473 474 0.014380
447 448 0.012942
609 610 0.008492

Update node schemas with pagerank property

Alter the movie and user table schemas by adding a new pagerank property to them (of type float64).

try:
  # Alter original node table schemas to add pageranks
  conn.execute("ALTER TABLE Movie ADD pagerank DOUBLE DEFAULT 0.0;")
  conn.execute("ALTER TABLE User ADD pagerank DOUBLE DEFAULT 0.0;")
except RuntimeError:
  # If the column already exists, do nothing
  pass

Scan Pandas DataFrame and copy values to Kùzu

The next feature demonstrated is a powerful one: Kùzu can natively scan Pandas DataFrames in a zero-copy manner, by using the LOAD FROM clause on the variable name that stores the DataFrame. Once the scan is done, the values are accessible by Kùzu via Cypher. This approach is used to read the computed pageranks (which are in a Pandas DataFrame) to the respective node tables in Kùzu.

# Copy pagerank values to movie nodes
x = conn.execute(
  """
  LOAD FROM movie_df
  MERGE (m:Movie {movieId: id})
  ON MATCH SET m.pagerank = pagerank
  RETURN m.movieId AS movieId, m.pagerank AS pagerank;
  """
)
x.get_as_df().head()
movieId pagerank
0 1 0.000776
1 3 0.000200
2 6 0.000368
3 47 0.000707
4 50 0.000724
# Copy user pagerank values to user nodes
y = conn.execute(
  """
  LOAD FROM user_df
  MERGE (u:User {userId: id})
  ON MATCH SET u.pagerank = pagerank
  RETURN u.userId As userId, u.pagerank AS pagerank;
  """
)
y.get_as_df().head()
userId pagerank
0 1 0.000867
1 2 0.000134
2 3 0.000254
3 4 0.000929
4 5 0.000151

We next find the top 20 pagerank movies and then users.

conn.execute('MATCH (m:Movie) RETURN m.title, m.pagerank ORDER BY m.pagerank DESC LIMIT 10').get_as_df()
m.title m.pagerank
0 Forrest Gump (1994) 0.001155
1 Shawshank Redemption, The (1994) 0.001099
2 Pulp Fiction (1994) 0.001075
3 Matrix, The (1999) 0.001006
4 Silence of the Lambs, The (1991) 0.000987
5 Star Wars: Episode IV - A New Hope (1977) 0.000903
6 Jurassic Park (1993) 0.000825
7 Braveheart (1995) 0.000810
8 Fight Club (1999) 0.000797
9 Schindler's List (1993) 0.000778
conn.execute('MATCH (u:User) RETURN u.userId, u.pagerank ORDER BY u.pagerank DESC LIMIT 10').get_as_df()
u.userId u.pagerank
0 599 0.016401
1 414 0.014711
2 474 0.014380
3 448 0.012942
4 610 0.008492
5 606 0.007245
6 274 0.006253
7 89 0.006083
8 380 0.005997
9 318 0.005920

As a final example, we find the average ratings from a highly influential (i.e, high pagerank score) user and highly influential movies. To filter high pagerank users, we first set a threshold of a user’s score being above or below the top 10 pagerank scores for all users. Any user with a pagerank score above this value is considered an “influential” user. We repeat this same computation to find the threshold for movies. Then we find all ratings between influential users and movies and take their average.

user_pr_threshold = conn.execute(
  """
  MATCH (u:User)
  WITH u.pagerank AS user_pr_threshold
  ORDER BY u.pagerank DESC LIMIT 10
  RETURN min(user_pr_threshold)
  """
).get_as_df().iloc[0,0];
user_pr_threshold
0.005919808556361484
movie_pr_threshold = conn.execute(
    """
    MATCH (m:Movie)
    WITH m.pagerank AS movie_pr_threshold
    ORDER BY m.pagerank DESC LIMIT 10
    RETURN min(movie_pr_threshold)
    """
).get_as_df().iloc[0,0];
movie_pr_threshold
0.0007781856282699511
avg_rating_df = conn.execute(
    """
    MATCH (u:User)-[r:Rating]->(m:Movie)
    WHERE u.pagerank > $user_pr_threshold  AND m.pagerank > $movie_pr_threshold
    RETURN avg(r.rating) as avgRBtwHighPRUserMovies;
    """,
    parameters={"user_pr_threshold": user_pr_threshold, "movie_pr_threshold": movie_pr_threshold}
).get_as_df()
avg_rating_df.head()
avgRBtwHighPRUserMovies
0 4.239437