Use-cases
Graph Database for Social Networks: 7 Fundamental Use Cases of NebulaGraph| EP 1
Two months ago, my colleague brought in a graph of Twitter interactions and asked me if I could write a post on graph technology to explore social network analysis and see how those connections are recognized through graph technology.
So, how are typical social relationship graphs like the one above, or even more complex social network-based graphs, generated? In the following episodes, I will use the NebulaGraph database to conduct the social network analysis, and you can find step-by-step guides to try all the methods in Playground.
Also read: How to choose a database for a social media app
Social Network Analysis on Graph Database
We can use a graph database in a social network system to represent users and their connections or relationships. Graph databases allow efficient querying of relationships between users, making various business implementations on social networks based on connection findings, statistics, and analysis feasible and efficient.
For example, graph databases can identify "influential users" in a network, recommend new connections (friendships, favorite content ) based on commonalities between users, or find different groups of people and communities to profile users. Graph databases are ideal for social networking systems where user relationships constantly change because they can support complex multi-hop queries and real-time writes and updates.
There have been quite a few examples of social network graphs practical applications, including:
- Finding key people
- Identifying clusters of people and communities
- Determining the closeness between two users
- Recommending new friends
- Pinpointing important content using common neighbors
- Push information flow based on friend relationships and geographic location
- Use spatio-temporal relationship mapping to query the relationship between people, get the people who intersected in time and space, and the provinces visited
I will dive into each application with hands-on demonstrations to present graph databases' competence in a couple of episodes. Stay tuned!
Social Network Graph Modeling
To showcase the SNS graph use cases, I'll build most of the examples on a typically small social network. I started by adding extra data on top of the NebulaGraph default dataset, basketballplayer:
Three new tags of vertices:
- address
- place
- post
Five new types of edges:
- created_post
- commented_at
- lived_in
- belong_to
It looks like this:
Importing the data
Load the default dataset
In the Command Line Console, we could execute:play basketballplayer` to load the default dataset.
Or, if we do so from NebulaGraph Studio/Explorer, click the Download from the Demos in the welcome page:
Add the SNS Graph schema
First, the DDL for those new tags and edges:
CREATE TAG IF NOT EXISTS post(title string NOT NULL);
CREATE EDGE created_post(post_time timestamp);
CREATE EDGE commented_at(post_time timestamp);
CREATE TAG address(address string NOT NULL, `geo_point` geography(point));
CREATE TAG place(name string NOT NULL, `geo_point` geography(point));
CREATE EDGE belong_to();
CREATE EDGE lived_in();
Load the data
Then we load the DML part to insert vertices and edges:
INSERT VERTEX post(title) values \
"post1":("a beautify flower"), "post2":("my first bike"), "post3":("I can swim"), \
"post4":("I love you, Dad"), "post5":("I hate coriander"), "post6":("my best friend, tom"), \
"post7":("my best friend, jerry"), "post8":("Frank, the cat"), "post9":("sushi rocks"), \
"post10":("I love you, Mom"), "post11":("Let's have a party!");
INSERT EDGE created_post(post_time) values \
"player100"->"post1":(timestamp("2019-01-01 00:30:06")), \
"player111"->"post2":(timestamp("2016-11-23 10:04:50")), \
"player101"->"post3":(timestamp("2019-11-11 10:44:06")), \
"player103"->"post4":(timestamp("2014-12-01 20:45:11")), \
"player102"->"post5":(timestamp("2015-03-01 00:30:06")), \
"player104"->"post6":(timestamp("2017-09-21 23:30:06")), \
"player125"->"post7":(timestamp("2018-01-01 00:44:23")), \
"player106"->"post8":(timestamp("2019-01-01 00:30:06")), \
"player117"->"post9":(timestamp("2022-01-01 22:23:30")), \
"player108"->"post10":(timestamp("2011-01-01 10:00:30")), \
"player100"->"post11":(timestamp("2021-11-01 11:10:30"));
INSERT EDGE commented_at(post_time) values \
"player105"->"post1":(timestamp("2019-01-02 00:30:06")), \
"player109"->"post1":(timestamp("2016-11-24 10:04:50")), \
"player113"->"post3":(timestamp("2019-11-13 10:44:06")), \
"player101"->"post4":(timestamp("2014-12-04 20:45:11")), \
"player102"->"post1":(timestamp("2015-03-03 00:30:06")), \
"player103"->"post1":(timestamp("2017-09-23 23:30:06")), \
"player102"->"post7":(timestamp("2018-01-04 00:44:23")), \
"player101"->"post8":(timestamp("2019-01-04 00:30:06")), \
"player106"->"post9":(timestamp("2022-01-02 22:23:30")), \
"player105"->"post10":(timestamp("2011-01-11 10:00:30")), \
"player130"->"post1":(timestamp("2019-01-02 00:30:06")), \
"player131"->"post2":(timestamp("2016-11-24 10:04:50")), \
"player131"->"post3":(timestamp("2019-11-13 10:44:06")), \
"player133"->"post4":(timestamp("2014-12-04 20:45:11")), \
"player132"->"post5":(timestamp("2015-03-03 00:30:06")), \
"player134"->"post6":(timestamp("2017-09-23 23:30:06")), \
"player135"->"post7":(timestamp("2018-01-04 00:44:23")), \
"player136"->"post8":(timestamp("2019-01-04 00:30:06")), \
"player137"->"post9":(timestamp("2022-01-02 22:23:30")), \
"player138"->"post10":(timestamp("2011-01-11 10:00:30")), \
"player141"->"post1":(timestamp("2019-01-03 00:30:06")), \
"player142"->"post2":(timestamp("2016-11-25 10:04:50")), \
"player143"->"post3":(timestamp("2019-11-14 10:44:06")), \
"player144"->"post4":(timestamp("2014-12-05 20:45:11")), \
"player145"->"post5":(timestamp("2015-03-04 00:30:06")), \
"player146"->"post6":(timestamp("2017-09-24 23:30:06")), \
"player147"->"post7":(timestamp("2018-01-05 00:44:23")), \
"player148"->"post8":(timestamp("2019-01-05 00:30:06")), \
"player139"->"post9":(timestamp("2022-01-03 22:23:30")), \
"player140"->"post10":(timestamp("2011-01-12 10:01:30")), \
"player141"->"post1":(timestamp("2019-01-04 00:34:06")), \
"player102"->"post2":(timestamp("2016-11-26 10:06:50")), \
"player103"->"post3":(timestamp("2019-11-15 10:45:06")), \
"player104"->"post4":(timestamp("2014-12-06 20:47:11")), \
"player105"->"post5":(timestamp("2015-03-05 00:32:06")), \
"player106"->"post6":(timestamp("2017-09-25 23:31:06")), \
"player107"->"post7":(timestamp("2018-01-06 00:46:23")), \
"player118"->"post8":(timestamp("2019-01-06 00:35:06")), \
"player119"->"post9":(timestamp("2022-01-04 22:26:30")), \
"player110"->"post10":(timestamp("2011-01-15 10:00:30")), \
"player111"->"post1":(timestamp("2019-01-06 00:30:06")), \
"player104"->"post11":(timestamp("2022-01-15 10:00:30")), \
"player125"->"post11":(timestamp("2022-02-15 10:00:30")), \
"player113"->"post11":(timestamp("2022-03-15 10:00:30")), \
"player102"->"post11":(timestamp("2022-04-15 10:00:30")), \
"player108"->"post11":(timestamp("2022-05-15 10:00:30"));
INSERT VERTEX `address` (`address`, `geo_point`) VALUES \
"addr_0":("Brittany Forge Apt. 718 East Eric WV 97881", ST_Point(1,2)),\
"addr_1":("Richard Curve Kingstad AZ 05660", ST_Point(3,4)),\
"addr_2":("Schmidt Key Lake Charles AL 36174", ST_Point(13.13,-87.65)),\
"addr_3":("5 Joanna Key Suite 704 Frankshire OK 03035", ST_Point(5,6)),\
"addr_4":("1 Payne Circle Mitchellfort LA 73053", ST_Point(7,8)),\
"addr_5":("2 Klein Mission New Annetteton HI 05775", ST_Point(9,10)),\
"addr_6":("1 Vanessa Stravenue Suite 184 Baileyville NY 46381", ST_Point(11,12)),\
"addr_7":("John Garden Port John LA 54602", ST_Point(13,14)),\
"addr_8":("11 Webb Groves Tiffanyside MN 14566", ST_Point(15,16)),\
"addr_9":("70 Robinson Locks Suite 113 East Veronica ND 87845", ST_Point(17,18)),\
"addr_10":("24 Mcknight Port Apt. 028 Sarahborough MD 38195", ST_Point(19,20)),\
"addr_11":("0337 Mason Corner Apt. 900 Toddmouth FL 61464", ST_Point(21,22)),\
"addr_12":("7 Davis Station Apt. 691 Pittmanfort HI 29746", ST_Point(23,24)),\
"addr_13":("1 Southport Street Apt. 098 Westport KY 85907", ST_Point(120.12,30.16)),\
"addr_14":("Weber Unions Eddieland MT 64619", ST_Point(25,26)),\
"addr_15":("1 Amanda Freeway Lisaland NJ 94933", ST_Point(27,28)),\
"addr_16":("2 Klein HI 05775", ST_Point(9,10)),\
"addr_17":("Schmidt Key Lake Charles AL 13617", ST_Point(13.12, -87.60)),\
"addr_18":("Rodriguez Track East Connorfort NC 63144", ST_Point(29,30));
INSERT VERTEX `place` (`name`, `geo_point`) VALUES \
"WV":("West Virginia", ST_Point(1,2.5)),\
"AZ":("Arizona", ST_Point(3,4.5)),\
"AL":("Alabama", ST_Point(13.13,-87)),\
"OK":("Oklahoma", ST_Point(5,6.1)),\
"LA":("Louisiana", ST_Point(7,8.1)),\
"HI":("Hawaii", ST_Point(9,10.1)),\
"NY":("New York", ST_Point(11,12.1)),\
"MN":("Minnesota", ST_Point(15,16.1)),\
"ND":("North Dakota", ST_Point(17,18.1)),\
"FL":("Florida", ST_Point(21,22.1)),\
"KY":("Kentucky", ST_Point(120.12,30)),\
"MT":("Montana", ST_Point(25,26.1)),\
"NJ":("New Jersey", ST_Point(27,28.1)),\
"NC":("North Carolina", ST_Point(29,30.1));
INSERT EDGE `belong_to`() VALUES \
"addr_0"->"WV":(),\
"addr_1"->"AZ":(),\
"addr_2"->"AL":(),\
"addr_3"->"OK":(),\
"addr_4"->"LA":(),\
"addr_5"->"HI":(),\
"addr_6"->"NY":(),\
"addr_7"->"LA":(),\
"addr_8"->"MN":(),\
"addr_9"->"ND":(),\
"addr_10"->"MD":(),\
"addr_11"->"FL":(),\
"addr_12"->"HI":(),\
"addr_13"->"KY":(),\
"addr_14"->"MT":(),\
"addr_15"->"NJ":(),\
"addr_16"->"HI":(),\
"addr_17"->"AL":(),\
"addr_18"->"NC":();
INSERT EDGE `lived_in`() VALUES \
"player100"->"addr_4":(),\
"player101"->"addr_7":(),\
"player102"->"addr_2":(),\
"player103"->"addr_3":(),\
"player104"->"addr_0":(),\
"player105"->"addr_5":(),\
"player106"->"addr_6":(),\
"player107"->"addr_1":(),\
"player108"->"addr_8":(),\
"player109"->"addr_9":(),\
"player110"->"addr_10":(),\
"player111"->"addr_11":(),\
"player112"->"addr_12":(),\
"player113"->"addr_13":(),\
"player114"->"addr_14":(),\
"player115"->"addr_15":(),\
"player116"->"addr_16":(),\
"player117"->"addr_17":(),\
"player118"->"addr_18":();
First glance at the data
Let's start with the stats of the data.
[basketballplayer]> SUBMIT JOB STATS;
+------------+
| New Job Id |
+------------+
| 10 |
+------------+
[basketballplayer]> SHOW STATS;
+---------+----------------+-------+
| Type | Name | Count |
+---------+----------------+-------+
| "Tag" | "address" | 19 |
| "Tag" | "place" | 14 |
| "Tag" | "player" | 51 |
| "Tag" | "post" | 10 |
| "Tag" | "team" | 30 |
| "Edge" | "belong_to" | 19 |
| "Edge" | "commented_at" | 40 |
| "Edge" | "created_post" | 10 |
| "Edge" | "follow" | 81 |
| "Edge" | "lived_in" | 19 |
| "Edge" | "serve" | 152 |
| "Space" | "vertices" | 124 |
| "Space" | "edges" | 321 |
+---------+----------------+-------+
Got 13 rows (time spent 1038/51372 us)
We could get all of the data:
MATCH ()-[e]->() RETURN e LIMIT 10000
As the data volume is quite small, we could render them all in the canvas of NebulaGraph Explorer:
Above are the two fundamental steps to embark on the journey of social network exploration. I'll introduce other applications in the next episode: