Graph-computing
How to calculate the Betweenness Centrality against NebulaGraph
Betweenness Centrality (BC for short) reflects the significance of vertices in the entire network. This article will introduce how to compute Betweenness Centrality against NebulaGraph.
Relevant Concepts
Centrality represents how central a vertex is in the entire network graph, including Degree Centrality, Closeness Centrality, and Betweenness Centrality, etc. Degree Centrality reflects the popularity of a vertex by counting the number of its incoming and outgoing edges, while Closeness Centrality computes the sum of the length of the shortest paths between a vertex and all other vertices in the graph. Thus, the more central a vertex is, the closer it is to all other vertices.
Betweenness Centrality counts the number of times a vertex appears on the shortest path between any two other vertices, so as to represent the significance of this vertex to the network connectivity.
The Betweenness Centrality of a vertex is the proportion of the number of paths passing through this vertex in all the shortest paths to the total number of shortest paths.
Calculating the Betweenness Centrality of a vertex in a graph can be conducted in a weighted graph or in an unweighted graph. For unweighted graphs, Breadth-First Search (BFS for short) is used to find the shortest path, while for weighted graphs, Dijkstra’s algorithm is used.
The following algorithms are all targeted at undirected graphs.
Applicable Scenarios
Betweenness Centrality reflects the significance of vertices in the entire network by measuring how a vertex bridges all other vertices in a graph or network. As we can see, Vertex C in the following figure acts as an important bridging vertex.
Betweenness Centrality can be used to identify
a. The intermediary entities in anti-fraud scenarios in the field of financial risk control.
b. Specific disease control genes in the medical field to improve drug targets.
Betweenness Centrality Algorithm
The Betweenness Centrality of a vertex can be computed as follows:
$$C_B = \sum_{s{\not=} v {\not=} t \in V} \frac{\sigma_{st}(v)}{\sigma_{st}}$$
(Formula 1)
In this formula,
$\sigma_{st}(v)$ is the number of shortest paths from Vertex s
to Vertex t
.
$\sigma_{st}$ is the number of shortest paths that pass through Vertex v
.
Vertex s
and Vertex t
are a pair of vertices belonging to the vertex set.
To make it more convenient, the betweenness of each pair of vertices can be computed as:
$$\delta_{st}(v) = \frac{\sigma_{st}(v)}{\sigma_{st}}$$
(Formula 2)
So Formula 1 can be replaced by Formula 2, which gives rise to Formula 3 as follows:
$$C_B(v) = \sum_{s{\not=} v {\not=} t \in V} \delta_{st}(v)$$
(Formula 3)
Solution Procedure
To get the Betweenness Centrality of Vertex v
, namely to get $$\frac{, we need to know whether Vertex v
lies on the path from Vertex s
to Vertex t
.
(1) To know whether Vertex v
lies on the shortest path from Vertex s
to Vertex t
, use the following formula $d_G$ represents the shortest path from Vertex s
to Vertex t
):
If Vertex v
lies on the shortest path from Vertex s
to Vertex t
, then $$d_G(s,t) = d_G(s,v) + d_G(v,t)$$ is satisfied.
(Formula 4)
$d_G(s,v)$ and $d_G(v,t)$ are mutually independent. According to the rules of combination, the total number of shortest paths from Vertex s
to Vertex t
is the product of the number of shortest paths from Vertex s
to Vertex t
and the number of shortest paths from Vertex s
to Vertex t
.
Based on the above two situations, Formula 5 can be inferred:
$$\sigma_{st}(v) = \begin{cases}
\sigma_{sv} \times \sigma_{vt} &\text{if } d(s,v) + d(v,t) = d(s,t) \
0 &\text{if } other
\end{cases}$$
(Formula 5)
(2)According to the above formula, we can get the conclusion that the number of shortest paths from Vertex s
to Vertex t
that pass through Vertex w
is the result of
$\sigma_{st}(w) = \sigma_{sw} \times \sigma_{wt}$. In the graph, Vertex v
is the preceding vertex of Vertex w
. Therefore, the formula to count the number of shortest paths from Vertex s
to Vertex t
passing through Vertex v
to Vertex w
is:
$\sigma_{st}(v,{v,w}) = \sigma_{sv} \times \sigma_{wt}$
(Formula 6)
Here are two situations, $$t = w$$ and $$t \not= w$$.
a. If $$t = w$$, then $$\sigma_{wt}$$ will not exist and we can get
$$\delta(v,{v,w}) = \frac{\sigma_{sv}}{\sigma_{sw}}$$
(Formula 7)
b. If $$t \not= w$$, then we can get
$$\delta(v,{v,w}) = \frac{\sigma_{sw}(v)}{\sigma_{sw}} \times \frac{\sigma_{st}(w)}{\sigma_{st}} $$
(Formula 8)
(3) So considering the above two situations, we can get
$$\delta_s(v) = \sum_{w:v \in P_s(w)}(\frac{\sigma_{sw}(v)}{\sigma_{sw}} + \sum_{t \not= w \in V}\frac{\sigma_{sw}(v)}{\sigma_{sw}} \times \frac{\sigma_{st}(w)}{\sigma_{st}}) \ = \sum_{w:v \in P_s(w)}\frac{\sigma_{sw}(v)}{\sigma_{sw}}(1 + \sum_{t \not= w \in V} \frac{\sigma_{st}(w)}{\sigma_{st}}) \ = \sum_{w:v \in P_s(w)}\frac{\sigma_{sw}(v)}{\sigma_{sw}} (1 + \delta_s(w)) $$
(Formula 9)
In $$w:v \in P_s(w)$$, Vertex v
is the predecessor of Vertex w
in the path from Vertex s
to Vertex w
.
(4) According to the above formula that gets the result of $\delta_{sv}$, the algorithm workflow of Betweenness Centrality in unweighted graphs can be given as follows:
For unweighted graphs, follow the above process.
To calculate the Betweenness Centrality in weighted graphs requires Dijkstra's algorithm, that is, to change the code in the first while loop.
The Betweenness Centrality against NebulaGraph has achieved the algorithms for both weighted and unweighted graphs. For the code, see https://github.com/vesoft-inc/nebula-algorithm/blob/master/nebula-algorithm/src/main/scala/com/vesoft/nebula/algorithm/lib/BetweennessCentralityAlgo.scala.
Computation Examples
Firstly, read the graph data in NebulaGraph to specify the edge data for data reading.
Secondly, make a topological graph based on the edge data of NebulaGraph and perform centrality computation.
The graph data read in NebulaGraph can be illustrated in the following unweighted graph:
Compute the BC of Vertex 1:
The vertex pair with the shortest path passing through Vertex 1 | The total number of shortest paths between the vertex pair | The number of the shortest path passing through Vertex 1 |
---|---|---|
2-4 | 3 (2-3-4, 2-5-4, 2-1-4) | 1 |
The BC of Vertex 1: | 1/3 |
Compute the BC of Vertex 2:
The vertex pair with the shortest path passing through Vertex 2 | The total number of shortest paths between the vertex pair | The number of the shortest path passing through Vertex 2 |
---|---|---|
1-3 | 2 (1-2-3, 1-4-3) | 1 |
3-5 | 2(3-2-5, 3-4-5) | 1 |
The BC of Vertex 2: | 1 |
Compute the BC of Vertex 3:
The vertex pair with the shortest path passing through Vertex 3 | The total number of shortest paths between the vertex pair | The number of the shortest path passing through Vertex 3 |
---|---|---|
2-4 | 3 (2-3-4, 2-5-4, 2-1-4) | 1 |
The BC of Vertex 3: | 1/3 |
Compute the BC of Vertex 4:
The vertex pair with the shortest path passing through Vertex 4 | The total number of shortest paths between the vertex pair | The number of the shortest path passing through Vertex 4 |
---|---|---|
1-3 | 2 (1-4-3, 1-2-3) | 1 |
3-5 | 2(3-4-5, 3-2-5) | 1 |
The BC of Vertex 4: | 1 |
Compute the BC of Vertex 5:
The vertex pair with the shortest path passing through Vertex 5 | The total number of shortest paths between the vertex pairs | The number of the shortest path passing through Vertex 5 |
---|---|---|
2-4 | 3 (2-3-4, 2-5-4, 2-1-4) | 1 |
The BC of Vertex 5: | 1/3 |
Therefore, the BC of each vertex is: Vertex 1: 1/3 Vertex 2: 1 Vertex 3: 1/3 Vertex 4: 1 Vertex 5: 1/3
Result Examples
Data: Read the edge data in the NebulaGraph test, and take srcId, dstId, and rank as the triplet of edges in the topological graph (Source Vertex, Destination Vertex, and Rank).
(root@nebula) [test]> match (v:node) -[e:relation] -> () return e
+------------------------------------+
| e |
+------------------------------------+
| [:relation "3"->"4" @1 {col: "f"}] |
+------------------------------------+
| [:relation "2"->"3" @2 {col: "d"}] |
+------------------------------------+
| [:relation "2"->"5" @4 {col: "e"}] |
+------------------------------------+
| [:relation "4"->"5" @2 {col: "g"}] |
+------------------------------------+
| [:relation "1"->"5" @1 {col: "a"}] |
+------------------------------------+
| [:relation "1"->"2" @3 {col: "b"}] |
+------------------------------------+
| [:relation "1"->"4" @5 {col: "c"}] |
+------------------------------------+
Read the edge data in NebulaGraph, set the graph as unweighted, and calculate the Betweenness Centrality of each vertex. The results are as follows:
vid: 4 BC: 1.0
vid: 1 BC: 0.3333333333333333
vid: 3 BC: 0.3333333333333333
vid: 5 BC: 0.3333333333333333
vid: 2 BC: 1.0
Read the edge data of NebulaGraph, set the graph as weighted, and calculate the Betweenness Centrality of each vertex. The results are as follows:
vid: 4 BC: 2.0
vid: 1 BC: 0.5
vid: 3 BC: 1.0
vid: 5 BC: 2.0
vid: 2 BC: 0.0
References
- Paper: A Faster Algorithm for Betweenness Centrality
- The source code of Python's NetworkX realizing Betweenness Centrality: https://github.com/networkx/networkx/blob/master/networkx/algorithms/centrality
Join our Slack channel if you want to discuss with the rest of the NebulaGraph community!