Success-stories
Validating Import Performance of Nebula Importer
Machine Specifications for Testing
Host Name | OS | CPU Architecture | CPU Cores | Memory | Disk |
---|---|---|---|---|---|
hadoop 10 | CentOS 7.6 | x86_64 | 32 核 | 128 GB | 1.8 TB |
hadoop 11 | CentOS 7.6 | x86_64 | 32 核 | 64 GB | 1 TB |
hadoop 12 | CentOS 7.6 | x86_64 | 16 核 | 64 GB | 1 TB |
Environment of NebulaGraph Cluster
- Operating System: CentOS 7.5 +
- Necessary software for NebulaGraph Cluster, including gcc 7.1.0+, cmake 3.5.0, glibc 2.12+, and other necessary dependencies.
yum update
yum install -y make \
m4 \
git \
wget \
unzip \
xz \
readline-devel \
ncurses-devel \
zlib-devel \
gcc \
gcc-c++ \
cmake \
gettext \
curl \
redhat-lsb-core
- NebulaGraph version: V2.0.0
- Back-end storage: Three nodes, RocksDB
Process \ Host Name | hadoop10 | hadoop11 | hadoop12 |
---|---|---|---|
# of metad processes | 1 | 1 | 1 |
# of storaged processes | 1 | 1 | 1 |
# of graphd processes | 1 | 1 | 1 |
Preparing Data and Introducing Data Format
# of Vertices / File Size | # of Edges / File Size | # of Vertices and Edges / File Size |
---|---|---|
74,314,635 /4.6 G | 139,951,301 /6.6 G | 214,265,936 /11.2 G |
More details about the data:
edge.csv: 139,951,301 records in total, 6.6 GB
vertex.csv: 74,314,635 records in total, 4.6 GB
214,265,936 vertices and edges in total, 11.2 GB
[root@hadoop10 datas]# wc -l edge.csv
139951301 edge.csv
[root@hadoop10 datas]# head -10 vertex.csv
-201035082963479683,实体
-1779678833482502384,值
4646408208538057683,胶饴
-1861609733419239066,别名: 饴糖、畅糖、畅、软糖。
-2047289935702608120,词条
5842706712819643509,词条(拼音:cí tiáo)也叫词目,是辞书学用语,指收列的词语及其释文。
-3063129772935425027,文化
-2484942249444426630,红色食品
-3877061284769534378,红色食品是指食品为红色、橙红色或棕红色的食品。
-3402450096279275143,否
[root@hadoop10 datas]# wc -l vertex.csv
74314635 vertex.csv
[root@hadoop10 datas]# head -10 edge.csv
-201035082963479683,-1779678833482502384,属性
4646408208538057683,-1861609733419239066,描述
-2047289935702608120,5842706712819643509,描述
-2047289935702608120,-3063129772935425027,标签
-2484942249444426630,-3877061284769534378,描述
-2484942249444426630,-2484942249444426630,中文名
-2484942249444426630,-3402450096279275143,是否含防腐剂
-2484942249444426630,4786182067583989997,主要食用功效
-2484942249444426630,-8978611301755314833,适宜人群
-2484942249444426630,-382812815618074210,用途
Validating Solution
Solution: Using Nebula Importer to import data in batch.
Edit a YAML file for importing data.
version: v1rc1
description: example
clientSettings:
concurrency: 10 # number of graph clients
channelBufferSize: 128
space: test
connection:
user: user
password: password
address: 191.168.7.10:9669,191.168.7.11:9669,191.168.7.12:9669
logPath: ./err/test.log
files:
- path: ./vertex.csv
failDataPath: ./err/vertex.csv
batchSize: 100
type: csv
csv:
withHeader: false
withLabel: false
schema:
type: vertex
vertex:
tags:
- name: entity
props:
- name: name
type: string
- path: ./edge.csv
failDataPath: ./err/edge.csv
batchSize: 100
type: csv
csv:
withHeader: false
withLabel: false
schema:
type: edge
edge:
name: relation
withRanking: false
props:
- name: name
type: string
Create schema
On Nebula Console, create a graph space, and then tags and edge types in the graph space.
# 1. Create a graph space.
(admin@nebula) [(none)]> create space test2(vid_type = FIXED_STRING(64));
# 2. Switch to the specified graph space.
(admin@nebula) [(none)]> use test2;
# 3. Create a tag.
(admin@nebula) [test2]> create tag entity(name string);
# 4. Create an edge type.
(admin@nebula) [test2]> create edge relation(name string);
# 5. View the definition of the tag.
(admin@nebula) [test2]> describe tag entity;
+--------+----------+-------+---------+
| Field | Type | Null | Default |
+--------+----------+-------+---------+
| "name" | "string" | "YES" | |
+--------+----------+-------+---------+
Got 1 rows (time spent 703/1002 us)
# 6. View the definition of the edge type.
(admin@nebula) [test2]> describe edge relation;
+--------+----------+-------+---------+
| Field | Type | Null | Default |
+--------+----------+-------+---------+
| "name" | "string" | "YES" | |
+--------+----------+-------+---------+
Got 1 rows (time spent 703/1041 us)
Compile
Compile Nebula Importer and run shell commands.
# Compile Nebula Importer.
make build
# Run the shell command where a YAML configuration file is specified.
/opt/software/nebulagraph/nebula-importer/nebula-importer --config /opt/software/datas/rdf-import2.yaml
View the output
# View part of logs.
2021/04/19 19:05:55 [INFO] statsmgr.go:61: Tick: Time(2400.00s), Finished(210207018), Failed(0), Latency AVG(32441us), Batches Req AVG(33824us), Rows AVG(87586.25/s)
2021/04/19 19:06:00 [INFO] statsmgr.go:61: Tick: Time(2405.00s), Finished(210541418), Failed(0), Latency AVG(32461us), Batches Req AVG(33844us), Rows AVG(87543.20/s)
2021/04/19 19:06:05 [INFO] statsmgr.go:61: Tick: Time(2410.00s), Finished(210901218), Failed(0), Latency AVG(32475us), Batches Req AVG(33857us), Rows AVG(87510.88/s)
2021/04/19 19:06:10 [INFO] statsmgr.go:61: Tick: Time(2415.00s), Finished(211270318), Failed(0), Latency AVG(32486us), Batches Req AVG(33869us), Rows AVG(87482.50/s)
2021/04/19 19:06:15 [INFO] statsmgr.go:61: Tick: Time(2420.00s), Finished(211685318), Failed(0), Latency AVG(32490us), Batches Req AVG(33873us), Rows AVG(87473.27/s)
2021/04/19 19:06:20 [INFO] statsmgr.go:61: Tick: Time(2425.00s), Finished(211959718), Failed(0), Latency AVG(32517us), Batches Req AVG(33900us), Rows AVG(87406.07/s)
2021/04/19 19:06:25 [INFO] statsmgr.go:61: Tick: Time(2430.00s), Finished(212220818), Failed(0), Latency AVG(32545us), Batches Req AVG(33928us), Rows AVG(87333.67/s)
2021/04/19 19:06:30 [INFO] statsmgr.go:61: Tick: Time(2435.00s), Finished(212433518), Failed(0), Latency AVG(32579us), Batches Req AVG(33963us), Rows AVG(87241.69/s)
2021/04/19 19:06:35 [INFO] statsmgr.go:61: Tick: Time(2440.00s), Finished(212780818), Failed(0), Latency AVG(32593us), Batches Req AVG(33977us), Rows AVG(87205.25/s)
2021/04/19 19:06:40 [INFO] statsmgr.go:61: Tick: Time(2445.01s), Finished(213240518), Failed(0), Latency AVG(32589us), Batches Req AVG(33973us), Rows AVG(87214.69/s)
2021/04/19 19:06:40 [INFO] reader.go:180: Total lines of file(/opt/software/datas/edge.csv) is: 139951301, error lines: 0
2021/04/19 19:06:42 [INFO] statsmgr.go:61: Done(/opt/software/datas/edge.csv): Time(2446.70s), Finished(213307919), Failed(0), Latency AVG(32585us), Batches Req AVG(33968us), Rows AVG(87181.95/s)
2021/04/19 19:06:42 Finish import data, consume time: 2447.20s
2021/04/19 19:06:43 --- END OF NEBULA IMPORTER ---
A special focus on the statistics of the statistics of results.
Time(2446.70s), Finished(213307919), Failed(0), Latency AVG(32585us), Batches Req
AVG(33968us), Rows AVG(87181.95/s)
2021/04/19 19:06:42 Finish import data, consume time: 2447.20s
2021/04/19 19:06:43 --- END OF NEBULA IMPORTER ---
Resource Requirements
High requirement of the machine specifications, including the number of CPU cores, memory size, and disk size.
- hadoop 10
hadoop 11
hadoop 12
Recommendations on the machine specifications:
- By comparing the memory consumption of the three machines, we found that the memory consumption is great when more than 200 million data are imported, so we recommend that the memory capacity should be as large as possible.
- For the information about the CPU cores and disk size, see the documentation: https://docs.nebula-graph.io.
nGQL Statements Test
The native graph query language of NebulaGraph is nGQL. It is compatible with OpenCypher. For now, nGQL has not supported traversal of the total number of vertices and edges. For example, MATCH (v) RETURN v
is not supported yet. Make sure that at least one index is available in a MATCH
statement. If you want to create an index when related vertices, edges, or properties exist, rebuild the index after it is created to make it effective.
To test whether nGQL is compatible with OpenCypher.
# Test OpenCypher statements.
# Import an nGQL file.
./nebula-console -addr 191.168.7.10 -port 9669 -u user -p password -t 120 -f /opt/software/datas/basketballplayer-2.X.ngql
Conclusion
This test validated the performance of importing a large amount of data to a three-node NebulaGraph cluster. The batch writing performance of Nebula Importer can meet the performance requirements of the production scenario. However, if the data is imported as CSV files, it must be stored in HDFS and a YAML configuration file is needed to specify the configuration of the tags and edge types for processing by tools.
Would like to know more about NebulaGraph? Join the Slack channel!