Community Contribution | Nebula Graph 2.0 Performance Testing
This article is shared by Fanfan from the Nebula Graph community. It is about his practice of the performance testing on Nebula Graph 2.0 and optimizing the data import performance of Nebula Importer. In this article, “I” refers to the author.
I did some research on Nebula Graph and did tests to evaluate its performance. During the process, I got a lot of help from the Nebula Graph team. I would like to give thanks to them.
In this article, I will introduce my test process, hoping that it could help you get inspired in the performance testing on Nebula Graph. If you have other ideas, please share them with me.
Deploy Nebula Graph Cluster
In the test, four machines were used to deploy a Nebula Graph cluster: Machine 1, Machine 2, Machine 3, and Machine 4.
Here is the hardware configuration of these machines:
Nebula Graph was installed with the RPM package. For more information, see Nebula Graph Database Manual. Other tools were also needed: nebula-importer-2.0 and nebula-bench-2.0. Their source code was downloaded and compiled on every machine.
The data structure for the testing is composed of 7 tags and 15 edge types. The dataset is simple and not large in size, about 34 million records in total. It needed to be preprocessed to become vertex or edge tables.
A graph space was created, of which
vid was set to 100,
replica_factor was set to 3, and
partition_num was set to 100.
Optimize Data Import Performance of Nebula Importer
At the beginning of the test, I used Nebula Importer to import data to Nebula Graph without any parameter configuration modifications. The speed was only about 30 thousand records per second. It was unacceptable.
To optimize the import performance, I read the documentation of Nebula Importer and found that only three parameters needed to be adjusted for optimization:
I tried modifying the configuration of these parameters, but it did not bring a significant performance improvement. According to the feedback from the Nebula Graph team, I made the following changes to the YAML file:
concurrency：96 # the number of CPU cores channelBufferSize：20000 batchsize：2500
After the changes, the import speed is increased to 70 or 80 thousand records per second. Great. I tried larger values, but the nebula-graphd server crashed. Therefore, these parameters should be set to a large value, but not too large.
The next check was on the disk and the network. Originally, mechanical drives and 1000 Mb/s network were used in my environment. I changed them to SSD and 10000 Mb/s network and the import speed was doubled to 170 thousand records per second. Therefore, hardware matters.
My last check was on the data. When I created the graph space,
vid was set to a large value. How about a smaller value? The change failed because some vertices do need VIDs of the specified
vid length. And then I turned my eyes to
partition_num. According to Nebula Graph Database Manual, it must be set to a value 20 times of the disk capacity, so I set it to 15. The change worked. The import speed was increased to 250 thousand records per second. So far, I was satisfied with the import performance. It may be improved further with more modifications, but I think now it is acceptable.
concurrencymust be set to the number of CPU cores.
batchSizeshould be set to values that must be large enough to improve the performance but not exceed the load of the cluster.
- SSD and 10000 Mb/s network are necessary
partition_numshould be set to an appropriate value
- The VID length, the number of properties, and the number of nebula-graphd servers may affect the performance. This is only my guess because I have not done a test on them
I picked a business indicator to do the pressure testing.
Here is the query statement for the business indicator：
match (v:email)-[:emailid]->(mid:id)<-[:phoneid]-(phone:phone)-[:phoneid]->(ids:id) where id(v)=="replace" with v, count(distinct phone) as pnum,count(distinct mid) as midnum,count(distinct ids) as idsnum , sum(ids.isblack) as black where pnum > 2 and midnum>5 and midnum < 100 and idsnum > 5 and idsnum < 300 and black > 0 return v.value1, true as result
This statement aimed to do 3-hop expansion and the conditional judgement. It queried two to four million concentrated data records.
Nebula-bench needed to be modified as follows: Open the
go_step.jmx configuration file of JMeter, set
ThreadGroup.num_threads to the number of CPU cores, and then modify other parameters such as the loop associated parameters. The nGQL statement must be modified according to the actual situation. The variables in the nGQL statement must be replaced with
Because the tested data was relatively concentrated, the speed is about 700 records per second. If the data on all the nodes are tested, the speed can be increased to higher than 6000 records per second. The concurrency performance looks good, and so does the query latency, the maximum of which is 300 ms.
In my environment, only one node was used, so I wanted to add one more nebula-graphd server to do the check of the concurrency performance. After I started a new nebula-graphd server, it turned out that no improvement occurred.
At that point, Nebula Graph 2.0.1 was released, so I deployed it on the machines and imported data. On the new cluster, three nebula-graphd servers were deployed, the performance was tripled, the speed for concentrated data was higher than 2100 records per second, and the speed for data on all the machines was about 20 thousand records per second. It was weird. I guess it was caused because no
COMPACT command was run after new nebula-graphd server was added. I should give it a try next time.
Additionally, I used Linux commands, but not some specific monitoring tools, to view the database metrics, so I cannot obtain the accurate metrics.
- Before the test, make sure that the
COMPACTcommand has been run to ensure the load balance of the cluster.
- Appropriately adjust the configuration of nebula-storaged by increasing the available threads and the cache memory capacity.
- The concurrency performance is affected by both data size and data distribution.
In my test, the default configurations of nebula-metad and nebula-graphd were used. Only the configuration of nebula-graphd was modified as follows:
rocksdb_block_cache=102400 # The Nebula Graph team recommend that 1/3 memory capacity should be used. In my test, 100 GB was used. num_io_threads=48 # The number of available threads. It was set to a half of the number of CPU cores. min_vertices_per_bucket=100 # The minimum number of vertices in a bucket. vertex_cache_bucket_exp=8 # The total number of buckets was set to the eighth power of 2. wal_buffer_size=16777216 # 16 M write_buffer_size:268435456 # 256 M
I referred to some posts on the forum and read the source code of Nebula Graph before making the preceding changes to these parameters. These configurations were obtained from my attempts. They are not necessarily accurate. Besides, I did not modify other parameters.
The Nebula Graph team do not recommend users modify the configuration of other parameters. If you are interested in them, read the source code.
In summary, my test is not professional. I did it to do a test for specific business scenarios and Nebula Graph turned out very well.
I haven’t fully understood how to adjust the parameter configuration. I will keep study in the future. If you have more ideas for performance tuning, please share them with me.
Would like to know more about Nebula Graph? Join the Slack channel!