Community Contribution | NebulaGraph 2.0 Performance Testing

NebulaGraph 2.0 Performance Testing

This article is shared by Fanfan from the NebulaGraph community. It is about his practice of the performance testing on NebulaGraph 2.0 and optimizing the data import performance of Nebula Importer. In this article, "I" refers to the author.

0. Background

I did some research on NebulaGraph and did tests to evaluate its performance. During the process, I got a lot of help from the NebulaGraph team. I would like to give thanks to them.

In this article, I will introduce my test process, hoping that it could help you get inspired in the performance testing on NebulaGraph. If you have other ideas, please share them with me.

Deploy NebulaGraph Cluster

In the test, four machines were used to deploy a NebulaGraph cluster: Machine 1, Machine 2, Machine 3, and Machine 4.

Here is the hardware configuration of these machines:

1：meta，storage
2：storage
3：storage
4：graphd

NebulaGraph was installed with the RPM package. For more information, see NebulaGraph Database Manual. Other tools were also needed: nebula-importer-2.0 and nebula-bench-2.0. Their source code was downloaded and compiled on every machine.

Import Data

The data structure for the testing is composed of 7 tags and 15 edge types. The dataset is simple and not large in size, about 34 million records in total. It needed to be preprocessed to become vertex or edge tables.

A graph space was created, of which vid was set to 100, replica_factor was set to 3, and partition_num was set to 100.

Optimize Data Import Performance of Nebula Importer

At the beginning of the test, I used Nebula Importer to import data to NebulaGraph without any parameter configuration modifications. The speed was only about 30 thousand records per second. It was unacceptable.

To optimize the import performance, I read the documentation of Nebula Importer and found that only three parameters needed to be adjusted for optimization: concurrency, channelBufferSize, and batchSize.

I tried modifying the configuration of these parameters, but it did not bring a significant performance improvement. According to the feedback from the NebulaGraph team, I made the following changes to the YAML file:

concurrency：96 # the number of CPU cores
channelBufferSize：20000
batchsize：2500

After the changes, the import speed is increased to 70 or 80 thousand records per second. Great. I tried larger values, but the nebula-graphd server crashed. Therefore, these parameters should be set to a large value, but not too large.

The next check was on the disk and the network. Originally, mechanical drives and 1000 Mb/s network were used in my environment. I changed them to SSD and 10000 Mb/s network and the import speed was doubled to 170 thousand records per second. Therefore, hardware matters.

My last check was on the data. When I created the graph space, vid was set to a large value. How about a smaller value? The change failed because some vertices do need VIDs of the specified vid length. And then I turned my eyes to partition_num. According to NebulaGraph Database Manual, it must be set to a value 20 times of the disk capacity, so I set it to 15. The change worked. The import speed was increased to 250 thousand records per second. So far, I was satisfied with the import performance. It may be improved further with more modifications, but I think now it is acceptable.

Summary

concurrency must be set to the number of CPU cores. channelBufferSize and batchSize should be set to values that must be large enough to improve the performance but not exceed the load of the cluster.
SSD and 10000 Mb/s network are necessary
partition_num should be set to an appropriate value
The VID length, the number of properties, and the number of nebula-graphd servers may affect the performance. This is only my guess because I have not done a test on them

Pressure Testing

I picked a business indicator to do the pressure testing.

Here is the query statement for the business indicator：

match (v:email)-[:emailid]->(mid:id)<-[:phoneid]-(phone:phone)-[:phoneid]->(ids:id) where id(v)=="replace" with v, count(distinct phone) as pnum,count(distinct mid) as midnum,count(distinct ids) as idsnum , sum(ids.isblack) as black  where pnum > 2 and midnum>5 and midnum < 100 and idsnum > 5 and idsnum < 300 and black > 0 return v.value1, true as result

This statement aimed to do 3-hop expansion and the conditional judgement. It queried two to four million concentrated data records.

Nebula-bench needed to be modified as follows: Open the go_step.jmx configuration file of JMeter, set ThreadGroup.num_threads to the number of CPU cores, and then modify other parameters such as the loop associated parameters. The nGQL statement must be modified according to the actual situation. The variables in the nGQL statement must be replaced with replace.

Because the tested data was relatively concentrated, the speed is about 700 records per second. If the data on all the nodes are tested, the speed can be increased to higher than 6000 records per second. The concurrency performance looks good, and so does the query latency, the maximum of which is 300 ms.

In my environment, only one node was used, so I wanted to add one more nebula-graphd server to do the check of the concurrency performance. After I started a new nebula-graphd server, it turned out that no improvement occurred.

At that point, NebulaGraph 2.0.1 was released, so I deployed it on the machines and imported data. On the new cluster, three nebula-graphd servers were deployed, the performance was tripled, the speed for concentrated data was higher than 2100 records per second, and the speed for data on all the machines was about 20 thousand records per second. It was weird. I guess it was caused because no BALANCE or COMPACT command was run after new nebula-graphd server was added. I should give it a try next time.

Additionally, I used Linux commands, but not some specific monitoring tools, to view the database metrics, so I cannot obtain the accurate metrics.

Summary

Before the test, make sure that the COMPACT command has been run to ensure the load balance of the cluster.
Appropriately adjust the configuration of nebula-storaged by increasing the available threads and the cache memory capacity.
The concurrency performance is affected by both data size and data distribution.

Modify Configuration

In my test, the default configurations of nebula-metad and nebula-graphd were used. Only the configuration of nebula-graphd was modified as follows:

rocksdb_block_cache=102400  # The NebulaGraph team recommend that 1/3 memory capacity should be used. In my test, 100 GB was used.
num_io_threads=48 # The number of available threads. It was set to a half of the number of CPU cores.
min_vertices_per_bucket=100 # The minimum number of vertices in a bucket.
vertex_cache_bucket_exp=8 # The total number of buckets was set to the eighth power of 2.
wal_buffer_size=16777216  # 16 M
write_buffer_size:268435456   # 256 M

I referred to some posts on the forum and read the source code of NebulaGraph before making the preceding changes to these parameters. These configurations were obtained from my attempts. They are not necessarily accurate. Besides, I did not modify other parameters.

The NebulaGraph team do not recommend users modify the configuration of other parameters. If you are interested in them, read the source code.

Conclusion

In summary, my test is not professional. I did it to do a test for specific business scenarios and NebulaGraph turned out very well.

I haven't fully understood how to adjust the parameter configuration. I will keep study in the future. If you have more ideas for performance tuning, please share them with me.

Would like to know more about NebulaGraph? Join the Slack channel!