LLM
NebulaGraph 1.0 Benchmark Report based on the LDBC Dataset
Disclaimer: We used the LDBC SNB benchmark as a starting point. However, the test results aren’t audited, so we want to be clear that this is not an LDBC Benchmark test run, and these numbers are not LDBC Benchmark Results.
Testing Environment
Specs
CPU: Intel® Xeon® CPU E5-2697 v3 @ 2.60GHz, 2(sockets) * 14(cores) * 2(threads)
Memory: DDR4,64GB * 4
Storage: HP MT0800KEXUU,NVMe,800GB * 2
Network: Mellanox MT27500 10Gb/s
Five servers in total have been used for this testing:
- One for graphd (the query engine process)
- Three for storaged (the storage engine process).The meta service is deployed on the same hosts with the storage service.
- One for (Golang) client. A single process with multiple goroutines.
- OS: Centos 7.5
- NebulaGraph Version: V1.0.0 GA 1
- Graph Space partition: 24
Key Configs in NebulaGraph
Storaged
# One RocksDB instance per disk
# The default reserved bytes for one batch operation
--rocksdb_batch_size=4096
# The default block cache size used in BlockBasedTable.
# The unit is MB.
--rocksdb_block_cache=102400
--num_io_threads=24
--num_worker_threads=18
--max_handlers_per_req=256
--min_vertices_per_bucket=100
--reader_handlers=28
--vertex_cache_bucket_exp=8
--rocksdb_disable_wal=true
--rocksdb_column_family_options={"write_buffer_size":"67108864",
"max_write_buffer_number":"4","max_bytes_for_level_base":"268435456"}
--rocksdb_block_based_table_options={"block_size":"8192"}
Graphd
# The number of networking IO threads, 0 for # of CPU cores
--num_netio_threads=20
# The number of threads to execute user queries, 0 for # of CPU cores
--num_worker_threads=32
--storage_client_timeout_ms=600000
--filter_pushdown=false
Intro to the Dataset
Data source
LDBC Social Network Benchmark Dataset
The LDBC is designed to be a plausible look-alike of a social network site. For a detailed introduction to the dataset, see https://github.com/ldbc
Data scale
- Scale Factor is 1000
- Data size: 632GB
- Disk size occupied: ~500GB (* number of replicas)
- Number of Vertices: 1,243,792,996
- Number of Edges: 8,397,443,896
Schema
The K-Hop Out-degree Distribution in LDBC
One-hop out-degree distribution
Most vertices have less than 10 outgoing edges. Some super vertices have 700 outgoing edges.
Two-hop out-degree distribution
Most vertices have less than 2000 two-hop adjacency nodes. Some super vertices have 40000 such nodes, though.
Three-hop out-degree distribution
Most vertices have less than 100,000 three-hop adjacency nodes. Some super vertices have 2,000,000 such nodes, though.
Query Samples
K-hop without Retrieving Properties
GO 1 STEP FROM $ID$ OVER knows
GO 2 STEP FROM $ID$ OVER knows
GO 3 STEP FROM $ID$ OVER knows
K-hop with Retrieving Properties
GO 1 STEPS FROM $ID$ OVER knows YIELD knows.time, $$.person.first_name,\
$$.person.last_name, $$.person.birthday
GO 2 STEPS FROM $ID$ OVER knows YIELD knows.time, $$.person.first_name,\
$$.person.last_name, $$.person.birthday
GO 3 STEPS FROM $ID$ OVER knows YIELD knows.time, $$.person.first_name,\
$$.person.last_name, $$.person.birthday
Note: The $ID$ in the statements is the placeholder of the starting vertex for a graph traverse. It will be substituted by the some random vertex ID upon query execution.
Testing Results
The results include the throughput and latency for each query.
One-hop Results without Properties
Two-hop Results without Properties
Three-hop Results without Properties
One-hop Results with Properties
Two-hop Results with Properties
Three-hop Results with Properties
You can find the testing code and the nGQL queries in this repo: https://github.com/vesoft-inc/nebula-bench
For batch write performance, refer to the Spark Writer doc.
Share your thoughts by leaving comments below!