LLM
NebulaGraph 1.0 Benchmark Report based on the LDBC Dataset
Testing Environment
Specs
CPU: Intel® Xeon® CPU E5-2697 v3 @ 2.60GHz, 2(sockets) * 14(cores) * 2(threads)
Memory: DDR4,64GB * 4
Storage: HP MT0800KEXUU,NVMe,800GB * 2
Network: Mellanox MT27500 10Gb/s
Five servers in total have been used for this testing:
- One for graphd (the query engine process)
- Three for storaged (the storage engine process).The meta service is deployed on the same hosts with the storage service.
- One for (Golang) client. A single process with multiple goroutines.
- OS: Centos 7.5
- NebulaGraph Version: V1.0.0 GA 1
- Graph Space partition: 24
Key Configs in NebulaGraph
Storaged
# One RocksDB instance per disk
# The default reserved bytes for one batch operation
--rocksdb_batch_size=4096
# The default block cache size used in BlockBasedTable.
# The unit is MB.
--rocksdb_block_cache=102400
--num_io_threads=24
--num_worker_threads=18
--max_handlers_per_req=256
--min_vertices_per_bucket=100
--reader_handlers=28
--vertex_cache_bucket_exp=8
--rocksdb_disable_wal=true
--rocksdb_column_family_options={"write_buffer_size":"67108864",
"max_write_buffer_number":"4","max_bytes_for_level_base":"268435456"}
--rocksdb_block_based_table_options={"block_size":"8192"}
Graphd
# The number of networking IO threads, 0 for # of CPU cores
--num_netio_threads=20
# The number of threads to execute user queries, 0 for # of CPU cores
--num_worker_threads=32
--storage_client_timeout_ms=600000
--filter_pushdown=false
Intro to the Dataset
Data source
LDBC Social Network Benchmark Dataset
The LDBC is designed to be a plausible look-alike of a social network site. For a detailed introduction to the dataset, see https://github.com/ldbc
Data scale
- Scale Factor is 1000
- Data size: 632GB
- Disk size occupied: ~500GB (* number of replicas)
- Number of Vertices: 1,243,792,996
- Number of Edges: 8,397,443,896
Schema
The K-Hop Out-degree Distribution in LDBC
One-hop out-degree distribution
Most vertices have less than 10 outgoing edges. Some super vertices have 700 outgoing edges.
Two-hop out-degree distribution
Most vertices have less than 2000 two-hop adjacency nodes. Some super vertices have 40000 such nodes, though.
Three-hop out-degree distribution
Most vertices have less than 100,000 three-hop adjacency nodes. Some super vertices have 2,000,000 such nodes, though.
Query Samples
K-hop without Retrieving Properties
GO 1 STEP FROM $ID$ OVER knows
GO 2 STEP FROM $ID$ OVER knows
GO 3 STEP FROM $ID$ OVER knows
K-hop with Retrieving Properties
GO 1 STEPS FROM $ID$ OVER knows YIELD knows.time, $$.person.first_name,\
$$.person.last_name, $$.person.birthday
GO 2 STEPS FROM $ID$ OVER knows YIELD knows.time, $$.person.first_name,\
$$.person.last_name, $$.person.birthday
GO 3 STEPS FROM $ID$ OVER knows YIELD knows.time, $$.person.first_name,\
$$.person.last_name, $$.person.birthday
Note: The $ID$ in the statements is the placeholder of the starting vertex for a graph traverse. It will be substituted by the some random vertex ID upon query execution.
Testing Results
The results include the throughput and latency for each query.
One-hop Results without Properties
Two-hop Results without Properties
Three-hop Results without Properties
One-hop Results with Properties
Two-hop Results with Properties
Three-hop Results with Properties
You can find the testing code and the nGQL queries in this repo: https://github.com/vesoft-inc/nebula-bench
For batch write performance, refer to the Spark Writer doc.
Share your thoughts by leaving comments below!