NebulaGraph Source Code Explained: An Overview

In NebulaGraph Source Code Explained: Preface, we introduced why this series of articles are being published. In this article, we will introduce the architecture of NebulaGraph, its source code repositories, the hierarchy of its code, and the development planning of the modules.

1. Architecture

NebulaGraph is an open-source, distributed graph database solution. In NebulaGraph, compute is separated from storage. Besides the graph database kernel, we also provide many tools for importing data, monitoring, deployment, visualization, graph compute, and so on.

For more information about the design of NebulaGraph, see NebulaGraph Architecture — A Bird's Eye View.

Architecture of NebulaGraph:

Architecture of NebulaGraph

For more information about the query engine, see An Introduction to NebulaGraph's Query Engine and An Introduction to NebulaGraph 2.0 Query Engine.

The query engine has adopted stateless design, which enables horizontal scaling in/out easily. The engine is composed of Parser, Validator, Optimizer, and Executor.

Architecture of the query engine:

Architecture of the query engine

In NebulaGraph, two storage services are provided. One is for storing metadata, which is called the Meta Service, and the other is for storing business data, which is called the Storage Service.

In the Storage Service, there are three layers. The bottom layer is Store Engine, above it is the layer for implementing the consensus algorithm of multi-group Raft, and the top layer is the storage interfaces, that is, a set of graph-related APIs.

For more information about the storage engine, see An Introduction to NebulaGraph's Storage Engine.

Here is the architecture of the Storage Service.

Architecture of the Storage Service

2. Introduction to Repositories

Welcome to the vesoft-inc project repositories. vesoft Inc. is the vendor of NebulaGraph, a distributed graph database solution.

So far, the vesoft-inc project repositories have covered the kernel of NebulaGraph, clients, tools, testing framework, compiling tools, visualization tools, monitoring product, and so on.

This article mainly focuses on the hierarchy of the major repositories of NebulaGraph and the basic functionalities of each module. More introductions to design details are coming. We hope that it could help you understand NebulaGraph better and make more contributions to the NebulaGraph community, such as submitting features, fixing bugs, and contributing to the documentation.

The following is a list of major repositories under the vesoft-inc account on GitHub:

nebula: NebulaGraph kernel
Nebula Clients
- nebula-java: Client in Java
- nebula-cpp: Client in CPP
- nebula-go: Client in Go
- nebula-python: Client in Python
Nebula Tools
- nebula-importer: A high-performance data importer based on the client in Go
- nebula-spark-utils: Spark applications, such as Nebula Spark Connector, Nebula Exchange, and Nebula Algorithm
- nebula-br: The backup and storage tool of NebulaGraph
- nebula-ansible, nebula-operator: The tool for automating the deployment of NebulaGraph clusters
Nebula Testing
- nebula-bench: Stress testing and performance testing of NebulaGraph
- nebula-chaos: Chaos framework for NebulaGraph
Compiling
- nebula-third-party: All the third-party libraries that are necessary for compiling NebulaGraph
- nebula-gears: Gears for NebulaGraph
nebula-graph-studio: The visualization tool of NebulaGraph

3. Structure of Source Files and Modules

3.1 NebulaGraph

The address of the nebula-graph repository is https://github.com/vesoft-inc/nebula-graph.

├── cmake
├── conf
├── LICENSES
├── package
├── resources
├── scripts
├── src
│   ├── context
│   ├── daemons
│   ├── executor
│   ├── optimizer
│   ├── parser
│   ├── planner
│   ├── scheduler
│   ├── service
│   ├── session
│   ├── stats
│   ├── util
│   ├── validator
│   └── visitor
└── tests
    ├── admin
    ├── bench
    ├── common
    ├── data
    ├── job
    ├── maintain
    ├── mutate
    ├── query
    └── tck

conf/: Contains the configuration files of the query engine.
package/: Contains the packaging script of the Graph Service.
resources/: Contains the resource files.
scripts/: Contains the startup scripts.
src/: Contains the source code of the query engine.
- src/context/: The context of a query, including AST (Abstract Syntax Tree), Execution Plan, execution result, and other resources for computing
- src/daemons/: The main process of the query engine
- src/executor/: The executor, implementing all the operators
- src/optimizer/: Implementing RBO (Rule Based Optimization) and providing rules for optimization
- src/parser/: Lexical analysis and parsing. Defining the structure of AST
- src/planner/: Operators and generating execution plans
- src/scheduler/: Scheduler of the execution plans
- src/service/: The service layer of the query engine, providing interfaces for authentication and executing queries
- src/session/: Managing sessions
- src/stats/: Performing statistics, such as P99 and statistics of slow queries
- src/util/: Tool functions
- src/validator/: Implementing semantic analysis, for validating the syntax and performing simple optimization
- src/visitor/: Expression visitor, for extracting information from the expressions and optimizing the expressions
tests/: BDD-based integration testing framework, for testing all the features of NebulaGraph.

3.2 Nebula Storage

├── cmake
├── conf
├── docker
├── docs
├── LICENSES
├── package
├── scripts
└── src
    ├── codec
    ├── daemons
    ├── kvstore
    ├── meta
    ├── mock
    ├── storage
    ├── tools
    ├── utils
    └── version

conf/: Contains the configuration files of the storage engine.
package/: Contains the packaging script of the storage services.
scripts/: Contains the startup scripts.
src/: Contains the source code of the storage engine.
- src/codec/: Tools for serialization and unserialization
- src/daemons/: The main process of the storage engine and the metadata engine
- src/kvstore/: Implementing a distributed KV store based on raft algorithm
- src/meta/: Implementing the metadata management service based on KVStore, for managing the metadata, the clusters, and the long tasks
- src/storage/: Implementing the storage engine based on KVStore
- src/tools/: Implementing some widgets
- src/utils/: Tool functions

3.3 Nebula Common

├── cmake
│   └── nebula
├── LICENSES
├── src
│   └── common
│       ├── algorithm
│       ├── base
│       ├── charset
│       ├── clients
│       ├── concurrent
│       ├── conf
│       ├── context
│       ├── cpp
│       ├── datatypes
│       ├── encryption
│       ├── expression
│       ├── fs
│       ├── function
│       ├── graph
│       ├── hdfs
│       ├── http
│       ├── interface
│       ├── meta
│       ├── network
│       ├── plugin
│       ├── process
│       ├── session
│       ├── stats
│       ├── test
│       ├── thread
│       ├── thrift
│       ├── time
│       ├── version
│       └── webservice
└── third-party

The nebula-common repository has the toolkit of the NebulaGraph kernel code. All these tools are commonly used and their source code is of high efficiency. Developers may be familiar with some of these tools. In this section, only some subdirectories closely related to graph databases are listed:

src/common/clients/: Implementing the meta and the storage clients in C++
src/common/datatypes/: Defining the data type and computation in NebulaGraph, such as string, int, bool, float, Vertex, and Edge
rc/common/expression/: Defining the expressions in nGQL
src/common/function/: Defining functions in nGQL
src/common/interface/: Defining the interfaces for the NebulaGraph, the Nebula Meta, and the Nebula Storage services.

Stay tuned for the next piece of the source code reading series.

Join our Slack channel if you want to discuss with the rest of the NebulaGraph community!