Features
Full-Text Indexing in NebulaGraph 2.0
1. Introduction
NebulaGraph 2.0 supports full-text indexing by using an external full-text search engine. To understand this new feature, let's review the architecture and storage model of NebulaGraph 2.0.
1.1 Architecture of NebulaGraph
As shown in the preceding figure, the Storage Service is composed of three layers. The bottom one is the Store Engine. It is a standalone local store engine, supporting get
, put
, scan
, and delete
operations on local data. The associated interfaces are in the kvstore/KVEngine.h file. Users can customize the local store plugins to meet their own needs. Currently, NebulaGraph provides a RocksDB-based store engine.
Above the local store engine, the consensus algorithm of multi-group Raft is implemented. With this implementation, each partition corresponds to one raft group, where partition is a data shard in NebulaGraph. Hash based sharding is used in NebulaGraph. For more information about how the hash functions work, see the 1.2.1 Data Storage in NebulaGraph. To create a graph space in NebulaGraph, the number of partitions is required and it cannot be changed after creation. The number of partitions must meet your needs of business expansion.
The top layer is the storage interfaces. A set of graph-related APIs are implemented in this layer. The API requests are translated into a set of KV operations on the corresponding partitions. This layer makes our storage service a real graph storage. Without it, the Storage Service of NebulaGraph is just a KV storage solution. In NebulaGraph, the KV storage is not provided as a separate service. The main reason is that a lot of computations are required to execute a WHERE
clause and the schema of a graph is needed for the computations, but the schema is not implemented in the KV store layer. The design implemented in NebulaGraph makes computation pushdown easier.
1.2 Storage of NebulaGraph
In NebulaGraph 2.0, the storage structure containing vertices, edges, and indexes is improved. Now let's review the storage structure of NebulaGraph 2.0, which could help you understand the implementation of data scanning and index scanning in NebulaGraph 2.0.
1.2.1 Data Storage in NebulaGraph
NebulaGraph stores vertices and edges based on the key-value storage model. In this section, the storage structure of the keys is introduced. The keys are composed of the following items:
Type
: One byte. It represents the key type, such as vertex, edge, index, or system.PartID
: Three bytes. It represents a partition. This field makes it easy to scan the entire partition data based on the prefix when the partition is re-balanced.VertexID
: n bytes. For an outgoing edge, it represents the ID of the source vertex. For an incoming edge, it represents the ID of the destination vertex.Edge Type
: Four bytes. It represents the type of an edge. If it is greater than 0, the edge is outgoing. If it is less than 0, the edge it incoming.Rank
: Eight bytes. It is used to identify edges of the same edge type and with the same source and destination vertices. Users can use it to represent their own business attributes such as transaction time, transaction serial number, or a sorting weight.PlaceHolder
: One byte. It is invisible to users now. In the future, it will be used when we implement the distributed transaction.TagID
:Four bytes. It represents the type of a tag.
1.2.1.1 Vertex Key Format
Type (1 byte) | PartID (3 bytes) | VertexID (n bytes) | TagID (4 bytes) |
---|
Edge Key Format
Type (1 byte) | PartID (3 bytes) | VertexID (n bytes) | EdgeType (4 bytes) | Rank (8 bytes) | VertexID (n bytes) | PlaceHolder (1 byte) |
---|
1.2.2 Index Storage in NebulaGraph
- props binary (n bytes): (n bytes): It represents the value of a property of a tag or an edge type. If the property value is
NULL
, 0xFF is used. - nullable bitset (2 bytes): It indicates whether the value of a property is NULL. It is two bytes long, which means that an index can contain a maximum of 16 properties.
1.2.2.1 Tag Index Key Format
Type (1 byte) | PartID (3 bytes) | IndexID (4 bytes) | props binary (n bytes) | nullable bitset (2 bytes) | VertexID (n bytes) |
---|
1.2.2.2 Edge Index Key Format
Type (1 byte) | PartID (3 bytes) | IndexID (4 bytes) | props binary (n bytes) | nullable bitset (2 bytes) | VertexID (n bytes) | Rank (8 bytes) | VertexID (n bytes) |
---|
1.3 Why External Full-Text Search Engine Is Used?
From the preceding figure, you can see that if you want to perform a fuzzy query of a text on a property, a full table scan
or full index scan
statement is required and then the data is filtered row by row, which will compromise the query performance. If the amount of data is large, out of memory may occur before the scanning is done. Besides, inverted indexing is against the initial design principle of indexing in NebulaGraph, so it is not implemented for text search. After some research and discussion, to make the full-text search work greatly, we decided to introduce a full-text search engine from a third party. It can ensure the query performance and reduce the development cost of the NebulaGraph kernel.
2 Objective
2.1 Functionalities
In NebulaGraph 2.0, only LOOKUP
supports text search. It means that when an external full-text search engine is available, users can run a LOOKUP
statement to perform text search. For an external full-text search engine, only some basic functionalities, such as inserting data and querying data, are implemented. To implement some complex, plain text queries, NebulaGraph needs to be polished further. Any suggestions from the NebulaGraph community are welcome. The following are the text search expressions that are supported by NebulaGraph 2.0:
- Fuzzy search
- Prefix search
- Wildcard search
- Regular expression search
2.2 Performance
In this article, I will discuss the data synchronization performance and query performance.
- Data synchronization performance: Because an external full-text search engine is used, it is necessary to store a copy of data in the external full-text search engine. It has been verified that the import performance of an external full-text search engine is lower than that of NebulaGraph. Therefore, in order not to decrease the data import performance of NebulaGraph, we decided to use a synchronous synchronization solution to import data to an external full-text search engine. For more information, see the following sections.
- Query performance: As mentioned above, if no external full-text search engine were adopted, the full-text search would be a nightmare for NebulaGraph. At present, with an external full-text search engine,
LOOKUP
supports text search, but the performance is inevitably lower than that of the native index scan of NebulaGraph, even sometimes the query performance of the external full-text search engine is low. To solve this problem, a timeliness mechanism,LIMIT
andTIMEOUT
, is needed to ensure the query performance. For more information, see the following sections.
3 Glossary
Term | Meaning |
---|---|
Tag | Defines the property structure of vertices. Tags are identified by tagId . Multiple tags can be attached to one vertex. |
Edge | Defines the property structure of edges. Edge types are identified by edgetype . |
Property | Defines the properties of a tag or an edge type. Its data type is defined in a tag or an edge type. |
Partition | Represents the smallest logical store unit in NebulaGraph. A Storage Engine contains multiple partitions. The Leader or Follower role can be assigned to a partition. Raftex ensures the data consistency between Leaders and Followers. |
Graph space | Each graph space is an independent business graph unit. Each graph space has its own independent tag and edge type set. A NebulaGraph cluster can have multiple graph spaces. |
Index | The referred index in the following sections represents the indexes on the properties of vertices and edges in NebulaGraph. Its data type is determined by the tag or edge type definition. |
TagIndex | Represents an index on a tag. A tag can have more than one index. Indexes across multiple tags have not been supported. |
EdgeIndex | Represents an index on an edge type. An edge type can have more than one index. Indexes across multiple edge types have not been supported. |
Scan Policy | Defines the index scan policy. Generally, a query statement can use multiple index scan policies, and Scan Policy decides which policy is used. |
Optimizer | Optimizes the query conditions to improve the query efficiency. For example, sorting, splitting, and merging sub-expression nodes on the expression tree of the WHERE clause. |
4 Implementation
Elasticsearch is the external full-text search engine that is supported by NebulaGraph. In this section, I will introduce how Elasticsearch works with NebulaGraph 2.0.
4.1 Storage Structure
4.1.1 DocID
partId(10 bytes) | schemaId(10 bytes) | encoded_columnName(32 bytes) | encoded_val(max 344 bytes) |
---|
partId
: Corresponds to the partition ID of NebulaGraph. Not available in NebulaGraph 2.0. It will be used for query pushdown and the routing feature of Elasticsearch in the future.schemaId
: Corresponds to thetagId
oredgetype
in NebulaGraph.encoded_columnName
: Corresponds to the property name of a tag or an edge type. The MD5 algorithm is used for encoding to avoid incompatible characters in Elasticsearch docID.encoded_val
:The maximum length is 344 bytes. To support some visible characters in the property values that are not supported by Elasticsearch docID, the Base64 algorithm is used to encode the property values, so the maximum length of encoded_val is 344 bytes. However, its actual size is up to 256 bytes only. Why is it 256 bytes? In the beginning, we just wanted to enableLOOKUP
to be used to perform text search. Similar to MySQL, the length of index in NebulaGraph is also limited and the recommended maximum length is 256 bytes. Therefore, the 256-byte length limit also is applied to the external search engine. So far, full-text search for long texts has not been supported.- The maximum length of Elasticsearch docID is 512 bytes. So far, about 100 bytes are reserved.
4.1.2 Doc Fields
- schema_id: Corresponds to
tagId
oredgetype
in NebulaGraph. - column_id: Corresponds to the property code of a tag or an edge type in NebulaGraph.
- value: Corresponds to the property value of the native index in NebulaGraph.
4.2 Synchronizing Data
Leader & Listener
In this section, I will introduce the details of synchronizing data asynchronously. Understanding Leader and Listener in NebulaGraph will help you understand the synchronization mechanism.
- Leader: NebulaGraph is a horizontally scalable distributed system and the distributed protocol is RAFT. In NebulaGraph, different roles can be assigned to a partition, such as Leader, Follower, and Learner. To write a new record to NebulaGraph, the Leader will initiate a WAL synchronization event and synchronize the event with the Followers and the Learners. When network or disk abnormalities occur, the partition role will be switched accordingly. Such a mechanism ensures the data security of the distributed database. Leaders, Followers, and Learners are controlled by the nebula-storaged process and the parameters are determined in
nebula-storage.conf
. - Listener: Unlike Leaders, Followers, and Learners, Listeners are controlled by a separate process and its configuration parameters are specified in
nebula-storage-listener.conf
. As a listener, a Listener passively receives the WAL sent by the Leader, parses the WAL regularly, and calls the data insertion API of the external full-text search engine to synchronize the data with the external engine. NebulaGraph 2.0 supports thePUT
andBULK
interfaces of Elasticsearch.
Now, let's see how the data is synchronized:
- Vertices or edges are inverted via Client or Console.
- On the Graph Service layer, the related partition is computed based on Vertex ID.
- On the Graph Service layer, the
INSERT
request is sent to the Leader of the related partitions via storageClient. - The Leader parses the
INSERT
request and then synchronizes the WAL with the Listener. - The Listener processes the newly synchronized WAL regularly, parses the WAL, and then obtains the
STRING
property values of the tags or edge types. - The metadata and the property values of the tags and the edge types are assembled to a data structure that are compatible with that of Elasticsearch.
- The data is written into Elasticsearch via the
PUT
orBULK
interface. - If writing data fails, go back to Step 5 and then try the failed WAL until the writing succeeds.
- When the writing succeeds, the successful Log ID and Term ID are recorded as the starting value for synchronization of the next WAL.
- Goes back to Step 5 to process the new WAL.
In the preceding steps, if the Elasticsearch cluster or the Listener process crashes, the synchronization of the WAL will stop. When the system is restored, the data synchronization will continue with the last successful Log ID. We recommend that DBA should monitor the state of the Elasticsearch cluster in real time by using an external monitoring tool. If the Elasticsearch cluster is inactive for a long time, a lot of logs will be generated to the Listener and the query cannot be performed normally.
4.3 Querying Data
From the preceding figure, we can see the key steps in text search as follows:
- Send Fulltext Scan Request: Generates a search request of the full-text index based on query conditions, Schema ID, and Property ID, that is, the CURL command of Elasticsearch is encapsulated.
- Fulltext Cluster: Sends a query request to Elasticsearch and obtains the result.
- Collect Constant Values: Uses the returned result as a constant value to generate an internal query expression of NebulaGraph. For example, the original request is to query the property values starting with "A" for the C1 property, and if the returned result contains both "A1" and "A2", an expression
C1 == "A1" OR C1 == "A2"
is generated. - IndexScan Optimizer: According to the newly generated expression, finds the optimal internal index based on RBO for NebulaGraph and generates the optimal execution plan.
- Fulltext Cluster: In this step, the query may be slow or massive data will be returned. Therefore,
LIMIT
andTIMEOUT
is adopted to interrupt the query on the Elasticsearch side in real time.
5 Demonstration
5.1 Deploying External Elasticsearch Cluster
I assume that you are already familiar with the deployment of an Elasticsearch cluster, so I won't describe it in detail. It should be noted that when the Elasticsearch cluster is successfully started, it is necessary to create a general template as follows.
{
"template": "nebula*",
"settings": {
"index": {
"number_of_shards": 3,
"number_of_replicas": 1
}
},
"mappings": {
"properties" : {
"tag_id" : { "type" : "long" },
"column_id" : { "type" : "text" },
"value" :{ "type" : "keyword"}
}
}
}
5.2 Deploying Nebula Listener
- According to the actual environment, modify the configuration parameters in
nebula-storaged-listener.conf
- Run this command to start the Listener:
./bin/nebula-storaged --flagfile ${listener_config_path}/nebula-storaged-listener.conf
5.3 Signing In to Text Search Clients
nebula> SIGN IN TEXT SERVICE (127.0.0.1:9200);
nebula> SHOW TEXT SEARCH CLIENTS;
+-------------+------+
| Host | Port |
+-------------+------+
| "127.0.0.1" | 9200 |
+-------------+------+
| "127.0.0.1" | 9200 |
+-------------+------+
| "127.0.0.1" | 9200 |
+-------------+------+
5.4 Creating a Graph Space of NebulaGraph
CREATE SPACE basketballplayer (partition_num=3,replica_factor=1, vid_type=fixed_string(30));
USE basketballplayer;
5.5 Adding Listeners
nebula> ADD LISTENER ELASTICSEARCH 192.168.8.5:46780,192.168.8.6:46780;
nebula> SHOW LISTENER;
+--------+-----------------+-----------------------+----------+
| PartId | Type | Host | Status |
+--------+-----------------+-----------------------+----------+
| 1 | "ELASTICSEARCH" | "[192.168.8.5:46780]" | "ONLINE" |
+--------+-----------------+-----------------------+----------+
| 2 | "ELASTICSEARCH" | "[192.168.8.5:46780]" | "ONLINE" |
+--------+-----------------+-----------------------+----------+
| 3 | "ELASTICSEARCH" | "[192.168.8.5:46780]" | "ONLINE" |
+--------+-----------------+-----------------------+----------+
5.6 Creating Tags, Edge Types, and Indexes
The name
property should be shorter than 256 bytes. If the business permits, the name
property of the player tag should be the fixed_string
type and its length should be less than 256 bytes.
nebula> CREATE TAG player(name string, age int);
nebula> CREATE TAG INDEX name ON player(name(20));
5.7 Inserting Data
nebula> INSERT VERTEX player(name, age) VALUES \
"Russell Westbrook": ("Russell Westbrook", 30), \
"Chris Paul": ("Chris Paul", 33),\
"Boris Diaw": ("Boris Diaw", 36),\
"David West": ("David West", 38),\
"Danny Green": ("Danny Green", 31),\
"Tim Duncan": ("Tim Duncan", 42),\
"James Harden": ("James Harden", 29),\
"Tony Parker": ("Tony Parker", 36),\
"Aron Baynes": ("Aron Baynes", 32),\
"Ben Simmons": ("Ben Simmons", 22),\
"Blake Griffin": ("Blake Griffin", 30);
5.8 Querying Data
nebula> LOOKUP ON player WHERE PREFIX(player.name, "B");
+-----------------+
| _vid |
+-----------------+
| "Boris Diaw" |
+-----------------+
| "Ben Simmons" |
+-----------------+
| "Blake Griffin" |
+-----------------+
6 Tracking and Solving Problems
In the process of setting up the system environment, errors in a step may make the functionalities unable to work normally. Based on user feedback, I summarized three possible error types. Here is how to analyze and solve these problems:
- Problem: The Listeners cannot be started or cannot work after startup.
- Do a check of the Listener configuration file, making sure that the
IP:Port
configuration of the Listeners does not conflict with that of the existing nebula-storaged process. - Do a check of the Listener configuration file, making sure that the
IP:Port
configuration of Meta is consistent with that of nebula-storaged process. - Do a check of the Listener configuration file, making sure that the PIDs directory and the logs directory are independent, and that they do not conflict with that of the nebula-storaged process.
- If the configuration is modified because of its errors after the Listeners are started successfully and the Listeners cannot work normally after restart, the Meta related metadata needs to be cleared. For more information about the commands, see NebulaGraph Database Manual.
- Do a check of the Listener configuration file, making sure that the
- Problem: The data cannot be synchronized with the Elasticsearch cluster.
- Make sure that the Listeners have received the WAL from the Leader by checking whether there are any files in the directory specified for
listener_path
in thenebula-storaged-listener.conf
file. - Open vlog by running
UPDATE CONFIGS storage:v=3
and make sure that the CURL command is executed successfully. If the execution fails, do a check of the Elasticsearch configuration or the compatibility between versions.
- Make sure that the Listeners have received the WAL from the Leader by checking whether there are any files in the directory specified for
- Problem: There are data in the Elasticsearch cluster but no correct result is returned.
- Open vlog by running
UPDATE CONFIGS graph:v=3
and do a check of the graph logs to confirm the reasons for the CURL command failures. - Only lowercase characters, but not the uppercase ones, can be identified during the query. It may be caused by template errors of Elasticsearch. For more information, see NebulaGraph Database Manual.
- Open vlog by running
7 TODO
- Creating full-text indexes on specified tags or edge types.
- Rebuilding full-text indexes (REBUILD)
Would like to know more about NebulaGraph? Join the Slack channel!