NebulaGraph Source Code Explained: Validator

Architecture

NebulaGraph Query Engine consists of four major modules: Parser, Validator, Optimizer, and Executor.

NebulaGraph Query Engine Architecture

Parser is responsible for performing lexical and syntax analysis on a statement and generating an AST (Abstract Syntax Tree). Validator aims to convert an AST into an execution plan. Optimizer optimizes the execution plan. And Executor is designed to perform computing on the data.

In this article, we will introduce how Validator is implemented.

Structure of Source Files

The directories for the source code of Validator are src/validator and src/planner.

The src/validator directory contains the code for implementing the validators for the clauses, such as OrderByValidator, LimitValidator, and GoValidator.

validator/
├── ACLValidator.h
├── AdminJobValidator.h
├── AdminValidator.h
├── AssignmentValidator.h
├── BalanceValidator.h
├── DownloadValidator.h
├── ExplainValidator.h
├── FetchEdgesValidator.h
├── FetchVerticesValidator.h
├── FindPathValidator.h
├── GetSubgraphValidator.h
├── GoValidator.h
├── GroupByValidator.h
├── IngestValidator.h
├── LimitValidator.h
├── LookupValidator.h
├── MaintainValidator.h
├── MatchValidator.h
├── MutateValidator.h
├── OrderByValidator.h
├── PipeValidator.h
├── ReportError.h
├── SequentialValidator.h
├── SetValidator.h
├── TraversalValidator.h
├── UseValidator.h
├── Validator.h
└── YieldValidator.h

The src/planner/plan directory defines the data structure of all the PlanNodes, aiming to generate the execution plans. For example, if a query statement contains aggregate functions, an Aggregate node will be generated in the execution plan, and the Aggregate class, defined in Query.h, specifies all the information necessary for the computation of the aggregate functions, including grouping keys and aggregate expressions. In NebulaGraph, more than 100 PlanNodes have been defined. For more information, see PlanNode∷kind in PlanNode.h.

planner/plan/
├── Admin.cpp          
├── Admin.h             // administration related  nodes
├── Algo.cpp
├── Algo.h              // graph algorithm related nodes
├── ExecutionPlan.cpp
├── ExecutionPlan.h     // explain and profile nodes
├── Logic.cpp
├── Logic.h             // nodes introduced by the implementation layer
├── Maintain.cpp
├── Maintain.h          // schema related nodes
├── Mutate.cpp
├── Mutate.h            // DML related nodes
├── PlanNode.cpp
├── PlanNode.h          // plan node base classes
├── Query.cpp
├── Query.h             // DQL related nodes
└── Scan.h              // index related nodes

Additionally, the src/planner directory contains the code for implementing the planners of nGQL and MATCH statements. These planners are used to generate the execution plans of nGQL and MATCH statements.

Explaining Source Code

The entry function of Validator is Validator::validate(Sentence*, QueryContext*). It is responsible for converting the AST, generated by Parser, into an execution plan. QueryContext stores the root node of the execution plan. Here is the source code of this function.

```cpp Status Validator::validate(Sentence* sentence, QueryContext* qctx) { DCHECK(sentence != nullptr); DCHECK(qctx != nullptr);

// Check if space chosen from session. if chosen, add it to context.
auto session = qctx->rctx()->session();
if (session->space().id > kInvalidSpaceID) {
    auto spaceInfo = session->space();
    qctx->vctx()->switchToSpace(std::move(spaceInfo));
}

auto validator = makeValidator(sentence, qctx);
NG_RETURN_IF_ERROR(validator->validate());

auto root = validator->root();
if (!root) {
    return Status::SemanticError("Get null plan from sequential validator");
}
qctx->plan()->setRoot(root);
return Status::OK();

}

This function retrieves the graph space information of the current session, stores the information in ValidateContext, and then calls `Validator::makeValidator()` and `Validator::validate()`. 

`Validator::makeValidator()` aims to generate validators for clauses. It generates SequentialValidator, the entry of Validator. For each statement, SequentialValidator is generated first, and then a validator is recursively generated.

`SequentialValidator::validateImpl()` will call `Validator::makeValidator()` to generate an appropriate validator for each clause. Here is the source code of this function.

cpp Status SequentialValidator::validateImpl() { Status status; if (sentence_->kind() != Sentence::Kind::kSequential) { return Status::SemanticError( "Sequential validator validates a SequentialSentences, but %ld is given.", static_cast(sentence_->kind())); } auto seqSentence = static_cast(sentence_); auto sentences = seqSentence->sentences();

seqAstCtx_->startNode = StartNode::make(seqAstCtx_->qctx);
for (auto* sentence : sentences) {
    auto validator = makeValidator(sentence, qctx_);
    NG_RETURN_IF_ERROR(validator->validate());
    seqAstCtx_->validators.emplace_back(std::move(validator));
}

return Status::OK();

}

Similarly, PipeValidator, AssignmentValidator, and SetValidator will generate appropriate validators for the corresponding clauses.

`Validator::validate()` is responsible for generating an execution plan. The source code of this function is as follows.

cpp Status Validator::validate() { auto vidType = space_.spaceDesc.vid_type_ref().value().type_ref().value(); vidType_ = SchemaUtil::propTypeToValueType(vidType);

NG_RETURN_IF_ERROR(validateImpl());

// Check for duplicate reference column names in pipe or var statement
NG_RETURN_IF_ERROR(checkDuplicateColName());

// Execute after validateImpl because need field from it
if (FLAGS_enable_authorize) {
    NG_RETURN_IF_ERROR(checkPermission());
}

NG_RETURN_IF_ERROR(toPlan());

return Status::OK();

}

This function does a check of the information of the graph space and authentication first, and then calls `Validator:validateImpl()` to validate the clauses. The `validateImpl()` function is a pure virtual function of the Validator class. It uses polymorphism to call different `validatorImpl()` functions for different clauses. Finally, the `Validator::toPlan()` function is called to generate an execution plan. The `toPlan()` function will generate subplans for the clauses and these subplans will be connected to form a complete execution plan for the statement. For example, for a MATCH statement, `MatchPlanner::connectSegments()` is called to connect all the subplans, but for an nGQL statement, `Validator::appendPlan()` is called for connection.

# An Example

In this section, we will take an nGQL statement as an example to show the procedure.

Here is the example statement.

GO 3 STEPS FROM "vid" OVER edge WHERE $$.tag.prop > 30 YIELD edge._dst AS dst | ORDER BY $-.dst

In the Validator phase of this example statement, all these three steps are necessary:

## Generating validators for the clauses

Firstly, `Validator::makeValidator()` is called to generate SequentialValidator. In the `SequentialValidator::validateImpl()` function, PipeValidator will be generated, which produces validators for the clauses separated by the pipe symbol (|), that is, GoValidator and OrderByValidator.

## Validating the clauses

In this step, the GO and ORDER BY clauses are validated separately.

Let's take the GO clause as an example. Firstly, it is verified for the semantic errors, such as invalid aggregate functions and type mismatch in expressions. Secondly, the subclauses inside it are validated one by one. During the validation phase, all the intermediate results are stored in GoContext, which will be used as the basis for GoPlanner to generate its subplan. For example, validateWhere() stores the filter condition expressions to generate the Filter node.

cpp NG_RETURN_IF_ERROR(validateStep(goSentence->stepClause(), goCtx_->steps)); // Validate step clause NG_RETURN_IF_ERROR(validateStarts(goSentence->fromClause(), goCtx_->from)); // Validate from clause NG_RETURN_IF_ERROR(validateOver(goSentence->overClause(), goCtx_->over)); // Validate over clause NG_RETURN_IF_ERROR(validateWhere(goSentence->whereClause())); // Validate where clause NG_RETURN_IF_ERROR(validateYield(goSentence->yieldClause())); // Validate yield clause

## Generating the execution plan
  
The subplan of the GO statement is generated by GoPlanner∷transform(Astcontext*). Its source code is as follows.

cpp StatusOr GoPlanner::transform(AstContext* astCtx) { goCtx_ = static_cast(astCtx); auto qctx = goCtx_->qctx; goCtx_->joinInput = goCtx_->from.fromType != FromType::kInstantExpr; goCtx_->joinDst = !goCtx_->exprProps.dstTagProps().empty();

SubPlan startPlan = QueryUtil::buildStart(qctx, goCtx_->from, goCtx_->vidsVar);

auto& steps = goCtx_->steps;
if (steps.isMToN()) {
    return mToNStepsPlan(startPlan);
}

if (steps.steps() == 0) {
    auto* pt = PassThroughNode::make(qctx, nullptr);
    pt->setColNames(std::move(goCtx_->colNames));
    SubPlan subPlan;
    subPlan.root = subPlan.tail = pt;
    return subPlan;
}

if (steps.steps() == 1) {
    return oneStepPlan(startPlan);
}
return nStepsPlan(startPlan);

}

This function calls `QueryUtil::buildStart()` to construct the Start node first. And then appropriate methods are used to generate plans for the four different steps. In this example statement, the nStepPlan strategy is used.

Here is the source code of `GoPlanner::nStepsPlan()`.

cpp SubPlan GoPlanner::nStepsPlan(SubPlan& startVidPlan) { auto qctx = goCtx_->qctx;

auto* start = StartNode::make(qctx);
auto* gn = GetNeighbors::make(qctx, start, goCtx_->space.id);
gn->setSrc(goCtx_->from.src);
gn->setEdgeProps(buildEdgeProps(true));
gn->setInputVar(goCtx_->vidsVar);

auto* getDst = QueryUtil::extractDstFromGN(qctx, gn, goCtx_->vidsVar);

PlanNode* loopBody = getDst;
PlanNode* loopDep = nullptr;
if (goCtx_->joinInput) {
    auto* joinLeft = extractVidFromRuntimeInput(startVidPlan.root);
    auto* joinRight = extractSrcDstFromGN(getDst, gn->outputVar());
    loopBody = trackStartVid(joinLeft, joinRight);
    loopDep = joinLeft;
}

auto* condition = loopCondition(goCtx_->steps.steps() - 1, gn->outputVar());
auto* loop = Loop::make(qctx, loopDep, loopBody, condition);

auto* root = lastStep(loop, loopBody == getDst ? nullptr : loopBody);
SubPlan subPlan;
subPlan.root = root;
subPlan.tail = startVidPlan.tail == nullptr ? loop : startVidPlan.tail;

return subPlan;

}

Here is the subplan of the GO statement.

Start -> GetNeighbors -> Project -> Dedup -> Loop -> GetNeighbors -> Project -> GetVertices -> Project -> LeftJoin -> Filter -> Project

A GO statement aims to expand querying of a graph. GetNeighbors is the most important node in the execution plan. The GetNeighbors operator will access the storage service during the execution to retrieve the VID of the destination vertex according to the source vertex and the specified edge type. An expansion across multiple steps is implemented through the Loop node. Between the Start node and the Loop node, it is the subplans of Loop. When the condition is satisfied, the execution of these subplans will be looped. And then the last step of an expansion is executed outside Loop. The Project node aims to retrieve the VID of the destination vertex of such an expansion. The Dedup node is responsible for deduplicating the VIDs of the destination vertices, which will be used as the starting vertices of the next expansion. GetVertices is responsible for retrieving the tag properties of the destination vertices. The Filter node aims to filter vertices based on the conditions. The LeftJoin node works to connect the results of GetNeightbors and GetVertices.

The ORDER BY clause is designed to sort data. Its subplan generates the Sort node. After the plans for the clauses on the left and right sides of the pipe symbol (|) are generated, `PipeValidator::toPlan()` will call `Validator::appendPlan()` to connect these subplans to obtain the complete execution plan, which is shown as follows:

Start -> GetNeighbors -> Project -> Dedup -> Loop -> GetNeighbors -> Project -> GetVertices -> Project -> LeftJoin -> Filter -> Project -> Sort -> DataCollect ```

Relevant Questions

Q: How can I get the parser/GraphParser.hpp file?

A: A .h file will be generated after compiling.

Stay tuned for the next piece of the source code reading series.

Join our Slack channel if you want to discuss with the rest of the NebulaGraph community!