Pick of the Week at Nebula Graph - Schema design in Nebula Graph

Steam
2020-12-04

Pick of the Week

Normally the weekly issue covers Nebula Graph Updates and Community Q&As. If something major happens, it will also be covered in the additional Events of the Week section.

Events of the Week

  1. Nebula Graph v2.0.0-beta release note

Nebula Graph V2.0.0-beta has been released. This release supports full-text indexing, data statistics, and other new features. Also, Nebula Graph Studio and Go Importer now support Nebula Graph 2.x versions.

Nebula Graph v2.0.0-beta release note

To get more information, please read the release note.

  1. DB-Engines Ranking Has Been Updated

Nebula Graph has risen 3 places and jumped to #15 in the latest DB-Engines ranking.

DB-Engines Ranking

Nebula Graph Updates

The updates of Nebula in the last week:

  • Supports the DeleteRange operation in RocksDB to greatly improves the efficiency of edge deletion. Tags: Version 1.x, Optimization. For more information, see PR#2404.

  • Fixed the issue where using FETCH PROP ON on timestamp properties outputs int64 results. Tags: Version 1.x, bug fix. For more information, see PR#2389.

Community Q&A

This week’s topic is from community user @panda about schema design in Nebula Graph.

Spark Writer Configuration Suggestions

Before using Spark Write to import data, we need to configure application.conf.

@panda: I am not familiar with graph databases. I used to use MySQL and MongoDB. Now I am having some trouble designing schemas in Nebula Graph. Suppose we have the following schema (written in GraphQL for convenience):

// User list
type User {
    name: String,
    followings: [User],
    followers: [User],
    posts: [Post],
    topics: [Topic]
}
// Topic list
type Topic {
    name: String,
    description: String,
    user: User,
    members: [Member]
    posts: [User]
}
// Post list
type Post {
    text: String,
    member: Member,
    topic: Topic
}
// Topic member list
type Member {
    user: User,
    topic: Topic,
    name: String,
    level: Int,
    join_date: DateTime,
    posts: [Post],
}

If we design a schema in Nebula Graph based on the preceding data, since there is no concept of table association and we cannot set the associated fields, what is the right way to do this?

Can we retrieve the required scalar and associated fields all at once? Please advise.

Now we are using Nebula Graph v2.x.

Nebula Graph: You can set associated fields as edges in Nebula Graph. For example, the followings field indicates that a user follows another user, i.e., User—following—>User. So you can set the following field as an edge type, and insert edges (relationships) of this type to connect the vertices representing the users.

@panda: I know these concepts, but this is not what I’m asking. Here are what really puzzle me:

  1. Tags and edges are separated and have no strong correlation, then how should we query them? Take the preceding data for example, if we want to find all the data in the post list, shall we search for the tags first, and then search for the edges one by one? Do we need to do this for all the queries? And so does writing data? It feels super troublesome.
  2. Some edges are shared by multiple tags, for example, the user: User edge mentioned earlier is used by both Topic and Member. Wouldn’t this mess up the data? And how to distinguish them? We don’t want to use other names to create new edges.
  3. Associated fields usually have a one-to-one association, such as the user: User edge mentioned earlier, one-to-many association, such as the followings: [User] edge, and many-to-many association. How are these associations represented in Nebula Graph?
  4. For associated lists, we usually need to count the quantity, normally by aggregating the queries or setting a count field alone to hold the quantity. In Nebula Graph, is there a proper way to handle this? Do we set a count property on edges? For example CREATE EDGE followings(count int default 0);?

The official documents are too term-based. They are not based on actual scenes and cases, and many things are not explained in detail, which makes them difficult to understand.

If possible, please make a best practice document based on the preceding schema and introduce how to query and write data in such a case.

Nebula Graph: First of all, thank you for the scenario you provided. You can require the document from our technical writers.

Let’s summarize your scenario as follows:

schema design

Tags and edges are separated and have no strong correlation, then how should we query them? Take the preceding data for example, if we want to find all the data in the post list, shall we search for the tags first, and then search for the edges one by one? Do we need to do this for all the queries?

And so does writing data? It feels super troublesome.

For now, the scanning by tag feature is not ready, but it will be in the future in MATCH. For example, MATCH(p:post) RETURN p. Scanning by tag relies on indexes and will be quite memory-consuming. If the data volume is large, OOM may happen frequently.

To write data, simply set the properties according to the post, and no other operation is needed.

Some edges are shared by multiple tags, for example, the user: User edge mentioned earlier is used by both Topic and Member. Wouldn’t this mess up the data? And how to distinguish them? We don’t want to use other names to create new edges.

You might have confused tags with vertices. I suggest that you read Nebula Concepts first. In fact, there is no such thing as an edge shared by different vertices. Edge IDs consist of the source vertex ID and destination vertex ID. An edge is unique if the vertices on both ends of it, its edge type, and its rank are distinct. Any difference in these attributes generates different edges. Different edges may have the same edge type, but their vertices are usually different. Even if their vertices are the same, if their ranks are different, they are different edges.

For example, the edge type between user and topic may be focus, and that between user and member may be is.

Associated fields usually have a one-to-one association, such as the user: User edge mentioned earlier, one-to-many association, such as the followings: [User] edge, and many-to-many association. How are these associations represented in Nebula Graph?

Please refer to the preceding reply. In Nebula Graph tags and edges don’t have direct relationships, but vertices and edges do. Tags are attached on vertices, and vertices are connected with edges. So no matter it is a one-to-one association or an N-to-N one, it is represented by edges.

For associated lists, we usually need to count the quantity, normally by aggregating the queries or setting a count field alone to hold the quantity. In Nebula Graph, is there a proper way to handle this? Do we set a count property on edges? For example CREATE EDGE followings(count int default 0);?

nGQL supports aggregation operations such as COUNT and you don’t need to set a count property for counting.

Previous Pick of the Week

  1. Compiling Nebula Graph with ARM64
  2. Nebula Graph Studio V1.2.1-beta Has Been Released
Like what we do ? Star us on GitHub. https://github.com/vesoft-inc/nebula