GQL
Unveiling ISO-GQL: The Future of Graph Query Languages
GQL, a property graph query language newly released this year by the ISO, is the second international standard database query language since SQL's debut over 40 years ago. It integrates the strengths of Cypher, SQL, and various general programming languages, making it a powerful tool for analyzing complex graph data.
Building on Cypher: How GQL Enhances Graph Queries
Enhanced Graph Pattern Matching
Powerful Label Matching Syntax
One of the standout features of GQL is its advanced label matching syntax. This allows for more complex queries using logical operators such as &
, |
, and !
.
In Cypher, a typical query might look like this:
MATCH (n:Person)
WHERE n.age > 30
RETURN n
With GQL, you can perform more sophisticated queries:
MATCH (n:Manager & FullTime | !Intern)
WHERE n.age > 30
RETURN n.name AS managerName, n.department AS dept
This GQL query finds nodes that are both Manager
and FullTime
, or not Intern
, and have an age greater than 30.
Regex-like Variable-Length Path Pattern
GQL introduces a regex-like syntax for variable-length patterns, enhancing flexibility in path matching. Examples include -[e]->{1,3}
, -[e]->+
, -[e]->*
, and-[e]->?
. This allows for more complex path pattern matching compared to Cypher’s syntax.
Path Restrictors
GQL provides four path restrictors—WALK
,TRAIL
,ACYCLIC
, and SIMPLE
—to control whether paths can have repeated nodes or edges.
Path Selectors
GQL offers versatile shortest path queries, including options for finding all shortest paths, any shortest path, and the top k shortest paths. This flexibility allows for detailed and efficient path retrieval in various scenarios.
Modern Variable Definition
Local Variable Definition
GQL introduces a dedicated syntax for defining local variables at the beginning of a procedure, similar to early C language practices. These variables are visible throughout the entire procedure scope. The introduction of local variables allows GQL to express nonlinear data flows, enabling data to be reused multiple times.
For instance, when users need to perform multiple analyses on the same dataset, GQL's local variables provide an efficient way to store and reuse intermediate results. This is particularly useful for complex calculations or when the same base data needs to be processed in different ways.
Consider a scenario where we want to analyze user interactions in a social network:
TABLE userActivity = {
MATCH (u:User)-[a:ACTION]->()
RETURN u.id AS userId, a.timestamp AS actionTime, a.type AS actionType
}
VALUE activeUsers = VALUE {
FOR i IN userActivity
WHERE i.actionTime > datetime('2024-01-01')
RETURN collect_list(DISTINCT i. userId)
}
VALUE uniqueActionTypes = VALUE {
FOR i IN userActivity
WHERE i.userId IN activeUsers
RETURN count(DISTINCT i.actionType) AS uniqueActionTypes
}
In this example,userActivity
captures the base dataset of user actions, which is then reused to identify active users and calculate action statistics. This demonstrates how GQL's local variables enable efficient data reuse and modular query construction.
The LET Statement: Another Way to Define Variables
The LET statement in GQL defines variables by adding a new column to the current output table, unlike VALUE, which creates procedure-scope visible variables. LET is non-blocking, allowing you to define new variables without interrupting the flow of existing ones, unlike Cypher’s WITH, which requires manual inclusion.
In Cypher, you must explicitly pass all existing variables when defining a new one:
MATCH (v)-[e]->(v2)
WITH v, e, v2, 1 AS a
Or use WITH *
to retain all:
MATCH (v)-[e]->(v2)
WITH *, 1 AS a
In GQL, LET
simplifies this process:
MATCH (v)-[e]->(v2)
LET a = 1
Here, v
, e
, and v2
remain accessible, and a
is seamlessly added. This makes LET
a more efficient and intuitive choice for variable definition.
Complex Nested Queries
GQL's CALL
statement allows you to encapsulate a set of statements into a subquery block, treating them as a single unit that can be referenced by outer statements. This is particularly useful when you need to apply operations on the combined results of multiple queries.
For example, to sort the combined results of two different queries:
CALL {
MATCH (v:Person) RETURN v.name AS name
UNION
MATCH (v:City) RETURN v.name AS name
}
ORDER BY name
The CALL
block enables the ORDER BY
to be applied to the unioned results, rather than just the last query.
Comparison between GQL and SQL/PGQ
SQL/PGQ extends its graph query capabilities based on SQL, and it shares a read-only graph pattern matching syntax with GQL. However, compared to SQL/PGQ, GQL provides a complete set of read-write syntax, which can better meet the various operation requirements of graph databases.
In addition, although SQL is powerful, its syntax design has some well-known issues. For example, the clause order of SQL is not intuitive and difficult to combine, and users often need to use workarounds to achieve complex requirements. Google recently published an article titled "SQL Has Problems. We Can Fix Them: Pipe Syntax In SQL," pointing out these syntax design issues of SQL and proposing a special pipeline operator aimed at making clauses more linear and combinable, thereby enhancing the flexibility and intuitiveness of SQL.
GQL, on the other hand, is inherently linear and composable. It features orthogonal statements like MATCH
for graph pattern matching, FILTER
for filtering results, FOR
for iterating over lists, LET
for variable definitions, and CALL for complex nested queries.
In addition, GQL includes INSERT
, SET
and other DML statements, which can be also elegantly combined in a linear manner with DQL statements.
For example, you can match nodes and then insert an edge between them:
MATCH (a:player{id:1, name:"Tim"}(b:player{id:2, name:"Jerry"})
INSERT (a)-[:follow{followness:90}]->(b)
The following two figures intuitively demonstrate the differences in syntax between SQL and GQL.
Figure 1: SQL's Rigid Clause Structure
Figure 2: GQL's Linear Statement Composition
Looking Forward: The Road Ahead for GQL
The first version of GQL has already included numerous features, showcasing the outstanding contributions of the standard committee members. Personally, I look forward to the possibility of future developments such as subgraph extraction, diverse table construction methods, window functions, and advanced programming constructs like conditional statements, loop statements, and lambda expressions. These enhancements will further solidify GQL's role as a powerful and user-friendly query language。
As a modern, advanced data query language, GQL excels in functionality and ease of use, making it essential for industries managing vast networks, such as telecommunications and social media. Its capability for real-time analytics provides immediate insights, which is crucial for decision-making in dynamic environments. Looking ahead, GQL's ongoing development promises to shape how we interact with data, paving the way for innovations in AI, healthcare, logistics, and beyond.
In conclusion, GQL is set to transform data management, and we eagerly anticipate its future advancements.