Graph databases are a relatively new and exciting type of NoSQL database that have been increasingly used for sophisticated applications over the past decade. Graph databases model their data using nodes and edges that form a “graph” (hence the name).
Graphs are mathematical abstractions of networks: for example, a social network where each person (or organization) is a node, and the connections between them are edges. This allows intuitive queries like: "Find me all people I went to school with", or "Get me all friends of my friends who work in advertising".
How graph databases work
A graph database is a kind of NoSQL database. It's a type of database that focuses on storing data as entities and their relations.
Graphs are a natural way to model complex relationships in data, such as user profiles, social connections and recommendations.
The term "graph" might seem a bit abstract, but it actually refers to a type of data structure that's used to model relationships between different pieces of information. For example, if you have a list of friends and their contact information, you can use a graph database to store this data in the form of nodes (pieces of information) and edges (connections between them).
Graph databases are different from other types of NoSQL databases because it stores data as nodes (vertices) and edges, rather than as rows and columns. This makes them ideal for storing highly connected data, such as social networks or recommendation systems.
They are used for a variety of applications, including social networking, recommendation systems, fraud detection and analysis, and logistics.
What are the benefits of a graph database?
Graph databases are a relatively new concept in the database world. They are best known for their ability to handle relational data, such as relationships between people, places and things. Graphs are a good match for complex relationships between entities, and graph databases can be used to model complex data in many domains.
This makes them a good fit for many real-world use cases, from social networks to supply chains, where multiple entities are connected in complex ways.
The power of graph databases in a nutshell: The best way to understand the power of using a graph database is to compare it with its traditional counterpart — the relational database.
Some of the benefits of using a graph database include:
Every entity and their relationships in the real world can be represented as a graph. This means you get all of the benefits of traditional relational databases without having to worry about foreign keys or other limitations. Also, the modeling of real-world data will be more intuitive because you are essentially abstracting the real world into a graph of entities and relations.
Graph databases support structured or unstructured data. You can store any type of data in a graph database without worrying about restructuring it first. You can also combine different types of data sources into one query if they're all represented as nodes and edges.
Simple querying for multi-hop queries. Graph databases allow you to perform multi-hop queries to find out how far away something is from another point in just one line of GQL statement, which is almost impossible in traditional relational databases.
Supports both structured data and unstructured data. Graph databases support both structured data, like tabular data, and unstructured data, like JSON, making them ideal for storing semi-structured data such as sensor data collected by IoT devices.
Graph databases are also highly performant because they allow you to query complex queries over the network of connections between objects rather than just individual objects themselves. For example: find all people who know each other through their friend groupings; find all people who like each other's posts on Facebook; or find all people who live in the same city as each other even if they don't know each other directly but may have passed by each other during their daily commute.
Graph database use cases
Graph databases can be used to support various graph-based scenarios. These include social networking, recommendation systems, fraud detection, and chatbot systems, etc.
Fraud detection is important to any organization, be it a financial service or simply a web service. With the increase in technology and the ease of information being shared with other parties, traditional methods of fraud detection have been rendered obsolete.
Graph technology is the state of the art of fraud detection. The reason is that it contains a massive amount of data, and even if one piece of information is incorrect or missing, the system will still be able to identify the user as fraudulent. Graph technology is used by financial institutions to detect fake identities (e.g., people trying to open accounts with fake ID cards), credit card fraud (e.g., someone applying for credit with a stolen credit card), and money laundering (e.g., someone trying to move money from one account into another).
Recommendation engines are widely used by websites such as Amazon, Netflix, Spotify and others for generating personalized recommendations for products or content based on a user's prior interests or history with the website.
For example, if you've bought X before then perhaps we should recommend Y? The most common type of recommendation engine is collaborative filtering which makes recommendations by analyzing user behavior data collected from previous purchases etc.
Graph databases are well suited for this task because they can leverage users' relationships with friends or colleagues to recommend relevant content.
Natural languages can be transformed into knowledge graphs and stored in a graph database. A question organized in a natural language can be resolved by a semantic parser in an intelligent question-answer system and re-organized. Then, possible answers to the question can be retrieved from the knowledge graph and provided to the one who asked the question.
Social networking is a very popular use case for graph databases. Graph databases such as Nebula Graph and Neo4j are specifically designed to support social networks. A social network is a way of representing relationships between people or things. They are often represented visually in the form of graphs that show how things are connected. Graphs can be used to represent both online and offline relationships between people; for example, Facebook uses them to represent friendships between users and LinkedIn uses them to represent professional connections between users.
Latest trends in graph database
Graph database is a somewhat novel piece of technology and it is consistently evolving. The field was once dominated by traditional powerhouse Neo4j, but now the market is much more dynamic, with new players like Nebula Graph, AWS Neptune, and Janus Graph.
Graph databases have been around for more than 20 years. They were first used by computer scientists working on artificial intelligence (AI) projects, who needed a way to model complex systems that can't be represented as tables or lists of records.
Today, businesses are increasingly using graph databases for their own needs. The graph database market is expected to grow from $1.59 billion in 2020 to $11.25 billion by 2030, according to Emergen Research.
The popularity of graph databases has been driven by several factors:
Data grows rapidly — Every day we generate more than 2.5 quintillion bytes of data globally. This is expected to grow exponentially over time as sensors become more prevalent, mobile devices become more ubiquitous and machine-to-machine communication increases
Data types vary widely — From unstructured text documents and images to structured data such as financial transaction records or sensor readings, many different types of data exist today
Data distribution varies widely — Data can be distributed across multiple locations at different scales (local vs global) and in different formats (JSON vs XML).
Because of the explosion of global data volume and the distribution of data, graph databases are becoming more important and are adapting themselves to these trends.
Here are some of the latest trends in the graph database field.
Distributed graph databases
Distributed graph databases is the latest trend in graph database as global data grows rapidly. A distributed database is a database that is split into multiple servers, so that the throughput and storage capacity of the entire system is greater than the sum of its parts.
A distributed graph database is a database that consists of two or more files located in different sites either on the same network or on entirely different networks. Portions of the database are stored in multiple physical locations and processing is distributed among multiple database nodes.
Distributed databases provide high availability and fault tolerance because they maintain multiple copies of data that can be accessed through processes known as replication and shadowing. The cost of maintaining such systems has traditionally been considered too high for general purpose applications like spreadsheets or word processors, but with the advent of inexpensive disk storage there are now many applications that require such features.
Distributed databases are ideal for large-scale applications with complex data requirements, such as e-commerce systems and the Internet. They can also be useful when there are some constraints on the location of the data — for example, if you need to keep certain records in a certain jurisdiction (eg. the EU) because they contain sensitive information or if you have regional offices that need to share data across their networks.
The technical and cost advantages of distributed systems (like Nebula Graph) over single machines (e.g. Neo4j) or small machines are more obvious due to the increasing volume of data and computation. Distributed systems allow applications to access these thousands of machines as if they were local systems, without the need for much modification at the code level.
Nebula Graph is a distributed, easily scalable, and native graph database. It is capable of hosting graphs with hundreds of billions of vertices and trillions of edges, and serving queries with millisecond-latency.
With a shared-nothing distributed architecture, Nebula Graph offers linear scalability, meaning that you can add more nodes or services to the cluster without affecting performance. It also means that if you want to horizontally scale out Nebula Graph, you don’t need to change the configuration of the existing nodes. As long as the network bandwidth is sufficient, you can add more nodes without changing anything else.
GQL - The graph query language standard
One of the disadvantages of graph databases was the lack of a standard graph query language like SQL for relational databases.
In the past, you have got Cpher, a declarative graph query language invented by Neo4j; Gremlin, a graph traversal language developed by Apache TinkerPop; and nGQL, Nebula Graph’s SQL-like graph query language.
The lack of a standard in query languages made it difficult to build applications that used different graph databases from different vendors.
However, in the last few years, there has been a lot of research activity in this area and some standardization efforts have been made by the industry.
In September 2019, members of ISO/IEC Joint Technical Committee, which is responsible for international Information Technology standards, proposed a project to create a new standard graph query language (ISO/IEC 39075 Information Technology — Database Languages — GQL). GQL is intended to be a declarative database query language, like SQL.
The GQL project proposal states:
"Using graph as a fundamental representation for data modeling is an emerging approach in data management. In this approach, the data set is modeled as a graph, representing each data entity as a vertex (also called a node) of the graph and each relationship between two entities as an edge between corresponding vertices.
The graph data model has been drawing attention for its unique advantages. Firstly, the graph model can be a natural fit for data sets that have hierarchical, complex, or even arbitrary structures. Such structures can be easily encoded into the graph model as edges. This can be more convenient than the relational model, which requires the normalization of the data set into a set of tables with fixed row types. Secondly, the graph model enables efficient execution of expensive queries or data analytic functions that need to observe multi-hop relationships among data entities, such as reachability queries, shortest or cheapest path queries, or centrality analysis.
There are two graph models in current use: the Resource Description Framework (RDF) model and the Property Graph model. The RDF model has been standardized by W3C in a number of specifications. The Property Graph model, on the other hand, has a multitude of implementations in graph databases, graph algorithms, and graph processing facilities.
However, a common, standardized query language for property graphs (like SQL for relational database systems) is missing. GQL is proposed to fill this void."
It is expected that the GQL standard become available by the end of this year(2022). In June 2022, the GQL Standard website said in an update that “It turns out that writing a database language standards is a lot of work, but we are making progress.”
Just like how SQL provided to a boost to the popularity of relational databases, the availability of GQL standard is also expected to further promote the use of graph database.
Take away: Graph database is the future of business intelligence
Graph databases have been around for decades. In recent years, however, they’ve become increasingly popular because they offer new ways of visualizing and analyzing data that can help companies make better decisions about their business processes.
It's not just a buzzword; companies like Facebook, LinkedIn, Google and Twitter are using graph databases to store their data and make sense of it.
Graph databases are a carrier of big data, which makes it possible to process anything you need using an intuitive and effective means; by using information, relational data and non-relational data as input. The graph databases market is now young enough to raise your interest – the future looks very promising for graph databases.
For now, the future of business intelligence appears to be a distinctive blend of "old and new". Relational databases will continue to be widely used but increasingly supplemented with or replaced by graph databases – with the caveat that this switch is not an "either/or" affair. Nor does it represent a radical change in attitude toward data storage: the key lies in adopting an enterprise-wide understanding of data, rather than one based on individual silos.