Tech-talk
How data visualization is implemented in NebulaGraph
This article is transcribed from NebulaGraph's recent webinar about data visualization in NebulaGraph, where three NebulaGraph engineers talked about how data visualization in implemented in NebulaGraph, the world's most powerful graph database management system.
What is visualization?
From my personal understanding, visualization is the use of certain data or text that we usually see in a relatively abstract way, such as icons or charts, for secondary display and more convenient for users to interact. Simply speaking, the common buttons and icons on web pages, such as the "Google" button in blue after typing in https://www.google.com/ and the auxiliary function icons in the lower right corner, are all simple and basic visual displays. Without these small icons, you may need to call the terminal to execute a certain command to complete the corresponding operation conveniently. The visualization can greatly reduce the operation cost as well as make it easier for users to understand the corresponding operation.
In the above example, we are talking about UI visualization. Speaking of UI, let's read the memory of more than ten years ago. As you know, in the early years, computer technology was not yet developed, and we all looked at the command line. And now, why computers can fly into ordinary people's homes? It is because the computer or computer has an interface, a UI, something that meets human intuition. So, visual things that improve efficiency are called visualization. If we were to give a definition, visualization would probably have two points.
- The above example and the GUI, such visualization is visible to everyone, but in my opinion is not the biggest highlight that reflects visualization.
- The design, or the art, which is the soul of the visualization. Because with this thing, it will give meaning to the visualization of life. In layman's terms, whether something is good or not, often not you can make it out it is good; but if it is good interaction, good experience, good visual effect, can meet the human intuition, dig out some information, it is considered a good visualization. So, I think visualization still needs two parts, one is in the physical sense, that is, the first part of the rendering, and the second part is its design experience, which involves software and hardware.
In fact, the concept of visualization can be viewed in two dimensions: broad and narrow.
In the broad sense, visualization can be understood as taking a complex piece of information and making it simpler and more intuitive to present to the user through certain processing, so that the user can receive some information more easily. This belongs to the category of visualization, like GUI, which is actually a kind of visualization, abstracting the underlying commands in a visual way, so that users can interact with the machine more easily.
In a narrow sense, it is the application of the scenario. For example, along with the development of the Internet, the amount of data we deal with will become larger and larger, we have to present these data through a certain way, the better the presentation, the better the data expression can be understood. How to display more data more smoothly on the basis of the current hardware with certain performance bottlenecks may be the meaning of the current visualization. In this way, visualization in a narrow sense, it may be charts, Canvas animations, or the rise of VR technology, browser rendering, 3D rendering in the meta-universe, these visualization researchable directions.
Specifically, because Nebula is a distributed graph database, the visualization we mention is more about data visualization. For example, Nebula Dashboard is a visual operation and maintenance monitoring product, the visualization practice in this area is not only icons and animations, but also how to make the complex operation and maintenance information easier for users to control the cluster through a simple way.
Visualization technologies
At different stages and in different scenarios, we choose different visualization technologies to better serve our products and business.
As far as visualization technologies are concerned, the common visual rendering technologies for front-end display are SVG, Canvas, CSS and WebGL, and what to use to draw SVG and Canvas is another topic. D3, as often mentioned by the Nebula community, is not a visualization technology, but essentially a visualization algorithm tool that can be used on the web or in a terminal application. D3 itself has some peripheral tools to do simple SVG wrapping, which is more on the upper layer of application tools, such as the charting library ECharts is an example.
Wang Yang, the product leader of Nebula Dashboard, said that the common visual rendering technologies such as CSS and SVG mentioned above will be customized according to the application scenarios and computing resources faced. For example, in order to facilitate third-party access, the designers of the visualization library will abstract the syntax and make the syntax sugar more scientific and standardized; for scenarios with high performance requirements, they will encapsulate the tools in order to obtain high performance. Here he mentioned Ant open source visualization engine: G2 https://github.com/antvis/g2/, G2 was inspired by the idea of the book "The Grammar of Graphics" to design this set of visualization based on the theory of graphics grammar underlying engine, G2's set of grammar implementation is also more recognized in the industry to the scientific specification. Compared with other visualization libraries, G2 is more like a complete product in terms of syntax design.
When it comes to G2, the core developer of the visual graph exploration tool Nebula Explorer, Miao Zhuang, said that although the syntax of G2 is standardized and scientific, it is not a very smooth visualization tool if you really want to do data visualization, because it is based on the theory of graph syntax and requires certain expertise, so there will be a certain threshold. More often than not, G2 may still be used as a visual charting library rather than a visualization library.
Performance issues of visualization
Nebula Explorer is a graph exploration tool and it will face the problem of node data presentation. The community also often asks how many node data can be rendered by a single canvas page. In response to the visualization performance issue, Miao Zhuang, the core product developer, said: visualization performance is not an absolute performance, it is often accompanied by the user experience. The user experience and performance are somehow inversely proportional, the better the performance, the better the user experience, and the less information will be presented; while presenting more data, it will consume more hardware resources. This is a balance problem.
And there are two ways to improve the performance of visualization: one is to improve the hardware configuration, before the very popular terrain - distributed concurrency is not enough what to do? Add a machine, there is nothing that can not be solved by adding a machine. But the simple and brutal method of adding machines is not a good policy, although the physical limitations are solved but the cost of adding the problem, in addition to the cost of the machine and the corresponding system maintenance costs. A machine, or more than one machine to join the system will certainly bring the complexity of the system increases. In addition, adding machines is not a universal solution, some users may not be able to add machines as they wish. Another is to optimize the algorithm, compared to adding machines, method 2 is more difficult, the current algorithm is very good, and continue to optimize only the marginal decreasing effect.
Nebula visualization tool optimization practice
Nebula Explorer optimization performance is also from the above two hardware and software direction to start. In terms of hardware, the hardware performance of different users is different, so Explorer will choose better hardware to achieve cool effects during the development process; while for users with poor hardware performance, it provides two modes: high quality and normal quality, adapting to the user's hardware to achieve a better rendering effect. Although Explorer's 3D graphics exploration mode looks cool and "eats" computing resources, the hardware requirements for this feature are not as high as you might think, and any phone's GPU configuration can run Explorer 3D mode.
In fact, in addition to the performance optimization of hardware and software, there is also a strategic optimization of product design. For example, some of the more difficult performance problems can be circumvented. In terms of performance optimization, there is a time-honored principle, which is to ensure that 60 frames are rendered in 1 second. Even in special scenarios where 60 frames a second is not possible, achieving more than 40 frames a second will keep the system smooth. This is like an optimized implementation of React, which breaks up a lot of big tasks into smaller ones, and then plugs in smaller ones when the system is free. Another way to think about this is that if the problem cannot be solved at the code level, you can modify the corresponding interaction. For example, a long time-consuming task is running, and if the page is not clickable, you will feel very laggy, right? However, adding a loading animation at this time will greatly reduce the user's waiting anxiety and make the user feel less laggy.
Here is a personal experience: when Nebula Dashboard first launched the visualization screen, we stepped through the performance pitfalls. When the visualization screen was first launched, I just thought it was a simple screen that might not involve too many performance problems. As a result, once the CPU was spinning like crazy, I was confused: Nebula Dashboard visualization screen only had 7 or 8 charts. Although the interface was cool, there were actually only a dozen network requests, so we ruled out the problem that the CPU was consuming too many resources in the request calculation. In addition, when the big screen was online, there were not many concurrent cases… After a round of checking, we finally located that the SVG was consuming too much resources. So we adjusted the animation implementation scheme: for animations with high CPU requirements, we downgraded them and reduced the number of rendering and rendering frequency. This is a bit like image compression, you throw a 1 MB image to the compression site, the compression may be 200 KB, at this time the user's naked eye is not aware of it.
Visualization technology selection
As mentioned above, hardware and software can improve the performance of visualization, but a good choice of technology can also affect the performance of your system.
Before Nebula Explorer v2.2, due to the technical selection and business logic organization, once we encountered hundreds of nodes displaying problems, we would encounter lag and the animation was obviously not smooth. After that, Nebula Explorer was refactored and replaced by G6, a graph visualization engine based on G2, which performed poorly in Nebula Explorer, probably due to G6's over-focus on business scenarios and lack of validation for high-performance scenarios. After switching to Force Graph, the performance has improved by about 10 times. Of course, it is not that the performance of G6 is not good, but the visualization requirements of Nebula Explorer itself are not MATCH with G6.
On the contrary to Nebula Explorer, Nebula Dashboard does not have thousands of points of data display, but a relatively fixed number of graphs and the corresponding operation execution. For monitoring charts, Grafana is currently the most recognized product in the market, and Grafana itself is an open source product. So, we researched Grafana's implementation and found that it uses the lightweight uPlot https://github.com/leeoniya/uPlot. Because uPlot is less than 40kb, it features time series charting and is easy to do custom development. In addition to monitoring, Nebula Dashboard also has an important function for cluster management and maintenance operations. This is a departure from Grafana's focus on monitoring, and the three criteria for selecting Nebula Dashboard are: first, easy to develop and start; second, active open source community; third, many base libraries to improve development efficiency. In the end, G2 was chosen as the underlying visualization tool for Nebula Dashboard in order to improve development efficiency.
Visual Product Design
Nebula Explorer has two graph exploration modes, 2D and 3D, which involve how the pages are laid out. In the Nebula Explorer design, algorithms are used to achieve aggregation and hashing of specific groups of points; and the full graph bird's eye view mode is displayed in 3D mode, because the mainstream computer is 1920*1080, and a plane can display up to a million pixels, then we can use 3D mode to present more data than 2D when we change the angle.