\jyear

2022

[1]\fnmMin \surWu

Nebula Graph: An open source distributed graph database

[email protected] \fnmXinglu \surYi [email protected] \fnmHui \surYu [email protected] \fnmYu \surLiu [email protected] \fnmYujue \surWang [email protected]

Abstract

This paper introduces the recent work of Nebula Graph, an open-source, distributed, scalable, and native graph database. We present a system design trade-off and a comprehensive overview of Nebula Graph internals, including graph data models, partitioning strategies, secondary indexes, optimizer rules, storage-side transactions, graph query languages, observability, graph processing frameworks, and visualization tool-kits. In addition, three sets of large-scale graph benchmark tests are conducted.

keywords:

graph database, graph processing, graph query language, graph benchmark

1 Introduction

In the last decade, new technological applications and development such as social media networks, banking fraud and money laundering networks, coronavirus contact tracing, and knowledge graphs have generated a huge amount of highly-related data. Traditional relational databases (RDBMS) cannot process such in-depth complexity for real-time responses. Thus, graph databases (GDBMS), wherein entities of interest are represented by vertices or nodes and relationships between them by edges have recently regained high prevalence dbengineranking . Furthermore, as the amount of data produced is enormous and ever-increasing, distributed graph databases are inevitable to address problems of a big amount of data, rather than a single server solution.

This paper introduces the recent work of Nebula Graph (v3.0), an open-source¹¹1Apache License, Version 2.0., distributed, scalable, and native graph database. It is capable of hosting graphs with hundreds of billions of vertices and trillions of edges, and serving queries with millisecond-latency. It is capable of scaling both vertically and horizontally, and automatically sharding a large graph into multiple subsets stored in multiple servers. It also has high availability backed by the Raft protocolongaro2015Raft .

2 Nebula Graph Architecture

Nebula Graph consists of three major components: Nebula Core, Nebula Algorithms & Analytics, and Clients, Tools & Visualization. In this paper, we mainly focus on the internals of Nebula Core, with some explanations of the other two components.

Refer to caption — Figure 1: Major components of Nebula Graph.

2.1 Nebula Core

Nebula Core serves as the OLTP (online transaction processing) part of the Nebula Graph, which is responsible for answering enormous queries within a local scope of the graph, such as neighborhoods, certain restricted traversals, paths, sub-graphs, lookups, inserts, deletes, and updates of a set of vertices and edges. The goal of Nebula Core, which is written in C++, is to achieve high throughput (TPS/QPS) with scalability as the top design priority.

To achieve the best performance in a large cluster, Nebula Core applies a separation architecture for storage and compute. It consists of three services: the Meta Service, the Storage Service, and the Graph Service.

Each service has its own executable binaries that can be hosted on a Kubernetes clusterk8s . Therefore, users can deploy a Nebula Core cluster on a single server or multiple servers.

Fig. 2 shows the architecture of a typical Nebula Core.

The Meta Service is responsible for metadata management, such as schema operations, cluster administration, user privileges, and license management.

All the meta processes form a Raft-protocol group, which initiates an election with quorum-based periodical terms. In the group, there are some roles, (Raft-)leader, followers, listeners, and drainers. The leader is unique. It is elected by all of the candidates: the leader and followers. Listeners and drainers do not participate in the election. Only the leader can provide services to the clients or other components of Nebula Graph. The followers run in a standby way and each has the data replication of the leader. Once the leader fails, usually due to a heartbeat timeout, one of the followers will be elected as the new leader. The listeners and drainers will be explained in Sec. 2.4 and 2.7.

The Storage Service stores all the graph data. There are three layers: the storage interface, which defines a set of APIs that are related to the graph operations (i.e., get neighbors, lookup, insert, delete, et al.), the consensus layer, which implements the Raft protocol to ensure the strong consistency between data replicas and high availability, and a key-value library, backed by RocksDBrocksdb for now.

The Graph Service is stateless. It supports two SQL-like declarative languages, openCyphercypher (DQL) and nGQL (a Nebula-native query language, which implements DML, DCL, etc.). A query is executed by a single process with multiple threads. But more graph processes can join to serve more queries. The workflow of the Graph Service is discussed in Sec. 2.5.

2.2 Data Model

Nebula Graph follows the most popular graph data models in industry, property graph, with some extensions and variants.

•

Graph spaces: A graph space is used to isolate data from different teams or projects. It defines property data types, storage replications, privileges, data partitions, etc.
•

Vertex: Vertices are identified with a graph-space-scope unique identifier, a.k.a. the VID (or Vertex ID). VIDs are provided by users.
•

Tag: Tags are used to categorize vertices. A tag defines a set of given property types. A vertex can have zero or more tags. It is similar to labels with constrains in Neo4jneo4jsite .
•

Edges: An edge is a connection between two vertices. An edge must have a direction.
•

Edges Types: Multigraphs are also supported (Multiple edges between two vertices are allowed). The edge type defines a given set of property types, and gives a name to this type of edges. An edge must have one and only one edge type.
•

Edges Rank: The rank value is an immutable user-assigned 64-bit signed integer. It distinguishes the edges with the same edge type between two given vertices. An edge must have one and only one edge rank.
•

Property: A property is a key-value pair. Properties are defined by tags or edge types. SQL-like data types, such as numeric, boolean, string, text, date-time, NULL, and geo-spatial are supported.

2.3 Graph Partitioning and Data Format

Since a very large graph cannot be fully stored and processed on a single server, graph partitioning is inevitable. Though there is plenty of research and algorithms on the optimal (e.g, min-cut) graph partitioning, streaming/dynamic graph partitions are few. Furthermore, graph partition algorithms are well known to be NP-hard and seldom used in industrial OLTP graph databases.

Therefore, Nebula Core chooses a straightforward way to model vertices and edges as key-value pairs. So it can rely on a lot of features of sophisticated distributed key-value systems.

And the number of partitions is determined once a graph space is created. Every server can host multiple partitions (usually 2-20), and the partitions can migrate between servers for data balancing and workload balancing. But key-value pairs can not migrate between partitions since they are located by a static VID hashing strategy.

Specifically, a vertex is modeled as key-value pairs (Fig. 3): One key (Type+PartID+VertexID), which represents the vertex itself, denotes the vertex’s partition ID and its VID, and the value is empty. Optionally, in addition to the same prefix, the other key (Type+PartID+VertexID+TagID), which represents the properties of a specific tag, includes a TagID which denotes the tag of the vertex. All the properties defined by this TagID are serialized as the value part. If this vertex has one more tag (and therefore properties), then one more key and the serialized value are added accordingly. See also Table. 1.

In the format of an edge shown in Fig. 4, the PartID is the same as that of the source vertex, and is followed by the Vertex’s VID, EdgeType, Edge-Rank, destination vertex’s VID, and a reserved placeholder for MVCC. The properties are also serialized in the value part. Besides, to facilitate reverse traversing, from a destination vertex back to a source vertex, a reversing edge is automatically inserted. The positions of source and destination VIDs are swapped. See Fig 5.

It’s worth noting that a vertex and its neighboring edge keys share the same prefix. So in most key-value systems, it is equivalent to a partitioned adjacency list. Therefore the storage and processing locality between multiple servers can be well guaranteed. It also works for most breadth-first search algorithms, which can be concurrently executed as the traversal depth increases.

Moreover, the graph structure can be entirely represented by the edge’s key part, without the need of the value part, which represents edge properties. The graph-structure-based traversing is the most important query patterngql2017 in the sense of a graph database. Therefore, some key-value separation technologieslu2017wisckey can be introduced to accelerate read latency. All the keys (both the vertex and edge keys) can be cached in the memory to gain better performance, and the serialized values are maintained in the SSD disks (and partially cached in main memory). See the benchmarking in Sec. 3.

2.4 Secondary Indexing on Property

Nebula Graph is designed to be a primary GDBMS for applications. Vertex keys and edge keys are primarily indexed as a classic log-structured-merge (LSM) tree by key-value libraries. The properties can either be directly fetched through VIDs (key), or indirectly accessed through the secondary indices.

There are three types of secondary indices in Nebula Graph: native index, full-text index, and geography index. The former index relies on the same key-value library (RocksDB) as explained in Sec. 2.1. And the full-text index relies on a Raft-listener with Elastic Searchgormley2015elasticsearch . Geo-spatial indexes use Google’s S2 librarygoogles2 , which is straightforward and will not be illustrated in this paper. But we’d like to mention that unique indexes or constraints are not implemented in Nebula Graph.

Basic property data types (i.e., numeric, boolean, string, date-time) can be accessed by native indexes. An illustrative example of vertices and their indices are given in Table. 1 and 2. We insert two example vertices (VID 50, 60) with two Tags. The example of the corresponding key values are shown in Table. 1.

⬇

– tag-a defines three properties: pa-1,

– pa-2, pa-3. and example inserted value

– is ta-1, ta-2, ta-3;

INSERT VERTEX tag-a (pa-1, pa-2, pa-3)

VALUES 50: (ta-1, ta-2, ta-3)

\par– insert two vertices 50 and 60 with the

– same tag-b, which defines two properties:

– pb-1, pb-2, and the same value: tb-1, tb-2

INSERT VERTEX tag-b (pb-1, pb-2) VALUES 50:

(tb-1, tb-2)

INSERT VERTEX tag-b (pb-1, pb-2) VALUES 60:

(tb-1, tb-2)

\par– create a composite index on pa-1 and pa-2

– but not on pa-3.

CREATE TAG INDEX i-a ON tag-a (pa-1, pa-2);

– create a single index on pb-2

CREATE TAG INDEX i-b ON tag-b (pb-2);

Table 1: The example vertices

Key			Value
PartId	VertexId	TagId	Serialized property
100	50
100	50	tag-a	ta-1 ta-2 ta-3
100	50	tag-b	tb-1 tb-2
101	60
101	60	tag-b	tb-1 tb-2

Table 2: The vertices’ indices

Key
PartId	Index-Id	Index-Binary	Index-Length	VertexId
100	i-a	ta-1 ta-2	length(ta-1) length(ta-2)	50
100	i-b	tb-2	length(tb-2)	50
101	i-b	tb-2	length(tb-2)	60

The format of index is shown in Table 2. The vertex 50 and all its indices are stored in the same partition for locality. Index-Id tells the related tags and properties. Index-Binary is the serialized binary format of corresponding properties. The Index-Length can segment the Index-Binary by each property’s length (E.g., ”ab”+”cd” or ”abc” + “d”). The VertexId is not put in the value part but in the key part. Because two vertices may have the same tag, the same properties and in the same partition by chance. Therefore, the vertex’s index is seek-ed by prefix, PartId + Index-Id + Index-Binary + Length, to get the VertexId. The value part of index is empty.

Because the key of the index is unchangeable, to update on the vertex property will trigger a long sequence of I/O operations: read old index, delete old index, write new index, and write/update vertex. The write performance is slower since the read index step is a random disk operation. So two special operations are supported for massive initial loading of the graph space: IGNORE_EXISTED_INDEX when inserting vertices without their indexing, and later run REBUILD INDEX to make all indexing updated. It can be seen that both are sequential operations on disks, therefore much faster.

For text-based searches, the native index is not suitable, since text can be much longer and fuzzy search is required. Therefore, a third-party tool Elasticsearch is introduced. To synchronize data from Nebula Core to Elastic Search, a new role called text-listener is implemented. In the Raft protocol, a listener works similarly to a Raft follower, but it will not ever engage in the Raft quorum election. The text-listener only subscribes to the write-ahead log (WAL) from the leader and asynchronously sends data to an Elasticsearch cluster. And the text-based index searches will be redirected to the Elasticsearch cluster by Graph Service as in Fig. 6.

For a read request, the index-choosing strategy is implemented in the Optimizer (Sec. 2.5) in Graph Service. And it is a set of rules: single property/single index comparison (=, $<$ , etc), single property prefix comparison (equal, less, start from, etc), multi-property/composite index comparison (composite index), text fuzzy comparison. And if no index is available, a full vertex/edge scan will be conducted. The matching rule is chosen in the above order. More details about these optimization rules will be discussed in Sec. 2.5.

2.5 Computation Pushing Down and Optimizer Rules

In the early version of Nebula Graph (v1.0), all the nGQL clauses can be directly mapped to physical executors. This is hard for the Graph Service to rewrite and generate an optimized execution plan especially when a query is long and complex (e.g. LDBC-SNB testsLDBC-SNB ). So the performance heavily depends on how the user writes the query sentences. Furthermore, as being compatible with openCypher in v2.0, these two languages can generate different plans with different executors. These make the system too complex and hard to profile queries. Therefore, the classical ”parser,optimizer,executor” workflow in DBMS have been introduced in Nebula Graph v2.0 (Fig. 7). These two front-end languages can share the same logical optimizer rules to generate a unified execution framework (back-end).

When a query statement is sent to the Graph Service, it is parsed and validated to generate an abstract syntax tree, AST, and then the initial plan. The initial plan is optimized by the Optimizer to be physical plans, which are executed by a group of executors. Except for the basic executors as in relational algebra (e.g., selection, projection, Cartesian product, union, and difference), additional graph-aware executors (totally 113) are introduced²²2https://github.com/vesoft-inc/nebula/tree/release-3.1/src/graph/planner/plan. For example, GetNeighbors gets the neighboring VIDs of a given VID, GetProps gets the property of a vertex or an edge, and Loop matches the variable-length path.

Because of Nebula Graph’s separate architecture for storage and compute, minimizing the remote data transfer from Storage and Graph Services is a major performance issue to be considered in the design of optimizer. The other goal is to search and choose a optimized sequence of executors to replace initial less-efficient executors according to predefined rules. That is a rule-based optimizer (RBO).

The Graph Service implements the cascades frameworkgraefe1995cascades for query optimization. In this framework, there are two steps: exploration and transformation. In the previous step, all rules are applied to the initial plan and expanded to a larger forest of plan trees. To implement this step, two utility classes are designed, OptGroup and OptGroupNode: An OptGroup contains a group of OptGroupNodes. In the same OptGroup group, all OpGraphNodes are considered to be equivalently exchangeable. Therefore the Optimizer can choose a better OptGroupNode. Every OptGroupNode depends on other OptGroups, but not OptGroupNode. The arrows in Fig. 8 show the dependency. All the dependencies form a plan forest.

In the exploration step, from the leaves of this forest, every pattern rule is applied to all the fragments (sub-plan) of the forest. If a sub-plan matches this pattern, this sub-plan is replaced to a new sub-plan. Both the match and replacement are predefined in the optimization rules. This new sub-plan is considered to be better than the original sub-plan. Every rule defines three interfaces:

⬇

struct OptRule {

// This plan Node matches

// the rule’s pattern

bool match(OptGroupNode);

\par// Transform current sub-plan

// fragments to generate a new one

Result transform(MatchedResult);

\par// Current rule pattern definition

Pattern pattern();

}

After all rules are applied in the exploration step, the optimized physical plan is considered to be generated.

The comprehensive rules are listed here³³3 https://github.com/vesoft-inc/nebula/tree/release-3.1/src/graph/optimizer/rule. Furthermore, to implement the popular Cost-Based Optimizer (CBO), some preliminary statistics should be collected or sampled, including hardware (memory, CPU, network traffic, and I/O. See Sec. 2.11), metadata (in-degree, out-degree, distribution, index cardinality, etc). The latter is still working on and not available currently.

2.6 Transaction on Storage Side (TOSS)

Nebula Graph chose to support the style of NoSQL databases. Therefore the ACID in sense of DBMS is not full-filled due to some write/read performance considerations of a distributed system. Especially, the isolation level is not guaranteed. But as shown in Fig. 5, a single insertion of one edge will be amplified to two key-value (out-edge and in-edge) writes. To guarantee the atomic remote operation of these two key-value pairs, a light-weight technique inspired byescriva2015warp and two-phase locking (2PL) called Transaction on Storage Side (TOSS) is introduced.

Usually, two partitions are engaged in the writing procedure. One partition (out-edge partition) serves the out-edge key-value pair, and the other (in-edge partition) serves the in-edge key-value pair. Besides, every partition can have multiple replicas (Raft leader and followers) based on the Raft protocol for avoiding possible crashes. These two partition leaders are usually located on two different servers.

The out-edge partition receives an edge write request from the client, and then it starts a spinning lock (SRC_VID+EdgeType+Rank) for itself. Then a message is sent by the out-edge partition to the in-edge partition to ask for writing the in-edge key-value pair. After the write’s success callback from the latter is acknowledged, the out-edge partition unlocks. But there could be possible failures or crashes during this procedure. Therefore the Raft protocol is used as well. The out-edge partition’s leader commits a special item for this lock to all the followers when locking/unlocking. Even if any crash occurs, all engaged roles can replay the states according to Raft and continue the procedure until a success. Unlike DMBS transaction, no rollback takes place in this procedure. The workflow is shown in Fig. 9.

2.7 High Availability Crossing Data Centers

Usually, one cluster of Nebula Graph should be deployed inside a data center due to the network fluctuation and heartbeat timeout constraints. And to populate persistent meta and data crossing multiple data centers, three roles, a data-listener, a meta-listener and a drainer are implemented (Fig. 10).

As an example, a meta listener is deployed to subscribe to the Raft-WAL from Meta Service, and a storage listener for the data from the Storage Service. Usually, these two types of listeners are deployed together with the primary cluster (the major cluster for the application’s writing). And then these two listeners write asynchronously to a remote drainer who is deployed in the same data center with the secondary cluster (the read-only cluster). Basically, the drainer works as a client of the secondary cluster. So the two clusters are working independently, and the configurations and deployments of them both can be heterogeneous.

Both listeners and drainers are stateful, they all have their own WAL for possible failures. In order to deploy more read-only secondary clusters, a chained-populate method is suggested, i.e., 1-to-1 cluster synchronization. There are some registration operations on every role for the upstream-downstream dependency. Listeners or drainers with the same kind share the overall workload burden by a M-to-N round-robin scheduling (partition-to-listener, listener-to-drainer).

2.8 Query Languages

As mentioned above, Nebula Graph supports two declarative languages, openCypher 9 (DQL part) and nGQL, a Nebula-native query language (v3.0NebulaManual ). But neither Apache Gremlinrodriguez2015gremlin nor W3C SPARQLworld2013sparql is supported. And there are now two groups (ISO/IEC JTC 1 N 14279 and JTC 1/SC 32 N 3228) working on the international standards of graph query languages, which are planned to be released in 2023. So Nebula Graph is planned to be modified accordingly to meet the upcoming international standards.

2.8.1 Data Definition Language (DDL)

Nebula Graph is strong-typed, and not entirely schema-free as Neo4j. The types and properties of vertices and edges must be defined before any usage. A simplified syntax of creating an edge type (edge’s type and its property definition) is shown as follows:

⬇

CREATE EDGE [IF NOT EXISTS]

<edge_type_name> ( <prop_name>

<data_type> [NULL | NOT NULL] )

E.g., an edge type called write_paper is created with one property (wtime):

⬇

CREATE EDGE write_paper (wtime DATE);

Except creation (CREATE EDGE), statements for dropping (DROP EDGE), altering (ALTER EDGE), SHOW EDGES, and DESCRIBE EDGE are also supported. They can be inferred directly from SQL’s DDL of tables. The syntax of operations on vertex types (TAG) is similar and neglected.

2.8.2 Data Manipulation Language (DML)

After running DDL clauses, data can be written into Nebula Core through DML. And then possible Extract-Transform-Loaded (ETL) to the Nebula Algorithms & Analytics Framework (Sec. 2.9). A simplified syntax of writing an edge is shown as follows:

⬇

INSERT EDGE <edge_type>(<prop_name_list>)

VALUES <src_vid> -> <dst_vid>[@<rank>]

: (<prop_value_list>)

Operations of creation (INSERT EDGE) and deletion (DELETE EDGE) are implemented. E.g., to insert an edge representing that an author writing a paper,

⬇

INSERT EDGE write_paper (wtime) VALUES

”author_abc” -> ”paper_123”

: (DATE(”2022-04-20”));

If an insertion’s success is acknowledged, the majority of Raft replicas in the Storage Service are written synchronously. But full-text indexes of Elasticsearch and crossing-data-center replications (if possible) are written asynchronously by corresponding listeners. If an input parameter does not meet the defined data type (e.g., try to write ”hello” as wtime), an error will be raised. And the write request fails with nothing happened.

2.8.3 Security and Data Control Language (DCL)

Access control of Nebula Graph is role-based. OpenLDAP is implemented for authentication. Clauses CREATE, DESCRIBE, ALTER, DROP, SHOW on USER and GRANT/REVOKE ROLE are used to control graph-space level privileges of different roles. Except for authentication, data transmission with SSL encryption is supported.

There are commands to maintain the system. ADD/REMOVE HOST is used to scale out/in the cluster; BALANCE DATA is used to re-balance data placement according to workload; CREATE SNAPSHOTS is used to create a snapshot backup of the database and can be uploaded on cloud (S3) and to restore snapshots by Backup & Restore tools.

2.8.4 Data types, Functions, and the Data Query Language (DQL)

Except for basic data types (such as numeric, boolean, string, text, datetime, NULL, and geo-spatial), some composite data types are also supported (List, Set, Map, Path, and Subgraph). Data types and functions are designed to follow the definition of openCypher. Frequently used (but not all) openCypher DQL clauses (MATCH, OPTIONAL MATCH, WITH, GROUP BY, ORDER BY, LIMIT, WHERE) are implemented.

And there are some extensions. Three well-tuned clauses are provided as syntactic sugar: GO (breadth-first searches), FETCH (get vertex/edge properties), LOOKUP (directly use secondary indexes).

All of these clauses share a common set of physical operators. The compatibility and performance are tested with the Cypher technology compatibility kit (TCK)openCypherTCK and LDBC Social Network Benchmark (LDBC-SNB), both of which are the most wildly used test cases in the graph database industry.

2.9 Nebula Algorithms & Analytics Framework

Nebula Algorithms & Analytics Framework is the OLAP (Online analytical processing) part of Nebula Graph. The OLAP requests are not processed at interactive speed, and the computation is mostly complex and global graph in scope. These computations mostly adopt the typical ”Gather-Apply-Scatter” abstractiongonzalez2012powergraph .

For now, there are two popular open-source frameworks that are supported natively in Nebula Graph, i.e., Spark/GraphXgonzalez2014graphx and Tencent/Platotencentplato . An ETL progress, which is implemented as a Spark job, is used to transfer subgraph data from Nebula Core directly into these frameworks. The ETL process is aware of the source graph’s schema and privileges. It is also aware of the target framework’s format (e.g., DataFrame in Spark, or the pre-sorted id’s in Plato). Therefore it does not have to store the source graph to an intermediate CSR or CSV file but in the direct data format of the framework. About 30 graph algorithms like Jaccard similarity, Triangle Count, Node2Vec, etc, are implemented as supplements to the original open source algorithms.

2.10 Clients, Tools & Visualization

Nebula Graph uses FBthriftwatson2014under as its Remote Procedure Call (RPC) framework. Therefore all available SDKs, including Java, CPP, Python, and Golang must be compiled accordingly.

Some big data system connecting tools are provided as well, including the connectors for Apache Spark, Flink, and Kafka. Users can choose two built-in tools, Nebula Importer and Nebula Exchange to import/export multiple data sources to/from Nebula Graph, including CSV, JSON, MySQL, Neo4j, Hive, HBase, Kafka, and JanusGraph.

Nebula Graph uses D3 for visualization. Three visualization toolkits are provided. Nebula Studio gives a graphical user interface to manipulate graph schema, import and navigate graph data, and run query statements. Nebula Explorer is designed as a no-code visual interaction with graph exploration. Nebula Dashboard can deploy and monitor the status of Nebula Graph on-premise or clouds.

2.11 Observability

Internal states of Nebula Graph are recorded and sent to a Prometheus service. Metrics like performance (insert/read/delete’s QPS and latency), hardware usage (CPU, Memory, Disk, and Network) with different types (SUM, COUNT, AVG and p-quantiles), and time window (5 seconds to 1 hour) are collected. Google glog (run time log), folly xlog (audit log), and Linux logrotate are used to print and manage log files.

Like most DBMS, the execution plan is a frequently used and important way for the users to profile slow queries. The commands EXPLAIN and PROFILE are used to print optimized execution plans and show time cost for each execution plan node of a specific query statement. See an example in Fig. 12. All the slow queries are sent to persist in the Meta Service by the Graph Service. And running slow queries can be killed by commands. All the logs and show queries can be sent to ELK frameworks for further analysis.

3 Benchmarking

We represent three benchmarking tests, two read tests and one write test.⁴⁴4Most graph databases are only publicly available for the single-server version and the distributed version is enterprise only. And LDBC gremlin’s implementation is not available. Therefor we conduct benchmarking on Nebula Graph without any competitor’s comparison.

3.1 Environment Setup

All tests are running on three virtual machines with each of 8 vCPU (Intel Xeon Platinum 8260 CPU @ 2.30GHz), 32GB Mem, 1 TB SATA SSD, and 10 Gigabit Ethernet. The operating system is CentOS 7.9.

On every server, we deploy all the Meta, Storage and Graph services (v3.1). All configurations are as default, except that KV-separation is setting $rocksdb\_enable\_kv\_separation=true$ , and the partition number is set to 120.

The data set is generated by LDBC-SNB v0.3.3 with Scale Factor 300, i.e., about 300GB raw inputs. There are in total 817,316,105 vertices and 5,269,286,554 edges, with eight tags and 19 edge types.

The stress client is from a remote server. Every test contains some query sentences (group-query), and input query parameters (seeds). All the query sentences are written in either parameterized openCypher or nGQL sentences without any pre-compiling. All the requests are ad-hoc and responded synchronously. All client threads choose the seeds exclusively to run the same group-query. Therefore it is expected to be a shorter time when the threads increases. Every test runs three times to ensure consistent behavior.

3.2 LDBC-SNB Short Reads

There are seven Interactive Short (IS) reads in LDBC-SNB Short Reads. That is IS-1 to IS-7 written in parameterized openCypher sentences. ⁵⁵5 The sentences can be found in https://github.com/CPWstatic/ldbc_snb_interactive _implementations/tree/nebula0.3.3/nebula/queries/interactive-short-1.ngql to interactive-short-7.ngql At the beginning of every group query, a seed is chosen, and the client sent IS-1 query to Nebula Graph and get the response. Then IS-2 query is sent and responded, etc. We sum all of the seven synchronous requests’ latency. Then a new seed is chosen and a new sequence of IS-1 to IS-7 queries are sent, until all 10k seeds are done. It runs in 5, 10, 20, 30, 40, or 50 concurrent client threads to test the short depth read capability.

The QPS and latency are shown in Fig. 13(a) and hardware metrics in Fig. 13(d). The accuracy indicates that all requests are successful. It can be seen that the performance is CPU bounded at about 30 clients ( $30\_vu$ ), and after that the latency goes up linearly. But before that, the QPS performance increases linearly with more stress. Besides, all servers’ workloads are nearly equivalent.

3.3 2-hop Breadth-First Search

Breadth-first search is widely used in graph analytics for data collecting and recalling purposes. We test the specific clause GO by starting from a vertex 2-hops away and counting the neighboring numbers. The seed number is 10k and concurrent client threads are 5, 10, 20, 30, 40, or 50. An example query sentence is as below.

⬇

GO 2 STEPS FROM ”p-10995116795962” OVER *

YIELD DST(EDGE) | YIELD COUNT(*);

+———-+

| COUNT(*) |

+———-+

| 148742 |

+———-+

Got 1 rows (time spent 328233/329623 us)

The test results are shown in Fig.13(b) and 13(e). Notice that the 2-hop search only requires the graph structure without any property. While the above Short Reads require both graph structure and property. Therefore KV-separation is much more useful in this case after warm-up, which is the disk reads pikes in Fig. 13(e). The QPS scales linearly as the stress increase until $30\_vu$ which then is CPU bounded.

We further profile to see that the most time consuming steps (Fig. 12): GetNeighbors_6 (120796us), which is to get vertices and edges from three Storage Services to one Graph Service through RPC, and Project_7 (191431us), which is to project (choose) the DST(VID) column for every row since the Graph Service is row-based processing not columnar processing.

3.4 LDBC-SNB Insert and Update

The above two tests are used to stress the read performance. And this is to test the insert capability. The sentence is to insert a vertex or an edge with a given condition. E.g.,

⬇

INSERT EDGE IF NOT EXISTS LIKES_POST(creationDate)

VALUES ”p-933”->”s-105553159254332”:

(datetime(”2012-07-26T22:45:50”))

Execution succeeded (time spent 841/985 us)

There are eight group-writes⁶⁶6 The implementation sentences can be found in https://github.com/CPWstatic/ldbc_snb_interactive _implementations/tree/nebula0.3.3/nebula/queries/ interactive-update-1-add-person-companies.ngql to interactive-update-8.ngql in LDBC-SNB, IU-1 to IU-8. We choose five million seeds and run in 5, 10, 20, 30, 40, 50 concurrent clients. The results are shown in Fig.13(c) and 13(f). The execution plan of insertion is much simpler than reads, and the latency is also much shorter (Fig. 11). The fluctuation of disk IO is because of the RocksDB memtable flush.

4 Related Work

In literature, there are numerous surveys on studying graph databases and related technologies, including graph languages, graph processing systems, and visualizations. Many evaluations and benchmarks are conducted as well.

Graph databases, mainly designed for managing graph-like data, follow the principles of database systems, i.e. persistent storage, data consistency, execution plans, and query languages. The study of graph databases has a long history, at least since the 1970sDDIA . We found many native graph databases are available: Neo4jguia2017graph ; miller2013graph , TigerGraphtiger1 ; tiger3 , JanusGraphJanusGraph , RedisGraphcailliau2019redisgraph , GeaBaseGeabase , GalaxyBasezhang2019platform , and DGraphjain2005dgraph . And some products are built directly upon other DBMSs via graph view or graph indexes, such as ArangoDBarangodb , OrientDBtesoriero2013getting , and Oracle Graphoraclegraphintro .

Graph Languages Currently, there are three commonly used graph languages: Neo4j Cypher, Apache Gremlin, and W3C SPARQL; and some variants: Oracle PGQLpgql-lang , TigerGraph GSQLtiger3 , and LDBC G-COREangles2018g . Researches on graph languages have bloomed in the last decadelindaaker2018overview ; gql2017 ; gql2021 ; cypher ; deutsch2021graph ; marton2017formalising with the emergence of graph databases. But these diversities of languages fragment the graph market from porting applications and skill sets from one platform to another. The ISO/IEC GQL standards have gained much progress, and after the release they are predicated to help all the vendors and clients adopt and converge on that language.

Graph Processing Frameworks

In addition to graph databases, a number of graph processing frameworks have been proposed for large-scale graphs. These frameworks are characterized by in-memory batch processing and the use of distributed and parallel processing strategies. These frameworks perform iteratively and batch processes over the entire graph until the computation satisfies a fixed-point or stopping criterion. e.g., PageRank, clustering, machine learning, and data mining algorithms. Typical graph-specific platforms are Pregelmalewicz2010pregel , Apache Giraphsakr2016large , GraphLablow2010graphlab , PowerGraphgonzalez2012powergraph , GraphXgonzalez2014graphx , Graphscopefan2021graphscope , and Platotencentplato .

Visualizations

Graphs are welcome in data analysis and business intelligence because of the intuitive and heuristic thinking mode inspired by it. Most graph databases have their own visualization tools, like Neo4j Bloombloom and TigerGraph GraphStudioTigerGraph2022 . And some third-party libraries are available, e.g., D3vancak2017graph , G6wang2021g6 , Cytoscapefranz2016cytoscape , Gephicherven2013network , and Zeppelincheng2018building .

Graph Reachability and Structure Index

The reachability problems on graph structure has been well studiedWei2014 ; jin2019shortest and been applied in some graph databases. Except for kv-separation in Sec 2.3 and secondary property indexes as in Sec 2.4, different types of indices are investigated to speed up structure based queriesmhedhbi2021a+ ; 10.1145/3282373.3282374 ; sumrall2016investigations ; sakr2012overview ; williams2007graph .

Comparison and Benchmarking

Plenty of surveys and comparisons have been conducted among different graph databasesbesta2019demystifying ; patil2018survey ; besta2018survey ; Angles2018 ; Sakr2021-dl ; Sahu2020-pi ; fernandes2018graph ; meiling2017benchmarking .

In DBMS, TPC standards are widely accepted as performance benchmarks. But there are no globally accepted benchmarks in graph databases. Some research work has been doneMacrobenchmarks ; LDBC2020 ; LDBC-SNB ; mhedhbi2021lsqb , in which the most well-known benchmarking of a graph database is LDBC-SNB. Some third-party experiments are publicly availablerusu2019depth ; cailliau2019redisgraph ; wangproc2020empriical .

5 Conclusions & Future Work

In this paper, we present Nebula Graph, an open-source graph database. Nebula Graph has already served in production in hundreds of leading companies, such as Walmart (WMT.NYSE), China Mobile (0941.HK), Tencent (0700.HK), Huawei, Meituan (3690.HK), JD (9618.HK), NetEase (NTES.NYSE), etc. The largest single cluster in production has a hundred servers, with more than 100 Terabytes.

In the future, three major works are planned. One is to be compatible with the upcoming ISO/IEC GQL standards, which requires a major redesign of the Graph Service, especially the optimizer. The second is to support running GQL in the ETL process of Nebula Core and Nebula Algorithms, which gives much flexibility by queries to get data instead of programming Scala codes in Spark. The third is to host Nebula Graph on Cloud, which requires a tremendous refactor of the Storage Service.

References

\bibcommenthead
(1) Solid IT gmbh: Complete trend, starting with January 2013 (2022). https://db-engines.com/en/ranking_categories
(2) Ongaro, D., Ousterhout, J.: The raft consensus algorithm (2015)
(3) Sayfan, G.: Mastering Kubernetes. Packt Publishing Ltd, Birmingham (2017)
(4) Cao, Z., Dong, S., Vemuri, S., Du, D.H.: Characterizing, modeling, and benchmarking rocksdb key-value workloads at facebook. In: 18th USENIX Conference on File and Storage Technologies (FAST 20), pp. 209–223 (2020)
(5) Francis, N., Green, A., Guagliardo, P., Libkin, L., Lindaaker, T., Marsault, V., Plantikow, S., Rydberg, M., Selmer, P., Taylor, A.: Cypher: An evolving query language for property graphs. In: Proceedings of the 2018 International Conference on Management of Data. SIGMOD ’18, pp. 1433–1445. Association for Computing Machinery, New York, NY, USA (2018). https://doi.org/10.1145/3183713.3190657
(6) Neo4j, Inc.: NEO4J GRAPH DATA PLATFORM. Accessed: March 28, 2022 (2022). https://neo4j.com/
(7) Angles, R., Arenas, M., Barceló, P., Hogan, A., Reutter, J., Vrgoč, D.: Foundations of modern query languages for graph databases. ACM Comput. Surv. 50(5) (2017). https://doi.org/10.1145/3104031
(8) Lu, L., Pillai, T.S., Gopalakrishnan, H., Arpaci-Dusseau, A.C., Arpaci-Dusseau, R.H.: Wisckey: Separating keys from values in ssd-conscious storage. ACM Transactions on Storage (TOS) 13(1), 1–28 (2017)
(9) Gormley, C., Tong, Z.: Elasticsearch: the Definitive Guide: a Distributed Real-time Search and Analytics Engine. ” O’Reilly Media, Inc.”, New York (2015)
(10) Google: S2 Geometry (2022). https://s2geometry.io/
(11) Erling, O., Averbuch, A., Larriba-Pey, J., Chafi, H., Gubichev, A., Prat, A., Pham, M.-D., Boncz, P.: The ldbc social network benchmark: Interactive workload. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. SIGMOD ’15, pp. 619–630. Association for Computing Machinery, New York, NY, USA (2015). https://doi.org/10.1145/2723372.2742786
(12) Graefe, G.: The cascades framework for query optimization. IEEE Data Eng. Bull. 18(3), 19–29 (1995)
(13) Escriva, R., Wong, B., Sirer, E.G.: Warp: Lightweight multi-key transactions for key-value stores. arXiv preprint arXiv:1509.07815 (2015)
(14) Wu, M., Zhou, Y., Liang, C., Yang, F., Huang, A.: Nebula Graph Database Manual (2022). https://docs.nebula-graph.io/3.0.2/pdf/NebulaGraph-EN.pdf
(15) Rodriguez, M.A.: The gremlin graph traversal machine and language (invited talk). In: Proceedings of the 15th Symposium on Database Programming Languages, pp. 1–10 (2015)
(16) Consortium, W.W.W., et al.: Sparql 1.1 overview (2013)
(17) Neo4j Inc.: Technology Compatibility Kit (TCK), openCypher Resources (2021). https://s3.amazonaws.com/artifacts.opencypher.org/M18/tck-M18.zip
(18) Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: $\{$ PowerGraph $\}$ : Distributed $\{$ Graph-Parallel $\}$ computation on natural graphs. In: 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12), pp. 17–30 (2012)
(19) Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: $\{$ GraphX $\}$ : Graph processing in a distributed dataflow framework. In: 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pp. 599–613 (2014)
(20) Tencent: Tencent Graph Computing (TGraph) Officially Open Sourced High-Performance Graph Computing Framework: Plato. 2022 (2022). https://github.com/Tencent/plato/blob/master/doc/introduction_en.md
(21) Watson, D.: Under the hood: Building and open-sourcing fbthrift (2014)
(22) Kleppmann, M.: Designing Data-intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems. O’Reilly Media, Inc., Sebastopol, CA (2017)
(23) Guia, J., Soares, V.G., Bernardino, J.: Graph databases: Neo4j analysis. In: ICEIS (1), pp. 351–356 (2017)
(24) Miller, J.J.: Graph database applications and concepts with neo4j. In: Proceedings of the Southern Association for Information Systems Conference, Atlanta, GA, USA, vol. 2324 (2013)
(25) Deutsch, A., Xu, Y., Wu, M., Lee, V.: TigerGraph: A Native MPP Graph Database. arXiv (2019). https://doi.org/10.48550/ARXIV.1901.08248
(26) Deutsch, A., Xu, Y., Wu, M., Lee, V.E.: Aggregation support for modern graph analytics in tigergraph. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 377–392 (2020)
(27) JanusGraph Authors: JanusGraph: Distributed, open source, massively scalable graph database. Accessed: March 28, 2022 (2022). https://janusgraph.org/
(28) Cailliau, P., Davis, T., Gadepally, V., Kepner, J., Lipman, R., Lovitz, J., Ouaknine, K.: Redisgraph graphblas enabled graph database. In: 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 285–286 (2019). IEEE
(29) Fu, Z., Wu, Z., Li, H., Li, Y., Wu, M., Chen, X., Ye, X., Yu, B., Hu, X.: Geabase: A high-performance distributed graph database for industry-scale applications. In: 2017 Fifth International Conference on Advanced Cloud and Big Data (CBD), pp. 170–175 (2017). https://doi.org/10.1109/CBD.2017.37
(30) Zhang, C., Wu, J.: Platform, system, process for distributed graph databases and computing. Google Patents. US Patent 10,409,782 (2019)
(31) Jain, M.: Dgraph: Synchronously replicated, transactional and distributed graph database (2021)
(32) ArangoDB Inc: Arangodb (2022). https://www.arangodb.com/
(33) Tesoriero, C.: Getting Started with OrientDB. Packt Publishing Ltd, Birmingham (2013)
(34) Corporation, O.: Graph Database and Graph Analytics (2022). https://www.oracle.com/database/graph/
(35) Oracle: Property Graph Query Language (PGQL). Accessed: March 28, 2022 (2022). https://pgql-lang.org/
(36) Angles, R., Arenas, M., Barceló, P., Boncz, P., Fletcher, G., Gutierrez, C., Lindaaker, T., Paradies, M., Plantikow, S., Sequeda, J., et al.: G-core: A core for future graph query languages. In: Proceedings of the 2018 International Conference on Management of Data, pp. 1421–1432 (2018)
(37) Lindaaker, T.: An overview of the recent history of graph query languages (2018)
(38) Angles, R., Bonifati, A., Dumbrava, S., Fletcher, G., Hare, K.W., Hidders, J., Lee, V.E., Li, B., Libkin, L., Martens, W., Murlak, F., Perryman, J., Savković, O., Schmidt, M., Sequeda, J., Staworko, S., Tomaszuk, D.: Pg-keys: Keys for property graphs. In: Proceedings of the 2021 International Conference on Management of Data. SIGMOD/PODS ’21, pp. 2423–2436. Association for Computing Machinery, New York, NY, USA (2021). https://doi.org/10.1145/3448016.3457561
(39) Deutsch, A., Francis, N., Green, A., Hare, K., Li, B., Libkin, L., Lindaaker, T., Marsault, V., Martens, W., Michels, J., et al.: Graph pattern matching in gql and sql/pgq. arXiv preprint arXiv:2112.06217 (2021)
(40) Marton, J., Szárnyas, G., Varró, D.: Formalising opencypher graph queries in relational algebra. In: European Conference on Advances in Databases and Information Systems, pp. 182–196 (2017). Springer
(41) Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 135–146 (2010)
(42) Sakr, S., Orakzai, F.M., Abdelaziz, I., Khayyat, Z.: Large-scale Graph Processing Using Apache Giraph. Springer, New York (2016)
(43) Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Graphlab: A new parallel framework for machine learning. In: Conference on Uncertainty in Artificial Intelligence (UAI), vol. 20 (2010)
(44) Fan, W., He, T., Lai, L., Li, X., Li, Y., Li, Z., Qian, Z., Tian, C., Wang, L., Xu, J., et al.: Graphscope: a unified engine for big graph processing. Proceedings of the VLDB Endowment 14(12), 2879–2892 (2021)
(45) Neo4j, Inc.: Neo4j Bloom (2022). https://neo4j.com/product/bloom/
(46) TigerGraph: GraphStudio User Interface (2022). https://www.tigergraph.com/graphstudio/
(47) Vancák, V.: Graph data visualizations with d3. js libraries (2017)
(48) Wang, Y., Bai, Z., Lin, Z., Dong, X., Feng, Y., Pan, J., Chen, W.: G6: A web-based library for graph visualization. Visual Informatics 5(4), 49–55 (2021)
(49) Franz, M., Lopes, C.T., Huck, G., Dong, Y., Sumer, O., Bader, G.D.: Cytoscape. js: a graph theory library for visualisation and analysis. Bioinformatics 32(2), 309–311 (2016)
(50) Cherven, K.: Network Graph Analysis and Visualization with Gephi. Packt Publishing Ltd, Birmingham (2013)
(51) Cheng, Y., Liu, F.C., Jing, S., Xu, W., Chau, D.H.: Building big data processing and visualization pipeline through apache zeppelin. In: Proceedings of the Practice and Experience on Advanced Research Computing, pp. 1–7 (2018)
(52) Wei, H., Yu, J.X., Lu, C., Jin, R.: Reachability querying: An independent permutation labeling approach. Proc. VLDB Endow. 7(12), 1191–1202 (2014). https://doi.org/10.14778/2732977.2732992
(53) Jin, R., Ruan, N.: Shortest path computation in large networks. Google Patents. US Patent 10,521,473 (2019)
(54) Mhedhbi, A., Gupta, P., Khaliq, S., Salihoglu, S.: A+ indexes: Tunable and space-efficient adjacency lists in graph database management systems. In: 2021 IEEE 37th International Conference on Data Engineering (ICDE), pp. 1464–1475 (2021). IEEE
(55) Pokorný, J., Valenta, M., Ramba, J.: Graph patterns indexes: Their storage and retrieval. In: Proceedings of the 20th International Conference on Information Integration and Web-Based Applications Services. iiWAS2018, pp. 221–225. Association for Computing Machinery, New York, NY, USA (2018). https://doi.org/10.1145/3282373.3282374
(56) Sumrall, J.M., Fletcher, G.H., Poulovassilis, A., Svensson, J., Vejlstrup, M., Vest, C., Webber, J.: Investigations on path indexing for graph databases. In: European Conference on Parallel Processing, pp. 532–544 (2016). Springer
(57) Sakr, S., Al-Naymat, G.: An overview of graph indexing and querying techniques. Graph Data Management: Techniques and Applications, 71–88 (2012)
(58) Williams, D.W., Huan, J., Wang, W.: Graph database indexing using structured graph decomposition. In: 2007 IEEE 23rd International Conference on Data Engineering, pp. 976–985 (2007). IEEE
(59) Besta, M., Peter, E., Gerstenberger, R., Fischer, M., Podstawski, M., Barthels, C., Alonso, G., Hoefler, T.: Demystifying graph databases: Analysis and taxonomy of data organization, system designs, and graph queries. arXiv preprint arXiv:1910.09017 (2019)
(60) Patil, N., Kiran, P., Kiran, N., KM, N.P.: A survey on graph database management techniques for huge unstructured data. International Journal of Electrical and Computer Engineering 8(2), 1140 (2018)
(61) Besta, M., Hoefler, T.: Survey and taxonomy of lossless graph compression and space-efficient graph representations. arXiv preprint arXiv:1806.01799 (2018)
(62) Angles, R., Gutierrez, C.: In: Fletcher, G., Hidders, J., Larriba-Pey, J.L. (eds.) An Introduction to Graph Data Management, pp. 1–32. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-96193-4_1
(63) Sakr, S., Bonifati, A., Voigt, H., Iosup, A., Ammar, K., Angles, R., Yoneki: The future is big graphs: a community view on graph processing systems. Communications of the ACM 64(9), 62–71 (2021)
(64) Sahu, S., Mhedhbi, A., Salihoglu, S., Lin, J., Özsu, M.T.: The ubiquity of large graphs and surprising challenges of graph processing: extended survey. VLDB J. 29(2-3), 595–618 (2020)
(65) Fernandes, D., Bernardino, J.: Graph databases comparison: Allegrograph, arangodb, infinitegraph, neo4j, and orientdb. In: Data, pp. 373–380 (2018)
(66) Meiling, L., et al.: Benchmarking multi-model databases with arangodb and orientdb (2017)
(67) Lissandrini, M., Brugnara, M., Velegrakis, Y.: Beyond macrobenchmarks: Microbenchmark-based graph database evaluation. Proc. VLDB Endow. 12(4), 390–403 (2018). https://doi.org/10.14778/3297753.3297759
(68) Angles, R., Antal, J.B., Averbuch, A., Boncz, P., Erling, O., Gubichev, A., Haprian, V., Kaufmann, M., Pey, J.L.L., Martínez, N., Marton, J., Paradies, M., Pham, M.-D., Prat-Pérez, A., Spasić, M., Steer, B.A., Szárnyas, G., Waudby, J.: The LDBC Social Network Benchmark. arXiv (2020). https://doi.org/10.48550/ARXIV.2001.02299
(69) Mhedhbi, A., Lissandrini, M., Kuiper, L., Waudby, J., Szárnyas, G.: Lsqb: a large-scale subgraph query benchmark. In: Proceedings of the 4th ACM SIGMOD Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA), pp. 1–11 (2021)
(70) Rusu, F., Huang, Z.: In-depth benchmarking of graph database systems with the linked data benchmark council (ldbc) social network benchmark (snb). arXiv preprint arXiv:1907.07405 (2019)
(71) Wang, R., Yang, Z., Zhang, W., Lin, X.: An empirical study on recent graph database systems. In: Li, G., Shen, H.T., Yuan, Y., Wang, X., Liu, H., Zhao, X. (eds.) Knowledge Science, Engineering and Management, pp. 328–340. Springer, Cham (2020)