This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: Xidian University, Xi’an, China
11email: [email protected], [email protected], {hli,xygao,cuijt}@xidian.edu.cn
22institutetext: Shanghai Yunxi Technology Co., Ltd., Shanghai, China
22email: [email protected]

SLSM : An Efficient Strategy for Lazy Schema Migration on Shared-Nothing Databases

Zhilin Zeng 11    Hui Li 11    Xiyue Gao 1(1())    Hui Zhang 22    Huiquan Zhang 11    Jiangtao Cui 11
Abstract

By introducing intermediate states for metadata changes and ensuring that at most two versions of metadata exist in the cluster at the same time, shared-nothing databases are capable of making online, asynchronous schema changes. However, this method leads to delays in the deployment of new schemas since it requires waiting for massive data backfill. To shorten the service vacuum period before the new schema is available, this paper proposes a strategy named SLSM for zero-downtime schema migration on shared-nothing databases. Based on the lazy migration of stand-alone databases, SLSM keeps the old and new schemas with the same data distribution, reducing the node communication overhead of executing migration transactions for shared-nothing databases. Further, SLSM combines migration transactions with user transactions by extending the distributed execution plan to allow the data involved in migration transactions to directly serve user transactions, greatly reducing the waiting time of user transactions. Experiments demonstrate that our strategy can greatly reduce the latency of user transactions and improve the efficiency of data migration compared to existing schemes.

Keywords:
Shared-nothing databases Lazy migration Schema evolution.

1 INTRODUCITON

Due to the high scalability and availability, shared-nothing NewSQL systems like CockroachDB  [24] and TiDB  [13] have been widely adopted in practice. As practical applications are constantly being updated and iterated, the database schema may need to evolve correspondingly. In fact, some studies have shown that schema changes occur on average once a week [18], while some cloud applications change the schema more frequently per week  [16]. Schema changes, a.k.a., schema migrations, are typically done with Data Definition Language (DDL) statements, and the process of schema migration can incur massive data movement that may block concurrent queries and transactions. It was common for early systems to require service downtime or maintenance windows to perform schema migration [9]. To avoid this, standard RDBMSs today use online schema change techniques or external migration tools [1, 2, 3, 4, 5, 7, 26]. The main strategy for these solutions is similar and usually involves snapshot replication and incremental data capture. Firstly, the system receives the DDL statements for the schema changes, which are processed to register a new schema and its corresponding table [1, 2, 3, 4, 5], a.k.a., a shadow table, and the shadow table is not yet available for service. Some approaches use materialized views instead of shadow tables  [7, 26]. After that, the system copies the source table in the background (snapshot replication), i.e., migrates the data from the old schema to the shadow table. Since the source table is not yet offline and is still available for service, when the system processes requests to change the data in the source table, these modified data need to be synchronized with the shadow table (incremental capture) to ensure consistency, which is achieved by means of triggers [1, 2, 3, 5, 7, 26] bound on the source table or log propagation [4]. Once all data replication is complete, the system locks the source table and renames the shadow table to complete the atomic switch between the old and new schemas. The techniques described above only allow the new schema to serve after all the data in the source table have been synchronized, and schema changes can take more time if the old schema has a larger amount of data in the table.

Online schema change techniques in shared-nothing NewSQL databases follow the idea of mainstream RDBMS schemes and tend to introduce metadata transition states to ensure availability and consistency, as proposed in Google F1  [19]. F1 introduces two additional intermediate metadata states, delete_onlydelete\_only and write_onlywrite\_only, in addition to the two non-intermediate states absentabsent and publicpublic. After all F1-Servers have reached the write_onlywrite\_only state, the database allows double writes to both old and new schema and simultaneously turns on backfill of new schema data. The data in the delete_onlydelete\_only and previous phases and the data written in the write_onlywrite\_only phase are generally referred to as stock and incremental data, respectively. This is the same as snapshot replication and incremental capture in the mindset of mainstream RDBMS solutions. Therefore, unfortunately, the flaws of mainstream RDBMS solutions are inherited along with it. However, as continuous deployment(CD) and integration became the norm  [15], developers are expected to simply deploy updates to the front-end code and the back-end database. Waiting for the changes to the back-end database schema to be completed before deploying updates to the front-end code undoubtedly delays the service of the new schema. This prevents developers from realizing the benefits of CD as the effects need to be predicted and estimated prior to performing each schema change, reserving enough time to perform migration and locate rewrites for outdated queries in the source code. The cycle time to bring new schemas online is significantly lengthened.

Although some schemes adopting the lazy migration idea can avoid the delayed deployment of new schemas [6, 20, 22], this idea has not yet been practiced on shared-nothing NewSQL databases. Inspired by the schemes in  [20, 22], in this paper, we propose a zero-downtime Strategy for Lazy Schema Migration (SLSM) on shared-nothing databases. SLSM inherits the advantages of existing delayed migration schemes. More importantly, it is tailored to the architecture and data distribution characteristics of shared-nothing databases, greatly reducing the overhead of the system in executing migration transactions and user transactions. We conducted extensive experiments on one of the popular shared-nothing databases using standard TPC-C benchmarks that include schema migration transactions. According to our experimental results, our approach achieves a performance improvement of more than 40%40\% over existing solutions.

In summary, this paper makes the following contributions.

  • 1)

    We propose and implement the SLSM, a lazy schema migration method on shared-nothing databases that supports single-step online schema migration.

  • 2)

    We reduce the overhead of SLSM in executing migration transactions and user transactions and optimize the background migration algorithm.

  • 3)

    We apply and integrate our approach to a popular shared-nothing database system and perform extensive experiments to demonstrate the advantages of our approach.

The rest of the paper is organized as follows. Section 2 summarizes and reviews related work. Preliminaries are described in Section 3. Section 4 details the basic methodology and optimization principles of SLSM. The experimental results and evaluations are given in Section 5. Finally, Section 6 summarizes this work.

2 Related Works

The idea of using lazy migration for nonblocking schema migration has been discussed in the standalone RDBMS  [6, 12, 22] and Nosql [20] database systems.

Sheng [22] proposed a multi-stage lazy migration approach for adding columns, where the system logically adds a new schema when a schema change is performed without physically executing any data migration. Data from the same table may be physically stored in multiple tables under different schemas. The system handles queries by interpreting the data according to the most recent schema, moving the data from the old schema to the new one only when necessary. However, the method is strictly limited to one kind of schema change, i.e., adding columns. Moreover, the characteristic of maintaining multiple physical tables increases the complexity and overhead of the system. Finally, the approach is discussed only in a standalone database system named Terrier and cannot be adapted to shared-nothing database systems.

BullFrog [6] is an RDBMS that adopts a similar idea for schema change through lazy migration, with the difference that the system only needs to maintain two schema versions and the new schema begins working as soon as it is registered. When the system receives a request on a new schema, it migrates the relevant data involved in the old schema first and then processes the request on the new schema. This scheme supports a wider variety of schema change operations but is currently only available on PostgreSQL, a standalone RDBMS. [20] starts with the single-step migration requirement and uses a lazy migration approach in the context of migrating an application on top of a NoSQL database (Redis). NoSQL databases are widely used by continuous deployment applications since they typically do not enforce schema constraints. For shared-nothing NewSQL databases, it is necessary to have both the features of an RDBMS and the scalability and high availability of a NoSQL system, and our work on SLSM demonstrates that lazy migration can also be well applied to distributed shared-nothing databases.

Tesseract [12] is a new method of online and transactional schema evolution. By modeling schema evolution as data modification and leveraging the concurrency control protocol, Tesseract can provide online, transactional schema evolution with no downtime and high application performance during the ongoing evolution progress. Experiments with Tesseract have demonstrated that it can work with existing lazy migration approaches to support the immediate deployment of schema changes. Our SLSM is compatible with Tesseract to provide a high-performance MVCC (Multi-Version Concurrency Control) scheme for non-blocking schema migrations on shared-nothing databases.

In contrast to previous work, SLSM provides a common solution for shared-nothing databases. It utilizes generic data partitioning rules and a simple modification of the INSERT operator for migration optimization without relying on a complex or ad-hoc design.

3 Preliminaries

In this section, we introduce some preliminary knowledge related to the shared-nothing databases.

3.1 Architecture for shared-nothing databases

In the classic shared-nothing [23] architecture, a database cluster consists of an arbitrary number of nodes, located in the same data center or not. Clients are assumed to be able to connect to any node in the cluster. Individual nodes typically have a tiered architecture designed to optimize distributed data storage and transaction processing. Depending on the specific database implementation, tiering is not strictly defined or limited, but usually includes an SQL layer and a distributed storage layer.

The SQL layer follows the concept of traditional relational databases and is considered as an interface for users to interact with the database. The SQL layer is responsible for parsing the user’s high-level SQL statements, generating distributed query plans, and converting them into low-level read and write requests to the underlying key-value store.

The distributed storage layer slices the data using a range partitioning policy and is responsible for routing addressing of the slices to provide uniform KV storage. The data are ordered and automatically partitioned into small slices (called ranges or regions), with each slice containing a segment of contiguous key-value pairs. Tuples of tables are mapped into one or several key-value pairs, typically the key consists of its table ID and the primary key column, while the value is the actual row data. For example, a tuple containing four columns might be encoded as (where col0 is the primary key column):

Key:{tableID/col0}Key:\{tableID/col0\}; Value:{col1/col2/col3}Value:\{col1/col2/col3\}

Each slice has multiple replicas, and the replicas are replicated and maintained via a consistent consensus protocol (e.g., raft [17] or paxos [14]). Data of a certain table may be distributed on different nodes of the cluster according to range, and the system automatically performs load balancing and failover of the replicas to ensure high availability of the cluster.

3.2 Raft Consensus Protocol

For ease of understanding, we also describe the Raft protocol that is popular in many shared-nothing databases. In a cluster, all the replicas of a slice form a Raft group, where one replica is the long-lived leader that coordinates all read and write operations sent to the Raft group, and the other replicas are the follower. The raft protocol is usually used in conjunction with the lease mechanism. Only the lease-holding replica can provide consistent KV reads/writes for the slice. In addition, the replica holding the lease is usually the leader of the Raft group, and its node is the leaseholder, typically a node may act as the leaseholder for multiple slices. Gateway nodes are responsible for parsing SQL requests, acting as transactions coordinators, and routing KV operations to the correct leaseholder.

4 SLSM Basic Methods and Optimization Principles

In this section, we propose the basic SLSM scheme and further explore the optimization strategies. At the very beginning, we present a pair of definitions that are fundamental for the following discussion.

Definition 1

User Transaction. Obsolete queries on the old schema are rewritten as the new schema comes online. User transactions are those that contain requests on the new schema.

Definition 2

Migration Transaction. Migration transactions are those transactions generated by and executed before user transactions. Migration transactions are responsible for migrating the old schema data involved in the user transactions to the new schema.

4.1 Basic Approach

We first describe the basic approach of SLSM, where we modify the SQL engine layer of the shared-nothing database in two phases so that schema changes can be accomplished immediately without waiting for massive data to be migrated in place. We show the overall framework of SLSM in Fig. 1, The first phase of SLSM mainly handles schema change requests and performs the required initialization. The second phase mainly handles user requests on the new schema, where SLSM first migrates the data involved in the user query from the old schema and then processes the user request on the new schema. Logically, a given tuple starts in the old schema and eventually migrates to the new schema, but never exists in both schemas simultaneously. We shall elaborate on each step below.

Refer to caption
Figure 1: Basic SLSM Overview

4.1.1 Initialization

A schema migration request appears in the form of one or more DDL statements, e.g., CREATE TABLE new_table … AS SELECT…FROM old_table. Once the request is submitted, SLSM starts a transaction that creates an empty table corresponding to the new schema in the migration request and a predicate filter. For most shared-nothing databases that support view disambiguation techniques, the predicate filter can be replaced by a temporary view containing the contents of the migration request e.g., CREATE VIEW new_table_view … AS SELECT…FROM old_table, otherwise some query rewriting tools may be required. SLSM then associates a background migration process with the new table to complete the initialization.

4.1.2 Migration and User Transactions

The initialization cost is trivial because no physical data migration is involved. Once initialization is complete, the new schema is ready for service. When the system receives a transaction request for the new schema from the user, e.g., SELECT, DELETE, UPDATE, instead of executing the user transaction immediately, SLSM first generates and executes a migration transaction. The migration transaction is responsible for migrating any relevant data from the old schema by using a predicate filter to convert the filter predicates on the new schema in the user’s transaction (usually located in the WHERE clause) to predicates on the old schema, forming a conditional SELECT statement on the old schema. As a result, it in fact acts as a subquery of the INSERT statement used by the migration transaction, such as INSERT INTO new_table (…) (SELECT … from old_table WHERE …). After executing the migration transaction, the new schema has the set of tuples needed to fully process the user transaction, and the SLSM can then process the original user transaction request. It is worth mentioning that if the user transaction is a INSERT request, there is no need to perform a migration transaction, which can be handled directly on the new schema.

4.1.3 Background Migrations

SLSM starts a background migration process that slowly injects simulated user transactions that cumulatively overwrite the entire old table with tuples that have not yet been migrated. The end of the background migration thread indicates the completion of the migration, and the old schema tables can be deleted.

4.2 Further Optimization for SLSM

With the basic approach of SLSM, shared-nothing databases can easily perform online schema migrations, avoiding the need to wait for massive data migrations. SLSM is essentially a combination of delayed transaction processing [11] and transaction decomposition [10, 21, 25], where large migration transactions are decomposed into separate smaller migration transactions with delayed processing. However, this undoubtedly increases the latency of user transactions on the new schema, as the system is required to generate and execute migration transactions before responding to user transactions. Consider a schema migration of table split from the old user table to the new user_rights table shown in Figure 2. The old user table contains three slices, each with three replicas, distributed on a three-node cluster, as shown in Figure 3. The old user table ID is 51 and the new user_rights table ID is 71. When processing the user transaction SELECT id, rights FROM user_rights WHERE id == 1001, gateway node3 needs to first execute the migration transaction INSERT INTO user_rights (id, name, rights) (SELECT id, name, rights from user WHERE id == 1001). The relevant data in the old user table is on the slice Range2, and the leaseholder of the slice Range2 is node1 at this time. The system routes to node1, obtains the related data, and then migrates to the leaseholder node2 where the corresponding data of the new user_rights table is located. Finally, node3 executes the user transaction. The dashed and solid arrows in Figure 3 represent the routing paths of KV operation requests/data for the migration transaction and user transaction, respectively.

Refer to caption
Figure 2: Table Split Migration
Refer to caption
Figure 3: Migration and User Transaction Execution

Obviously, the additional overhead of migrating transactions causes the latency of user transactions to at least double, and we discuss how to improve the efficiency of migrating transactions and user transactions.

4.2.1 Optimization of Migration Transactions.

We observe that the routing paths for KV operation requests/data of migration transactions are excessive, because the leaseholder of the slice where the source data reside is usually inconsistent with the leaseholder of the slice which is inserted, resulting in additional network round-trip communication overhead. The reason for this phenomenon is two-fold. Firstly, the old and new schemas have different table IDs and primary keys, and the slices in which they reside are different. Secondly, the leaseholder for each slice is dynamically adjusted according to network latency, node load, replica health, and other factors.

In fact, most schema migration requests are closely related to practical business upgrades. For example, the original table may need to add some new columns to model expanded objects and attributes. As another example, two tables are often joined together and rarely updated independently, and the developer wants to merge them into one to improve query performance. In these cases, the old and new schemas share most of the structure and data. We can leverage this linkage to modify the metadata information of the new schema during the initialization phase so that the leaseholder where the source data slice resides is the same as that of the destination slice in the migration transaction.

When the new schema contains the primary key of the old schema, SLSM constructs the metadata of the new table in a different way, helping the database system encode and decode the tuples. As shown in Section 3.1, at the storage layer, the key of key-value pair for the new table is encoded by the new table ID and the primary key columns of the new table, and SLSM adds a “prefix" to the Key, which consists of the old table ID and the primary key columns of the old table. In other words, the data of the new table are stored with the key of the old table as the prefix. SLSM ensures that the system performs key prefix additions and deletions for subsequent read and write operations to the new table. Consider the schema migration we mentioned in Figure 2, where user_rights table contains the primary key columns of user table. As shown in Figure 4, before processing the user transaction SELECT id, rights FROM user_rights WHERE id == 1001; gateway node3 executes the migration transaction first. The relevant data in the old user table is on the slice Range2, and at this time the leaseholder of the slice Range2 is node1. The system routes to node1 and obtains the relevant data, which can then be migrated directly to node1 of the leaseholder in Range2 of user_rights table, reducing the communication overhead compared to Figure 3. It can be seen that the data of the old and new tables are interleaved and that the new schema table has the same data distribution as the old one after all the data have been migrated.

Refer to caption
Figure 4: Optimization of Migration Transactions

4.2.2 Optimization of User Transactions.

Although we have achieved some performance improvement through migration transaction optimization, it is very limited because user transactions still need to wait for the migration transaction to complete before executing. We observe that the relevant data migrated by the migration transaction are exactly the data needed by the user transaction. The migration transaction is generated by the filter predicate in the user transaction request to ensure that the new schema table has the set of tuples needed to fully process the user transaction. However, due to the concurrency of user transactions, the migration transaction executed before the current user transaction may actually only migrate part of the data, or even not migrate any data, since the migration transaction of other user transactions may have already migrated some or all of the data. As shown in Fig. 5, the migration transaction (marked in green) is executed before the user transaction (marked in red), and since half of the data has already been migrated (marked in black), the migration transaction only needs to migrate the other half. Therefore, although the data involved in migration transactions are related to user transactions, they may not fully meet the requirements for processing user transactions. However, we can still take advantage of this feature to optimize user transactions, so that the data involved in migration transactions directly serve user transactions, reducing the waiting time for user transactions.

Refer to caption
Figure 5: Migrating process of migration transaction
Refer to caption
Figure 6: Migration, User, and Fusion Transaction for A SELECT Request

Instead of strictly generating and executing a migration transaction before a user transaction, SLSM actually executes a user transaction that incorporates the migration transaction (referred to as a fusion transaction). This is achieved by modifying the INSERT operator so that it can output post-insert data in the execution plan operator stream. Figure 6 shows the abbreviated execution plans for a migration transaction, a user transaction, and a fusion transaction for a SELECT request, respectively. For SELECT requests, the fusion transaction filters and reads the data from both the old and new schemas. Since the modified INSERT operator returns the inserted data, these data can be merged with the filtered data from the new schema and returned as the result. UPDATE and DELETE requests follow a similar idea, both filter and read the relevant data on the old and new schemas at the same time, and then merge and process them subsequently. The INSERT request is processed directly on the new schema, without the requirement of a fusion transaction. Figure 7 shows the advantages of our fusion transaction. When processing user transaction SELECT id, rights FROM user_rights WHERE id == 1001, the system scans the relevant data on tables user and user_rights at the same time in the leaseholder node1 of range2, migrates the data read from user table to user_rights table directly, and then merges them with the data read from user_rights table (no relevant data in this case) to return the result.

Refer to caption
Figure 7: Optimization of User Transactions

4.2.3 Complexity Analyze.

Finally we study the time complexity of SLSM. We denote the overhead of migration transaction, user transaction and the average communication overhead between nodes as costmig,costusrcost_{mig},cost_{usr} and costcomcost_{com}, respectively. We categorize the study according to whether the gateway, the old table slice and the new table slice are on the same node. The complexity of the basic SLSM, with migration transaction optimization only, and the final SLSM are shown in Table 1.

Table 1: Comparison of complexity of different SLSMs
Categories Basic Basic-with-mig-trans-optim Basic-with-all
{gateway,old,new}\left\{gateway,old,new\right\} costmig+costusrcost_{mig}+cost_{usr} costmig+costusrcost_{mig}+cost_{usr} costmig+costusrcost_{mig}+cost_{usr}
{gateway,old}\left\{gateway,old\right\} costmig+costusr+2costcomcost_{mig}+cost_{usr}+2*cost_{com} - -
{gateway,new}\left\{gateway,new\right\} costmig+costusr+2costcomcost_{mig}+cost_{usr}+2*cost_{com} - -
{old,new}\left\{old,new\right\} costmig+costusr+2costcomcost_{mig}+cost_{usr}+2*cost_{com} costmig+costusr+2costcomcost_{mig}+cost_{usr}+2*cost_{com} {costmig,costusr}max+2costcom\left\{cost_{mig},cost_{usr}\right\}_{max}+2*cost_{com}
\varnothing costmig+costusr+6costcomcost_{mig}+cost_{usr}+6*cost_{com} - -

5 Experiments

5.1 Experimental Setup

We conducted our experiments on three AliCloud server instances, each with 4 vCPUs at 2.70 GHz on Intel(R) Xeon(R) Platinum and 8 GB of RAM.

5.1.1 Implementation.

We implemented a prototype of SLSM on top of CockroachDB 21.1.21. CockroachDB is an open source distributed NewSQL database engine, following the shared-nothing architecture. We built a three-node CockroachDB cluster. By default, the average round-trip time of the nodes in the cluster is around 11.7ms (set via the“tc" command under Ubuntu).

5.1.2 Baselines.

We compare the performance using the following approaches:

  • i)

    Upperbound: Vanilla CockroachDB cluster without any schema migration. We use it to show the upper bound.

  • ii)

    OSC: CockroachDB’s own online schema change schemes [24] built upon work originated by the F1 team at Google. It backfills (or deletes) the underlying table data without locking them and thus without any downtime.

  • iii)

    Bullfrog: The state-of-the-art lazy schema migration method [6] in a standalone database, implemented in Postgresql 11.0. We port it to CockroachDB for experimental comparison.

  • iv)

    SLSM: Our lazy schema migration solution specifically designed for shared-nothing databases.

5.1.3 Benchmarks.

To evaluate SLSM under real OLTP workloads, we, therefore, follow previous work [6] to use a variation of TPC-C that includes schema migrations. By default, we use a scale factor of 50. We start the benchmark by running the original TPC-C mix, and after some seconds, we start a schema migration to perform one of the following operations:

  • 1)

    SplitTable: Split the customer table into two, one containing private customer information such as credit, payment and balance, and the other containing public customer information like state, city, street, etc. The two new tables after the split have the same primary key as the original customer table.

  • 2)

    JoinTable: Join the stock and order_line tables. This optimizes the StockLevel transaction, which reads stock after scanning the order_line table to get out-of-stock items.

  • 3)

    Preaggregate: Sum up the values in order_line table where ol_w_id = o_w_id, ol_d_id = o_d_id and ol_o_id = o_id. The results are maintained as a separate table.

5.1.4 Experimental Platform and Metrics.

We use BenchBase [8] to setup and run our experiments. BenchBase supports tight control of transaction mixtures, request rates, and access distributions over time. We measure throughput as transactions per second (TPS) and the end-to-end latency as the time from when the client issues a transaction request until the response is received. The measurements for all of our experiments are averaged over 5 runs, but we found that the variance across runs in each of our experiments was negligible.

5.2 Performance under Schema Migration

Our first set of experiments focus on how the throughput and latency of transaction processing vary during the different phases of schema migration. Once the schema migration request starts, for Buffrog and SLSM, transactions containing requests on the old schema are immediately replaced by transactions containing requests on the new schema (we have modified them to be compatible); for OSC, old transactions are still used until the migration is complete. Fig. 8 illustrates our experimental results. The migration begins for all implementations at the red circle and ends for each system at the later corresponding circles marked in the figure.

Refer to caption
Figure 8: Throughput and latency during migration

Consider the upper two subplots, OSC takes approximately 60 seconds to complete, and transactions experienced significant jitter in both throughput and latency during the migration. Although access to the old schema is not blocked during the migration, data backfilling and double-writing of the old and new schemas cause some negative impact on system performance. No noticeable fluctuations in throughput or latency are observed for Bullfrog or SLSM, and the total time to complete the migration is longer because the migration will take place on demand while the user transactions are being processed. Compared to the background migration algorithm in Bullfrog, SLSM limits simulated user transactions to tuples that have not yet been migrated, rather than all tuples for the entire table, and therefore has a shorter migration time window. Although SLSM has better latency performance, it is not significant compared to Bullfrog. Since each transaction in the TPCC benchmark involves requests on many different tables, there may be only two or three requests on the new schema. Therefore, we simplify each type of transaction in TPCC by only evaluating the latency of those requests on the new schema. As shown in the lower left subfigure, the latency of SLSM improves by about 40% overall compared to Bullfrog, and we illustrate the latency improvement for each specific type of transaction in the lower right subfigure.

5.3 Ablation Experiments

We then evaluate the optimization of migration transactions and user transactions in SLSM (see Section 4.2). For visual comparison, we start schema migration directly and stop the background migration process, using TPCC transaction (evaluating requests on the new schema only). As shown in Fig. 9, we label the basic SLSM, the basic SLSM with migration transactions optimization only, the basic SLSM with user transactions optimization only, and the final SLSM as “Basic", “Basic-with-mig-trans-optim", “Basic-with-user-trans-optim", and “Basic-with-all". It turns out that each of our optimization methods performs as expected. Either optimization alone reduces the latency of user transactions in the basic SLSM, while combining the two optimization methods allows SLSM to achieve the best performance.

Refer to caption
Figure 9: Latency during migration under different SLSMs

5.4 Impact of Network Round-Trip Time

Now we examine how network round-trip time (RTT) affects schema migration performance. Network RTT is an important metric for distributed databases, as data are often distributed between multiple nodes or locations, which means that data operations (e.g., read, write, and transactions) involve network communication. Our experimental setup is similar to the above, with the addition of measurements for the Upperbound and Bullfrog. As shown in Fig. 10, the average RTT in the experimental results from left to right are 1.15ms, 11.78ms and 22.33ms, respectively. We observe that as the RTT grows, the latency of SLSM converges more towards the Upperbound. As network communication overhead gradually becomes the bottleneck of transaction execution performance, since migration transaction optimization and user transaction optimization in SLSM can reduce the network communication overhead to a certain extent, it has a latency curve closer to Upperbound compared to Bullfrog. At low RTT, the reduction in network communication latency in migration transaction optimization by SLSM is not significant compared to its extra overhead in the data’s key. However, the optimization of user transactions still offers an advantage to SLSM.

Refer to caption
Figure 10: Latency during migration with different network round-trip time

5.5 Stand-alone and Clustered

Although SLSM is designed for distributed shared-nothing databases, it is also applicable on stand-alone databases. In the final experiment, we analyze SLSM in a single node cluster; the results are shown in Fig. 11. In addition to Bullfrog and SLSM, we also compare the basic SLSM with user transaction optimization only since migration transaction optimization designed specifically for network communication overheads no longer works. It turns out that SLSM still has a performance advantage over the current state-of-the-art standalone database lazy schema migration scheme and performs better with the migration transaction optimization removed. Migration transaction optimization struggles more in terms of applicability than user transaction optimization. Specifically, the former optimizes network communication overhead and, more importantly, effectively reduces the time that user transactions wait for execution. For standalone databases that do not require a network for node communication, the work we did with user transaction optimization in SLSM still paid off.

Refer to caption
Figure 11: Latency during migration in a single-node cluster

6 Conclusion

Schema migrations on shared-nothing databases typically last a long time, since they are accompanied by massive data movement, resulting in a service vacuum before the new schema is available. In this paper, we propose SLSM, a lazy migration strategy, to perform online schema changes so that the new schema can be immediately ready for access, even when the physical data has not yet been migrated to the new schema. SLSM improves the performance of user transactions during migration by decreasing the network communication overhead and the waiting time for user transactions. Experimental results indicate that SLSM can accomplish schema migration on shared-nothing databases with high quality and with low impact on user transaction latency. The solution not only works on shared-nothing databases, but is also applicable for stand-alone database systems. {credits}

6.0.1 Acknowledgements

This work is supported by the Natural Science Basic Research Program of Shaanxi under Grant No.2023-JC-QN-0648, the National Natural Science Foundation of China under Grant No.62302370.

References

  • [1] Facebook online schema change. https://www.facebook.com/notes/mysql-at-facebook/online-schema-change-for-mysql/430801045932/ (2010)
  • [2] Oak online alter table. https://shlomi-noach.github.io/openarkkit/oak-online-alter-table.html (2010)
  • [3] Large hadron migrator. https://github.com/soundcloud/lhm (2012)
  • [4] Github online schema change. https://github.com/github/gh-ost (2016)
  • [5] Percona online schema change. https://www.percona.com/doc/percona-toolkit/2.2/pt-online-schema-change.html (2016)
  • [6] Bhattacherjee, S., Liao, G., Hicks, M., Abadi, D.J.: Bullfrog: Online schema evolution via lazy evaluation. In: Proceedings of the 2021 International Conference on Management of Data. pp. 194–206 (2021)
  • [7] De Jong, M., van Deursen, A., Cleve, A.: Zero-downtime sql database schema evolution for continuous deployment. In: 2017 IEEE/ACM 39th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP). pp. 143–152. IEEE (2017)
  • [8] Difallah, D.E., Pavlo, A., Curino, C., Cudre-Mauroux, P.: Oltp-bench: An extensible testbed for benchmarking relational databases. Proceedings of the VLDB Endowment 7(4), 277–288 (2013)
  • [9] Dumitras, T., Narasimhan, P.: No downtime for data conversions: Rethinking hot upgrades. Parallel Data Laboratory pp. 1–8 (2009)
  • [10] Faleiro, J.M., Abadi, D.J., Hellerstein, J.M.: High performance transactions via early write visibility. Proceedings of the VLDB Endowment 10(5) (2017)
  • [11] Faleiro, J.M., Thomson, A., Abadi, D.J.: Lazy evaluation of transactions in database systems. In: Proceedings of the 2014 ACM SIGMOD international conference on Management of data. pp. 15–26 (2014)
  • [12] Hu, T., Wang, T., Zhou, Q.: Online schema evolution is (almost) free for snapshot databases. arXiv preprint arXiv:2210.03958 (2022)
  • [13] Huang, D., Liu, Q., Cui, Q., Fang, Z., Ma, X., Xu, F., Shen, L., Tang, L., Zhou, Y., Huang, M., et al.: Tidb: a raft-based htap database. Proceedings of the VLDB Endowment 13(12), 3072–3084 (2020)
  • [14] Lamport, L.: Paxos made simple. ACM SIGACT News (Distributed Computing Column) 32, 4 (Whole Number 121, December 2001) pp. 51–58 (2001)
  • [15] Laukkanen, E., Itkonen, J., Lassenius, C.: Problems, causes and solutions when adopting continuous delivery—a systematic literature review. Information and Software Technology 82, 55–79 (2017)
  • [16] Neamtiu, I., Dumitraş, T.: Cloud software upgrades: Challenges and opportunities. In: 2011 International Workshop on the Maintenance and Evolution of Service-Oriented and Cloud-Based Systems. pp. 1–10. IEEE (2011)
  • [17] Ongaro, D., Ousterhout, J.: In search of an understandable consensus algorithm. In: 2014 USENIX annual technical conference (USENIX ATC 14). pp. 305–319 (2014)
  • [18] Qiu, D., Li, B., Su, Z.: An empirical analysis of the co-evolution of schema and code in database applications. In: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering. pp. 125–135 (2013)
  • [19] Rae, I., Rollins, E., Shute, J., Sodhi, S., Vingralek, R.: Online, asynchronous schema change in f1. Proceedings of the VLDB Endowment 6(11), 1045–1056 (2013)
  • [20] Saur, K., Dumitraş, T., Hicks, M.: Evolving nosql databases without downtime. In: 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME). pp. 166–176. IEEE (2016)
  • [21] Shasha, D., Llirbat, F., Simon, E., Valduriez, P.: Transaction chopping: Algorithms and performance studies. ACM Transactions on Database Systems (TODS) 20(3), 325–363 (1995)
  • [22] Sheng, Y.: Non-blocking lazy schema changes in multi-version database management systems. CMU MS Thesis (2019)
  • [23] Stonebraker, M.: The case for shared nothing. IEEE Database Eng. Bull. 9(1),  4–9 (1986)
  • [24] Taft, R., Sharif, I., Matei, A., VanBenschoten, N., Lewis, J., Grieger, T., Niemi, K., Woods, A., Birzin, A., Poss, R., et al.: Cockroachdb: The resilient geo-distributed sql database. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. pp. 1493–1509 (2020)
  • [25] Zhang, Y., Power, R., Zhou, S., Sovran, Y., Aguilera, M.K., Li, J.: Transaction chains: achieving serializability with low latency in geo-distributed storage systems. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. pp. 276–291 (2013)
  • [26] Zhu, Y.: Towards Automated Online Schema Evolution. University of California, Berkeley (2017)