An Empirical Study of Protocols in Smart Contracts
Abstract.
Smart contracts are programs that are executed on a blockhain. They have been used for applications in voting, decentralized finance, and supply chain management. However, vulnerabilities in smart contracts have been abused by hackers, leading to financial losses. Understanding state machine protocols in smart contracts has been identified as important to catching common bugs, improving documentation, and optimizing smart contracts. We analyze Solidity smart contracts deployed on the Ethereum blockchain and study the prevalence of protocols and protocol-based bugs, as well as opportunities for gas optimizations.
1. Introduction
Smart contracts are general-purpose programs that are run on a blockchain and maintain state stored on a blockchain’s ledger. They have been used in a variety of applications, including voting, decentralized finance, and supply chain management (IBM, 2019; Elsden et al., 2018). However, vulnerabilities in smart contracts can be exploited and have serious consequences, such as the well-known DAO hack in which over $50 million worth of cryptocurrency was stolen (Popper, 2016). Smart contracts on Ethereum cannot be modified once they are deployed, making bugs difficult to fix retroactively. In addition, on Ethereum, the most common platform for smart contracts, users must pay gas in order to run transactions, which is used to reward miners and prevent non-terminating transactions. High gas prices can make interacting with smart contracts prohibitively expensive, limiting their usefulness (Rozen, 2021).
Smart contracts have been observed to often follow protocols that can be represented by state machines (Foundation, 2021). Protocols specify that a contract’s functions must be called in a particular order. A contract may have multiple abstract states that it transitions between, where each abstract state determines which functions can be called in that particular state. For instance, consider a wallet contract whose public functions are shown in Figure 1. Its purpose is to transfer its token balance to a beneficiary after a fixed amount of time. The wallet begins in an initial state, and to initiate the transfer, the owner of the wallet must call lock() to lock the wallet for a set period of time. Only after lock() has been called and the time has elapsed can this state can release() be called to transfer the wallet’s balance to the beneficiary.
The concept of typestate (Strom and Yemini, 1986) allows one to specify the set of valid operations which can be called on a contract, which may change as the contract transitions to different states. New programming languages for smart contracts, including Obsidian (Coblenz et al., 2020) and Flint (Schrans et al., 2018), have proposed that typestate can be used to express protocols more explicitly and statically check adherence to protocols. One benefit of such systems is that they allow developers to specify functions and fields that are only in scope in certain states. Failure to change or reset fields between state transitions can lead to bugs that are difficult to track down. In addition, accessing storage on the blockchain is costly, and using typestate may allow contracts to lower their gas usage by clearing fields or avoiding unnecessary writes to storage.
We conducted an empirical study of Solidity smart contracts to demonstrate that protocols are commonly used in smart contracts and that typestate can help contracts avoid bugs and use less gas. We collected a sample of 100 smart contracts and determined that 37 of them defined protocols. We wrote a static analysis tool, StateOptimizationDetector, to find opportunities to lower the cost of transactions by taking advantage of Ethereum’s gas refund policy. We ran StateOptimizationDetector on a sample of 1,000 contracts and found that 80 (0.8%) of them could use this optimization, with a false positive rate of 28.6%.
We investigated the prevalence of one access control bug, which allows previous owners of a contract to unexpectedly reclaim ownership, to show that the incorrect implementation of protocols can create vulnerabilities. We wrote a static analysis tool, AccessLockDetector, to identify contracts which could have bugs in the implementation of their access control mechanisms. We ran AccessLockDetector on a sample of 4,500 contracts and identified 36 (0.8%) of them via manual examination as vulnerable to this exploit, with a false positive rate of 34.5%. We observed a high level of similarity among the identified vulnerable contracts, suggesting that many of them originated from a common source. We hypothesize that language features that allow protocols to be expressed more directly can help contract writers avoid these bugs or catch them before deployment.
2. Methodology
To inform our understanding of how smart contracts define and use protocols, we first tried to identify contracts with protocols by looking for patterns in contracts’ source code. We built a detector using the Slither static analysis tool (Feist et al., 2019), named ProtocolDetector, that identifies contracts with protocols by searching for reverting execution paths in functions that resulted from reading the contract’s fields. The motivation behind this is that when a function in a protocol-defining contract is called, the contract will often first check that it is in an appropriate abstract state, which is typically represented by one or more fields. Reverting paths that result from reading fields often indicate that the protocol has been violated. Similar patterns have been used to search for evidence of object protocols in Java (Beckman et al., 2011). For example, in Figure 1, the function release will revert if the values of the fields isLocked and isReleased are not true and false, respectively. This indicates that these fields represent the contract’s abstract state, and that this contract defines a protocol.
We incorporated several heuristics specific to Solidity to help the detector find common patterns indicative of protocols. For example, Solidity allows the definition of modifiers, which can be attached to function declarations to insert code that is run before or after the main body of the function. When checking a function for state patterns, the detector will also check the body of any of its modifiers, since they are commonly used to ensure that the contract is in a valid state before executing the function.
2.1. Finding protocol-based optimizations
We then identified a method to lower the gas cost of some transactions using Ethereum’s refund mechanism. When a storage value is reset (set to zero from non-zero), a refund of (up to) 15,000 gas is given back to the sender of the transaction (Wood et al., 2014). Contracts with certain types of protocols may be able to use the refund to lower the cost of some of their transactions. When a contract transitions to a new state, some of its fields may no longer be used for the remaining lifetime of the contract. These fields could therefore be reset during this state transition to take advantage of the refund, lowering the overall gas cost. This optimization encourages the clearing of unused fields, reducing the amount of unneeded state that is kept on the Ethereum blockchain (Moosavi and Clark, 2021). See Figure 2 for an example case of this optimization. Here, the function addToX is “deactivated” by calling stop(). Since the field amountToAdd is only used in the function addToX, after stop() is called, amountToAdd can safely be reset without affecting any observable behavior of the contract. It should be noted that a recent proposal to Ethereum that is included in the London hard fork, EIP-3529 (Buterin and Swende, 2021), reduces the amount of gas that is refunded by clearing storage, making this optimization no longer viable after the proposal is implemented. However, this optimization is still applicable to earlier versions of Ethereum.
To investigate how useful this optimization was, we identified two categories of protocols (described in Section 2) that indicated that this optimization was applicable. Then we built a detector, named StateOptimizationDetector, to find contracts with these protocols. This detector infers the contract’s transition graph between different abstract states by identifying the set of states a contract can be in before and directly after calling a function. The detector uses the state transition graph to identify fields that will never be used after a state transition, meaning that it will never be read from or written to in any function that can be called in the current state or any possible future state.
We manually examined each candidate contract reported by the StateOptimizationDetector. During this process, we checked that each candidate contract had one of the two protocol patterns, and that each identified field could safely be reset without affecting the contract’s expected behavior. One issue we ran into was that some fields are declared public, meaning that a public getter function for the field is automatically generated and added to the contract’s interface. Thus, resetting this field could affect the contract’s clients that access this getter function after the state transition, even if it does not affect the behavior of the contract itself. However, the fact that this field is no longer used may also indicate that this field is conceptually no longer in scope during this state. Whether to count such a field depends on the specifics of the contract, and when manually examining the results, we only counted the fields that we deemed safe to reset. For example, in some contracts, we noted that fields would not be used within the contract, but were simply used to maintain some status that would be useful to other contracts. An example is shown in Figure 3. Such contracts were discarded during the manual review process.
2.2. Finding bugs in access control contracts
Access control mechanisms are used in smart contracts to restrict access to certain administrative functions. An example of an access control contract is the Ownable contract shown in Fig. 4, which defines an onlyOwner modifier that can be attached to functions, which allows only the owner of a contract to call them. Contracts can inherit from the Ownable contract and then use the onlyOwner modifier to restrict access to certain functions.
Access control contracts also commonly have the ability to transfer ownership to another address and renounce ownership (preventing anyone from calling onlyOwner functions). Some access control contracts also have a temporary lock feature, which allows the owner to temporarily relinquish ownership, preventing themselves and anyone else from calling onlyOwner functions for a set period of time. During this time, users of a contract can be assured that onlyOwner functions cannot be called. A contract with both a temporary lock and the ability to renounce ownership can be modeled with a protocol with three abstract states, Locked, Unlocked, and Renounced.
However, we observed that many implementations of this temporary lock feature violate this protocol and create a vulnerability in the expected behavior of renouncing and transferring ownership. As shown in Table 1, after the function unlock is called once, it can continue to be called even after the owner has renounced or transferred ownership, since the value of _previousOwner is never reset. This interaction may affect users of a contract who make decisions based on the assumption that onlyOwner functions cannot be called after a contract is renounced. For instance, a malicious owner of a token contract could renounce ownership to convince prosepective buyers that the price of the token is fixed, then use this exploit to reclaim ownership and raise the price of the token.
_owner | _previousOwner | Contract state | |
---|---|---|---|
Contract is initialized by an address addr. _owner is set to addr. | addr | 0x0 | Unlocked |
Owner calls lock(0). | 0x0 | addr | Locked |
Owner calls unlock(). | addr | addr | Unlocked |
Owner calls renounceOwnership(). | 0x0 | addr | Renounced |
Owner calls unlock(). | addr | addr | Unlocked |
To investigate how commonly this bug occurs, we built a detector, AccessLockDetector, which is designed to identify access control contracts which define temporary lock mechanisms. To find access control contracts, it looks for contracts that have an “owner” address field, which represents the only address that can call restricted functions, and define an onlyOwner modifier, which modifies functions so that they can only be called by the contract’s owner. Then, it looks for a “locking” function in which the owner address is first assigned to another field and then reset. This pattern indicates that owner’s access has been disabled, and that the owner address has been saved to another field so access can be re-enabled later. We observed that contracts that implemented this temporary lock mechanism frequently contain a bug that breaks the expected behavior of renouncing and transferring ownership.
3. Results
3.1. Prevalence of protocol-defining contracts
We ran the ProtocolDetector on a random sample of 100 contracts from the SmartBugs Wild dataset (Durieux et al., 2020), a collection of over 47,398 unique Solidity contracts. ProtocolDetector identified 71 protocols in 37 contracts. We manually reviewed each reported protocol and categorized them based on each protocol’s intended use and method of transitioning between states. We determined that 2 (2.8%) of the protocols found were false positives. However, in both cases, the contract defined other protocols, so in total, 37 of the contracts defined protocols. The categorization of protocols, which is based of a categorization of Java protocols in (Beckman et al., 2011), is shown in Table 2.
Category | Count | Description |
---|---|---|
Toggle | 21 | The owner of a contract can enable/disable functionality at will. |
Activation | 12 | Some functions become eventually enabled. |
Redundant Operation | 12 | Prevents a function from being called twice. |
Deactivation | 10 | Some functions become eventually disabled. |
Threshold | 6 | A function is enabled once a counter reaches a certain value. |
Interval | 3 | A function is enabled during a specified interval of time. |
Lifecycle | 3 | The contract transitions through various states modeling the lifetime of the contract. |
Other | 2 |
3.2. Protocol-based optimizations
From the categories of protocols that we had observed, two of them indicated contracts which could possibly be optimized by resetting fields. Deactivation protocols have two states that are represented by a boolean field and have a set of functions that are initially enabled. Once the boolean value changes, the functions are permanently disabled. Lifecycle protocols typically have a set of states through which the contract monotonically transitions through, eventually ending in some terminal state. Contracts which have either Deactivation or Lifecycle protocols permanently lose functionality as the contract transitions, which may lead to fields which are no longer used and can be safely reset.
We ran the StateOptimizationDetector on a random sample of 1,000 contracts from the SmartBugs Wild dataset. The StateOptimizationDetector reported 112 of these as candidates for optimization, 80 of which we determined were true positives.
Category | Candidates | Optimizable |
---|---|---|
Deactivation | 96 | 73 |
Lifecycle | 16 | 7 |
Total | 112 (11.2%) | 80 (8.0%) |
3.3. Protocol-based access control bugs
We ran the AccessLockDetector on a random sample of 4,500 contracts from the Smart Contract Sanctuary (Ortner and Eskandari, 2021), a collection of over 99,582 smart contracts deployed on the main Ethereum blockchain, of which the AccessLockDetector flagged 55 of these contracts. We used the Smart Contract Sanctuary dataset because it is updated daily, and we observed that the access control bug we were looking for only appeared in contracts that were deployed within the past year. We chose a sample of 4,500 contracts so that all of the contracts flagged by the detector could be reviewed by hand. We then manually examined each candidate contract, checking if it was an access control contract that defined a temporary lock as well as functions for renouncing ownership. Then we checked if its implementation allowed the unlock function to be called when the contract was already unlocked, creating the vulnerability where ownership could be reclaimed after renouncing ownership. After examining the positive results, we determined that 36 of them were access control contracts that defined a temporary lock feature and had this vulnerability.
All of the 36 true positives followed the same structure as in Figure 4 in defining the lock and unlock functions. 34 of these contracts contained a misleading error message in the unlock function (line 53 of Figure 4), and 33 of them contained the same typo in a function name (line 38 of Figure 4). In addition, all of the contracts passed an incorrect argument to the OwnershipTransferred event in the lock function (line 47 of Figure 4; the first argument should be _previousOwner, since _owner has already been reset). Although events are not used by other smart contracts, this could provide incorrect information to applications outside the blockchain that are subscribed to this event.
4. Related Work
Previous studies have employed static analysis and symbolic execution tools to analyze bugs and security vulnerabilities in smart contracts. We built our detectors using Slither (Feist et al., 2019), an extensible static analysis tool which analyses Solidity source code and converts it into a convenient intermediate representation. Oyente (Luu et al., 2016) is a symbolic execution tool that finds common bugs involving issues with transaction-ordering dependence and reentrancy attacks. Smartcheck (Tikhomirov et al., 2018) runs a syntactical analysis on Solidity source code to detect a number of code issues and poor practices.
A high level of code cloning and duplication among smart contracts has been shown through various studies and smart contract datasets. In the SmartBugs Wild dataset (Durieux et al., 2020), only 47,398 out of 972,855 (4.8%) verified contracts (those with source code available) were unique. In a dataset of 22,493 smart contracts used for (Kalra et al., 2018), only 1,524 (6.8%) of contracts were unique. A study on code cloning in smart contracts found that only 20.8% of verified contracts were not a clone of any other verified contract (Kondo et al., 2020).
Gasper (Chen et al., 2017) is a tool that locates gas-costly patterns by analyzing contracts’ bytecodes. It reduces the cost of transactions by eliminating dead code and rewriting expensive loop operations. We are not aware of any tools which optimize gas usage by taking advantage of Ethereum’s gas refund policy.
5. Discussion and Future Work
We have shown that protocols appear relatively frequently in Solidity smart contracts, with 37% of contracts defining protocols. For comparison, it is estimated that 7.2% of types in Java programs define protocols (Beckman et al., 2011). Therefore, we believe the implementation of protocols is an important consideration for the design of languages for smart contracts. Moreover, a number of categories of protocols allow for gas optimizations, where fields can safely be reset to reduce the cost of transactions.
We have demonstrated that approximately 0.8% of deployed Ethereum contracts contain a bug in their access control mechanisms that stems from a failure to accurately model a contract’s protocol. We believe that introducing language features such as typestate could help developers avoid writing these bugs like this in the first place. Analyzing the Ownable contract from Figure 4, we notice that the vulnerability arises from the fact that the unlock function can be called even when the contract is not locked because the value of _previousOwner is never reset. In a language such as Obsidian, one can describe a contract as having multiple abstract states, and functions and fields can be specified as being in scope only in certain states. This allows the expected behavior of the Ownable to be expressed more easily: the Ownable contract has three states, Locked, Unlocked, and Renounced, and the unlock function is only in scope in the Locked state. In addition, one can avoid having to define a _previousOwner field altogether simply by specifying the onlyOwner modifier to only be in scope in the Unlocked state, preventing onlyOwner functions from being called if the contract is in the Locked or Renounced states.
We found that many of the vulnerable contracts flagged by the AccessLockDetector had nearly identical code in their access control functions and contained the same mistakes in function names and error messages. This supports previous results about the high level of code cloning and duplication between smart contracts. We hypothesize that these mistakes indicate that these contracts were copied from a single source, rather than created independently. This suggests that the practice of copying code from other contracts may introduce vulnerabilities and propagate the spread of vulnerable contracts.
In future work, we would like to investigate other ways protocols can be used to reduce gas usage. For instance, if two fields are used in different states, it may be possible to use the same storage for both fields, eliminating the cost of reserving an extra storage spot. We would also be interested in studying protocol-related asset bugs, where a contract’s assets are lost or can no longer be accessed due to a state transition. We would also like to reduce the false positive rates on our StateOptimizationDetector and AccessLockDetector to make them more useful to Solidity developers. We hope this work can inform and motivative future research into the design of languages and type systems for smart contracts.
References
- (1)
- Beckman et al. (2011) Nels E. Beckman, Duri Kim, and Jonathan Aldrich. 2011. An Empirical Study of Object Protocols in the Wild. In ECOOP 2011 – Object-Oriented Programming, Mira Mezini (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 2–26.
- Buterin and Swende (2021) Vitalik Buterin and Martin Swende. 2021. EIP-3529: Reduction in refunds. https://eips.ethereum.org/EIPS/eip-3529
- Chen et al. (2017) Ting Chen, Xiaoqi Li, Xiapu Luo, and Xiaosong Zhang. 2017. Under-optimized smart contracts devour your money. In 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 442–446.
- Coblenz et al. (2020) Michael Coblenz, Reed Oei, Tyler Etzel, Paulette Koronkevich, Miles Baker, Yannick Bloem, Brad A. Myers, Joshua Sunshine, and Jonathan Aldrich. 2020. Obsidian: Typestate and Assets for Safer Blockchain Programming. ACM Trans. Program. Lang. Syst. 42, 3, Article 14 (Nov. 2020), 82 pages. https://doi.org/10.1145/3417516
- Durieux et al. (2020) Thomas Durieux, João F. Ferreira, Rui Abreu, and Pedro Cruz. 2020. Empirical Review of Automated Analysis Tools on 47,587 Ethereum Smart Contracts. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (Seoul, South Korea) (ICSE ’20). Association for Computing Machinery, New York, NY, USA, 530–541. https://doi.org/10.1145/3377811.3380364
- Elsden et al. (2018) Chris Elsden, Arthi Manohar, Jo Briggs, Mike Harding, Chris Speed, and John Vines. 2018. Making Sense of Blockchain Applications: A Typology for HCI. Association for Computing Machinery, New York, NY, USA, 1–14. https://doi.org/10.1145/3173574.3174032
- Feist et al. (2019) Josselin Feist, Gustavo Grieco, and Alex Groce. 2019. Slither: A Static Analysis Framework for Smart Contracts. 2019 IEEE/ACM 2nd International Workshop on Emerging Trends in Software Engineering for Blockchain (WETSEB) (May 2019). https://doi.org/10.1109/wetseb.2019.00008
- Foundation (2021) Ethereum Foundation. 2021. Common Patterns. https://docs.soliditylang.org/en/v0.8.6/common-patterns.html#state-machine
- IBM (2019) IBM. 2019. Blockchain for Supply Chain - IBM Blockchain. https://www.ibm.com/blockchain/industries/supply-chain
- Kalra et al. (2018) Sukrit Kalra, Seep Goel, Mohan Dhawan, and Subodh Sharma. 2018. Zeus: Analyzing safety of smart contracts.. In NDSS. 1–12.
- Kondo et al. (2020) Masanari Kondo, Gustavo Oliva, Zhen Jiang, Ahmed E. Hassan, and Osamu Mizuno. 2020. Code Cloning in Smart Contracts: A Case Study on Verified Contracts from the Ethereum Blockchain Platform. Empirical Software Engineering 25 (11 2020). https://doi.org/10.1007/s10664-020-09852-5
- Luu et al. (2016) Loi Luu, Duc-Hiep Chu, Hrishi Olickel, Prateek Saxena, and Aquinas Hobor. 2016. Making Smart Contracts Smarter. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (Vienna, Austria) (CCS ’16). Association for Computing Machinery, New York, NY, USA, 254–269. https://doi.org/10.1145/2976749.2978309
- Moosavi and Clark (2021) Mahsa Moosavi and Jeremy Clark. 2021. Lissy: Experimenting with on-chain order books. arXiv:2101.06291 [cs.CR]
- Ortner and Eskandari (2021) Martin Ortner and Shayan Eskandari. 2021. Smart Contract Sanctuary. (July 2021). https://github.com/tintinweb/smart-contract-sanctuary
- Popper (2016) Nathaniel Popper. 2016. A Hacking of More Than $50 Million Dashes Hopes in the World of Virtual Currency. https://www.nytimes.com/2016/06/18/business/dealbook/hacker-may-have-removed-more-than-50-million-from-experimental-cybercurrency-project.html
- Rozen (2021) Jacob Rozen. 2021. The ridiculously high cost of Gas on Ethereum. https://coingeek.com/the-ridiculously-high-cost-of-gas-on-ethereum/
- Schrans et al. (2018) Franklin Schrans, Susan Eisenbach, and Sophia Drossopoulou. 2018. Writing Safe Smart Contracts in Flint. In Conference Companion of the 2nd International Conference on Art, Science, and Engineering of Programming (Nice, France) (Programming’18 Companion). Association for Computing Machinery, New York, NY, USA, 218–219. https://doi.org/10.1145/3191697.3213790
- Strom and Yemini (1986) Robert E. Strom and Shaula Yemini. 1986. Typestate: A programming language concept for enhancing software reliability. IEEE Transactions on Software Engineering SE-12, 1 (1986), 157–171. https://doi.org/10.1109/TSE.1986.6312929
- Tikhomirov et al. (2018) Sergei Tikhomirov, Ekaterina Voskresenskaya, Ivan Ivanitskiy, Ramil Takhaviev, Evgeny Marchenko, and Yaroslav Alexandrov. 2018. SmartCheck: Static Analysis of Ethereum Smart Contracts. In 2018 IEEE/ACM 1st International Workshop on Emerging Trends in Software Engineering for Blockchain (WETSEB). 9–16.
- Wood et al. (2014) Gavin Wood et al. 2014. Ethereum: A secure decentralised generalised transaction ledger. Ethereum project yellow paper 151, 2014 (2014), 1–32.