Abstract

The security of blockchain smart contracts is one of the most emerging issues of the greatest interest for researchers. This article presents an intermediate specification language for the formal verification of Ethereum-based smart contract in Coq, denoted as Lolisa. The formal syntax and semantics of Lolisa contain a large subset of the Solidity programming language developed for the Ethereum blockchain platform. To enhance type safety, the formal syntax of Lolisa adopts a stronger static type system than Solidity. In addition, Lolisa includes a large subset of Solidity syntax components as well as general-purpose programming language features. Therefore, Solidity programs can be directly translated into Lolisa with line-by-line correspondence. Lolisa is inherently generalizable and can be extended to express other programming languages. Finally, the syntax and semantics of Lolisa have been encapsulated as an interpreter in mathematical tool Coq. Hence, smart contracts written in Lolisa can be symbolically executed and verified in Coq.

1. Introduction

The blockchain platform [1] is one of the emerging technologies developed to address a wide range of disparate problems, such as those associated with cryptocurrency [2] and distributed storage [3]. Presently, this technology has gained interest from the finance sector [4]. Ethereum is one of the most widely adopted blockchain systems. One of the most important features of Ethereum is that it implements a very flexible general-purpose Turing-complete programming language denoted as Solidity [5]. This allows for the development of arbitrary applications and scripts that can be executed in a virtual runtime environment denoted as the Ethereum Virtual Machine (EVM) to conduct blockchain transactions automatically. These applications and scripts (i.e., programs) are collectively denoted as smart contracts, which have been widely used in many critical fields, such as the medical [6] and financial fields. The growing use of smart contracts has led to an increased scrutiny of their security. Smart contracts can include particular properties (i.e., bugs) making them susceptible to deliberate attacks that can result in direct economic loss. Some of the largest attacks on smart contracts are well known, such as the attack on decentralized autonomous organization (DAO) and Parity wallet [7] contracts. In fact, many classes of subtle bugs, ranging from transaction-ordering dependencies to mishandled exceptions, exist in smart contracts [8].

The present article capitalizes upon our past work by defining the formal syntax and operational semantics for a large subset of the Solidity version 0.4. This subset is denoted herein as Lolisa and has the following features.

Consistency Lolisa formalizes most of the types, operators, and mechanisms of Solidity according to Solidity documentation. As such, programs written in Solidity can be translated into Lolisa, and vice versa, with a line-by-line correspondence without rebuilding or abstracting, which are operations that can negatively impact consistency.

Static Type System The formal syntax in Lolisa is defined using generalized algebraic datatypes (GADTs) [9], which impart static type annotation to all the values and expressions of Lolisa. In this way, Lolisa has a stronger static type system than Solidity for checking the construction of programs.

Executable and Provable In contrast to similar efforts focused on building formal syntax and semantics for high-level programming languages, the formal semantics of Lolisa are defined based on the GERM framework in conjunction with EVI. Therefore, it is theoretically possible for ethereum-based smart contracts written in Lolisa to be symbolically executed and have their properties simultaneously verified automatically in higher-order logic theorem-proving assistants directly when conducted in conjunction with a formal interpreter developed based on GERM framework.

Mechanized and Validated The syntax and semantics of Lolisa are mechanized using the Coq proof assistant [10]. We also develop a formal verified interpreter in Coq to validate whether Lolisa satisfies the above Executable and Provable feature and the meta-properties of the semantics. The details regarding the implementation of our formal interpreter have been presented in another paper [11].

The remainder of this paper is structured as follows. Section 2 introduces related work regarding the programming language formalization. Section 3 introduces the overall structure of the specification language framework and provides predefinitions of Lolisa syntax and semantics. Section 4 elaborates on the formal abstract syntax of Lolisa and compares this with the formal abstract syntax of Solidity. Section 5 presents the formal dynamic semantics of Lolisa, including the program execution semantics and the formal standard library for the built-in data structures and functions of EVM. Section 6 describes the integration of the Lolisa programming language and its semantics within the formal verified interpreter FEther. Section 7 discusses the contributions and limitations of our current work. Finally, Section 8 presents the conclusions of our work.

Software engineering techniques employing such static and dynamic analysis tools as Manticore [12] and Mythril [13] have not yet been proven to be effective at increasing the reliability of smart contracts.

KEVM [14] is a formal semantics for the EVM written using the K-framework, like the formalization conducted in Lem [15]. KEVM is executable, and therefore can run the validation test suite provided by the Ethereum foundation. The symbolic reasoning conducted for KEVM programs involves specifying properties in Reachability Logic and verifying them with a separate analysis tool. While these represent currently available mechanized formalizations of operational semantics, axiomatic semantics, and formal low-level programming verification tools for EVM and Solidity bytecode [16], they are not well-suited for high-level programming languages, such as Solidity. In response, the Ethereum community has placed open calls for formal verification proposals [17] as part of a concerted effort to develop formal verification strategies [18]. Fuzzing testing is an efficient and effective testing technique. Presently, numerous projects develop fuzzing in smart contracts to analyze vulnerabilities, such as ReGuard [19]. Securify [20] is a type of Ethereum-based smart contracts security analyzer based on static analysis. It verifies the behavior of target smart contracts based on the given security properties at the Ethereum virtual machine bytecode level. Securify provides a kind of domain-specific language which can write security properties according to the attack reports and the basic practices. MadMax [21] is a static program analysis framework that takes the Ethereum bytecode as analysis source code and automatically analyzes common vulnerabilities such as the integer and memory overflows vulnerabilities. Besides, it is the first tool that allows for loop specifications to be defined by a dynamic property. In this manner, this tool can avoid loop explosion during the verification process. Similarly to OYENTE, Ehtir [22] is also a type of rule-based static analyzer for the bytecode of Ethereum smart contracts. This tool can produce control flow graphs and includes the whole possible execution addresses. VeriSolid [23] is a formal verification framework which can be accessed through the web directly. Its foundational concept is FSolidM [24]. In brief, the VeriSolid presents a formal verification framework which provides an approach for semiautomatically developing the correct formal specifications of smart contracts. A new approach is presented in Abdellatif and Brousmiche [25] which can model the execution behaviors of target smart contracts based on a formal model checking language. This technique can be applied to verify the execution behavior and authority of target smart contracts by using model checking methods.

In other fields of computer science, a number of interesting studies have focused on developing mechanized formalizations of operational semantics for different high-level programming languages. The Park project [26] presents completely formalized denotational semantics and the corresponding syntax in the JavaScript language. The CompCert project [27] is another influential verification work for C and GCC that developed a formal semantics for a subset of C denoted as Clight. This work formed the basis for VST [28] and CompCertX [29]. In addition, a number of interesting formal verification studies have been conducted for operating systems based on the CompCert project. In addition, the operational semantics of JavaScipt also have been investigated [30], which is of particular importance to the present study because Solidity is a programming language like JavaScipt. However, few of the frameworks defined in these related works can be symbolically executed or analyzed in higher-order logic theorem-proving assistants directly.

3. Foundational Concepts

The overall architecture of Lolisa is shown in Figure 1. Table 1 summarizes the helper functions used in the dynamic semantic definitions. Table 2 lists the state functions used to calculate commonly needed values from the current state of the program. All of these state that functions will be encountered in the following discussion. Components of specific states will be denoted using the appropriate Greek letter subscripted by the state of interest. As shown in Table 2, the context of the formal memory space is denoted as M, where σ is employed to denote a specific memory state; the context of the execution environment is represented as ε; and we assign Λ to denote a set of memory addresses, where the meta-variable α is employed to represent an arbitrary address. Similarly, we define the function return address Λfun. In addition, struct is an important data structure in Lolisa. Therefore, we adopt Σ to represent the Lolisa struct information context, and Θ is employed to represent the set of pointers of the struct types. Also, the following type of assignments may include variables, so our types will include references to variable-typing contexts, which we will denote as Γ, Γ1, etc. Such contexts are finite mappings from variable names to types. Because programs may also contain references to the declared functions of a Solidity program, another mapping is needed from function identifiers to types. This mapping will be succinctly denoted as Φ, Φ1, etc. Furthermore, we assign Ω as the native value set of the basic logic system. For brevity in the following discussion, we will assign to represent the overall formal system combination of Σ, Γ, Θ, Ω, Φ, and Λ. Due to limitation of length, the details of Lolisa’s formalization have been presented in our online report (https://arxiv.org/abs/1803.09885).

4. Formal Syntax of Lolisa

4.1. Types

The formal abstract syntax of Lolisa types is given in Figure 2. Supported types include arithmetic types (integers in various sizes and signedness), byte types, array types, mapping types, as well as function types and struct types. Although Solidity is a JavaScript-like language, it supports pointer reference. Therefore, Lolisa also includes pointer types (including pointers to functions) based on label address specification. Furthermore, these types of annotations and relevant components can be easily formalized by enumerating inductively in Coq or other higher-order logic theorem-proving assistants. Lolisa does not support any of the type qualifiers such as const, volatile, and restrict, and these qualifiers are simply erased during parsing.

The types fill two roles in Lolisa. Firstly, they serve as type declarations of identifiers in statements and, secondly, they serve as signatures to specify the GADTs-style constructor of values and expressions for transmitting type information, which will be explained in the following sections. In Coq formalization, the term is declared as type according to rule 1, as follows:

Note that many types are defined in Figure 2 as parameterized types recursively. In this way, a specific type is dependent on the specified parameters and can abstract and express many different Solidity types.

One of the most important data types of Solidity is mapping types. In Solidity documentation [4], mapping types are declared as mapping (KeyType⇒ValueType). Here, _KeyType can be nearly any type except for a mapping, a dynamically sized array, a contract, and a struct. As shown in Figure 2, _KeyType is defined as Tmap (, ), where represents the _KeyType and represents the _ValueType. The best way to keep the terms in Lolisa well-typed and to ensure type safety is to maintain type isolation rather than adding corollary conditions. Therefore, we define a coordinate type for _KeyType employed in mapping. In particular, the address types in Lolisa are treated as a special struct type, so that _KeyType is allowed to be a struct type in Lolisa. In Coq formalization, shares the same constructor with that of type except for Tmap, and a term with type is recorded as according to rule 2, as follows:.

In Solidity, array types, which are defined according to an array index as Tarray (, ) in Coq, can be classified as fixed-size arrays and dynamic-size arrays. For fixed-size arrays, the size and index number are allowed to be declared by different data structures including constants, variables, struct, mapping, and field access values. These are respectively formalized as Array Index in Figure 2. Because the size of array types in Solidity can be dynamic, the dynamic-size array type in Lolisa is treated as a special mapping type of (Iint Signed I64).

As shown in Figure 3, (n)-dimensional mapping types, as well as array types, are widely defined in smart contracts. Due to the recursive inductive definition, Lolisa can express n-dimensional array types and n-dimensional mapping types easily, which is illustrated below by rules 3 and 4, respectively:

We classify and into normal form types and nonnormal form types. The normal form types refer to types whose typing rules disallow recursive definition, whereas recursive definition is allowed for nonnormal form types. For example, the normal form of should be . In Figure 2, the normal types are defined separately as Normal type.

4.2. Expressions

Having formally specified all the possible forms of values that may be declared and manipulated in Solidity programs, we now discuss the expressions used in programs to encapsulate values. As introduced in Section 4.1, all expressions and their subexpressions are defined with GADTs, which are annotated by two types of signatures according to rule 5, as follows:

Here, refers to the current expression type and refers to the normal form type after evaluation. For instance, we would define the type of an integer variable expression e as . In this way, the formal syntax of expressions becomes clearer and abstract, and allows the type safety of Lolisa expressions to be maintained strictly. In addition, employing the combination of the two types of annotations facilitates the definition of a very large number of different expressions based on equivalent constructors. Of course, the use of and may be subject to different limitations depending on the situation.

Constant expressions are used to denote the native values of the basic formal system, which are transformed from the respective Lolisa values. Therefore, and should satisfy rule 6 given below:

To satisfy the limitation TYPE-FORM, the array types and mapping types should be analyzed and simplified according to the type definitions given by Figure 2 into , which can be formulated as . We denote this process as .

In addition, as mentioned previously, the type information of the value level is successfully transmitted into a constant expression. For example, a value has type , and the constant expression Econst has type . Therefore, τ in Econst (v) is determined by . For example, has type , where τ is specified by the Tbool of . The type information of the expression level can also be transmitted to the statement level in the same way, which will be described specifically in the next section.

For operator expressions, Lolisa supports nearly all binary and unary operators and we adopt to simplify the formal abstract syntax. In Coq formalization, binary and unary operators are abstracted as an inductive type op that is also defined by GADTs, and specific operators serve as their constructors. In this way, operator expressions are made more clear and concise, and can be extended more easily than when employing a weaker static-type system. The binary and unary operators are annotated by two type signatures, as respectively given in rule 7, as follows:

4.3. Statements

Figure 4 defines the syntax of Lolisa statements. Here, nearly all the structured control statements of Solidity (i.e., conditional statements, loops, structure declarations, modifier definitions, contracts, returns, multivalue returns, and function calls) are supported, but Lolisa does not support unstructured statements such as goto and unstructured switches like the infamous “Duff’s device”. Besides, anonymous functions are forbidden in Lolisa because all functions must have a binding identifier to ensure that they are well formed. As previously discussed, the assignment of a right-value (r-value) to a left-value (l-value) , and modifier declarations, as well as function calls and structure declarations are treated as statements. In addition, statements are also classified according to normal form and nonnormal form categories, where the normal form statement, given as , represents a statement that halts after being evaluated. Actually, while Solidity is a Turing-complete language, smart contract programs written in Solidity have no existing halting problems because program execution is limited by gas, which we have defined in ε for Lolisa.

As defined in Figure 4, we still inductively classify statement definitions into a normal form , whose typing assignments must be conducted without recursive definition, and non-normal form statements. The normal form statements of Lolisa are defined as . The remaining statements are nonnormal form statements.

4.4. Macro Definition of Formal Abstract Syntax

The Lolisa formal syntax is too complex to be adopted by general users. Lolisa syntax includes the same components as those employed in Solidity; however, it has stricter formal typing rules. Therefore, Lolisa syntax must include some additional components not supported in Solidity, such as type annotations and a monad-type option. Moreover, Lolisa syntax is formally defined in Coq formalization as inductive predicates. Thus, a Lolisa code looks much more complicated than the corresponding Solidity code, even though both the codes demonstrate line-by-line correspondence. An example of this difficulty is illustrated in the code segments shown in Figures 5 and 6. The formal Lolisa version of the conditional statement in the pledge function in Figure 6 is much more complicated than that in the original Solidity version in Figure 5.

The degree of complexity poses a challenge for general users to write Lolisa codes manually and develop a translator between Lolisa and Solidity or another language. This is a common issue in nearly all similar higher-level language formalization studies.

Fortunately, Coq and other higher-order theorem-proving assistants provide a special macro-mechanism. In Coq, this mechanism is referred to as the notation mechanism. Here, a notation is a symbolic abbreviation denoting a term or term pattern automatically parsed by Coq. For example, the symbols in Lolisa can be encapsulated as shown in Figure 7.

The new formal version of this example yields the notation in Figure 8, which demonstrates that the notation is nearly equivalent to the original Solidity syntax.

Through this mechanism, we can hide the fixed formal syntax components used in verification and thereby provide users with a simpler syntax. Moreover, this mechanism makes the equivalence between real-world languages and Lolisa far more intuitive and user friendly. In addition, this mechanism improves verification automation. Similar to converting Figures 58, we develop a translator, constructed by a lexical analyzer and a parser, to automatically convert the Solidity program to the macro definitions of the Lolisa abstract syntax tree. The translation process is given in Figure 9. The textual scripts of Ethereum smart contracts will be analyzed by the lexical analyzer of translator, which will generate the Solidity token stream. According to the syntactic sugar of Lolisa, the lexical analyzer will generate the respective Lolisa token stream. Next, the parser will take the Solidity token stream as parameters and generate the parse tree of smart contracts. Finally, the tokens of the parse tree will be replaced by the Lolisa token stream, and then the parser will rebuild the Lolisa parse tree and output the respective formal smart contracts rewritten by Lolisa. In this manner, the translation process can be guaranteed to be completed mechanically.

5. Formal Semantics

5.1. Evaluation of Expressions

The semantics of expression evaluation are the rules governing the evaluation of Lolisa expressions into the memory address values of the GERM framework, and this process includes two parts: the l-value position evaluation and the r-value position evaluation. In contrast, modifier expressions are a special case that cannot be evaluated according to these expression evaluation semantics, but their evaluation is conducted according to rule 8:

Here, represents the process of evaluating a modifier expression both in the l-value position and the r-value position. And the example semantics are summarized in Figure 10.

5.1.1. Evaluating Expressions in the L-Value Position

In the following, we assign to denote the evaluation of expressions in the l-value position to yield respective memory addresses. First, most expressions constructed by Econst obviously cannot be employed as the l-value because most of these represent a Lolisa constant value at the expression level directly. For brevity, we assign to denote the recursive processes of array and map employed for searching the indexed addresses. Note that struct and field are forbidden to specify expressions in the l-value position to ensure that Lolisa is well-formed and well-behaved. The only means allowed in Lolisa of altering the fields of structures are using Estruct to either change all fields or declaring a new field. Although this limitation may be not friendly for programmers or verifiers, it avoids potential risks.

In the previous section, we defined the semantics of array values. Accordingly, we can define the address searching process based on the semantics of arrays as rule 9, which takes name, , , and as parameters. Similarly, we can define rule 10 below for mapping values:

5.1.2. Evaluating Expressions in the R-Value Position

In the following, we assign to denote the evaluation of expressions in the r-value position to yield the respective memory addresses.

As shown in Figure 10, the rules EVAL-REXP-CONS define the evaluation of constant expressions. Here, we note that, because constant expressions store Lolisa values directly, the results can be obtained by applying directly. In the expression level, the r-value position is specified with a struct type. This is also the only means of initializing or changing the value of a struct-type term. The rules EVAL-REXP-STR defines this process. Here, if the evaluation of Estruct fails, the process of evaluating a member’s value yields an error message. Otherwise, the member’s value set is obtained and the respective struct memory value is returned. Finally, the semantics of binary and unary operations are defined according to the rules EVAL-REXP-BOP and EVAL-REXP-UOP.

Due to the static type limitations in the formal abstract syntax definition based on GADTs, the expressions, subexpressions, and operations are all guaranteed to be well-formed, and the type dependence relations need not be checked using, e.g., informal assistant functions, as required by other formal semantics such as Clight. The functions and take the results of expression evaluations and required operations as arguments, and combine them together to generate new memory values.

5.2. Evaluation of Statements

In the following, we assign to denote the evaluation process of statements, and parts of the necessary operational semantics are summarized in Figure 11. Most evaluations employ the helper functions and . The helper function takes the current environment env and the super-environment fenv as arguments, and checks conditions such as gas limitations and the congruence of execution levels. Contract declarations are one of the most important statements of Solidity. In Lolisa, contract declaration involves two operations. First, the consistency of inheritance information is checked using the helper function , which takes the inheritance relations in module context and the source code as arguments. Second, the initial contract information, including all member identifiers, is written into a designated memory block. As defined in Figure 11, the formal semantics of contract declaration are defined as EVAL-STT-CON below.

As rule EVAL-STT-STRUCT, the address is the new struct type identifier, and the struct-type information is written into the respective memory block directly.

In Lolisa, a function call statement is used to apply the function body indexed by the call statement. The process of applying an indexed function is defined by the rules EVAL-STT-FUN-CALL below.

Modifier declarations are a kind of special function declaration that requires three steps, and includes a single limitation. The parameter values are set by the predicate. As defined by the rule EVAL-STT-MODI in Figure 11, the first step (denoted as ) initializes and sets the parameters. The second step (denoted as ) stores the modifier body into the respective memory block. The third step (denoted as ) attempts to initialize the return address Λfun. Due to the multiple return values, takes a return type list as an argument. Particularly, the modifier body can only yield an initial memory state, and therefore cannot change memory states. The difference between modifier semantics and function semantics is that function semantics include checking the modifier limitations restricting the function. Specifically, taking EVAL-STT-FUN as an example, before invoking a function, the modifier restricting the function will be executed. If the result of a modifier evaluation is , it means that the limitations checking of the modifier fails and the function invocation will be thrown out. Otherwise, the function will be executed.

5.3. Development of Standard Library and Evaluation of Programs

As discussed previously, we have developed a small standard library in Lolisa that incorporates the built-in data structures and functions of EVM to facilitate execution and verification of Solidity programs rewritten in Lolisa using higher-order logic theorem-proving assistants. Here, we discuss the standard library in detail. Then, based on the syntax, semantics, and standard library formalization, we define the semantics governing the evaluation (i.e., execution) of programs written in Lolisa.

5.3.1. Development of the Standard Library and Evaluation of Programs

Note that we assume the built-in data structures and functions of EVM are correct. This is reasonable because, first, the present focus is on verification of high-level smart contract applications rather than the correctness of EVM. Second, Lolisa is sufficiently powerful to implement any data structure or function employed by EVM. Thus, we only need to implement the logic of these built-in EVM features using Lolisa based on the Solidity documentation [4] to ensure that these features are well formed. For example, an address is a special compound type in Solidity that has the balance, send, and call members. However, we can treat an address as a special struct type in Lolisa and define it using the Lolisa syntax, as shown in Figure 12. All other built-in data structures and functions of EVM are defined in a similar manner. Typically, requires is a special standard function that does not need a special address and, according to the Solidity documentation, is defined in Lolisa as rule 11:

Next, we pack these data structures and functions together as a standard library in Lolisa, which is executed prior to executing user programs. Thus, all built-in functions and data structures of EVM can be formalized in Lolisa, which allows the low-level behavior of EVM to be effectively simulated rather than building a formal EVM. Currently, this standard library is a small subset that only includes msg, address, block, send, call, and requires.

5.3.2. Program Evaluation

The semantics governing the execution of a Lolisa program (denoted as ) is defined by rules 12 and 13, where ∞ refers to infinite execution and T represents the set of termination conditions for finite execution.

These rules represent two conditions of execution. Under the first condition governed by rule 12, terminates after a finite number of steps owing to a returned stop, exit, or error. Under the second condition governed by rule 13, cannot terminate via its internal logic and would undergo an infinite number of steps. Therefore, is deliberately stopped via the gas limitation checking mechanism. Here, represents a list of optional arguments. In addition, as discussed in Section 5.1, the initial environment and super-environment are equivalent, except for their gas values, which are initialized by the helper function , and the initial gas value of is set by . Finally, the initial memory state is set by , considering and the standard library as arguments.

6. Formal Verification of Smart Contract Using FEther

As introduced in Section 1, we have implemented a formal verified interpreter in Coq for Lolisa, denoted as FEther [11], which incorporates about 7000 lines of Coq code (not including proofs and comments). This interpreter is developed strictly following the formal syntax and semantics of Lolisa based on the GERM framework. To be specific, FEther is implemented by computational functions (considered as the mechanized computational semantics), which are equivalent to the natural semantics of Lolisa given in this paper. The implementation is conducted following the details presented in our previous study [11] using Gallina, which is the functional programming language provided by Coq. Accordingly, FEther can parse the syntax of Lolisa to symbolically execute formal programs written in Lolisa. While efforts are ongoing to prove the consistency between the semantics of FEther and Lolisa, FEther can be employed to prove the properties of real-world programs. This process is effective at exposing errors not only in the test suites that exemplify expected behaviors but also in normal smart contracts. Specifically, a simple case study is presented to demonstrate the symbolic execution and verification process based on Lolisa and FEther. Its source code is presented in Appendix A, and the respective formal version written in Lolisa is presented in Appendix B. Here, it is clear that the program will be thrown out if the message sender in the index mapping list and the current time now are less than privilegeOpen or are greater than privilegeClose. This is easily proven manually with the inductive predicate semantics defined previously. Meanwhile, we can verify this property by symbolically executing the program with the help of FEther in Coq directly, as shown in Figure 13. The formal intermediate memory states obtained during the execution and verification of this Lolisa program using FEther are shown in Figure 14. Then, we can compare the mechanized verification results and the manually obtained results to validate the semantics of Lolisa. In addition, the application of FEther based on Lolisa and the GERM framework also certifies that our proposed EVI theory is feasible.

7. Discussion

7.1. Contributions

First, Lolisa formalizes most of the types, operators, and mechanisms of Solidity, and it includes most of the Solidity syntax. In addition, a standard library was built based on Lolisa to represent the built-in data structures and functions of EVM, such as msg, block, and send. As such, programs written in Solidity can be translated into Lolisa, and vice versa, with a line-by-line correspondence without rebuilding or abstracting, which are operations that can negatively impact consistency.

Second, the formal syntax in Lolisa is defined using generalized algebraic datatypes, which impart static type annotation to all the values and expressions of Lolisa. In this way, Lolisa has a stronger static type system than Solidity for checking the construction of programs. As such, it is impossible to construct ill-typed terms in Lolisa, which also assists in discovering ill-typed terms in Solidity source code. Moreover, the formal syntax ensures that all expressions and values in Lolisa are deterministic.

Finally, the syntax and semantics of Lolisa are mechanized using the Coq proof assistant. Besides, a formal verified interpreter FEther is developed in Coq to validate whether Lolisa satisfies the above Executable and Provable feature and the meta-properties of the semantics. In contrast to similar efforts focused on building formal syntax and semantics for high-level programming languages, the formal semantics of Lolisa are defined based on the FSPVM-E framework. As such, it is possible for programs written in Lolisa to be symbolically executed and have their properties simultaneously verified automatically in Coq proof assistant directly as program execution in the real world when conducted in conjunction with FEther.

7.2. Limitations

Although the novel features in the current version of Lolisa specification language confer a number of advantages, some limitations remain.

First, because the Lolisa is large subset of Solidity, some of Solidity characteristics, such as inline assembly, have been omitted in Lolisa. Hence, some complicated Ethereum smart contracts are not supported by the current version of Lolisa current. These characteristics will be supported in the updated version of Lolisa.

Second, the Lolisa is formalized at the Solidity source-code level. Although it will analyze vulnerabilities before the compiling process, it cannot guarantee the correctness of the corresponding bytecode when the compiler is untrusted. One possible solution is developing a low-level version of Lolisa, which executes the bytecode generated by the compiler, then proving the equivalence between Solidity execution results and the respective execution results of the bytecode.

Finally, although the current version of Lolisa can be verified in FEther symbolically, this process is not yet fully automated. In occasional situations, programmers must analyze the current proof goal and choose suitable verification tactics. Fortunately, this goal can be achieved by optimizing the design of the tactic evaluation strategies.

8. Conclusion and Future Work

In this paper, we defined the formal syntax and semantics for a large subset of Solidity, which we denoted as Lolisa. The formal syntax of Lolisa is strongly typed according to GADTs. The syntax of Lolisa includes nearly all the syntax in Solidity, and the two languages are therefore equivalent with each other. As such, Solidity programs can be translated to Lolisa line-by-line without rebuilding or abstracting, which are operations that are too complex to be conducted by general programmers, and may introduce inconsistencies. Moreover, we have mechanized Lolisa in Coq completely, and have developed a formal interpreter FEther in mathematical tool Coq based on Lolisa, which was employed to validate the semantics of Lolisa. By basing the formal semantics of Lolisa on our FSPVM-E framework [31], programs written in Lolisa can be symbolically and automatically executed in Coq, and thereby verify the corresponding Solidity programs simultaneously. As a result of the present work, we can now directly verify smart contracts written in Solidity using Lolisa.

The source files containing the formalization of Lolisa abstract syntax tree are accessible at https://gitee.com/UESTC_EOS_FV/LolisaAST/tree/master/SPEC

Presently, we are working toward verifying the correctness of FEther, and developing a proof of the equivalence between computable semantics and inductive semantics. Subsequently, we will implement our proposed preliminary scheme based on the notation mechanism of Coq to extend Lolisa along two important avenues.

Our ongoing project is the extension of FSPVM-E to support EOS blockchain platform [32], and we will then verify our new framework in Coq. Besides, we will develop a general formal verification toolchain using HOL proof technology for blockchain smart contracts with the goal of automatic smart contract verification.

Appendix

A. Source Code of the Case Study

As shown in Algorithm 1, we give the partial source code of the case study contract.

Solidity 0.4.8;
function example () public payable {
 uint index = indexes[msg.sender];
 uint open;
 uint close; …
 if (privileges[msg.sender]) {
  open = privilegeOpen;
  close = privilegeClose;
 …} else {
  open = ordinaryOpen;
  close = ordinaryClose;…}
 if (now < open || now > close) {
  throw(); }
 if (subscription + rate > TOKEN_TARGET_AMOUNT) {
  throw (); }
 …
 if (msg.value <= finalLimit) {
  safe.transfer(msg.value);
  deposits[index] + = msg.value;
  subscription + = msg.value / 1000000000000000000 ∗ rate;
  Transfer(msg.sender, msg.value); } else {
  safe.transfer(finalLimit);
  deposits[index] + = finalLimit;
  subscription + = finalLimit / 1000000000000000000 ∗ rate;
  Transfer(msg.sender, finalLimit);
  msg.sender.transfer(msg.value - finalLimit);
 }
}

B. Formal Version of the Case Study

As shown in Algorithm 2, we give the formal version of Algorithm 1 written in Lolisa.

Coq 8.8;
Definition Example : = 
 (Fun public payable (Efun (Some example) Tundef) pnil nil);;
  (Var (Some public) Evar (Some index) Tuint));;
  (Assignv (Evar (Some index) Tuint)
  (Econst (@Vmap Iaddress Tuint indexes (Mstr_id Iaddress msg (sender ∼>>\\\)) None));;
   (Var (Some public) (Evar (Some open) Tuint));;
   (Var (Some public) (Evar (Some close) Tuint));;
   (Var (Some public) (Evar (Some quota) Tuint));;
   …
   (If (Econst (@Vmap Iaddress Tbool priviledges
    (Mstr_id Iaddress msg (sender ∼>>\\\)) None))
   ((Assignv (Evar (Some open) Tuint) (Evar (Some privilegeOpen) Tuint));;
    (Assignv (Evar (Some close) Tuint) (Evar (Some privilegeClose) Tuint));;
   …;; nil)
   ((Assignv (Evar (Some open) Tuint) (Evar (Some ordinaryOpen) Tuint));;
    (Assignv (Evar (Some close) Tuint) (Evar (Some ordinaryClose) Tuint));;
   …;; nil)
   (If ((Evar (Some now) Tuint) (<) (Evar (Some open) Tuint) (||)
   (Evar (Some now) Tuint) (>) (Evar (Some close) Tuint))
   (Throw;; nil) (Snil;; nil));;
   (If ((Evar (Some subscription) Tuint) (+) (Evar (Some rate) Tuint) (>)
    TOKEN_TARGET_AMOUNT)
   (Throw;; nil) (Snil;; nil));;
   …
   (If ((Econst (Vfield Tuint (Fstruct _0xmsg msg) (values ∼> \\) None))
   (<=) (Evar (Some finalLimit) Tuint))
 ((Fun_call (Econst (Vfield (Tfid (Some safe)) (Fstruct _0xaddress safe) (send ∼> \\) None))
  (pccons (Econst (Vfield Tuint (Fstruct _0xmsg msg) (values ∼> \\) None)) pcnil));;
   (Assignv (Econst (@Vmap Iuint Tuint deposits (Mvar_id Iuint index) None))
    ((Econst (Vfield Tuint (Fstruct _0xmsg msg) (values ∼> \\) None)) (+)
    ((Econst (@Vmap Iuint Tuint deposits (Mar_id Iuint index) None))));;
  (Assignv (Evar (Some subscription) Tuint) ((Econst Vfield Tuint (Fstruct _0xmsg msg)
   (values ∼> \\) None)) (+) (Evar (Some finalLimit) Tuint) (/)
   (Econst (Vint (INT I64 Unsigned 1000 000 000 000 000 000))) (x)
   (Evar (Some rate) Tuint))));; nil) …;; nil);; nil.

Data Availability

The source files containing the formalization of Lolisa abstract syntax tree data used to support the findings of this study have been deposited in the Gitee repository (https://gitee.com/UESTC_EOS_FV/LolisaAST/tree/master/SPEC).

Conflicts of Interest

The authors declare no conflict of interest.