Abstract

Self-sovereign identity (SSI) is a new distributed method for identity management, commonly used to address the problem that users are lack of control over their identities. However, the excessive pursuit of self-sovereignty in the most existing SSI schemes hinders sanctions against attackers. To deal with the malicious behavior, a few SSI schemes introduce accountability mechanisms, but they sacrifice users’ privacy. In addition, the digital identities (static strings or updatable chains) in the existing SSI schemes are as inputs to a third-party executable program (mobile app, smart contract, etc.) to achieve identity reading, storing and proving, and users’ self-sovereignty are weakened. To solve the above problems, we present a new self-sovereign identity scheme to strike a balance between privacy and accountability and get rid of the dependence on the third-party program. In our scheme, one and only individual-specific executable code is generated as a digital avatar-i for each human to interact with others in cyberspace without a third-party program, in which the embedding of biometrics enhances uniqueness and user control over their identity. In addition, a joint accountability mechanism, which is based on the shamir (t, n) threshold algorithm and a consortium blockchain, is designed to restrict the power of each regulatory authority and protect users’ privacy. Finally, we analyze the security, SSI properties and conduct detailed experiments in terms of the cost of computation, storage, and blockchain gas. The analysis results indicate that our scheme resists the known attacks and fulfills all the six SSI properties. Compared with the state-of-the-art schemes, the extensive experiment results show that the cost is larger in server storage, blockchain storage, and blockchain gas, but is still low enough for practical situations.

1. Introduction

Identity management (IdM) has experienced increased interest due to the ever-growing demand for digital identities, as people become overly dependent on online services [1]. However, each traditional IdM system usually adopts centralized authorization, authentication and maintains identity data independently [2]. As a result, enormous online IdM services force people to manage a large number of digital identities, which leads to the problem of identity fragmentation [3] and is vulnerable to identity attacks, such as identity impersonation, privacy leakage, and identity fraud [4]. Even worse, users are lack of control and ownership over their digital identities in the traditional IdM [5, 6]. Therefore, a distributed method for identity management called self-sovereign identity (SSI) is proposed [7], in which the users are central to the administration of identities. Fortunately, the rise of distributed ledger technology (DLT), such as blockchain, has also made it possible to construct self-sovereign identities [810]. In comparison to the centralized management used by the traditional IdM, SSI schemes shift decision authority to users through secured DLT [11] and allow them to possess full control over their identities and data [1216].

According to the goals to achieve, existing SSI schemes can be divided into the following three categories: junior SSI schemes [1721], SSI schemes with sybil-resistance [2226], and SSI schemes with accountability [27, 28]. To give users’ control over their identities and data, junior SSI schemes adopt DID standards [17], smart contracts [18, 19], or credential chain [21]. Static strings, such as DIDs, addresses of smart contracts, or updatable chains are employed to identify the users. However, the fact that users can hold as many identities as they want facilitates the implementation of sybil attacks. Therefore, many scholars introduced additional certificate authority [22] and biometrics [2326] to ensure that each user has one and only DID-based digital identity in their SSI schemes with sybil-resistance. But unfortunately, the above schemes cannot reveal the identities of malicious users. To deal with the problem, SSI schemes with accountability [27, 28] are proposed. Since users are represented by credential chains in [27], a regulatory authority checks the malicious credentials with the personal information in a central registry to identify the malicious users. However, the audit of malicious users initiated by a single regulatory authority may lead to serious problems of inadequate regulation or injustice. Different from the scheme [27], the problems caused by a single regulation are overcome in the paper [28]. Specifically, the sanctions lists and a fuzzy matching method based on secure multiparty computation are applied to identify the credentials of suspicious users. However, both the central registry [27] and the sanctions lists [28] inevitably leak user privacy.

On the other hand, metaverse, as the evolving paradigm of the next generation of the Internet [29], will contain enormous amounts of applications and bring new challenges to the SSI. Metaverse is considered as a massive virtual environment parallel to the physical world, in which users interact through digital avatars [30]. That is, digital avatars are executable programs that own and control their identities for the user’s physical self [29, 30]. However, the digital identities (static strings or updatable chains) in the existing SSI schemes are all used as inputs to a third-party executable program (mobile app, smart contract, etc.) to achieve identity reading, storing, and proving. Thus, the existing SSI schemes cannot play well in metaverse and also weaken users’ self-sovereignty. In detail, the dependence on a third-party executable program during the usage of SSI inevitably leads to the problems of a single point of failure and privacy leakage.

Inspired by the digital avatars in metaverse, and taking the above problems in the existing SSI schemes into account, we propose a new self-sovereign identity scheme with accountability. And the contributions of the proposed scheme are summarized as follows:(i)We propose a new self-sovereign identity scheme with accountability (NSSIA), in which executable code is introduced to allow users to control their identities completely and the balance between privacy and accountability is achieved.(ii)To get rid of the dependence on third-party programs, one and only individual-specific executable code is distributed to each user, where the user’s biometrics are embedded to enhance uniqueness and user control. The hash of the executable code is used as an identifier and each user can use his/her own local executable code to store, read, and prove identities with network servers. For simplicity, the term “digital avatar” in metaverse is borrowed and reformed to “digital avatar-i” to denote the executable code focusing on digital identity.(iii)In order to regulate malicious users fairly without violating privacy, a joint accountability mechanism is introduced to decentralize the power of regulatory authorities and hide users’ information in reality through shamir(t, n) threshold signature algorithm, while the impartial audit is further guaranteed by a consortium blockchain.(iv)We analyze the proposed scheme in detail in terms of security, SSI properties in the generation phase and conduct extensive experiments in the cost of computation, storage, and blockchain gas.

The rest of this article is organized as follows. Section 2 introduces the related work of SSI schemes. In Section 3, the system model, security model, and design goals are introduced and Section 4 describes our scheme in detail. We analyze our proposed scheme in terms of security and performance in Sections 5 and 6, respectively. Finally, the conclusion and future work are given in Section 7. This paper has been published as an arxiv preprint [31].

2.1. Junior SSI Schemes

Junior SSI schemes [1721] are first proposed to allow users to control their own identities. To enable users to have full control over their identities, Takemiya and Vanieiev [17] designed a security protocol for storing encrypted personal information based on Hyperledger Iroha. The decentralized identifier (DID) [32] was used as the unique identifier of each user, while entries that characterized a user’s identity were represented in the form of verifiable claims [33]. For self-sovereignty, all the claims were stored locally on user’s phone in encrypted form. Different from Takemiya and Vanieiev [17], smart contracts were used to represent the user’s identity in the paper of [18, 19]. Concretely, they both designed a kind of smart contracts with addresses as identifiers, specifically for managing identities. Once published, these contracts were owned by the corresponding users. And, the user’s identity information was stored in IPFS [18] and stored in the user’s device in the form of a Merkle tree [19] for self-sovereignty. However, both DIDs and smart contract addresses are machine-readable static strings, which are difficult for users to understand, leading to the dilemma of managing digital identities.

Then, a decentralized service architecture for self-sovereign social communication, proposed by Westerkamp et al. [20], solves the above problem. In this scheme, the user’s identifier was represented as a human-readable name which was generated by the smart contract-based Ethereum Name Service (ENS). Besides, the user’s data was stored in his/her own API server, and the Uniform Resource Identifier (URI) of the server was stored and linked to the human-readable name on the blockchain. However, such human-readable identifier is still inherently static, which is easily impersonated by malicious users during use.

Fortunately, this problem can be alleviated by a general provable claim model proposed in the paper [21]. With reference to the structure of the blockchain, the self-sovereign identity was designed as a growing chain of user’s claims. And, the user’s identity could be used only after the authentication of the verifier on the existing claims, thus alleviating the risk of identity being impersonated. But, due to the lack of necessary authentication before identity registration, users can create as many identities as they want, which facilitates the implementation of sybil attacks.

2.2. SSI Schemes with Sybil-Resistance

In order to let each user has one and only digital identity, SSI schemes with sybil-resistance [2226] are proposed. A commitment scheme combined with zk-SNARK was introduced in [22] to provide integrity and privacy of user information simultaneously. In this scheme, to ensure integrity and avoid reuse, only after the user’s information and the corresponding commitment were confirmed by the CA, a certificate would be issued to the user. And during usage, user’s data was encrypted by zk-SNARK to prevent privacy leakage. However, the verification of the commitment by the CA can only guarantee the integrity of the information from the user, not the authenticity, which means the CA can be deceived by false information.

For the authenticity and reliability of user identity, biometric identification is introduced into SSI schemes [2326]. In 2018, Othman and Callahan [25] designed a novel method for decentralized biometric-based self-sovereign identity. In this scheme, the user’s identity was created based on the DID specification. Also, in order to associate each user with their own identity, biometrics (fingerprint, face, voice, etc.) were encrypted and stored in the corresponding DID document. But unfortunately, biometric information is only collected through the user’s mobile app, which presents an opportunity for adversaries to commit identity fraud.

Then, in 2019, Hamer et al. [24] proposed a unique self-sovereign identity management scheme to deal with the above problem. A user’s biometrics was authenticated by the trusted organization to make sure that the user did have this biometrics. Besides, the collected biometrics were encrypted with a homomorphic signature algorithm to ensure that a user could not enroll twice in the system. But all the user’s behaviors can be linked to the same digital identity, which leaks the user’s privacy.

A blockchain-based privacy protection unified identity authentication scheme is proposed in [23]. In this scheme, the server would authenticate user information by online face verification using photos from a central database. In addition, a set of key derivation algorithms were designed to ensure the unlinkability between identity attribute information. However, the way a central database stores user information weakens users’ control over their identities.

In 2021, Bandara et al. [26] proposed a blockchain and self-sovereign identity empowered digital identity platform. For full control over the data, the information required for registration (name, address, photo, etc.) submitted by the user was stored locally on the mobile device. And only with the user’s consent, these information would be sent to the service provider (SP). Additionally, verification of information is achieved by comparison with physical documents, eliminating the need for SPs to store information.

In a word, strict identity authentication, especially the introduction of biometrics, ensures that each user has one and only digital identity, effectively resisting sybil attacks. However, none of these schemes design accountability mechanisms to regulate malicious users who disrupt the order of the network.

2.3. SSI Schemes with Accountability

For the purpose of maintaining order in cyberspace, SSI schemes with accountability [27, 28] are proposed. In 2021, Stokkink et al. [27] designed a truly self-sovereign identity system based on Pedersen commitments, where the digital identities were implemented as data structures that held a list of credentials. And, for self-sovereignty, these data structures were stored on the users’ devices. In terms of accountability, credential verifiers were required to keep audit logs, which were actually composed of credentials presented by users. Then, malicious users could be identified by a single regulatory authority through analyzing audit logs and comparing them with the personal information in a central registry. But several problems such as inadequate regulation and injustice may arise due to the reliance on a single regulatory authority.

Also in 2021, Maram et al. [28] presented a decentralized System model. Identity management with legacy compatibility, sybil-resistance, and accountability. Instead of an additional credential issuer, all credentials characterizing the user’s identity in this scheme were imported from existing web service providers. And a deduplication protocol based on secure multiparty computation (MPC) was designed to prevent the reuse of these credentials. Besides, in order to address the drawbacks of the single regulatory authority, a MPC-based fuzzy matching method was proposed, which can find the digital identities of the corresponding malicious users according to the sanctions lists. However, legacy compatibility does not change the status quo of data stored by existing web service providers, which remains out of the user’s control. In addition, both the central registry and the sanctions lists introduced in the above schemes to regulate malicious behavior inevitably sacrifice users privacy.

3. Models and Design Goals

3.1. System Model

Our system model consists of seven entities, as Figure 1 shows, a natural person (NP), a digital avatar-i (DA), two blockchains: an identity information chain (IIC), a digital avatar-i behavior chain (DABC), and three groups: an information collection and verification group (ICVG), a digital avatar-i generation group (DAGG), and a regulatory authority group (RAG).(1)NP, Natural Person, refers to a person living in the physical world. He/She can digitize himself/herself through the ICVG and apply to the DAGG for a DA.(2)DA, Digital Avatar-i, is an individual-specific executable program focusing on the identity dimension of the digital avatar, which stands for a living person to interact with others in cyberspace, and has one-to-one relationship with NP.(3)ICVG, Information Collection and Verification Group, validates that the requestor is one and only breathing person in the physical world and provides digitalizing service for him/her. It contains two types of entities, namely, metadata verifier (MV) and biometric collector (BC). MV proves the requestor’s existence in physical space through metadata, such as name, identity number, and address. BC collects two types of distinct biometric data, where one is as a permanent proof and the other is for activating the DA.(4)IIC, Identity Information Chain, is a consortium blockchain. It is mainly responsible for recording the proof of physical identity information (metadata and biometric data), the hash of DA, and making sure each NP has only one proof.(5)DAGG, Digital Avatar-i Generation Group, generates a unique DA for each NP. It contains two types of entities, namely, digital avatar-i generator (DAG) and secure storages (SSs). At first, DAG verifies the identity of the applicant with the data in the IIC, and then generates the sole DA for him/her. SSs, which contain SS SS , use shamir(t, n) threshold algorithm to safely store the metadata of NP and the hash of DA.(6)DABC, Digital Avatar-i Behavior Chains, is an infrastructure that is composed of multiple blockchains, supporting all kinds of decentralized applications (Dapps). These Dapps provide services for DAs in cyberspace and the DABC keeps their historical records for accountability.(7)RAG, Regulatory Authority Group, is responsible for regulating NP by monitoring the DA’s activities in the DABC. And it is composed of n regulatory authorities (RA1...RA ), where at least three of them can hold suspicious users accountable.

In order to securely and privately take part in various activities such as work, study, and entertainment in cyberspace, especially the metaverse, a NP needs to map himself/herself to one DA. In detail, there are four steps to achieve the mapping. First, the NP needs to digitize himself/herself through the ICVG, where MV verifies the descriptive metadata of the NP and BC collects the NP’s biometric data. Second, the ICVG uploads the proof information to the IIC and replies the NP with a certificate. Third, the NP applies to the DAGG for a DA with the certificate. At last, the DAGG verifies the authenticity of the NP by checking whether the live biometric data matches the proof in the IIC. If the verification is passed, the DAGG generates the DA and uploads the hash of the DA to the IIC. Afterward, the NP uses the corresponding DA to live in the cyberspace without a third-party program.

It is worth noting that, once there is a malicious DA, the RAG can map him/her to the corresponding NP through inquiring the IIC and finding the metadata in DAGG. For constructing a safe and orderly cyberspace, a DA is supposed to interact with the Dapps based on DABC. In this way, the RAG can regulate a malicious NP through monitoring the DA’s historical and future behaviors.

3.2. Security Model

In NSSIA, we have the following security assumptions.(1)An adversary can monitor, intercept, modify, and insert the messages into the public channel [34]. He/she can breach no more than half of the entities in each group of ICVG, DAGG, and RAG within a certain period of time.(2)A NP and a DA are considered as malicious entities. A NP would submit false information or fraudulently use anyone’s information, and a DA can be modified or illegally used by an adversary in the cyberspace.(3)Entities in the ICVG, DAGG, and RAG are regarded as semihonest. They will perform the protocol strictly and comply with the consensus algorithm of the IIC but are curious about the information.(4)The DABC is a semi-honest entity. It is a public chain that is secured by consensus algorithms, and the miners perform the protocol strictly but are curious about the information. The IIC is a trusted consortium chain that is jointly managed by the ICVG, DAGG, and RAG.(5)We assume that the standard cryptographic algorithm used in our scheme is secure and unbreakable.

3.3. Design Goals

According to the aforementioned system model and security model, the design goals of our scheme are as follows:(i)User friendly: a user (NP) accesses a service in cyberspace with a digital avatar-i (DA) in a convenient way and the user owns and controls it.(ii)One-to-one: in order to build an orderly cyberspace, a user (NP) has one and only digital avatar-i (DA). And all the behaviors of the DA belong to the only one NP.(iii)Linkability with condition: for the security of cyberspace and the privacy of a user (NP), the identity mapping (the NP and the DA) is encrypted and stored in a distributed way (different pieces of it in each SS). Only three or more of the RAs can jointly decrypt it and recover the detail of identity.

4. Proposed NSSIA

The NSSIA generates a unique digital avatar-i for a user to ensure his/her conditional identity privacy when interacting with each other in cyberspace, especially the metaverse. Concretely, the NSSIA is mainly divided into five phases. The first phase initializes the entities in the IIC, ICVG, DAGG, and RAG to generate their keys. The second phase lets a NP digitize himself/herself through the ICVG, where the ICVG verifies the authenticity of NP’s metadata, collects NP’s biometric data, and writes the proof information to the IIC, as shown in steps ①-② of Figure 1. And in the third phase, the NP applies to the DAGG for a DA in which the DAGG checks the metadata of the NP, generates a DA and records the DA generation transaction to the IIC, as shown in steps ③-④ of Figure 1. The NP can use his/her own DA to interact with the Dapps built on DABC in the fourth phase, as shown in step ⑤ of Figure 1. Lastly, the RAG regulates a malicious NP with the mapping DA’s behaviors in the DABC and the data in the IIC and the DAGG, as shown in steps ⑥-⑧ of Figure 1. To elaborate the NSSIA clearly, we give the notations used in our scheme in Table 1.

4.1. Initialization

The IIC performs initialization to generate the public parameters, the master key (MK), and the corresponding subkeys (SubKs). In addition, the entities in RAG, DAGG, and ICVG generate their public and private keys.

4.1.1. IIC Initialization

The IIC performs initialization to generate the public parameters, the and the , where the is the subkey of the entity . In detail, it first selects a large prime , an elliptic curve and a base point with order under the finite field . Then, it publishes the public parameters to the genesis block. Afterward, it randomly selects a 128-bit AES key as the . Lastly, the IIC uses the shamir (t,n) threshold secret sharing algorithm [35] to generate the for each entities of DAGG and RAG. And we assume that the number of SSs and RAs is and , and the corresponding thresholds are and . Here are the details below.

The IIC first chooses two polynomials of degree and shown as (1) and (2), where , are random numbers, and , are bigger than each coefficient.

Next, the IIC chooses random numbers and substitutes the into (1) and (2) to calculate the and .

4.1.2. SS and RA Initialization

The SS and RA use the published by IIC to generate their own public and private keys, referred to as and , and publish the public keys. Then, the encrypted subkeys with and are obtained from the IIC by the SS and RA. Next, they decrypt the encrypted subkeys to recover the and .

4.1.3. MV, BC, and DAG Initialization

MV, BC, and DAG use the published by IIC to generate their own public and private keys, referred to as , , and .

4.2. Digitization

To prepare for the generation of a DA, the NP sends the metadata to the ICVG for digitizing himself/herself. Here, the MV verifies the metadata and records the proof of metadata to the IIC. While the BC collects the NP’s biometric data, writes the proof of the biometric data to the IIC, and sends the NP a digitization credential. The whole process is shown in Figure 2.STEP D1A NP presents the certificates, such as ID card and passport, and provides the metadata (, including name, id number, address and gender) to the MV face to face, as shown in step ①.STEP D2The MV verifies the authenticity of the with the certificates. If it is confirmed, the MV calculates the proof and sends a metadata verification transaction (TM, as shown in Equation (3)) to the IIC, without any information (neither nor ) stored locally, as shown in step ②.In the (3), according to the paper [36], Tid represents the transaction number of the TM. The input array Tin[] consists of three parts, the input address, the previous transaction, and the input script. The , the input address, is the initiator’s public key, since TM is the original transaction, both the last transaction of the TM and the input script are denoted to the . The output array Tout[] is composed of three parts, where the is the accepter’s address, is the data to be recorded in the IIC, and is an out-script used to sign the TM.STEP D3The MV sends the transaction number (TNum) of TM to the NP, as shown in step ③.STEP D4The NP sends the , TNum to the BC face to face for authentication, as shown in step ④.STEP D5The BC calculates , uses TNum to find the of TM recorded in the IIC, and checks whether is satisfied. If not, the BC aborts and it is shown in step ⑤.STEP D6The BC collects two kinds of biological characteristics, where the one is as a permanent proof of NP’s existence in cyberspace and the other one is used to activate the DA. Specifically, the permanent one should be unbreakable and needs not to be collected frequently, therefore, we choose the iris data. While for frequently using the DA to access network services, an easy-to-collect face biometric is introduced. And then, the BC calculates the and sends a iris verification transaction (TI, as shown in (4)) to the IIC, with no information (such as biometrics and ) stored locally, as shown in step ⑥.In the (4), the Tid is the transaction number of the TI, the is the creater’s address of the TI, the TM is the previous transaction, the is the accepter’s address of the TI, and the is the data to be recorded in the IIC.STEP D7As Equation (5) shows, the BC encrypts the face data with the and then signs it with the to generate the .

Finally, the BC sends to the NP, as shown in step ⑦.

4.3. Generation

After the digitization, the DAGG can generate a DA for the NP and the process consists of seven steps. At first, the NP applies to the DAGG for a DA. Second, the DAG verifies the authenticity of NP’s identity with the proof information in the IIC. Third, the DAG generates a DA and requests the SSs’ subkeys. Then, the SSs send the encrypted subkeys to the DAG. Next, the DAG restores the to generate the , and splits it into multiple backup information. Afterward, the DAG records the proof of DA in the IIC, and lastly sends the DA to the NP, as shown in Figure 3.STEP G1The NP sends the physical identity proof to the DAG, as shown in step ①.STEP G2The DAG calculates and M2 by Equation (6).Afterward, the DAG obtains the and the with TNum from the IIC and checks whether is met. At last, the DAG calculates , and verifies the living face biometric of NP with the , as shown in step ②.STEP G3If the NP’s identity is confirmed, according to Algorithm 1, the DAG selects corresponding code modules (dynamic verification, file transfer, etc.) from the code library to get the DA with the digital avatar-i seed which is produced from by the algorithm in the paper of [37]. The DA is divided into modules and the selected code modules are combined together in order. Then, the DAG calculates the identifier and requests all SSs to send their respective , as shown in step ③.STEP G4Each encrypts subkey to get by the Equation (7) and sends it to the DAG, as shown in step ④.STEP G5The DAG calculates at least and constructs the Lagrangian interpolation formula (as shown in Equation (8)) with these s to restore the .Afterward, the DAG generates the and expands SecInfo to bytes by filling high bits with zero. The DAG constructs polynomials, as shown in (9).In the (9), the SecInfo is divided into coefficients in order and each coefficient is bytes. The is a prime number bigger than any coefficient , and the length of is bytes.The DAG substitutes s into each polynomial in the (9) to calculate points (since performs multiple exponentiation operations, the length of is set to one byte to reduce computational overhead, while the length of is set to five bytes to avoid collisions between these points. If the lengths of and are too short, the high bits are filled with zero). Further, the DAG divides all the points into groups and points in each group come from the different s. It is worth mentioning that the in these points are the same, and the points are combined to form a set of identity restoration information (IRI), which is as shown in Figure 4. At last, the DAG transmits sets of IRI and storage index (SI) to all the SSs, as shown in step ⑤.STEP G6. If the receives the IRI and SI, he/she sends a response to the DAG. When DAG confirms that more than half of SSs have received IRI and SI, he/she writes a DA generation transaction (TDA, as shown in (10)) in the IIC, as shown in step ⑥.In the Equation (10), the Tid is the transaction number of the TDA, the TI is the previous transaction, the is the creater and the accepters’ address of the TDA, and the is the data to be recorded in the IIC.STEP G7The DAG calculates the DA’s proof and sends it with the DA to the NP, as shown in step ⑦.

Input: The digital avatar-i seed The code module template ,
Output: The digital avatar-i
(1);
(2), , ;
(3)For do
(4);
(5);
//Decimal is a function that converts a string to
//a decimal
//num[i] is the number of code module templates
//available in
(6);
//Combine is a function that splices code
//modules in order.
(7)End for
(8)return ;
4.4. Interaction

After receiving the DA, the NP can access various services provided by Dapps built on DABC through it. At first, the NP activates the DA through live face recognition. And then, the DAI or a random string can be selected by the activated DA as the identifier for the NP to participate in activities in cyberspace. It is worth mentioning that all behaviors of the NP accessing network services will be recorded in the DABC for future audit.

In a word, the main work of this phase is to use DA for authentication and authorization, which requires unlinkable identity, informed consent and the right to be forgotten, etc. However, limited by space, details such as the protocol process, algorithms, and data format will be given in our future work.

4.5. Accountability

When a malicious behavior of a DA occurs, the RAG can discover the mapping NP by inquiring the IIC and finding the metadata of the NP in the DAGG with the joint participation of multiple RAs. Then, the RAG can regulate all the historical behaviors of the malicious NP, as shown in Figure 5.STEP S1All the RAs are monitoring the DAs’ behavior in DABCs, as shown in step ①.STEP S2When the finds a suspicious behavior of a DA, he/she can start the accountability mechanism. First, inquires SI from the IIC with DAI and writes an audit transaction (TA, as shown in (11)) in the IIC, as shown in step ②.In the Equation (11), the Tid is the transaction number of the TA, the TA in the Tin[] is the previous audit transaction, the is the creater and the accepters’ address of this TA, and the is the data to be recorded in the IIC.STEP S3Then, the finds at least IRIs stored in SSs with the SI. And all the obtained IRIs are processed as follows: ① The decomposes each into points, where the structure of the is shown in Figure 4 and the decomposed point set is shown in Equation (12):② Then, substituting the points , , into the Lagrangian interpolation formula (13) to obtain the polynomial ;③ Similar to step ②, the obtains the polynomials , , .After the polynomials are obtained, the is restored by splicing the coefficients of these polynomials in order, as shown in step ③.STEP S4. The initializes an audit request to all other s. Each encrypts his/her by the (14) and sends the to the .

When more than s respond, the decrypts the one by one using the (15).

Then, the constructs the Lagrangian interpolation formula with the s, as shown in the (16), to calculate the .

After that, the decrypts the to get the , and gets . So far, the can discover the malicious NP and regulate him/her through the historical behaviors in the DABC, as shown in step ④.

5. Security Analysis

In this section, we discuss the security of the proposed scheme.

5.1. Conditional Anonymity

In this scheme, the metadata of a NP is hidden in the by the and , and the other entities except RA cannot recover it without the and . The is recorded on the IIC, while the shamir (t, n) threshold algorithm protects the . Therefore, the scheme realizes the anonymity of entities other than the RA. On the other hand, we allow at least RAs to restore the for revealing the by the Lagrangian interpolation formula. In short, the conditional anonymity is achieved in our scheme.

5.2. Anti-Sybil Attack

In this scheme, each NP needs to digitize himself/herself through the ICVG before applying for a DA, where the authenticity of the NP’s is verified by the MV with NP’s certificates and the NP’s biometric data is collected by the BC as a proof of unique identity. In addition, the hash of the and biometric data (iris) are permanently recorded on the IIC. In this way, it can be ensured that each NP has one and only DA, and the sybil attack is avoided.

5.3. Tamper-Proof

During the digitization phase, the NP is required to provide face-to-face, therefore, the tampered submitted by the adversary cannot be verified by the MV with the NP’s certificates. The consortium blockchain records the proof of and biometric data, no one can easily erase the NP’s information. In addition, the DAG calculates the signature to prevent the adversary to tamper with the DA.

5.4. Nonrepudiation

For each DA, the RAG can find the corresponding through the data in the IIC and SSs. Then, the RAG can calculate the with the participation of multiple RAs, and get . Using the known in the IIC, the RAG can obtain the , and track the mapping NP with it. That is, a NP cannot deny his/her malicious historical behavior.

5.5. Impersonation-Resistance

When accessing a DA, the NP’s biometric data needs to be verified by the DA in advance. In addition, the DA includes a dynamic verification module that will issue dynamic verification requests to the NP from time to time. Once the NP fails to pass the verification, the DA will be locked. Therefore, even if an adversary obtains a DA that does not belong to him/her, he/she cannot use it to participate in network activities.

5.6. Data Security

The metadata of a NP and the hash of a DA are first encrypted as the , then divided into parts, and finally converted into points by shamir (t, n) threshold algorithm. In this way, the cost of obtaining a NP’s information by an adversary is greatly increased. Further, the information transmitted between different entities, such as subkeys, are protected by asymmetric encryption algorithm. Therefore, the scheme guarantees the security of the data.

5.7. Provable Security

The proposed scheme is based on the advanced encryption standard (denoted as AES), elliptic curve cryptography (denoted as ECC), and shamir (t, n) threshold algorithm (denoted as SHAMIR). According to the security characteristics of each module, we show that our scheme can resist sybil attacks of adversaries and prevent malicious users from evading sanctions.

Theorem 1. If the ECC algorithm satisfies the basic security properties, then the scheme in this paper can meet sybil-resistant characteristics.

Proof 1. Define as an adversary attacking the security of ECC algorithm. Assuming successfully carried out a sybil attack, a polynomial time algorithm is defined, where the has the ability to attack the ECC algorithm. Through the interaction between and in the simulated sybil attack game, is optimized repeatedly to successfully attack the ECC algorithm. That is, if the adversary creates a sybil identity successfully in this scheme, it means successfully attack the security of ECC algorithm with a certain probability. According to the steps defined above, the interactions between the algorithm and adversary are as follows:STEP 1Initialization phase: Through the public parameters generated by in IIC Initialization phase, the algorithm generates the , , and , and sends them to the adversary ;STEP 2Challenge phase:The adversary first digitizes the identity information (including metadata and biometrics) as described in Section 4.2. Then, the algorithm executes algorithm to calculate the encrypted biometrics containing a randomly chosen {}, and generate the corresponding signature . Finally, sends the signature to the adversary .STEP 3Verification phase: The adversary verifies the signature with the and outputs the . If the equation is satisfied, it indicates that the adversary successfully implemented the sybil attack. The probability of success for the adversary is:If an attacker can successfully attack ECC algorithm, can carry out the sybil attack successfully. However, the probability of successfully attacking the ECC algorithm is almost , then wins in the sybil attack game of NSSIA scheme with a probability of . But, according to the assumption about the security of the ECC algorithm, the probability of successfully attacking can be ignored. Therefore, the scheme can resist sybil attack.

Theorem 2. If the AES and SHAMIR algorithms satisfy the basic security features, then the NSSIA can prevent malicious users from evading sanctions.

Proof 2. Define as an adversary who attacks the security of AES algorithm, as an adversary attacking the security of SHAMIR algorithm. Assuming the successfully hampered joint regulation of the malicious NP, a polynomial time algorithm is defined, where the has the ability to attack the algorithms of AES and SHAMIR. Through the query of and ’s interaction in the sanctions evasion game, is optimized repeatedly to successfully attack the AES and SHAMIR algorithms. That is, if the adversary gets rid of sanctions successfully in the scheme, it means successfully attacks the security of algorithms of AES and SHAMIR with a certain probability. According to the steps defined above, the interactions between the algorithm and adversary are as follows:STEP 1Initialization phase: Through the public parameters generated by and in IIC Initialization phase, the algorithm generates the master key and subkeys (, ). Then, sends the public parameters to the adversary ;STEP 2Inquiry phase: The adversary can query the algorithm for polynomial time:(1)Generate the encrypted identity information : generates the encrypted identity information which contains a randomly selected {} by the AES algorithm.(2)Generate multiple identity restoration informations s: Based on the , generates multiple identity restoration informations s by the SHAMIR algorithm, and sends these s to the adversary .STEP 3Verification phase: The adversary decrypts the restored by s and outputs the using the AES and SHAMIR algorithms. If exists, it represents that the adversary successfully evades sanctions. The probability of success for the adversary is:

If an attacker successfully attacks the AES algorithm, and an attacker can successfully attack the SHAMIR algorithm, the adversary can successfully hamper joint regulation of the malicious NP. However, the probability of and successfully attacking the AES and SHAMIR algorithms is almost respectively, then wins in the sanctions evasion game of NSSIA scheme with a probability of . But, according to the assumptions that AES algorithm and SHAMIR algorithm satisfy the basic security properties, it is concluded that the probability of successfully attacking can be ignored. As a result, the scheme can prevent malicious users from evading sanctions through joint accountability.

6. Performance Analysis

This section analyzes the cost of our proposed NSSIA and compares it with the above schemes [17, 22, 23, 26, 28] in terms of the SSI property in identity generation, computation cost, storage cost, and blockchain Gas cost.

6.1. Property Analysis in Identity Generation

Ten principles are proposed by Christopher Allen [7] to define a SSI model, which are Existence, Control, Access, Transparency, Persistence, Portability, Interoperability, Consent, Minimalization, and Protection. It can be said that Allen’s insights on SSI lays the foundation for the research of later generations.

As the shortcomings of the centralized identity model have been revealed in recent years, more and more scholars have invested in the research of SSI, and their work can be seen from literature [1, 38]. It is worth mentioning that before this article is written, Ferdous et al. [38] had introduced in detail the insights of various scholars on SSI. At the same time, they put forward their own views on the properties of SSI. They divided self-sovereign identity into five categories, with a total of seventeen properties. Mühle et al. [1] analyzed the work of Christopher Allen and then studied four basic components for having a deeper understanding of the concept of SSI.

As the identity generation is the crucial step for users to enter cyberspace, in order to effectively guarantee the users’ self-sovereignty over identities and maintain the order of the cyberspace, we believe that the first step in the generation phase is to ensure that each DA has a corresponding NP to avoid false identities. In addition, it is also crucial to ensure that NPs fully control their own DAs and protect the privacy of users. Finally, DA should be user friendly, e.g., while complying with regulations, DA should be used for as long as possible and NP can migrate DA-related data between different devices. Therefore, we select the applicable six properties among the seventeen properties proposed by Ferdous et al. [38], as shown in Figure 6, and conduct the analysis. These properties are depicted next.(1)Existence. Digital identities should be strictly verified before registration to ensure that each digital identity has a corresponding physical entity.(2)Ownership. Digital identities can only be held and controlled by users.(3)Protection. The registration of digital identities should pay attention to protecting user privacy and avoiding the identity link between the physical world and the cyberspace. At the same time, the design of SSI model should prevent fraudulent use of identity by others.(4)Persistence. The digital identity should exist forever if the user does not take the initiative to revoke.(5)Portability. When the user’s device is replaced or the system’s infrastructure is updated, the user’s data can be easily transferred to the new device.(6)Standard. The SSI model should comply with the laws and regulations of various countries and international standards, such as GDPR and DID.

Next, we compare our scheme with the previously mentioned schemes on these properties, as shown in Table 2.

From Table 2, the properties of “Ownership” and “Persistence” are satisfied in [17], and the part of “Protection”, “Portability” and “Standard” properties are met; however, the “Existence” property is unsatisfied. In [22], “Ownership” and “Persistence” properties, as well as the part of “Protection” and “Standard” properties are fulfilled, while the “Existence” property is unsatisfied and the rest is uncertain. In [23], “Existence” and “Protection” properties as well as the part of “Ownership” property are satisfied, but the “Standard” property is unsatisfied and the others is doubtful. The properties of “Existence”, “Ownership,” and “Persistence” properties are satisfied in [26], and the part of “Protection,” “Portability,” and “Standard” is met. Maram et al. [28] fulfill the properties of “Ownership” and “Persistence,” as well as the part of “Existence,” “Protection” and “Standard” properties, but the rest is doubtful. Compared with them, these properties are all realized in our scheme.

6.2. Computation Cost
6.2.1. The NSSIA

Our scheme includes five phases, namely, initialization, digitization, generation, interaction, and accountability. Since the initialization phase is executed only once and the interaction phase is not the point, we do not evaluate these two phases, and we mainly focus on the other three phases. Our scheme is run on a PC with windows 10, Intel(R) Core(TM) i5-1035G1 CPU @ 1.00 GHz and RAM 16G. We use Java 8.0 and Python 3.9 to evaluate the computation cost, where we choose the SHA-1 algorithm, the AES-128 algorithm, and the Secp256k1 elliptic curve. In order to balance security and computational cost, the and which is the number of SSs and RAs, respectively, are set to 5 in this simulation, and the corresponding and are 3. In addition, the length of the is 17 B. The , , and are 256 B, 1 KB, 20, and 5 B separately. That is, the length of is B. According to the data in the paper [39], the length of iris and face is 25 KB and 30 KB, respectively.

To elaborate the computation cost clearly, a series of computational notations are defined:(1) denotes the SHA-1 operation. , , and are the computation cost of performing the SHA-1 operation when the parameter sizes are 256 B, 1 KB, and 25 KB, respectively, and they are 0.0169 ms, 0.0353 ms, 0.0748 ms.(2) indicates the AES-128 encryption/decryption algorithm and the computation cost is 0.4186 ms when the parameter size is 256 B.(3) represents the ECC encryption algorithm. and are the computation cost of performing the ECC encryption algorithm when the parameter sizes are 17 B and 30 KB, respectively, and they are 0.0015 ms and 0.0024 ms.(4) expresses the ECC decryption algorithm. and are the computation cost of performing the ECC decryption algorithm when the parameter sizes are 17 B and 30 KB, respectively, and they are 0.034 ms and 0.0401 ms.(5) serves as the ECDSA signature algorithm. and are the computation cost of performing the ECDSA signature algorithm when the parameter sizes are 1 KB and 30 KB, respectively, and they are 0.0014 ms and 0.0283 ms.(6) is the ECDSA verification algorithm. and are the computation cost of performing the ECDSA verification algorithm when the parameter sizes are 1 KB and 30 KB, respectively, and they are 0.0007 ms and 0.001 ms.(7) is the Lagrangian interpolation algorithm and the computation cost is 0.0101 ms when the threshold value is .

Since the time cost of XOR operation, split operation, and splicing operation is negligible, it is not taken into consideration.

The computation cost of different phases in our scheme is shown in Table 3.

In Table 3, three SHA-1 operations, one ECDSA signature algorithm, and one ECC encryption algorithm are performed in the Digitization phase and the time cost is ms. Two SHA-1 operations, two ECC encryption algorithms, one ECDSA verification algorithm, one ECDSA signature algorithm, one Lagrangian interpolation algorithm, one AES encryption, and three ECC decryption algorithms are performed in the Generation phase and the time cost is ms. The Accountability phase consists of two ECC encryption algorithms, two ECC decryption algorithms, twenty-one Lagrangian interpolation algorithms, and one AES decryption algorithm and the time cost is ms. To generate a unique DA for each NP, the total ms is used.

6.2.2. Computation Cost Comparison

We compare with other schemes in terms of the generation phase and the accountability phase, as shown in Table 4.

In Table 4, one PBKDF2 algorithm, one SHA-256 operation, and two AES-256 encryption algorithms are performed in the generation phase in [17], and the time cost is ms. In [22], one SHA-1 operation, one ECDSA signature algorithm, and one zk-SNARK algorithm are performed in generation phase, in which the time cost is ms. Zheng et al. [23] perform two AES-128 encryption algorithms, three ECC encryption algorithms, three ECC decryption algorithms, three ECDSA signature algorithms and three ECDSA verification algorithms in generation phase, and the time cost is ms. In [26], one base58 encoding algorithm and two RSA signature algorithms are performed in generation phase, and the time cost is ms. One oracle operation and one ZKP operation are performed in generation phase in [28], and the time cost is ms. As shown in Section 6.1, the computation cost is ms in generation phase in our scheme.

For the accountability phase, since the corresponding mechanism has not been designed in the paper [17, 22, 23, 26], the audit cost cannot be given. While in [28], one secure multiparty computation operation is performed in the accountability phase, where the audit cost is ms. As shown in Section 6.1, the accountability cost in our scheme is ms. From Table 4, our scheme has the lowest time overhead in both the generation phase and the accountability phase.

6.3. Storage Cost

We compare the storage cost with other schemes from the perspective of users, servers, and blockchain. The comparison result is shown in Table 5.

As shown in Table 5, users in [17] need to store a master key and a corresponding derived key, as well as the encrypted personal identity information, which are 168 bytes. And users in [22] need to store a random value key for hash algorithm, and the storage cost is estimated to be bytes. In [23], three pairs of public and private keys, as well as a password used for symmetric encryption algorithm are stored locally by the user, which are 304 bytes. And users in [26] need to store a private key for signature algorithm and personal information (name, DID, photo, etc.) locally, which are exceeds 10670 bytes. In [28], a credential containing user’s information is stored locally, and the storage cost is estimated to be over 150 bytes. While with the help of the biometrics, there is no data such as keys need to be stored locally by users in our scheme.

For a server, an estimated cost cannot be given in the paper [26, 28], because there is no detailed description. A pair of public and private keys encrypted by the AES-256 algorithm needs to be stored in the server in [17], and the storage cost is 108 bytes. In [22], the user’s identity information and related certificates are stored in the server, which are totally 198 bytes. And the server in [23] needs to store the user’s information and the certificate containing the user’s phone number, photo, and so on, and the storage cost exceeds 10240 bytes. While the storage cost is 505 bytes composed of in our scheme. Because we divide the encrypted user information into multiple pieces based on the shamir (t, n) threshold algorithm and store them in different servers, so as to audit malicious users without revealing user privacy.

The last is the storage cost of each scheme on the blockchain. Since there is no evidence that a blockchain is deployed in [28], an estimated overhead cannot be given. In [17], a public part of the claim used to proof the identity is written into the blockchain and the cost exceeds 200 bytes. In [22], the hash value of the user’s identity information and the corresponding certificate are recorded in the blockchain, which is bytes. And in [23], the hash value of the user’s identity information is written into the blockchain and the storage cost is 20 bytes. A DID proof with DID, name, signature, etc. Is recorded in the blockchain in [26], which is 800 bytes. In our scheme, the hash value of user’s metadata, biometrics, and digital identity, as well as the timestamp are recorded on the blockchain, which is bytes. From Table 5, we achieve lower storage cost in the blockchain compared to the schemes [17, 22, 26]. Although Zheng et al. [23] have a lower overhead than us, the data recorded on the blockchain in our scheme more intuitively shows the entire process of identity generation and accountability. In addition, the data is written to the blockchain by different entities, which decentralizes the power of regulatory authorities and reduces the risk of information leakage compared to [23]. Besides, with the storage requirement of 94 bytes per person, even if the identity information of about seven billion people around the world needs to be stored, the required storage space does not exceed 0.7 TB, which is within the acceptable range for practical use. In short, we liberate users in terms of storage, while servers and the blockchain have the necessary storage requirements to balance privacy and accountability, which are low enough for practical scenarios.

6.4. Blockchain Gas Cost

Considering Gas cost as an important aspect to measure performance, we conduct detailed experiments in this regard. To visualize the execution cost of our smart contract, we evaluate its practical performance on a public Ethereum testnet (Rinkeby). We used the plugin Metamask in Chrome v100.0 explorer to access the Rinkeby testnet and the Remix, a browser-based IDE, to compile and deploy our smart contract. Rinkeby is built in April 2017 by the Ethereum Foundation and it uses the proof-of-authority consensus mechanism. Since the ether supply is controlled by several trusted parties and only they can write transactions on the blockchain, it can be considered a consortium blockchain. Hence, the waiting time for a transaction to be confirmed is relatively short to be ignored.

Writing and reading are the main interactions between entities and the consortium blockchain. Therefore, we record the proof data by deploying a smart contract on Rinkeby and count the Gas spent on contract deployment and invocation. And, since Maram et al. [28] have no blockchain deployed, we compare our scheme with the above SSI schemes [17, 22, 23, 26] based on the data in Section 6.3, as shown in Figure 7.

As we can see from Figure 7, during the generation phase, there are more than 200 bytes of data recorded on the blockchain in [17], which costs approximately 25643 Gas. In [22], a total of 102 bytes of data are written to the blockchain and the cost is 24065 Gas. Zheng et al. [23] record 20 bytes of certificates on the blockchain, costing 22679 Gas. The cost of recalling the contract to write 800 bytes of identity proof in [26] is 35268 Gas. In phase 4.1, the contract in our NSSIA is deployed and the cost is 176335 Gas. Unlike the above schemes where only one entity interacts with the blockchain, in our scheme, different entities are responsible for interacting with the blockchain at different stages of identity generation described in Section 4. Concretely, in the digitization stage, metadata verifier (MV) and biometric collector (BC), respectively, record 20-byte metadata and biometric proofs into the blockchain, with an overhead of 22679 Gas. And, in the generation stage, the proof of the generated digital avatar-i (DA) is written into the blockchain by the digital avatar-i generator (DAG), which also costs 22679 Gas. Although the total cost is Gas which is greater than other schemes, the Gas cost of our scheme is actually the lowest due to the spread over different entities. In addition, we greatly reduce the risk of centralization of power compared to other schemes.

In terms of accountability, the overhead of the above schemes is 0 due to the lack of accountability mechanism. In our scheme, there are 34 bytes of log information recorded by RA on the blockchain, and the cost is 22981 Gas, which is almost the lowest compared with the overhead of the generation phase. All in all, regardless of the generation stage or the accountability stage, the gas cost of our scheme is not prohibitive for practice use. Furthermore, the decentralization of regulatory authorities’ power guarantees fair audits and protects user privacy.

7. Conclusion and Future Work

A new self-sovereign identity scheme with accountability is proposed in this paper, where the executable code is introduced to allow each user to independently control their own identity, referred as the digital avatar-i (DA), and malicious users can be fairly regulated without violating the privacy of legitimate ones. For concreteness, one and only individual-specific executable code is generated for each user to interact with others in metaverse without a third-party program, in which biometrics are integrated into the code to enhance uniqueness and user control. The hash of the individual-specific executable code is used as an identifier and each user can store, read and prove identities with service providers through his/her own local executable code. Furthermore, a joint accountability mechanism is introduced to balance the privacy and accountability, where shamir(t, n) threshold algorithm is used to decentralize the power of each regulatory authority and hide users’ information in reality, and the impartial audit is further guaranteed by a consortium blockchain. The security analysis illustrates that our NSSIA can resist multiple security threats such as sybil attacks, impersonation attacks, and so on. And the analysis result on SSI properties shows that we have satisfied all the six SSI properties in the identity generation phase. Compared with the state-of-the-art schemes, the extensive experiment results in performance indicate that the overhead of our NSSIA is not unreasonable for practical use.

For future work, we will pay attention to the difficulties existing in the use of the DA, such as unlinkability, and right to be forgotten, and the full design of Section 4.4 will be presented. Meanwhile, striking a balance between privacy and accountability when using the DA to interact with others in cyberspace is also the focus of our research.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest in this work.

Acknowledgments

This work was supported by the Natural Science Foundation of Zhejiang Province (Grant no. LQ20F020019) and the Foundation of Science and Technology on Communication Security Laboratory (Grant no. 6142103190105).