Abstract

Online advertising, which depends on consumers’ click, creates revenue for media sites, publishers, and advertisers. However, click fraud by criminals, i.e., the ad is clicked either by malicious machines or hiring people, threatens this advertising system. To solve the problem, many schemes are proposed which are mainly based on machine learning or statistical analysis. Although these schemes mitigate the problem of click fraud, several problems still exist. For example, some fraudulent clicks are still in the wild since their schemes only discover the fraudulent clicks with a probability approaching but not 100%. Also, the process of detecting a click fraud is executed by a single publisher, which makes a chance for the publisher to obtain illegal income by deceiving advertisers and media sites. Besides, the identity privacy of consumers is also exposed because the schemes deal with the plain text of consumers’ real identity. Therefore, in this paper, a blockchain-based click fraud detection and prevention scheme (BCFDPS) for online advertising is proposed to deal with the above problems. Specifically, the BCFDPS mainly introduces bilinear pairing to implicitly verify whether a consumer’s real digital identity is contained in a click message to significantly avoid click fraud and employs a consortium blockchain to ensure the transparency of the detection and prevention process. In our scheme, the clicks by machines or fraud ones by a human can be accurately detected and prevented by media sites, publishers, and advertisers. Furthermore, ciphertext-policy attribute-based encryption is adopted to protect the identity privacy of consumers. The implementation and evaluation results show that compared with the existing click fraud detection and prevention schemes based on machine learning and statistical analysis, BCFDPS achieves detection of each fraudulent click with a probability of 100% and consumes lower computation cost; furthermore, BCFDPS adds functions of consumers’ privacy protection and click fraud detection and prevention, compared to the existing blockchain-based online advertising scheme, by introducing limited communication cost ( bytes) at lower storage cost.

1. Introduction

Nowadays, cost-per-click (CPC) is by far the most popular model used in online advertising [1]. An online advertising system mainly includes four entities [24], namely, consumers (Us), advertisers (ADEs), publishers (PUBs), and media sites (MSs). An ad promotion process includes seven steps such as publishing, clicking, paying, and so on, which is shown in Figure 1. The ADE’s ad is published by PUB to U on the website of MS, as shown in steps 1–3 in Figure 1. A click is counted when a U clicks on the ad, as shown in steps 4-5 in Figure 1. Then, ADE needs to pay advertising promotion fees to PUB because of these clicks, and PUB also pays advertising click fees to MS, as shown in steps 6-7 in Figure 1. There are mainly two types of implementations of online advertising system to publish ads. The first type is the traditional online advertising systems (Google, Twitter, etc.) which mainly rely on centralized servers. Also, inspired by the tamper-proof and decentralized characteristics of blockchain, the other one is blockchain-based advertising systems [5, 6] which are implemented to achieve the transparency of an advertising business.

For higher revenue in the two types of implementations of online advertising systems, an ad will be published to a targeted U who is the potential consumer, which is called ad precision targeting. As we know, two main types of click fraud methods are designed by the attackers in the above two systems to gain extra illegal revenue: the first type is to generate the repeated click messages by machines. In detail, malicious advertisers use web crawler, botnet and proxy server, etc. to click an ad by machines [79] for exhausting his competitors’ budgets. The second one is click fraud by a human. Specifically, the malicious publishers and media sites recruit many people to click the same ad frame [10] or abuse the click history of legitimate users [11] to charge advertisers more ad promotion fees [4, 12]. Thus, these fake clicks generate additional budgets for advertisers, but do not create any revenue [13, 14], which undoubtedly disrupts the order of the advertising system.

Therefore, many click fraud detection and prevention schemes have been proposed to predict the authenticity of each click and to maintain the stability of the advertising system. According to the technology adopted, these schemes may be classified into two categories: machine learning-based scheme and statistical analysis-based scheme. Machine learning-based schemes [4, 79, 13, 1522] utilize machine learning algorithms to train models that can judge whether a new click is fraudulent from massive click traffic. For example, in the scheme of [7], a machine learning algorithm based on convolutional neural networks and decision tree is designed to construct a classifier that distinguishes whether a click message is generated by machines or human beings according to the sensors of mobile device. However, the dataset used for training is easy to be mixed with fraudulent clicks in these schemes, causing the process of training a model to be susceptible to adversarial attack [23]. Thence, statistical analysis-based schemes [1, 10, 2432] aim to mitigate the adversarial attacks. For instance, the schemes in [10, 25] predict malicious crowdsourcing platforms by clustering algorithms. Xu and Li [25] used the DP-means clustering method to predict malicious groups, while Tian et al. [10], inspired by the DP-means clustering method, proposed a non-parametric method to solve the problem of malicious coalition fraud. Although they prevent the fraud of short-term malicious crowdsourcing platforms, their approaches are not enough for multiple traffics with long fraud intervals.

Apart from this, in the above two categories of schemes, the click fraud is still in the wild since they predict the click fraud only with a probability which is less than 100%. Also, the transparency of the click fraud detection and prevention process is not achieved, since these fraud detection and prevention algorithms are only implemented within a single central agency (publisher). That is, the publisher could gain illegal income from misreporting the number of the real clicks. Moreover, U’s privacy is also leaked since these schemes analyze U’s some original identity information such as the username and phone number.

Recently, as a tamper-proof and distributed technology, blockchain has attracted the attention of online advertising systems to significantly increase the trust between consumers and advertisers without additional costs and intermediaries. Specifically, by using a distributed ledger, the data related to the delivery of ads, clicks, and the analysis result of the real click number are all stored in the blockchain, which can be audited and verified by everyone [33]. On the other hand, people’s activities in physical space have been transferred to cyberspace increasingly. To build a better cyberspace, a digital identity that maps one-to-one with a physical identity in cyberspace is becoming the focus of the future. To this end, many digital identity infrastructures [3438] have emerged to better manage user behavior in cyberspace.

Therefore, taking the above merits and problems into account, we introduce the blockchain and the existing digital identity infrastructures to detect and prevent fraudulent clicks for an online advertising system. The main contributions of this paper are summarized as follows.(1)We propose a blockchain-based click fraud detection and prevention scheme (BCFDPS) for online advertising, which significantly avoids clicking by machines and increases the cost of fraud ones by a human.(2)Whether a click is fraudulent can be confirmed directly in our scheme rather than predicted with a probability less than 100%. A consumer’s digital identity that is one and only mapping to a person in the physical world is embedded in a click message. That is, a fraudulent click does not contain a legitimate digital identity, and many duplicate clicks contain only the same digital identity.(3)When negotiating the ad billing fee between entities, the problem of tampering with the real number of clicks by media sites and publishers is solved because the transparency of the number in the click fraud detection and prevention process is realized through introducing a consortium blockchain. Specifically, the analysis result of the clicks is periodically recorded by publishers. The media sites and the advertisers can also verify the result independently through the data in the blockchain.(4)The risk of leaking consumers’ identity privacy from adversaries is alleviated by the bilinear pairing and ciphertext-policy attribute-based encryption without excessively affecting the publisher’s accurate target of ads to consumers.

The rest of this article is organized as follows. Related works are discussed in Section 2. Section 3 reviews some preliminaries. Section 4 formulates the problem being addressed. Section 5 describes the proposed BCFDPS in detail. Security is analyzed in Section 6, and an experiment is designed and implemented in Section 7, followed by discussion and conclusion in Sections 8 and 9, respectively.

2.1. Machine Learning-Based Schemes

Machine learning-based schemes are widely used in advertising fraud detection scenarios with large amounts of click data. User’s click features are first extracted, then these features are used to train a model with the training dataset, and further the trained model is evaluated with the test dataset [13]. Oentaryo et al. [15] and Kanei et al. [4] mainly utilized random forest to detect click fraud in online advertising systems. But they are unable to catch coalition attacks involving multiple fraudulent approaches. Then, Wang et al. [16] presented CLUE in 2017, a novel recurrent neural network (RNN)-based online e-commerce transaction fraud detection system, and they deployed the CLUE on JD.com, serving over 220 million active users, to achieve real-time detection of fraudulent transactions. However, the CLUE in [16] will face gradient vanishing or gradient exploding problems when the click traffic is too complicated, leading to a poor fraud detection model. In 2018, Haider et al. [18] used two ensemble learning techniques, bagging and boosting algorithms, to train a model to detect and prevent click fraud. In 2019, support vector machine (SVM), K-nearest neighbor (KNN), AdaBoost, decision tree, and bagging were evaluated to detect a click by Almahmoud et al. [8]. In 2020, gradient tree boosting (GTB) algorithm was used to address the challenges encountered in effectively classifying fraudulent publishers [19]. Nevertheless, the user’s identity privacy is exposed in [8, 18, 19] since they used the original identity information of consumers, such as the real username and the address of the visitor, to train models. Then, in 2021, two XGBoost-based schemes [21, 22] were proposed for click fraud detection, but both of them require manual classification of a large amount of click traffic in advance, which is time consuming. Apart from the problems mentioned above, according to the paper of Mikhailov and Trusov [23], all the schemes in this category are prone to adversarial attacks, for which they need a large number of samples as input to train the model. Also, these machine learning-based schemes can only determine whether the click is fraudulent with a probability approaching 100%.

2.2. Statistical Analysis-Based Schemes

Graph-based propagation approaches were first proposed in [24, 27, 29] to analyze the advertising traffic. The main idea of Stitelman et al. [24] is to use the co-visitation network between websites to identify media sites with a large amount of fraudulent traffic, but this approach relies on the fact that the experts have informed views about which websites look reasonable and which do not. Since it is difficult to collect all the users’ data in a co-visitation network, Hu et al. [27] analyzed the behavior characteristic of individual mobile advertising user and then reduced malicious user clicks. As a specific deployment of the idea in [27], Dong et al. [29] proposed FraudDroid, a novel hybrid approach, to detect ad frauds in mobile Android apps. It dynamically analyzes applications to build UI state transition graphs, collects their associated runtime network traffic, and then uses it to identify advertising fraud.

Additionally, a pattern-based click fraud detection scheme for mobile applications [32] was designed, and it mainly has two components: offline pattern extractor and online fraud detector. The extractor is responsible for extracting traffic patterns for ad and non-ad traffics, and the detector is in charge of monitoring network traffic and detecting click fraud with the traffic patterns. But the schemes in [29, 32] may fail to handle subsequent variant click fraud [39].

Different from the above graph-based and pattern-based analysis schemes, three contextual-based attributes concerning interarrival time (IAT), diurnal activity (DA), and eigenscore (ES) were analyzed in comparison-shopping services to calculate a click’s credible score for detecting whether it is fake in [26]. Moreover, Meghanath et al. [28] proposed a new contextual outlier detection technology (ConOut) and applied it to the advertising domain to identify fraudulent publishers. Besides, the work in [30] presents Bag-of-Words algorithm to assess clicks in online advertising system, which is based on the concept of text search methods. However, the outlier detection technology in [28] and the Bag-of-Words algorithm in [30] involve users’ real username and address of the visitor, which reveals user’s identity privacy.

In 2019, a novel inference technique (Clicktok) was developed to isolate click fraud attacks in [31]. Clicktok analyzes the traffic matrix, including matrix decomposition and construction, to propose two defenses, mimicry and bait-click. The mimicry isolates click spam by observing the reuse pattern of legitimate click traffic, and the ad network uses bait-clicks to watermark the channel periodically, which sets off watermark detectors when an attacker harvests and reuses a legitimate clickstream in the channel. But the Clicktok does not have a good ability to prevent the new types of click fraud whose traffic matrix is similar to the one of a normal click. On the other hand, these statistical analysis-based methods are deployed in publishers’ devices and they do not achieve the transparency of the click fraud detection and prevention process for each entity in the advertising system. In addition, the statistical analysis-based schemes leak user’s identity privacy since they analyze the original data which includes user’s real username, address of the user, etc.

2.3. Blockchain-Based Advertising System

Recently, blockchain has been widely adopted in many disciplines owing to its trusted computing model and open nature. Therefore, blockchain-based advertising systems [5, 6] are proposed to provide trust between entities in advertising business. The scheme of Liu et al. [5] develops transparent and accountable vehicular local advertising (TAVLA) by utilizing the message digest, multi-party verification, and smart contract of blockchain. Specifically, the hash of the advertising information database is stored in the blockchain, and the code of the advertising query functions is stored off-chain. A vehicle user first requests the ad off-chain, and then the off-chain query result will be verified and assembled in the blockchain smart contract, and finally, the smart contract sends the result to the user. However, the communication data in this scheme is in plaintext, which is not secure for online advertising systems. Moreover, to improve the low trust caused by the click fraud in the online advertising system, Ding et al. [6] designed and implemented a blockchain-based digital advertising media system (B2DAM) and deployed the business logic based on smart contracts and Hyperledger SDK. But the communication cost between multiple blockchains in their scheme is expensive, as they read and write messages too many times on the blockchains.

All in all, in the above schemes, all real-time interactions of entities are settled in the blockchain, and each ad delivery and click behavior are recorded in the blockchain, so that the throughput is hard to meet the high concurrency in the advertising system. As a result, we periodically record the analysis results of the clicks on the blockchain in our scheme.

3. Preliminaries

3.1. Blockchain

Blockchain records all the transactions which are generated in a peer-to-peer network, and it is actually a decentralized ledger system. In the system, all the blocks include the hash of the previous block; in this way, they are linked together by the hash, and a blockchain is formed. According to the decreasing order of decentralization, the blockchain consists of three categories: public blockchain, consortium blockchain, and private blockchain [40, 41]. The public blockchain is open to all nodes, and everyone can read and write data on it. The consortium blockchain is partially open since it is managed by several organizations and only the authenticated members can access and record data on it. Also, the private blockchain is considered to be centralized as it is fully controlled by a single enterprise or organization. Considering that several enterprises and organizations are included in the advertising system, the consortium blockchain is adopted in our scheme.

3.2. Shamir (t, n) Threshold Secret Sharing Algorithm

A threshold secret sharing algorithm [42] was proposed by Shamir in 1979 to share a master secret in a safe way. In the literature, a trusted center (TC) splits the master secret into sub-secrets and then distributes them to participants . The master secret cannot be reconstructed with fewer than sub-secrets and the specific steps are described as follows. Firstly, a random ( − 1)-th degree polynomial as is generated by the TC, in which the master secret . Then, the TC calculates sub-secrets and allocates to secretly. Next, when receives , he saves it safely. Finally, the master key can be recovered by the TC using the Lagrange interpolation formula: , and the master key .

3.3. Bilinear Mapping

Assume , , and denote three additive and multiplicative cyclic groups of the same order , where is a large prime and is the generator of , . Besides, : is an isomorphism, and , , are equipped with pairing. The bilinear pairing mapping satisfies the following properties [43, 44].(1)Bilinear: and , where  =  ; if , the mapping is said to be bilinear.(2)Non-degenerate: there exists such that .(3)Computability: , there is an efficient algorithm to compute .

The group that possesses such a map is called a bilinear group.

3.4. Ciphertext-Policy Attribute-Based Encryption (CP-ABE)

CP-ABE schemes [45, 46] are designed to realize complex access control on encrypted data. In CP-ABE, a party wishing to encrypt a message specifies a policy by an access tree , and the private key must meet the policy to decrypt it, where the access tree is constructed by the party and the private key is generated by a set of descriptive attributes of the decryptors. In the access tree , each non-leaf node represents a threshold gate, described by its children and a threshold value, and each leaf node of the tree is described by an attribute and a threshold value . To facilitate working with the access trees, the parent of the node is described by , and the function is defined only if is a leaf node and denotes the attribute associated with the leaf node . The access tree also defines an ordering between the children of every node, that is, the children of a node are numbered from 1 to . The function returns such a number associated with the node .

According to [47], the four algorithms of the bilinear mapping-based CP-ABE scheme are as follows:(1)Setup: this algorithm gives the public parameters and master key . It chooses a bilinear group of prime order with generator . Next, it chooses two random exponents . Then, the public key is published aswhere , and the master key MK is .(2)Encrypt (PK,, ): this algorithm encrypts message to get ciphertext using the public parameters and the access tree . In detail, it first chooses a polynomial for each node (including the leaves) in the tree , in which the degree of the polynomial is one less than the threshold value , that is, . Starting with the root node in , the algorithm chooses a random and sets . Then, it sets . Finally, it lets be the set of leaf nodes in , and the ciphertext is(3)KeyGen (MK, ): the KeyGen algorithm outputs the private key using the master key and the attribute set . Firstly, it chooses and for each attribute . Then, it computes the private key as(4)Decrypt (CT, SK): this algorithm decrypts the ciphertext with the private key for people who satisfy the attribute set . The decryption procedure is a recursive algorithm in whichWhen is the root node in the tree , it can be concluded that . Then, the message can be computed by

4. Problem Statement

Many digital identity infrastructures [3438] have emerged to better manage user’s behavior in cyberspace, among which an identity management agency is responsible for generating and maintaining one-to-one mappings between digital identities and physical identities. The one-to-one mapping of digital identity infrastructures can prevent identity-based attacks (Sybil, whitewashing, etc.) in cyberspace. Therefore, based on the existing infrastructures, we designed our blockchain-based click fraud detection and prevention scheme (BCFDPS) for online advertising system. To elaborate our scheme clearly, the main entities and procedures of existing digital identity infrastructures are also included.

4.1. System Model

The proposed BCFDPS consists of seven entities: identity management agency (IMA), entity identity blockchain (EIB), consumer (U), access behavior blockchain (ABB), media site (MS), publisher (PUB), and advertiser (ADE), where the IMA and EIB are the entities of the existing digital identity infrastructures, as shown in Figure 2.(i)IMA generates identities for U, PUB, and ADE, issues their identity licenses, and provides U with a signature on U’s masked identity during the registration phase. IMA records the real identities of U, PUB, and ADE in EIB. Note: the IMA belongs to the existing digital identity infrastructures.(ii)EIB is responsible for recording the hash of digital identity in the advertising system other than MS. Also, it is a consortium blockchain maintained by several IMAs. Note: the EIB belongs to the existing digital identity infrastructures.(iii)U sends the encrypted masked identity and ad click messages to MS whenever he visits MS’s website and clicks the ad.(iv)ABB is in charge of recording the information of PUB’s advertising bidding, MS’s forwarding results of U’s click message, and PUB’s analysis result of U’s access behavior. The ABB is a consortium blockchain, which is controlled by many MSs and PUBs.(v)MS represents the media site between U and PUB. It is responsible for displaying PUB’s ad for U and forwarding all the click messages of U for PUB. MS summaries the result of ad bidding and the forwarding information and periodically records them in the ABB. MS can verify the click number independently for detecting and preventing click fraud.(vi)PUB publishes ADE’s ad, joins the ad bidding process of MS, analyzes the effective ad clicks generated by U, and records the analysis results in the ABB. PUB can verify the click number independently for detecting and preventing click fraud.(vii)ADE sends an ad to PUB for publishing, and he can also detect and prevent click fraud alone through verifying the number of clicks in the ABB.

4.2. Security Model

In BCFDPS, we have the following security assumptions.(i)IMA and PUB are semi-honest and they will strictly follow the protocol but are curious about the information.(ii)U is considered as a malicious entity and he will intentionally click on the same ad many times out of profit or curiosity.(iii)MS is regarded as dishonest. It may deploy click fraud methods and even directly tamper with the statistical results of clicks to try to obtain extra illegal revenue from the PUB.(iv)ADE is also seen as a malicious entity. He may attempt to deliberately falsify his statistics to reduce the ad expenses from the PUB.(v)Two consortium blockchains, maintained by multiple IMAs, MSs, and PUBs, respectively, are fast and secure enough in recording transactions. Also, we assume that the standard cryptographic algorithms used in our scheme are secure and unbreakable.(vi)It is built upon the Canetti–Krawczyk (CK) threat model [48], in which any two parties could communicate via an unauthenticated network. Specifically, an adversary can fully control the communication in a probabilistic polynomial time and try to reveal, track, or even imitate U through sniffing and tampering with messages between U and MS.(vii)Corresponding to the physical identity, a person in cyberspace has his one and only digital identity. The U’s digital identity in each ad click message is protected by a masked identity and the random numbers, and the click message is generated by U’s browser plugin, where the plugin is assumed to be integrated in the browser in advance to protect U’s privacy.

4.3. Design Goals

According to the above system model and security model, the design goals of our scheme are as follows.(1)No impact on ad precision targeting: a PUB can still accurately target an ad to a U although the U’s digital identity is masked. In other words, only the PUB can link the U’s masked identity from different click messages.(2)Acceptability of ad response speed: the ad response speed in our scheme is acceptable for a U, even though the cryptographic algorithms are used to protect the U’s identity privacy in the process of publishing an ad.(3)Transparency of ad billing fee: MS, PUB, and ADE can count the real number of clicks on the same ad in an independent way. That is, the process of verifying the ad billing fee is transparent between MS, PUB, and ADE.

5. Proposed BCFDPS

The BCFDPS is proposed to detect and prevent click fraud, and it is mainly divided into three phases, as shown in Figure 2. The first phase allows U, PUB, and ADE to register with the IMA and obtain their digital identities and identity licenses. Meanwhile, IMA stores the hash value of their identities on the EIB, as shown in steps ①–④ (note: the four steps belong to the existing digital identity infrastructure). The second phase permits MS and PUB to work together to recommend ADE’s ad to U. U clicks the ad that he is interested in and MS records the hash value of the data that it forwarded in the ABB, as shown in steps ⑤–⑬. The last phase lets both PUB and ADE detect and prevent click fraud independently using the data in the ABB, which is shown in steps ⑭ and ⑮.

The detailed process of BCFDPS includes four phases: initialization, registration, ad publishing, and click fraud detection and prevention. To elaborate our scheme clearly, the notations and descriptions of BCFDPS are shown in Table 1.

5.1. Initialization

The digital identity is the cornerstone of cyberspace which is provided and validated whenever a user accesses the network services. The initialization of this section is not exclusive to our scheme. In other words, in order to describe our scheme clearly, the pivotal initialization of the existing digital identity infrastructure is described in this section. Specifically, the IMA performs initialization to generate its public and private keys, and the EIB generates the system public parameters . In addition, U, PUB, and ADE generate their public and private keys.

5.1.1. IMA Initialization

IMA initializes its public and private keys . Then, IMA publishes in the system. In addition, IMA defines PUB’s attribute set including but not limited to these attributes .

5.1.2. EIB Initialization

The EIB performs initialization to generate the system public parameters . Firstly, it selects a large prime , an elliptic curve , and a base point with order under the finite field . Then, it chooses a bilinear group with generator and two random numbers . Next, it calculates parameters: , , , and , and publishes in the system so that IMA can get . Lastly, EIB generates a shared private key denoting the master key described in Section 3.2, and it uses the Shamir (t, n) threshold secret sharing algorithm [42] to distribute the sub-secrets of to each IMA.

5.1.3. U, PUB, and ADE Initialization

U, PUB, and ADE also generate their own public and private keys, referred to as , , .

5.2. Registration

Similar to Section 5.1, the registrations of U, PUB, and ADE are not exclusive to our scheme. In other words, in order to describe our scheme clearly, the pivotal registration of the existing digital identity infrastructure is described in this section. Specifically, U registers with IMA to obtain his real identity , identity license , and the signature . Similar to U, PUB registers with IMA to get its digital identity , identity license , and an attribute set as an ad publisher. Also, ADE receives his digital identity and identity license from IMA.

5.2.1. U Registration (UR)

STEP UR1. IMA collects U’s biometric data, e.g., fingerprint, digitalizes the fingerprint to obtain the digitized data, and selects and assembles a set of unique code segments from the code library according to the data, and at the same time, the hash value of the code segments is calculated. The hash value is U’s real identity . Note that if a U is disguised by a machine or has already registered, the IMA would not generate an identity for the U. In order to issue an identity license to U, the IMA gathers other’s sub-secrets and uses the Shamir (t, n) threshold secret recovering algorithm [42] to recover the shared private key for calculating and U’s masked identity . Then, IMA sends to U’s browser plugin, where is the signature of U from IMA, shown in (6), and refer to the public parameters in EIB. This process is shown in step ① in Figure 2.STEP UR2. At last, IMA records in EIB, which is used for accountability when a click fraud happens. This process is shown in step ④ in Figure 2.

5.2.2. PUB Registration (PUBR)

STEP PUBR1. IMA generates PUB’s identity and an attribute set according to the business scope of PUB. Hereafter, similar to STEP UR1, IMA generates PUB’s identity license and sends to PUB. This process is shown in step ② in Figure 2.STEP PUBR2. At last, IMA records the in EIB for supervision when a click fraud appears. This process is shown in step ④ in Figure 2.

5.2.3. ADE Registration (ADER)

STEP ADER1. IMA generates ADE’s identity and identity license and sends to ADE. This process is shown in step ③ in Figure 2.STEP ADER2. At last, IMA records in EIB to supervise when a click fraud arises. This process is shown in step ④ in Figure 2.

5.3. Ad Publishing

To obtain higher revenue, PUB often publishes ads to a targeted U through MS’s ad bidding. Then, U clicks the ad that he is interested in.

5.3.1. Publisher Publishes an Ad (PPA)

This phase deals with the process that a browser plugin sends U’s masked identity in ciphertext to PUB and PUB displays the related ads to the targeted U, as shown in steps ①–⑥ in Figure 3.STEP PPA1. ADE sends an ad to PUB for publication. Then, ADE and PUB reach a consensus on ADE’s ad and create the ad’s identity , as shown in step ① in Figure 3.STEP PPA2. U sends his masked identity in ciphertext (instead of a real identity in the real world) to PUB for getting the ad that he is interested in, protecting his privacy. Firstly, U visits MS’s website, and U’s web browser plugin encrypts the secret through the CP-ABE algorithm to prevent entities other than the collection of PUBs from obtaining U’s identity privacy. Then, U sends the ciphertext to the MS. After that, the MS directly broadcasts the to the PUBs cooperating with the MS. Here is the specific process.U uses the public parameters and defines an access tree according to the attributes of ads that he is interested in. The format of is shown in Figure 4, where “1/2” means that PUB must satisfy at least one of the two attributes . Then, U encrypts the secret to get the ciphertext .where is the timestamp, , , is a random number, is a polynomial for each node in , and and are the attributes associated with the leaf node .After U visits MS’s website, U’s browser plugin sends to MS, which is then directly broadcast to different PUBs by MS. This process is as in steps ② and ③ in Figure 3.STEP PPA3. Next, PUB decrypts in to get the secret . Further, PUB gets U’s masked identity from the and decides whether to bid for an ad according to the , as shown in step ④ in Figure 3. The specific process of getting the is as follows.According to the attribute set obtained as described in Section 5.2.2, PUB uses the master key in the public parameters to generate the decryption key according towhere are random numbers, is an attribute, and belong to the public parameter .Then, PUB uses (9) to decrypt the leaf nodes of the access tree with and when PUB’s satisfies the attributes which U requires.where is a leaf node in . After PUB obtains all leaf nodes, it uses the Lagrangian interpolation formula to obtain the parent node, and this process is recursive until ’s root node is obtained. ’s root node is .Next, PUB can get the secret byAt last, PUB decrypts with IMA’s public key and obtains the masked identity of U. PUB searches its local database to get the portrait of and decides whether to bid for the ad. If a PUB joins in the bidding process, it sends the and the price to the MS. This bidding process will be executed by many PUBs.STEP PPA4. MS displays the ad of the bid winner and sends the , , , , and ad frame to the U, as shown in step ⑤ in Figure 3.STEP PPA5. MS periodically (e.g., once a day) records the results of the ad bidding in the ABB sorted by periods and s. The format of the results is , where “” is the price that PUB should pay to MS after an ad is clicked once by U, as shown in step ⑥ in Figure 3.

5.3.2. U Clicks the Ad (UCA)

U clicks the ad that he is interested in after MS displays it on the website. This section is shown in steps ⑦–⑨ in Figure 3.STEP UCA1. U’s browser plugin gets , , , and from the ad frame and calculates and . It then embeds the timestamp to calculate and . Next, the plugin sends the click message about the ad to MS. The click message is shown as in equation 6 and as in step ⑦ in Figure 3.STEP UCA2. MS forwards to the PUB who won the bidding and stores in its local database, as shown in step ⑧ in Figure 3.STEP UCA3. Finally, the data are classified by periods and s and periodically (e.g., once a day) recorded in the ABB by the MS, as shown in step ⑨ in Figure 3.

5.4. Click Fraud Detection and Prevention

This phase achieves click fraud detection and prevention between entities in an advertising system based on ABB.

5.4.1. PUB Detects and Prevents Click Fraud (PUBD)

To prevent MS from forging the data and ensure the transparency of this ad click analysis process, PUB can detect and prevent fraudulent click, and it is shown in Figure 5.STEP PUBD1. PUB uses its private key to decrypt from MS to obtain the secret and from . The PUB verifies the timeliness of the to prevent the replay attacks, as shown in step ① in Figure 5.STEP PUBD2. PUB uses IMA’s public key to restore and from the and then calculates by (12), as shown in step ② in Figure 5:STEP PUBD3. If (12) holds, PUB counts the number of different s in s, denoted as , which is the number of valid advertising clicks in a certain period (e.g., one day). This means that in this period, PUB only pays for clicks to MS. In this way, the PUB can detect all the fraudulent clicks in the message forwarded by MS. Also, the PUB pays nothing to MS for the repeated s, so the click fraud by a malicious MS can be prevented. Simultaneously, PUB records U’s access behavior information like locally, as shown in step ③ in Figure 5.STEP PUBD4. Finally, PUB periodically (e.g., once a day) records the result in ABB sorted by periods and s, as shown in step in ④ in Figure 5.

5.4.2. ADE Detects and Prevents Click Fraud (ADED)

Similarly, ADE also verifies the results recorded by PUB in the ABB to detect and prevent click fraud, and this section is shown in Figure 6.STEP ADED1. ADE communicates with PUB to obtain the original access information of U in the PUB local database. ADE then uses to encrypt the and compares it with the data on the ABB to prevent PUB’s cheating, as shown in step ① in Figure 6.STEP ADED2. ADE decrypts with private key , obtains , and verifies the timeliness of to prevent replay attacks, as shown in step ② in Figure 6.STEP ADED3. Similar to STEP PUBD2, ADE also restores and from , and then calculates byIf (13) holds, ADE also counts the number of different s in s in a certain period (e.g., one day), as shown in step ③ in Figure 6.STEP ADED4. ADE reads the data recorded in ABB by PUB and compares with the . If the equation holds, ADE pays fee to PUB according to the , as shown in step ④ in Figure 6. Therefore, the ADE can detect all the fraudulent clicks in the original access information from PUB. Also, the ADE pays nothing to PUB for the repeated s, so the fraudulent click by a malicious PUB can be prevented.

5.4.3. MS Detects and Prevents Click Fraud (MSD)

MS obtains all the from PUB and uses to encrypt them successively to get the encrypted result . Then, MS compares the with the in from MS’s local database one by one; if it holds, the data from the PUB are valid. Finally, MS counts the number of the different and verifies if holds. If it holds, MS charges PUB fees according to the . As a result, the MS can detect all the fraudulent clicks in the data from PUB. Also, MS cannot charge more PUB for the repeated s, so the fraudulent click by a malicious MS can be prevented.

6. Security Analysis

In this section, we first analyze the security of our scheme from three levels: the processing level, the data level, and the infrastructure level, which can be called PDI model-based security [4952]. Then, we give the informal analysis of security under the security assumptions in Section 4.2. Lastly, we demonstrate that the BCFDPS scheme is provably secure.

6.1. PDI Model-Based Security Analysis

As the one of the latest and most mature blockchain security analysis frameworks for Industry 4.0, the PDI model [49] conducts a comprehensive and detailed analysis of security issues. In the PDI model, the blockchain security is divided into three levels, which are the process level, the data level, and the infrastructure level [51]. Similarly, we also analyze the security of our blockchain-based click fraud detection and prevention scheme according to the three aspects.

6.1.1. The Process Security

(1)Off-blockchain data processing security: a large number of data processing operations are run off-blockchain in our scheme, since the data statistical analysis ability of the existing blockchain applications is weak [50]. In our scheme, a U’s masked identity and his ad click message are encrypted (denoted as ) and sent to the MS by U’s browser plugin locally. Then, is forwarded to a PUB by a MS off-blockchain. Next, the MS, PUB, and ADE can independently count the real click number from with ECC and bilinear pairing algorithms. Also, since is ciphertext and being processed off-blockchain, it is difficult for an attacker to gather, crack, and modify it. That is, the data processing security off-blockchain is guaranteed in these entities.(2)Data processing security in the blockchain: to implement our scheme in a real-time online advertising scenario, the data processing in the blockchain of our scheme is to periodically read and write content in the access behavior blockchain (ABB) through smart contracts. The ABB is a consortium blockchain that only allows authorized MSs and PUBs to write data, which avoids the unauthorized access. Also, the consensus protocol in ABB guarantees the correctness and consistency of the data when it is written to the ABB, largely eliminating exceptions in data processing and ensuring the security of data processing in the blockchain.

6.1.2. The Data Security

(1)Data tamper-proof: in our scheme, all original business data are stored in the local servers of MS and PUB, and the aggregated results of the original data are regularly recorded in the consortium blockchain as the form of hash values. In this way, even if attackers obtain the data in the blockchain, they cannot get the original data in the local servers of MS and PUB, so they cannot view or tamper with the original data. On the other hand, blockchain can ensure data consistency in distributed ledgers. Therefore, business data security is achieved whether the data are in the blockchain or not.(2)Consumer’s identity privacy: similar to the digital twin in [5355], a U in our scheme can only obtain his unique digital identity to visit MS’s websites and click on PUB’s ads. Also, a masked identity , CP-ABE algorithm, and ECC algorithm are utilized by the U to hide his identity, while preserving the ad precision targeting. In addition, nobody except the PUB can mark the U, and no one can reveal U’s real digital identity . That is, our scheme protects the privacy of consumer’s identity.

6.1.3. The Infrastructure Security

(1)System structure security: the two-level mutual verification between MS, PUB, and ADE maintains the stability of our system structure. For one thing, PUB counts the real and effective clicks from a large number of users’ ad click messages s which are forwarded by the MS. Once the s are tampered with or forged by the MS, they cannot pass the verification of PUB. At the same time, the MS can use the ECC algorithm to count the real clicks from the data stored in local database to prevent PUB from forging the amount of the clicks. For the other thing, since the raw data are generated by U, the ADE can find anomalies once the PUB adds entries in the raw data. Thus, our scheme has system structure security.(2)Cryptographic facilities security: we use the standard cryptographic facilities to build our system. Specifically, CP-ABE algorithm, bilinear pairing algorithm, and ECC algorithm are used by a U to protect his identity. The bilinear pairing algorithm and ECC algorithm are adopted by a PUB and an ADE to detect the fraudulent click, while a MS utilizes the ECC algorithm to detect a click fraud. The security of our scheme relies on these standard cryptographic facilities and we assume that the standard cryptographic facilities used in our scheme are secure and unbreakable.

6.2. Informal Analysis of Security

In this section, we analyze the security of our scheme under the security assumptions in Section 4.2 in an informal way.

6.2.1. Prevention of a False

In Section 5.2.1, a machine cannot obtain a valid since it has no way to pass the IMA biometric authentication. Even if it forges a false , it still cannot generate a valid without IMA’s private key . That is, the click message, containing an invalid , generated by the machine in phase 5.3.2 will be discarded. Therefore, in our scheme, the number of false s is not included in the number of valid clicks.

6.2.2. Transparency of Clicks between Entities

PUB decrypts the in to get using , then verifies the , and counts the number of different . Similarly, ADE restores the in from PUB’s local database using to verify the authenticity of , and then ADE counts the number of different . Although of comes from PUB, in is encrypted by , and only can decrypt it. Therefore, PUB cannot tamper with ; furthermore, ADE ensures the validity of . MS encrypts the original data from PUB using and compares the encrypted result with in from MS’s local database to verify whether the PUB is honest. In this way, PUB, ADE, and MS can verify the number of clicks about the same ad in an independent way.

6.2.3. U’s Conditional Unlinkability

First of all, U sends his masked identity in ciphertext to MS and MS broadcasts it to PUBs in phase 5.3.1. Then, only PUBs can decrypt U’s from since is calculated using the CP-ABE algorithm and only attributes owned by PUBs can generate a decryption key . Secondly, U sends his click message to MS and MS forwards it to PUB in phase 5.3.2. Next, only PUB can reveal U’s masked identity from using its private key for advertising precision marketing. In the entire communication of U, neither the attacker in the channel nor the MS can directly link U’s masked identity because both and are encrypted by CP-ABE algorithm or asymmetric cryptographic algorithm and only PUBs can decrypt them. However, even PUB cannot link U’s masked identity to U’s real identity in ABB since PUB does not have the right to write and read in EIB. Thence, the scheme achieves U’s conditional unlinkability.

6.2.4. Data Security and Integrity

Firstly, in this scheme, all the commercial contract data, e.g., , are encrypted and only the data owner PUB and MS can decrypt these ciphertexts. In addition, all the commercial contract data and the hash value of click result are recorded in the ABB (a consortium blockchain) which is shown in steps ⑩, ⑬, and, ⑭ and any adversary cannot tamper with these data in the consortium blockchain.

6.2.5. Resistance to Replay Attacks

In phases 5.3.1 and 5.3.2, the timestamp and are included in the message and , and PUB first checks their timeliness to avoid replay attacks. Further, in phases 5.4.2 and 5.4.3, ADE and MS can avoid replay attacks by the timestamp . Consequently, our scheme is resistant to replay attacks in a great probability.

6.2.6. Resistance to Forgery

For one thing, in phase 5.3.1, MS stores U’s while it has no ability to construct U’s click message in phase 5.3.2 without a . For another thing, in phase 5.4.2, PUB records and , but it cannot forge a because it also does not have a . In other words, the click message containing can only be generated by U. That is, the BCFDPS can resist forgery attacks.

6.3. Provable Security

The proposed scheme is based on bilinear pairing cryptosystem on elliptic curves (denoted as BPCEC), ciphertext-policy attribute-based encryption (denoted as CP-ABE), and elliptic curve cryptography (denoted as ECC). According to the security characteristics of each module, we show that our scheme meets click fraud detection and prevention and U’s conditional unlinkability.

6.3.1. Theorem 1

If the BPCEC, CP-ABE, and ECC algorithms satisfy the basic security properties, then the scheme in this paper can detect and prevent click fraud.

Proof. Define as an adversary who attacks the security of BPCEC algorithm, as an adversary attacking the security of CP-ABE algorithm, and as an opponent attacking the security of ECC algorithm. Assuming clicks fraud successfully, a polynomial time algorithm is defined, which has the ability to attack the algorithms of BPCEC, CP-ABE, and ECC. Through the query of and the ’s interaction in the click fraud game, is optimized repeatedly to successfully attack the BPCEC, CP-ABE, and ECC algorithms. That is, if the adversary clicks fraud successfully in the scheme, it means successfully attacks the security of algorithms of BPCEC, CP-ABE, and ECC with a certain probability.
According to the steps defined above, here are the interactions between algorithm and the adversary :STEP 1. Registration phase: through the identity generated in U’s registration phase, algorithm obtains U’s digital identity and receives U’s identity , U’s masked identity , U’s identity license , and IMA’s signature . At last, sends to .STEP 2. Inquiry phase: the adversary can query the algorithm for polynomial time:(1)Generate the ciphertext: visits MS’s website, generates the ciphertext by the CP-ABE algorithm, and sends to MS.(2)Generate the click message: generates the click message which contains a randomly selected by through BPCEC and ECC algorithm and then clicks PUB’s ad to send .STEP 3. Verification phase: PUB verifies U’s click message and outputs using the ECC and BPCEC algorithms. If exists, it indicates that the adversary successfully carried out the click fraud attack. The success probability of the adversary isIf an attacker successfully attacks the BPCEC algorithm, an attacker successfully attacks the CP-ABE algorithm, and an attacker can successfully attack ECC algorithm, can carry out the click fraud attack successfully. However, the probability of , and successfully attacking the BPCEC, CP-ABE, and ECC algorithms is almost , respectively; then, wins in the click fraud attack game of BCFDPS scheme with a probability of . But, according to the assumptions that BPCEC, CP-ABE, and ECC algorithms satisfy the basic security properties, it is concluded that the probability of successfully attacking can be ignored, so the scheme can detect and prevent click fraud.

6.3.2. Theorem 2

If all the crypto-algorithms such as BPCEC, CP-ABE, and ECC satisfy the basic security features, then U’s conditional unlinkability can be achieved in the BCFDPS.

Proof. Define as an adversary who attacks the linkability of of BPCEC algorithm, as an adversary attacking the linkability of of CP-ABE algorithm, and as an opponent attacking the linkability of of ECC algorithm. Assuming (except PUB) links U’s masked identity successfully, a polynomial time algorithm is defined, which has the ability to attack the algorithms of BPCEC, CP-ABE, and ECC. During the communication process of U, MS, and PUB, two messages and are encrypted by the algorithms BPCEC, CP-ABE, and ECC. Therefore, for the adversary , the probability of successfully linking many different messages to the same U isTherefore, if the attacker successfully attacks the BPCEC algorithm, the attacker successfully attacks the CP-ABE algorithm, and the attacker successfully attacks the ECC algorithm, then wins in the conditional unlinkability simulation attack game. However, according to the assumptions about these security features, the probability of successfully attacking can be ignored. As a result, the scheme accomplishes U’s conditional unlinkability.

7. Implementation and Evaluation

We evaluate our scheme in terms of computation, communication, storage, and Ethereum gas cost based on JPBC library [56] and Ethereum.

In the proposed scheme, four phases of initialization, registration, ad publishing, and click fraud detection and prevention are involved. Because the first two phases happen rarely, they are not implemented in this section and we mainly focus on the phases of ad publishing and click fraud detection and prevention in which an ad is published and the click fraud is detected and prevented.

7.1. Computation Cost
7.1.1. Evaluation of Our Scheme

We mainly focus on the phases of the ad publishing and click fraud detection and prevention in this section. We execute evaluation tests to get the time cost of meta-operations and the evaluation test is based on a PC (Intel Core i5-9400F CPU @ 2.90 GHz, 16 GB RAM @ 2667 MHz and Windows 10 × 64). We use JDK 1.8, JPBC library [56], to support efficient bilinear pairing operations.

To achieve persuasive expression of computation comparison, the symbols and parameters are introduced: is the encryption algorithm in CP-ABE scheme, denotes the key generation algorithm in CP-ABE scheme, means the decryption algorithm in CP-ABE scheme, expresses the encryption algorithm in ECC, signifies the decryption algorithm in ECC, and represents the bilinear pairing operation. Their time cost is as follows: ms, ms, ms, ms, ms, and ms. In addition, the time cost of hash function and concatenate operation is small, and we do not take this into account in computation cost. The detailed computation costs for each phase are illustrated in Table 2.

In phase 5.3.1, U is required to perform one encryption algorithm in CP-ABE scheme and PUB needs to execute one key generation algorithm in CP-ABE scheme, one decryption algorithm in CP-ABE scheme, and one decryption algorithm in ECC, that is, the running time is ms. According to Ma et al. [57], the response speed of publishing an ad in our scheme is in the acceptable threshold ( ms) and is lower than the one in [6] which closes to 400 ms. In phase 5.3.2, U computes three encryption algorithms in ECC and two bilinear pairing operations. Therefore, the execution time to generate a click message is  ms, and it has no effect on the user experience. Further, in phase 5.4.1, PUB is required to run three decryption algorithms in ECC and one bilinear pairing operation, that is  ms. In summary, the computation cost from publishing an ad for U (phase 5.3.1) to verifying the effective clicks by PUB (phase 5.4.1) is  ms, where the time cost of one click fraud detection and prevention is only 7.87 ms. After PUB counts the effective clicks, ADE and MS will also verify the clicks to ensure their profit. Similar to PUB, ADE performs three decryption algorithms in ECC and one bilinear pairing operation, that is,  ms. For the MS, it executes one encryption algorithm in ECC to detect a click fraud, which is  ms. From Table 2, it can be seen that the CP-ABE algorithm increases the run time in phase 5.3.1, but it protects U’s privacy from MS and the sniffer of a channel. In addition, it should be noted that the computation overhead of PUB in phases 5.3.1 and 5.4.1 can be improved at the publisher with powerful computing clusters. Moreover, distributed and parallel optimization techniques for verifiable computations can also be adopted to further enhance publisher’s performance in publishing the ad to a U who is the potential consumer of the ad.

On the other hand, blockchain is introduced in our scheme; in order to demonstrate the practical performance of our blockchain-based scheme, we evaluate the execution cost of our smart contract based on a public Ethereum testnet (Rinkeby). We used Chrome v89.0 explorer with the plugin MetaMask and Remix which is a browser-based IDE to connect the contract between Ethereum and the program simulated. Rinkeby testnet was started by the Ethereum team in April 2017 and it uses Clique PoA (Proof of Authority) consensus protocol. Importantly, it is immune to spam attacks, as Ether supply is controlled by several trusted parties and only they can write transactions in the blockchain, which makes it like a consortium blockchain; thence, the waiting time for transaction confirmation is relatively short to be ignored.

We deploy smart contracts on Rinkeby to record the transaction data and count the gas cost of smart contracts on deployment and recall. The gas cost of our scheme is shown in Table 2. In our scheme, a smart contract is only deployed once in phase 5.3.1 and the gas cost of deploying the contract is . Additionally, in phases 5.3.1, 5.3.2, and 5.4.1, the cost of recalling the contract to write 256, 32, and 128 bytes of analysis result on blockchain is 27,054, 23,470, and 25,006 gas, respectively. All in all, judging from the evaluation results, our scheme is feasible in practice.

7.1.2. Comparison of the Computation Cost in Click Fraud Detection and Prevention Process

As far as we know, the click fraud detection and prevention schemes that use blockchain are hardly found. Therefore, we choose the click fraud detection schemes [8, 16, 18, 29, 31] which do not use blockchain and compare the computation costs with them in publisher’s click fraud detection and prevention process (phase 5.4.1), and the comparison result is shown in Table 3.

It can be seen from Table 3 that Almahmoud et al. [8] utilized SVM, KNN, etc. to detect a fraudulent click by machines, and the time taken to build the model of the generated 500 instances is 10 ms, while the time taken to classify a single instance whether legitimate or illegitimate is 50 ms with a precision of 95.10%. The scheme in [16] uses recurrent neural network to train a model with more than 1.6 million sessions so that the typical training duration is 12 hours (roughly 6–8 epochs), but the precision is 33.80%. The dataset of the scheme in [18] contains 393,708 deliveries (243,650 ok deliveries and 150,058 fraud deliveries), and the time required to train classifier with 10 features is about 800 seconds with a precision rate of 96.29%. Dong et al. [29] utilized 12,000 ad-supported apps, and an average of 216.7 seconds was spent to construct the UI transition graphs and an average of 400 ms was spent to detect the ad frauds. The dataset in [31] is from a university campus network between June 2015 and November 2017 with total of 217,334,190 unique clicks. After training, the precision is 89.34%. Table 3 shows that the preparation times of schemes in [16, 18, 29] are longer than ours because their schemes are based on machine learning and statistical analysis, and they need to spend more time training machine models and analyzing the pattern of the click traffic, while the preparation time is not included in our scheme. The verification time of a click fraud in the schemes in [16, 18, 31] is not explained, but in the scheme in [29], it is 400 ms, which is obviously higher than ours. In summary, our scheme is the best one for publishers to detect and prevent a click fraud.

7.2. Communication and Storage Cost
7.2.1. Evaluation of Our Scheme

Our scheme is embedded in the advertising system and many entities in the system need to send data to publish an ad and store data as evidences to pay for fees. To evaluate the feasibility of our scheme in practice, we simulate the scheme in terms of ad publishing and click fraud detection and prevention, and the results of communication and storage cost are shown in Table 4. Specifically, we assume that the output size of the general hash function is 256 bits, the size of an elements in the elliptic curve is 256 bits, the size of an element in a bilinear group is 1,024 bits, the length of identities is 256 bits, and the timestamp size is 112 bits.

In phase 5.3.1, an ADE first sends an ad’s identity to a PUB, then a U transmits a ciphertext containing his own to a MS, the MS further forwards the ciphertext to the PUB, and after the PUB decrypts and obtains the , the PUB sends the and bidding fee to the MS; next, the MS displays the for the U. The communication cost of U, MS, PUB, and ADE is , , , and bytes. Also, U stores 259-byte parameters to compute ad click messages faster. To make it easier to publish the ad, the MS stores the that are 65 bytes, the PUB reserves , which are bytes, and the ADE keeps his 32 bytes . Similarly, in phase 5.3.2, the contents of the communication of U, MS, PUB and ADE are bytes, bytes, 0, and 0, respectively. The storage cost of them is 0, bytes, bytes, and 0 separately. The click fraud is detected and prevented by the MS, PUB, and ADE in an independent way in phases 5.4.1, 5.4.2, and 5.4.3, and the processed results are also stored. Specifically, the PUB writes a total of bytes of data in the ABB and it consumes 175 bytes to store . The ADE receives bytes to verify the click messages, and the ADE stores bytes of data. Moreover, bytes of message are obtained by the MS to detect the click fraud, and it stores bytes of result.

For the data presented in Table 4, the communication and storage cost in our scheme is mainly consumed in phases 5.3.1 and 5.3.2. A total of about bytes are used, which is negligible in today’s common online advertising systems.

7.2.2. Comparison of the Communication Cost in Publishing and Clicking an Ad

We did our best to search for current blockchain-based online advertising click fraud detection and prevention schemes but only found two blockchain-based online advertising schemes [5, 6] which do not realize the detection and prevention of click fraud. Additionally, Ding et al. [6] were mainly concerned about the throughput of the blockchain transactions, and they did not give details of sending the advertising messages. Therefore, from the perspective of scheme similarity, we only make a comparison in the processes of “Publisher publishes an ad (phase 5.3.1)” and “U clicks the ad (phase 5.3.2)” with a vehicular local advertising system of Liu et al. [5]. Table 5 visually describes the communication cost in the processes of publishing an ad and clicking an ad.

In the scheme of Liu et al. [5], a PUB directly sends an ad to a U, and the U then clicks on the ad. ADE and MS are not included in the process of publishing and clicking on the ad, so the cost of ADE and MS is 0. To obtain an ad in the scheme of Liu et al. [5], a U needs to send his local position of bytes, five attributes of bytes, and a number of 1 byte to a PUB, in which the communication of a U is bytes. Also, the PUB returns two positions of bytes and forty attributes of bytes to the U, in which the communication of a PUB is bytes. However, in their scheme, the click fraud still exists since they did not verify the authenticity of the click. Also, the privacy of U’s locations and interests is leaked to the sniffer in the channel because the communication data are in plaintext. In our scheme, we are able to detect and prevent click fraud while protecting the identity privacy of the U. Specifically, an ADE first sends an ad to a PUB, a U sends a ciphertext to the MS, and the MS forwards the to the PUB for getting an ad. Then, the PUB sends a and a price to the MS, and the MS displays the to the U. Next, the U clicks on an ad and sends a click message to the MS, and the is forwarded to the PUB. In these steps, the U’s communication cost includes a and a , which is bytes, the communication cost of the MS contains a , an , a , an identity , and a , which is bytes, the PUB’s communication cost consists of a and a , which is bytes, and the ADE only sends 32 bytes of . The communication cost of ours is higher than that of Liu et al. [5] since we add some additional authenticity information in the click message to detect and prevent click fraud. Moreover, the communication data are encrypted by the CP-ABE algorithm to protect U’s privacy from the transmission medium.

When we place our scheme and Liu et al.’s scheme [5] with the same level of U’s privacy protection and without regarding to click fraud detection and prevention, the ad publishing steps in our scheme can be modified as follows: a U needs to send to the MS, then the MS forwards to the PUB, next, the PUB sends and to the MS, and finally, the MS displays for the U. As a result, during these steps, the total communication content within the system is bytes, which is significantly lower than 499 bytes of Liu et al.’s scheme. That is, we add bytes of communication overhead for U’s privacy protection and click fraud detection and prevention. Also, the overhead ( bytes) added to our scheme is acceptable in the background that the mainstream network bandwidth is above 3 MB/s (the average bandwidth of a 4G network is 3 MB/s).

7.2.3. Comparison of the Storage Cost in Publishing and Clicking an Ad

Besides, the comparison of storage cost when a publisher publishes an ad and a consumer clicks the ad is also shown in Figure 7.

In the processes of publishing and clicking an ad, Liu et al.’s scheme [5] does not involve the advertiser and the media site, that is, the storage cost of them is 0. Also, to request an ad faster, the U stores his ad query in advance, in which the storage cost of U is 67 bytes. After publishing an ad to the U, the PUB records the result of the ad query, and according to the experimental result, the total length is bytes. For our scheme, the access tree , the signature from IMA , the masked identity , and the identity license of U are stored in U’s browser plugin in advance, which is a total of bytes. The MS is responsible for forwarding messages and retaining the forwarding results , so its storage cost is bytes. Additionally, the PUB stores forwarded by the MS, which is a total of bytes. Also, the ADE only stores 32 bytes of ad information . In a word, the total storage cost of our scheme is significantly lower than that of Liu et al. [5] because they need to store all the similarity results between multiple ads and one consumer.

8. Discussion

Our scheme addresses the challenging problems encountered in online advertising click fraud detection and prevention, namely, incompletely reliable detection results, tampering with the number of real clicks by the PUB itself (the PUB can count the real click number), and leakage of consumer’s identity privacy. However, it still has some shortcomings that need to be solved.

First of all, although an entity identity blockchain (EIB) exists in our scheme, fraudulent adversaries have not been held accountable in our current scheme. Specifically, the EIB is designed as a consortium blockchain that records the digital identity hash of entities which can serve as evidence to hold malicious entities accountable when a fraudulent click fraud occurs. To restrain the malicious entities, an accountability system needs to be designed in the future.

Secondly, the time spent by MS to detect and prevent click fraud is slightly higher. In detail, when a MS detects click fraud, it needs to use the to encrypt the successively and then compare the encrypted result with the in its local database one by one. As a result, to reach an agreement with PUB on ad billing fees, the time cost for MS to detect real clicks may be high in a certain period. Therefore, our future research will focus on reducing the time cost of MS in its detection process.

Lastly, the problem of consumers’ partial data loss may still exist. In our scheme, we assume that the parameters obtained by registration such as the user’s identity license are secretly stored in his browser plugin, so how to prevent the leakage of parameters from the plugin also needs to be further studied.

9. Conclusion

In this paper, we proposed a blockchain-based click fraud detection and prevention scheme (BCFDPS) for online advertising to avoid clicking by machines and increases the cost of fraud ones by a human. Specifically, a click fraud by a malicious machine is significantly avoided since a consumer’s immutable digital identity is embedded in the click message with the bilinear pairing algorithm and the machine does not have a digital identity to generate a valid click message. Also, the cost of click fraud by a human increases because many valid clicks by the same recruited person can only be counted once. Additionally, the introduced consortium blockchain maintains all the hash values of analysis result of consumers’ click messages to achieve the transparency of the click fraud detection and prevention process for each entity in the advertising system. Further, the identity privacy of consumers is protected from media sites, advertisers, and the sniffers in the channel by ciphertext-policy attribute-based encryption. Our implementation and evaluation demonstrate the advantages of BCFDPS in computation and storage cost, and the Ethereum gas cost is limited. Additionally, to protect the user’s identity privacy, the communication cost is moderately increased.

Data Availability

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

Qiuyun Lyu was responsible for conceptualization, methodology, analysis, and writing. Hao Li was responsible for conceptualization, methodology, software, and writing. Renjie Zhou was responsible for revision and funding acquisition. Jilin Zhang was responsible for methodology and validation. Nailiang Zhao was responsible for validation and analysis. Yan Liu was responsible for review and editing.

Acknowledgments

This study was partially supported by the Zhejiang Provincial Key Technology Research and Development Program under grant no. 2019C03134 and National Key Technology Research and Development Program of China under grant no. 2019YFB2102100.