Abstract

By deploying resources in the vicinity of users, edge caching can substantially reduce the latency for users to retrieve content and relieve the pressure on the backbone network. Due to the capacity limitation of caching and the dynamic nature of user requests, how to allocate caching resources reasonably must be considered. Some edge caching studies improve network performance by predicting content popularity and actively caching the most popular content, thereby ignoring the privacy and security issues caused by the need to collect user information at the central unit. To this end, a collaborative caching strategy based on federated learning is proposed. First, federated learning is used to make distributed predictions of the preferences of users in the nodes to develop an effective content caching policy. Then, the problem of allocating caching resources to optimize the cost of video providers is formulated as a Markov decision process, and a reinforcement learning method is used to optimize the caching decisions. Compared with several basic caching strategies in terms of cache hit rate, transmission delay, and cost, the simulation results show that the proposed content caching strategy reduces the cost of video providers, and has higher cache hit rate and lower average transmission delay.

1. Introduction

The explosive growth of mobile devices, including cell phones, wearable devices, connected cars, and Internet-of-Things (IoT) devices, has led to exponential growth in data traffic. As a result, great pressure has been exerted on the backhaul network, with resultant increases in network latency. Resource-constrained mobile devices also face considerable challenges in supporting computation-intensive and time-critical applications, such as video services, voice control, gesture recognition, 3D modeling, natural language processing, and online interactive games. On the other hand, video data is also growing explosively. With substantial storage space and powerful computing ability, cloud data centers are the best content repository for some video vendors, such as YouTube and TikTok. However, mobile devices suffer performance degradation when retrieving all data from the cloud.

To address these issues, edge computing (EC), which is a promising paradigm, has been introduced to provide a service environment with computing and caching capacity. By hiring computing and storage resources on edge servers, mobile apps and content vendors (referred to as vendors hereafter) can host their apps and content on edge servers to provide low-latency and high-quality services for users [1]. In this way, EC can greatly alleviate the congestion of the backhaul network, improve the quality of user experience (QoE) by meeting the strict requirement of response delay, and enhance location awareness.

Compared to cloud computing, MEC is still constrained by limited storage capacity [2]. Edge servers can cache only part of the content, and video users can retrieve data from nearby edge servers instead of from remote cloud servers if the data are already cached on those edge servers [3]. Edge servers are prone to storing popular content to obtain higher hit ratios. Thus, content popularity prediction [4] and cache cooperation strategies between edge servers are important to improve caching performance.

In recent years, traditional caching strategies, such as the least recent use (LRU) and least frequent use (LFU) approaches, have been extensively studied [5, 6]. However, these methods are not very efficient because they do not take content popularity into account. Most of the existing work on edge caching handles the popularity based on the assumption that the content popularity is known in advance, which is impractical. Therefore, it is imperative to improve the caching efficiency by properly predicting content popularity [7]. Recently, centralized machine learning can learn on its own from available data and make accurate decisions or predictions on unseen data with the help of algorithms [8, 9]. In literature [10], an optimal cache resource allocation scheme during the next update period is drawn using a neural network from the content history requests collected from all users. In literature [11], a deep learning-based content popularity prediction scheme is proposed. In literature [12], the authors propose a popularity prediction model for each content category in terms of historical popularity by training a simplified bidirectional long and short-term memory (Bi-LSTM) network. In literature [13], the authors propose an evolutionary learning-based content caching strategy that adaptively learns content popularity over time. In literature [14], a user preference model is proposed to predict content popularity and track popularity changes based on user preferences and the features of the requested content.

However, most existing caching schemes require users to share their private preference data with a central server, which may pose privacy and security risks [15]. Federated learning (FL) is a distributed framework that learns global models on a server while protecting participants’ privacy by allowing each participant to share locally trained model parameters with the server instead of their local data. A mechanism to protect user privacy is introduced in the literature [16], where the content popularity obtained by the user’s local model is weighted with user preferences. The authors in the literature [17] propose an FL-based model aggregation method that divides all users into multiple subspaces based on their contextual information.

In addition, most of the above articles consider the user’s perspective to optimize user request latency and improve the QoE. However, for content providers, how to increase profit and reduce costs are additional factors to consider. In literature [18], the multicell collaborative caching problem is studied to minimize the total cost for content providers. The literature [19] considers the caching resource allocation problem in a scenario where a network operator coexists with multiple content providers. This problem is modeled as a multileader multifollower Stackelberg game, and the optimal caching strategy for the content providers is obtained.

In this paper, we propose a cooperative caching strategy in edge caching by considering the cost of optimizing content providers and the privacy security of users. Aiming to ensure the privacy security of users, we adopt FL to predict user preferences in different edge nodes in a distributed manner and apply the approach to the design of the caching policy. For the cost problem for content providers, we use a deep reinforcement learning (DRL) based collaborative caching (Dueling DDQN) algorithm, which combines DDQN and Dueling DQN to make caching decisions for edge nodes. The contributions of this paper are as follows: (i)Due to the dynamic changes in video popularity on the network, the FL framework is used to accurately predict the content popularity in a given region. The proposed model can learn the influence of multidimensional content features to formulate an effective caching strategy(ii)To reduce the cost of video providers, a video collaborative caching strategy based on deep reinforcement learning is proposed. We construct a collaborative edge cache model based on transforming storage and energy consumption into costs. By combining DDQN and Dueling DQN, the proposed algorithm can effectively reduce the high valuation of DQN and accelerate the convergence speed. The experimental results verifies the validity of our approach(iii)Experimental verification and comparative analysis are conducted on the proposed cache strategy, and our strategy is compared with other baseline algorithms. The experimental results show that our strategy optimizes the cost of video providers and has a high cache hit ratio

The rest of this paper is organized as follows: Section 2 summarizes the relevant work; Section 3 defines the system model and problem description; Section 4 details the proposed cache collaboration scheme; Section 5 uses simulation results to evaluate the cache performance; finally, Section 6 summarizes this paper.

Many different strategies and algorithms have been for edge caching. In [20, 21], the user can only obtain the requested content from the local base station or cloud data center, while the noncooperative content caching scheme that cannot be obtained from the nonlocal base station is proposed. Since the cache capacity of a single base station is limited, to alleviate the storage capacity limitation of edge cache nodes, a collaborative cache strategy is adopted to solve the problem that limited cache node capacity affects cache efficiency and degrades system performance. Literature [22] proposes a cooperative cache strategy for heterogeneous cellular networks by transforming the optimal strategy design of content caching into an integer linear programming problem that is solved by the subgradient method. Literature [23] describes the problem of optimal collaborative content caching as a 0-1 integer program to minimize the average download delay. A greedy algorithm based on content popularity is used to solve the problem. Compared with the popular cache strategy, this strategy can substantially improve the content cache hit ratio and reduce the average content delivery delay.

Due to the dynamic nature of the network and the complexity of the environment, an efficient cache strategy must take user requirements into account. However, accurate user requirements are difficult to obtain. Literature [1014] design cache strategies based on machine learning technology, estimating the future needs of users according to the content popularity and user preference data set with dynamic characteristics of the time series. Literature [24] proposes an online active cache scheme using the bidirectional deep recursive neural network (BRNN) model to predict time series content requests and update the edge cache accordingly. Literature [25] proposes a collaborative cache strategy of edge servers based on the software-defined network (SDN), in which multilayer sensory neural networks are used to predict video content request probability by mobile users and to construct an objective minimization function to maximize the utilization of edge servers’ resources. The approach adopts a branch-and-bound algorithm to determine the optimal global solution. Literature [26] proposes a deep reinforcement learning approach and an improved branch delimitation strategy to solve the problem of jointly optimizing cooperative edge caching and wireless resource allocation in IoT networks, respectively. However, centralized access to user data may lead to user privacy exposure. Federated learning is a distributed machine learning framework that can effectively solve this problem [16, 17]. Thus, it is necessary to extract, analyze, and identify content popularity, user preferences, and user movement patterns while respecting user privacy. Literature [27] proposes a federated -means scheme for privacy protection that is used for active caching in the next-generation cellular network. This scheme protects user privacy by means of two privacy-protection technologies: FL and secret sharing. Literature [28, 29] propose an intelligent F-RANS framework based on FL that can accurately predict the content popularity distribution in a network by applying FL to user demand prediction.

Due to the limited storage capacity of edge servers, it is impossible to guarantee that all content provided by the content provider (CP) will be cached to the edge server, and a user’s request will directly affect the income of the CP. Therefore, the study of collaborative cache strategy must also consider maximizing the income of the CP [18, 19]. To reduce the delay and cost in the cloud-side collaboration environment, literature [30] proposes a content collaboration caching strategy. The strategy considers the delay gain and cache cost gain brought by the cached content to the cooperative domain and designs the cached content according to the gain. Because of the considerable request delay problems and high operation costs in the current video service-based caching strategy, literature [31] proposes a moving edge computing video caching strategy with coordinated optimization of delay and energy consumption. The method adopts a branch-and-bound algorithm to solve the optimization problem with the aim of reducing request delays for users and lowering costs for suppliers. This paper proposes an edge cache strategy algorithm based on reinforcement learning that considers cache placement and privacy security while protecting user privacy and reducing the costs of video providers. The recent work summary can be seen in Table 1.

3. System Architecture

In this section, we introduce a collaborative edge caching system architecture that supports FL frameworks. The architecture consists of a network model, content request model, local popularity model, cooperative cache model, and request cost model. The main parameter symbols are listed in Table 2.

3.1. Network Model

Figure 1 shows the system scenario. This scenario contains a set of BSs , and UEs . Each BS can serve multiple UEs that have disjoint coverage areas. Mobile edge servers are deployed at each BS to provide edge caching service for UEs and to make wireless resource allocation decisions. Let be the caching capacity of ; then, all BSs capacity sets can be defined as . All the MEC servers can exchange cache information and share data through backhaul links and the cache manager (CM). Mobile devices send content requests at the beginning of each time slot .

3.2. Content Request Model

The content repository set located in the cloud server is denoted as , and the size of content is denoted as . The set of requesting users at in time slot is defined as , and the number of UEs requesting content from during time slot is defined as . Each UE is assumed to be associated with only one BS in each time slot, i.e., each UE can be served by only one BS in a given time slot. The number of requests from all users can be defined as .

3.3. Local Popularity Model

Due to the diversity of the content preferences in different BSs, the local content popularity of all content in at time slot is defined as a content popularity vector . In addition, considering the privacy and security of users, FL is applied to accurately predict content popularity without the UE uploading all individual user preference data to the BS.

3.4. Cooperative Cache Model

The content requests of mobile devices are first received by BSs. If the requested content has been cached in the local BS, it will be pushed to users immediately. A binary local content delivery variable indicates whether the local BS provides services for the UE: if a response is requested and otherwise. If the requested content is not cached in the local BS, the BS will obtain the requested content from other BSs through the CM. Let denote a nonlocal BS serving the UE: if another BS responds to the request and otherwise. If the CM cannot find the requested content in any BS, the local BS will obtain the requested content from the cloud server and deliver it to the UE. Let denote whether the UE obtains the content from cloud server in time slot : if the UE obtains content from the cloud server and otherwise.

3.5. Request Cost Model

The cost of the system is composed of two main parts: the storage energy consumption and the transmission energy consumption of the content on the MEC. If the content is cached in the BS, there will be additional storage energy consumption. Assuming that all BS servers have the same performance, the storage cost of BS caching content [31] within period can be expressed as where the caching decision indicates whether content is cached in and is the energy consumption of the MEC server to cache each bit of data. The wireless transmission rate between and can be obtained by the Shannon formula: where represents the bandwidth of the base station, denotes the transmission power from to , is the channel gain between and , and is the variance of additive Gaussian white noise. Then, the transmission energy consumption of to download content from its local is where denotes the time taken for file to be transmitted from the user to the local base station, is the transmission power between BSs. We define the transmission rate between a BS and the CM as and that between the CM and the cloud server as . The transmission energy consumption of UE acquiring content from nonlocal BSs is expressed as: where denotes the time taken for file to be transmitted from CM to the nonlocal base station. The energy consumption of UE downloading content from cloud server can be expressed as: where is the transmission power of the cloud server. can be seen from the transmission rate, so the transmission energy consumption is . Therefore, the transmission cost can be expressed as follows:

Thus, the total cost can be expressed as

3.6. Formalization

In this paper, the provider cost-based edge cocaching approach aims to minimize the total cost by efficiently caching content on edge servers. This problem can be expressed mathematically as: where constraints C1 and C2 indicate that the cache decision and content delivery variables are binary. C3 indicates that the data in each BS should not exceed its storage capacity.

4. Problem Solution

In this section, we predict the user’s request behavior via the factor machine (FM) algorithm [32] to account for different users’ personality preferences in different scenarios. The FM algorithm can solve the feature combination problem under sparse data conditions. In addition, considering users’ privacy security, we accurately predict the content popularity by applying FL to the content popularity prediction algorithm, which does not require the UE to upload all individual user preference data to the BS.

4.1. FL-Based Content Popularity Prediction Model
4.1.1. Creation of the Local Model

For each content , define as its feature vector. is the category tag. If the content is requested, ; otherwise, . We define as the probability of requesting content for in time slot . The correspondence between the feature vector for the requested content and the category label is approximated based on the sigmoid function, and the FM model represents the user preferences. The formula is expressed as follows: where denotes the sample feature dimension and is the value of the -th feature of content . are model parameters. In the case of sparse data, very few samples will satisfy the nonzero cross term. When the number of training samples is insufficient, insufficient and inaccurate training of the parameters is likely, which affects the model’s effectiveness. Therefore, an e-dimensional auxiliary vector is introduced for each feature . Then, the second-order parameter can be expressed as . The above equation (10) can be converted to where . The training parameters are integrated into , and the quadratic term learning parameters are .

4.1.2. Training of the Local Model

To track the changes in user preferences and protect user privacy, we design a local model training process using user request records as the input data for model training. To measure the learning performance of the model, we use the cross-entropy loss function to represent the loss of for the binary classification problem. The formula is as follows:

When the user preference model update starts, we assume that receives requests. Based on the collected samples, we iteratively learn the user preference model parameters by minimizing the logistic loss for each sample, denoted as where is defined as the model parameters learned by in the -th iteration.

Due to the constructed dataset’s extensive feature dimensionality and sparseness, overfitting may occur. Follow-the-regularized-leader (FTRL) [14] is used to solve Equation (14). FTRL is an online optimization method based on the online gradient descent (OGD) method, and Equation (15) is the iterative strategy of OGD. FTRL introduces both L1 and L2 mixed regularization terms into the optimization process. The L1 regularization term increases the sparsity of the model solution, and the L2 regularization term helps to prevent the model from overfitting. The update strategy of FTRL is where denotes a nonincreasing learning rate. is a parameter related to that satisfies . and denote regularization parameters with positive values, and is the sum of the gradient vectors of the first samples. The gradient vector of the -th sample is represented as follows

By continuing the expansion of Equation (16), we can obtain: where is a constant that does not affect the problem. Let ; we can then obtain the iteration relationship as follows:

For the requested content, there is a difference in the weight change rate of each feature dimension, and the gradient value of each feature dimension reflects this change rate. Therefore, different learning rates are used for different feature dimensions:

The feature dimension is , and the -th feature learning rate can be denoted as , where and are tuning parameters chosen to yield good learning performance. These two parameters are FTRL optimization parameters. Equation (18) can be split into subproblems for each feature where the L1 norm is nondifferentiable at . Define as the subgradient of the L1 norm when . The optimal solution should satisfy that the derivative is 0, so is obtained by derivation of the above formula. Thus, we have:

4.1.3. Model Aggregation

After each UE completes the local model training process, to obtain the overall content popularity, must aggregate the models trained by all UEs, using the federated average method for aggregation. The parameters of the global model are formulated as:

At this time, the regional popularity of can be obtained by Formula (10). The time complexity of the proposed prediction model algorithm is , and the specific algorithm is shown in Algorithm 1.

1: Input:
2: Initialization:
3: Fordo
4: Fordo
5:  Calculate by (17)
6:  
7:  
8:  
9:  Fordo
10:   Calculate by (17)
11:   
12:   
13:   
14:   Fordo
15:    Calculate by (17)
16:    
17:    
18:    
19:   End for
20:  End for
21: End for
22: End for
23: Calculate by (21)
24: Calculate by (22)
25: Calculate
4.2. Deep Reinforcement Learning-Based Content Caching Decision

After obtaining the content collection, it is necessary to determine which content must be placed in the edge server to minimize the cost to the video provider. Since edge cache environments usually have huge high-dimensional state spaces, it is difficult to manually determine all valuable features from the environment. Deep reinforcement learning can automatically obtain the optimal policy from the original high-dimensional state input to solve such problems. DQN is a general DRL framework, but it often overestimates the value of the possible actions in a given state. Additionally, DQN usually estimates values for all actions of each state, but this is not necessary for those states where actions have no effect on the environment or values. Therefore, we propose a content placement method based on Dueling-DDQN, which combines double DQN and Dueling DQN to effectively reduce the overestimation of DQN and accelerate the learning process. The main purpose of double DQN is to mitigate the overestimation problem. Dueling DQN decomposes the action-value function into a state-value function and a dominance function to speed up convergence without estimating the value of each action in each state.

The method has the following three steps. (1) First, the cooperative content caching problem is formulated as a constrained Markov decision process (CMDP). (2) Second, the cache placement process is analyzed, and the reward function for the cache decision is constructed. (3) Finally, deep reinforcement learning is used to obtain the optimal content placement policy. The CM is considered a proxy in a given scenario, making caching decisions for all MEC servers. The CMDP element can be represented as a four-tuple consisting of (S, A, R, and C), where is the state space, is the action space, is the reward, and is the constraint. The detailed definition is as follows: (1)State space: denotes the set of edge cache node states, and denotes the specific state of edge cache nodes in time slot . The CM collects information such as content popularity vector and cache capacity of each edge base station(2)Action space: action space is defined as , where is the action space set of time slot and represents the buffering decisions of in time slot . The CM will select from the action space as the buffer decision of the BS according to the information received from each base station(3)Reward: represents the reward obtained by a BS for performing action in state . From Formula (8), we can see that the optimization goal of this paper is to minimize the cost of the video provider. Therefore, , the profit gained from storing a single file, can be defined as:(4)Constraints: in making cache decisions, it is necessary to ensure that the files cached by each BS do not exceed its capacity. The capacity constraint is defined as follows:

The Dueling-DDQN algorithm is a deep neural network algorithm used to predict the size of the value. The value can be understood as the state action value, i.e., the expected benefit of the agent acting in a certain state. The algorithm divides the entire model structure into two parts: the state value function and the advantage function. The state value function is used to estimate the value of a state, while the advantage function is used to estimate the advantage of an action taken in a state. The value function can be expressed as:

It is impossible to obtain a unique determination of and based on a given value of . Therefore, Equation (25) is not identifiable. To solve this problem, a centralized treatment of the dominance function part function is given by where denotes the dimensionality of the vector . The weight parameters of the target network must be updated once per cycle during the training process. The parameters of the updated network are updated by stochastic gradient descent (SGD) to minimize the loss function.

The whole training process involves approximating the value to the target value, so the target value is expressed as: where represents the action corresponding to the maximum value in the current network. The selected action is then used to calculate the target value in the target network. The detailed process of the caching strategy based on Dueling-DDQN is shown in Algorithm 2.

1: Input: The capacity of the experience replay poll , train starts , size of minibatch , discount factor , -greedy exploration , learning rate , number of episodes , target network update period
2: Initialization:
3: For ep do
4: Input initial state space .
5: Fordo
6:  Choose an action via -greedy policy
7:  Execute action and get the next status and reward , judge whether is a terminal state.
8:  Put the sample into the experience replay pool.
9:  Ifthen
10:   Randomly sample training samples with a minibatch size K from the experience replay pool .
11:   Calculate the target value by formula (5)
12:   Apply the SGD method to calculate equation (6) to update the weight
13:  End if
14:  Ifthen
15:   Update target Q network parameters
16:  End if
17:  
18: End for
19: End for

5. Simulation Results

In this section, the experimental results of the proposed algorithms are investigated, and the performance of four other algorithms, namely, content caching algorithm based on marginal gain, FL-based caching strategy, popularity-based caching algorithm, and noncooperative caching strategy, is taken as a reference.

5.1. Simulation Parameters

In the experiment, it is assumed that the number of base stations is [3,6], and the capacity of base stations is set to [100,300] MB identically. The number of users is 30, the users are randomly distributed under different BSs, and the content size is set to [5,10] MB. The data rate between each BS and the CM is set to 128 MB/s, and the data rate between the CM and cloud server is set to 32 MB/s. The parameters used in the simulation experiment are shown in Table 3.

5.2. Datasets

To evaluate the performance of the proposed edge caching strategy, we use a real-world dataset-MovieLens [33]. The MovieLens ([https://grouplens.org/datasets/movielens/]) dataset contains rating data for multiple movies by multiple users, movie metadata information, and user attribute information. The MovieLens 100 K dataset contains 100,000 ratings for 1682 movies by 943 users. Each user has reviewed at least 20 movies with ratings on a 5-star scale, from 0 to 5. This paper simulates the process of users requesting content. We assume that the movie participation score is the content requested by the user, and each movie score corresponds to a content download. Literature [14, 28] take a similar approach to simulate the process of a user requesting content.

5.3. Performance Metrics

This paper considers three performance metrics: cache hit ratio (hit), average transmission delay (time), and cost. The cache hit ratio represents the ratio of satisfied requests to the total number of requests at the edge node. It is defined as: where is the total number of requests received by the edge server in each time slot, and is the number of missed requests. The average transmission delay represents the average delay to transmit the content from the edge server or the central server to the user. It is expressed as: where the above equation is the ratio of the transmission delay of all requests and the number of all requests between time slot . The cost is the optimization objective function in this paper.

5.4. Results

To evaluate the performance, the proposed algorithm is compared with the following four algorithms: (1)Content caching strategy based on marginal gain [30](CCBG): this strategy analyzes the marginal gain in latency and cache cost to cache the content in the edge server(2)FL-based caching policy [28](FLBC): this algorithm applies FL to user demand prediction and formulates the caching problem as an integer linear programming (ILP) problem(3)Popularity-based caching algorithm [34](PBC): this is a caching update scheme based on content popularity, replacing low-popularity content with high-popularity content(4)Noncooperative caching strategy (no collaboration): this algorithm does not consider cooperation between base stations for content storage

In Figure 2, the predicted popularity is shown using the proposed request prediction strategy. The predicted popularity is very close to the actual popularity, which indicates that our proposed popularity prediction strategy can effectively predict user requests.

Figure 3 shows the impact of the number of BSs on the cache hit rate, average transmission delay, and cost. The capacity of BSs is set to 100 MB, and the number of BSs is increased from 3 to 6. From Figures 3(a) and 3(b), it can be seen that the cache hit ratio increases and the average transfer latency decreases for the four algorithms as the number of BSs increases. In addition, the proposed algorithm consistently outperforms the four baseline algorithms. The reason is that as the number of BSs increases, the amount of content that the collaborative baseline can cache increases; however, the number of users in the collaborative range also increases, and the types of content requests from users become more diverse. This scenario leads to a gradual slowing of the hit rate variation and a gradual decrease in the delay variation. The increase in the number of users leads to an increase in the number of requests, so the hit rate and latency still show an improving trend. From Figure 3(c), the comparison shows that the proposed service placement method outperforms the other cache placement methods. The overall cost of the comparison algorithm is increasing, and the total cost of the proposed algorithm is decreasing. There is a downward trend in transport costs as most requests can be responded to locally. However, as the number of BSs increases, storage costs also increase. The proposed algorithm is more optimized for the transfer cost.

Figure 4 shows the impact of the BS capacity on the cache hit ratio, average transmission delay, and cost. The number of BSs is set to 3, and the capacity of the BSs is set to 100 MB-300 MB. As shown in Figure 4(a), as the storage capacity of the BSs increases, the cache hit rate gradually increases. That is because as the cache capacity of the BSs increases, the BS can cache more files, so the cache hit rate increases. As shown in Figure 4(b), as the storage capacity of the BSs increases, the delay gradually decreases the number of times files are obtained from the cloud server decreases, so the delay decreases. As shown in Figure 4(c), as the BS storage capacity increases, the video provider’s cost decreases. Because more content can be requested locally without going through the cloud center, the content with the most significant profit gain will be cached first as the cache capacity increases. Hence, the hit rate, latency, and profit curve tend to change faster initially and then more slowly.

Figure 5 shows the impact of the UE number on the cache hit rate, average transmission delay, and cost. The number of BSs is set to 3, the capacity of the BSs is set to 200 MB, and the number of UEs served is increased from 30 to 60. As shown in Figure 5(a), as the number of UEs increases, the cache hit rate shows a downward trend. This is because as the number of UEs increases, the content requested by users becomes diverse, and content requests become scattered, so the hit rate decreases accordingly. As shown in Figure 5(b), the delay increases gradually as the number of UEs increases. It is because as the cache hit rate of the BSs decreases, the number of times files are obtained from the cloud server increases, so the delay also decreases. As shown in Figure 5(c), the video provider cost increases as the number of UEs increases. Because the number of requests passing through the cloud server increases, the overall cost of video providers increases.

6. Conclusion

In this paper, we propose a deep reinforcement learning-based approach to collaborative content caching to optimize video providers’ costs. First, the content caching problem is represented as a CMDP. Then, the content caching process is analyzed to construct a caching reward function. Finally, deep Q-learning is used to obtain the optimal content caching strategy. In addition, considering the user’s content request and privacy security, federated learning is used to design the caching strategy to make distributed predictions for the users in the nodes. Simulation results based on real datasets show that the proposed algorithm optimizes the cost of the video provider while achieving a high cache hit rate.

Although the cooperative caching strategy proposed in this paper has achieved good results, there are still some shortcomings. The proposed algorithm is a centralized algorithm and may not be suitable for network scenarios with a large number of base stations. Future work will aim at designing a scheme for multiagent proxy DRL that can learn the optimal caching policy where each BS acts as an agent and makes its own caching decisions.

Data Availability

Data is openly available in a public repository. The data that support the findings of this study are openly available in [movielens] at [https://grouplens.org/datasets/movielens/].

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grants 61902112, and 62072159.