Machine Learning in Mobile Computing: Methods and ApplicationsView this Special Issue
Network Traffic Statistics Method for Resource-Constrained Industrial Project Group Scheduling under Big Data
With the advent of the Internet era, the demand for network in various fields is growing, and network applications are increasingly rich, which brings new challenges to network traffic statistics. How to carry out network traffic statistics efficiently and accurately has become the focus of research. Although the current research results are many, they are not very ideal. Based on the era background of big data and machine learning algorithm, this paper uses the ant colony algorithm to solve the typical resource-constrained project scheduling problem and finds the optimal solution of network traffic resource allocation problem. Firstly, the objective function and mathematical model of the resource-constrained project scheduling problem are established, and the ant colony algorithm is used for optimization. Then, the project scheduling problem in PSPLIB is introduced, which contains 10 tasks and 1 renewable resource. The mathematical model and ant colony algorithm are used to solve the resource-constrained project scheduling problem. Finally, the data quantity and frequency of a PCU with a busy hour IP of 188.8.131.52 are analyzed and counted. The experimental results show that the algorithm can get the unique optimal solution after the 94th generation, which shows that the parameters set in the solution method are appropriate and the optimal solution can be obtained. The schedule of each task in the optimal scheduling scheme is very compact and reasonable. The peak time of network traffic is usually between 9 : 00 and 19 : 00-21 : 00. We can reasonably schedule the network resources according to these time periods. Therefore, the network traffic statistics method based on the solution of resource constrained industrial project group scheduling problem under big data can effectively carry out network traffic statistics and trend analysis.
1.1. Background Significance
With the increasing coverage of the Internet, the Internet has been widely used in various fields, and the demand for network construction is also growing. A network connection is becoming more and more complex, and the cost of network interconnection is also gradually increasing . The statistical analysis of network traffic is very important, so it is imperative to explore a new efficient and convenient network traffic statistics method. In the era of big data, there is a new solution to the resource-constrained project scheduling problem. On this basis, it is innovative to take network traffic as a renewable resource to solve the optimal solution of project scheduling, which can provide new ideas for the statistics and analysis of network traffic.
1.2. Related Work
With the increasing importance of network traffic statistics, there are many research results. Rocha proposed some statistical and model multiplier equations for scaling coefficients in order to develop a new algorithm for network traffic statistics . Solomon takes the network traffic of Botswana International University of Science and Technology (BIUST) as an example and proposes a mathematical model  for modeling the real-world problem. Their algorithm is not stable enough to calculate the peak value effectively. Resource-constrained project scheduling problem has always been the focus of attention, and its research has never stopped. Kreter studied the RCPSP problem with general time constraints and calendar constraints. It not only developed six different constraint programming (CP) models to solve the problem but also developed a special propagator to consider the cumulative resource constraints of calendar constraints . In order to optimize the production period and total delay time, Xiao tried to extend electromagnetic (EM) and integrate it into three well-known advanced multiobjective evolutionary algorithms (MOEA), namely, nondominated sorting based multiobjective evolutionary algorithm (NSGA-II), intensity Pareto evolution algorithm (SPEA2), and decomposition-based multiobjective evolutionary algorithm (MOEA/D) . Both of them have a large amount of calculation in the calculation process, which is not suitable for big data calculation.
1.3. Innovative Points in This Paper
In order to explore a new efficient and convenient statistical method of network traffic, this study, based on the ant colony algorithm solving process of resource-constrained project scheduling problem, takes network traffic as a renewable resource and solves the optimal solution of project resource balance problem. The innovation points of this paper are (1) the objective function and mathematical model of resource-constrained project scheduling problem are successfully established; (2) the mathematical model of resource-constrained project scheduling problem is optimized by using ant colony algorithm; (3) the optimal scheduling scheme for a project is formulated successfully, and the peak value of network traffic is counted.
2. Statistical Method for Resource-Constrained Industrial Project Group Scheduling Network Flow
2.1. Resource-Constrained Project Scheduling Problem
2.1.1. Classification of Resource-Constrained Project Scheduling Problems
When the project resources are limited, the project scheduling problem is based on the job sequence in the project and the optimal solution is obtained.
According to the properties of resources, they can be divided into renewable resources, nonrenewable resources, and dual constraint resources. The available quantity of renewable resources is restricted in each period of the total project duration. Resources are obtained and used on the basis of time periods. The resources in each time period are limited, and they will be automatically updated in the next period after use, such as ordinary machinery and equipment and human resources. The available quantity of nonrenewable resources is limited within the total project duration, but it is not limited in each time period. If the resources are obtained and used based on the total construction period, they cannot be updated once they are used up . The availability of dual constrained resources will be limited in the total project duration and each time period, such as funds.
According to the different project scheduling objectives, it can be divided into project duration minimization problem, resource balance problem, and maximum net present value problem. Project duration minimization is a common basic problem, and it is also the most common goal in project management. This problem is a combinatorial optimization problem under the constraints of time and resources. It needs to arrange the start and end time of all work reasonably, and the total project duration can reach the minimum. Resource balance is also a very important research objective . If the resources are limited, the progress of the whole project will be affected. Therefore, in the process of project implementation, we must reasonably arrange the work schedule so that the resources can be used in a balanced way. Otherwise, there will be fluctuations in the demand for resources, resulting in resource squeeze or conflict between supply and demand. The problem of maximum net present value (MNPV) is to establish a mathematical model with a discount rate to solve the problem. Because it is not only the main measurement index of production and R&D but also the embodiment of enterprise vitality and competitiveness.
According to the mode of project execution, it can be divided into project hypothesis of unique execution mode and project hypothesis of multiexecution mode. Unique execution mode means that all jobs in the project have only one mode that can be executed. Resources are only constrained by renewable resources, and the shortest construction period needs to be found . Multiexecution mode means that there are multiple execution modes in the project, and resources are not limited to renewable resources. Each execution mode corresponds to a different resource combination mode and execution time, and multiple optimal solutions can be found. In practical projects, multiexecution mode is usually used.
2.1.2. Resource-Constrained Project Scheduling Problem Model
Single execution mode resource-constrained project scheduling problem is also a classical resource-constrained project scheduling problem, which needs to meet the following conditions: the minimum total project duration is the optimization goal. Once started, all operations must be completed continuously, and the number of operations is fixed. Each job corresponds to an execution mode, and the mode has no preemption. There is no delay start and feedback in the logical relationship between each job. The task execution process is only constrained by renewable resources, and each resource is a certain value in any time period of the project .
The solution of multimode resource-constrained project scheduling problem is to generate the most objective scheduling scheme under the tight relationship of work and resource constraints. It is a complex production scheduling problem, which needs to meet the following conditions: the project is composed of a limited number of jobs, and each job has a time sequence relationship. Once started, each job cannot be interrupted, and the corresponding execution mode cannot be changed. After the execution mode is selected, the duration is fixed and the resource demand is fixed. Resource types include updatable and nonupdatable.
2.1.3. Solution Method
At present, the algorithms for solving resource-constrained project scheduling problem mainly include accurate algorithm, heuristic algorithm, and intelligent optimization algorithm.
The exact algorithm includes many methods, such as exhaustive method, branch and bound method, and dynamic programming method . The algorithm is practical and effective, which is suitable for small-scale problems. The exhaustive method can determine the approximate range of feasible solutions according to the conditions of the problem to be solved and verify all feasible solutions. The verified ones are the solutions, and all feasible solutions need to be obtained. Therefore, although the accuracy of this method is high, the efficiency is very low. The dynamic programming method transforms the multistage into multiple single-stage and solves one by one, which can obtain the global optimal solution and improve the efficiency. However, it lacks a unified standard model and has a large amount of calculation. The branch and bound method will select different subproblems and branch variables to branch, exclude the impossible, narrow the search scope, close to the optimal solution, high efficiency, and effectiveness, but cannot solve large-scale complex problems.
The heuristic algorithm is composed of a schedule generation mechanism and priority rule . The core of the algorithm is the schedule generation mechanism, which can expand the local plan from zero to generate a complete project schedule gradually. It can be divided into serial schedule generation mechanism with the task as variable and parallel schedule generation mechanism with time as the variable. Compared with the exact algorithm, the heuristic algorithm has great advantages in performance and efficiency and can obtain an ideal realizable scheme in a short time. However, this algorithm needs specific analysis based on specific problems and depends on the characteristics of the problem itself. In addition, it is easy to obtain only locally optimal solutions.
The intelligent optimization algorithm is the focus of this paper, and its content will be detailed in the next section.
2.2. The Intelligent Optimization Algorithm
2.2.1. The Ant Colony Algorithm
The ant colony algorithm simulates the behavior of ant colony foraging to find the shortest path and uses the characteristics of ant colony foraging. For example, the ant colony can recognize the pheromone concentration in the neighborhood and can release pheromone continuously, but the pheromone will volatilize over time. After the double bridge experiment, artificial ants were designed to simulate moving in the double bridge system and find the shortest path . Artificial ants have an effective pheromone renewal mechanism and pheromone volatilization mechanism and have the ability to evaluate and compare. Moreover, by using ants’ pathfinding ability, repeated search and premature convergence can be avoided. Artificial ants, like real ants, can cooperate with each other to complete tasks. They use pheromones as communication media, and there is the phenomenon of pheromone volatilization. They complete tasks through local behaviors based on probability decision-making, and all of them are self-organized.
The ant colony algorithm is based on ant system, and its earliest application problem is traveling salesman problem (TSP) . The TSP problem of cities is to find the shortest path through cities once. The parameters and their meanings of the basic ant colony algorithm are as follows: is the number of ants in the ant colony; is the number of ants in city at time ; is the distance between city and city ; is the visibility of edge , which remains unchanged; is the pheromone track intensity on edge , is the amount of pheromone per unit length left by ants on edge ; is the transfer probability of ants , is the current city of ants, and is the city that has not been visited. Therefore, at time , the transfer probability of ant in city and city is calculated as follows:
Among them, indicates that the cities allowed to be selected by ants in the next step will change with the process of ants . and , respectively, reflect the relative importance of information accumulated and heuristic information in the process of ant’s path selection.
The advantages of ant colony algorithm include the following: as a population-based evolutionary algorithm, it can be easier to achieve parallel computing; the algorithm has high reliability and stability, is not easy to be disturbed, and has a wide range of applications; and the algorithm has strong adaptability and can be combined with other heuristic algorithms to further improve the performance of the algorithm. In a word, the ability of the ant colony algorithm in finding the best solution is really good.
2.2.2. The Bacterial Foraging Algorithm
The bacterial foraging algorithm imitates the foraging behavior of E. coli in the human intestinal tract. In the search process, the nutritional distribution function is used to judge the advantages and disadvantages of the algorithm, and the optimal value is found through chemotaxis, aggregation, replication, and dispersion calculation. In the implementation of the algorithm, the foraging behavior of bacteria is first realized through chemotaxis and aggregation operations . Chemotaxis and aggregation of bacteria are defined as the life cycle of bacteria. At the end of the cycle, the bacteria enter the reproduction stage, and the reproduction of bacteria is completed by replication. In order to simulate the actual situation of bacteria in the environment, the algorithm requires bacteria to move to other places in the search area with a certain probability after reproduction.
The bacterial foraging algorithm needs to encode bacteria, then designs fitness function to generate initial population, and finally uses information communication between groups to optimize. The specific operation steps are as follows: initialize the parameters of the algorithm, including the execution times of chemotaxis, replication, and dispersal operations; the size of the bacterial population; and the dispersal probability in the dispersal operation. The initial position of bacteria is initialized, and the initial fitness value of bacteria is calculated, and then, it is modified by aggregation operation. The bacterial community was rotated, the bacterial position was updated, and the adaptive value was calculated. For the reproduction operation of bacterial flora, the health degree of bacteria is calculated first, and half of the bacteria with low health degree are eliminated, and then, the remaining healthy bacteria are regenerated . After the dispersal operation, the population is updated according to the dispersal probability, and the final judgment is whether it reaches the maximum iteration. If yes, it ends; otherwise, it returns to the circulation operation.
There are many factors that affect the efficiency of the bacterial foraging algorithm, especially in the selection of population size, the number of chemotaxis, replication and dispersal operations, the parameters, and dispersion probability in aggregation operation, because these parameters will directly affect the ability of the bacterial foraging algorithm. Large population size can increase population diversity and help to find the optimal solution, but it will make the calculation more complex and reduce the convergence rate. When the number of chemotaxis is less, the algorithm is more prone to the local optimal solution. Increasing the number of chemotaxis can enhance the search ability of the algorithm, but at the same time, it will increase the complexity of the algorithm. The more the number of replication and dispersal operations, the more bacteria have the chance to search in the eutrophic area, which can improve the convergence speed of the algorithm, but too many will lead to the algorithm is too complex.
2.2.3. The Genetic Algorithm
The genetic algorithm is based on the probability search of biological natural selection and natural genetic mechanism. It does not rely on function gradient information but uses some coding technology to act on string, simulates the evolution process of string formation, exchanges and reorganizes information in an organized and random way, and finally generates the optimal solution.
The search process of the genetic algorithm starts from a series of initial solutions. Each individual in the population is an optimal solution called a chromosome. In the genetic algorithm, chromosome is usually a series of data as the code of solution . The quality of chromosomes is determined by iterating through various operations. In the iteration process, the new chromosome produced by the change of genetic operator is called offspring. Some individuals were selected as the offspring of the new generation population. The size of the population is a specific constant because the chromosomes with low fitness are excluded. The genetic algorithm is a probability transformation, so it is more likely to select the chromosome with high fitness. After several iterations, the algorithm will converge to the best chromosome or better chromosome.
In the basic genetic algorithm, some individuals are copied according to the fitness ratio of the current generation. After crossover and mutation, a new generation of individuals will be formed. Individuals with high fitness are likely to be copied, and the best individuals are selected from the previous generation and, then, directly enter the next generation. This not only ensures that the coding of the optimal individual will not be destroyed but also promotes the propagation of excellent individual characteristics and the guidance of offspring.
2.3. Statistical Methods of Network Traffic
2.3.1. Network Traffic Identification and Classification
Network traffic identification and classification technology is the basis of network traffic statistics. According to the characteristics of data samples, samples are found and extracted, and they are divided into the identified categories. At present, the commonly used network traffic identification and classification methods include well-known port number, signature, and flow statistical characteristics .
Well-known port number is a public service port predefined in the registry of service name and transport layer protocol port number published by the Internet Digital Distribution Agency (IANA), and its port number is generally between 0 and 1023 . This method has high recognition accuracy, simple operation, and low complexity. However, its limitations are also obvious. The port number assignment of IANA is not oriented to all applications, and it cannot correspond with applications one by one. Sometimes some common protocols do not use the default port number for data transmission.
Signature is unique to some application protocol. In the actual application interaction environment, the attribute with the highest frequency can be used to distinguish different network applications. Signature based on application layer is identified and classified by analyzing whether the payload of data group is consistent with the signature database. By analyzing whether the payload contains regular expressions consistent with the function recognition rules, it is necessary to establish the network traffic application layer signature database in advance. Otherwise, the application will be marked as a match if it is successful. Although this method has a high classification accuracy rate, its system storage and processing costs are high, and it cannot parse the traffic of encrypted applications and may even violate the security risks of user privacy.
Identification and classification methods based on statistical characteristics of application flows are generated by various network applications, such as transport layer protocol, duration of network flow, and average number of data packets in network flow . Because the classification method based on traffic statistical characteristics needs to deal with many complex big data problems, the machine learning algorithm is introduced to solve the tenfold sum classification of network traffic. The machine learning algorithm can realize the calculation and learning of large-scale data and mine the relationship and rules between data.
2.3.2. Data Mining
Association rule mining is a classical method and an important research direction in data mining. Its essence is to find a meaningful association between data items in data set. The first step is to find frequent itemsets. Frequent itemsets are itemsets whose support is not less than the minimum support. This step is very important, because the basis of forming association rules is frequent itemsets. The second step is to generate association rules. All strong association rules are generated based on frequent itemsets with minimum confidence.
When mining frequent itemset, the Apriori algorithm is a classic algorithm, which is based on the iterative idea of layer by layer search . Firstly, scan the database, count and record the number of times of an itemset, and compare it with the minimum support. If the minimum support is greater than or equal to the minimum support, it is classified as one frequent itemset, and then, two frequent sets are used to find two frequent itemsets, and two frequent sets are used to find three frequent itemsets. The algorithm needs to scan the database for many times to generate a large number of candidate item sets, which greatly increases the storage cost.
Therefore, based on the Apriori algorithm, there are some improved algorithms. The AprioriTid algorithm reduces the amount of scanned data, thus reducing the candidate itemset and improving the efficiency of the algorithm . The FP-Tree algorithm uses the prefix tree to improve the search speed, does not need to produce a large number of candidate sets, and has advantages in space and time . The DHP algorithm uses Hash technology to reduce the number of candidate sets, thus improving the algorithm. The partition algorithm divides the database into disjoint partitions, thus reducing the number of database scans .
2.3.3. Demand Analysis of Network Traffic Statistics
Demand analysis is a very important link in the system design and implementation. The functions of the system must be defined according to the business objectives, and the business problems should be converted into technical problems. Therefore, before designing and implementing the network traffic statistical analysis system, it is necessary to analyze its requirements and define the functions and indicators required by the system.
The management system is divided into the acquisition layer, analysis layer, and presentation layer . The acquisition layer is used to collect the corresponding data of different devices. The data must be comprehensive and accurate, and it must have a corresponding database to store the data. The analysis layer is to analyze the flow data, process the corresponding data in an orderly manner, and put forward sensitive information, total price, and store information. The presentation layer feeds back the statistical results to users. The platform and interface must be friendly and can handle abnormal operations in time.
Performance is reflected in response time, stability, and friendliness. In order to improve the efficiency of the system, it is necessary to respond and feedback every request of users in time when interacting with users. The excellence of the program must be based on the stability. Once there is an unpredictable error, it must be able to ensure a certain stability. Good application interface is very important, always keep friendly interaction with users.
3. Experiments on Model Building and Algorithm Optimization
3.1.1. Objective Function
The objective function of the project is mainly divided into time class, resource class, financial class, and quality class. In this paper, based on the resource of network traffic, the objective function of resource class is considered. In the project plan, if the arrangement is unreasonable, the resource demand will be unequal during the planned construction period. At this time, the resource restriction will lead to the delay of tasks or the overstock of resources. Therefore, this paper mainly studies the objective function of resource balance. The objective function of resource balance problem is not unique but is determined by the evaluation index of resource balance. When there is only one resource in the project, the objective function can be expressed as Formula (2):
Among them, represents the total demand for resource in period of the project, and represents the average resource demand level of resource in the whole project duration. Therefore, the optimization goal of resource-constrained project scheduling is to minimize the total project duration and resource balance.
3.1.2. Mathematical Model
Suppose that there are tasks and kinds of renewable resources in the project, for task , represents the demand for resources in the , and represents the weight of the th resource. The mathematical model of resource-constrained target scheduling is as follows:
Among them, Formula (3) represents the total demand for resource at time , Formula (4) represents the average resource demand level of resource z within the project duration, Formula (5) represents the start time of the project, and Formula (6) represents the logical constraint of task objects.
3.2. Ant Colony Optimization
In order to solve the optimization problem by the ant colony algorithm, it is necessary to build a suitable algorithm, which not only reflects the characteristics of the problem but also selects the appropriate pheromone and heuristic information decision. In order to construct the path, it is necessary to choose a comfortable execution task that satisfies the immediate relationship and set a planned start time to meet the resource constraints for the work. Therefore, the ant’s March should start from the same starting point and end at the same end point. The expression object of pheromone is the construction period, which is initialized and updated, while heuristic information considers that the work with a smaller latest start time and more follow-up work is more important and should be implemented earlier.
Artificial ants start from the same starting point and schedule the next task in the earliest time period that meets the resource constraints by repeatedly applying the state transition rules until the end of the project. During the search process, pheromone is updated twice. The specific workflow chart of the algorithm is shown in Figure 1:
3.3. Simulation Analysis
The benchmark problem in PSPLIB is introduced, and the mathematical model of resource-constrained project scheduling problem and the ant colony algorithm are used to solve the problem. The data source is a project scheduling problem with 10 tasks and 1 renewable resource in PSPLIB. The network diagram of each project is shown in Figure 2:
4. Discussion of Optimization Model Algorithm
4.1. Algorithm Performance Test
After counting all the solutions and calculating the number of optimal solutions, we can get the relationship between the number of iterations and the number of optimal solutions. The results were as follows:
As shown in Figure 3, there is a peak value of the number of optimal solutions around the 30th generation, but it is not stable afterwards but continues to fluctuate, indicating that the algorithm has certain randomness. The convergence is obvious after the 75th generation, and the unique optimal solution is obtained after the 94th generation, which indicates that the optimal solution can be obtained by setting appropriate parameters in the solution method.
4.2. Project Scheduling Results
The algorithm parameters are set as follows: the initial population size is 10, and the number of iterations is 30. The optimal scheduling scheme is shown in Figure 4:
Based on the above plan, the starting time and completion time of each task are planned, and the specific arrangement is as follows.
As shown in Table 1, the total construction period of the project is 32 days, and the schedule of each task is very compact, and in the first half of the construction period, both tasks are carried out at the same time. First of all, projects No.1 and No.8 were carried out at the same time. After the completion of project No.8 on day 4, task No.2 was started after task No.1 was completed on day 6, task No.6 was started after task No.2 was completed on day 10, task No.3 was completed on day 12, and task No.9 and task No.5 were started simultaneously with task No.6. On the 16th day, after the end of task No.5, the whole project entered the operation stage of single task, and the tasks of No.4, No.7, and No.10 were carried out according to the time sequence until the whole project was completed.
4.3. Flow Statistics
The system can analyze and count the amount and frequency of sending and receiving data of PCU IP in a busy hour and, then, analyze the change trend of PCU’s received and received data in a period of time through the results. The PCU with IP 184.108.40.206 in December 10 and 11, 2019, is statistically analyzed, and the change trend chart of the PCU is obtained as follows:
As shown in Figure 5, the change trend of data volume of PCU with IP of 220.127.116.11 on December 10 and December 12 is roughly the same, but the peak time is different. The data volume at 9 : 00, 13 : 00, and 21 : 00 on December 10 is relatively large, while the data volume on 11 : 00 has been gradually rising after 9 : 00, and the peak value appears from 19 : 00 to 21 : 00. This shows that there are more users using data services in these time periods. We can pay attention to these time periods and reasonably schedule the network resources to prevent the occurrence of network failures.
With the development of mobile Internet, the statistical analysis of network traffic becomes more and more important. The effective and accurate network traffic statistics method can determine whether the equipment is likely to fail and can plan the market according to the distribution and type of data traffic. The advent of big data era makes the statistical data of network traffic more complicated. The application of the ant colony algorithm in the resource-constrained project scheduling problem can effectively obtain the optimal scheduling scheme, based on which the optimal solution of network traffic scheduling problem can be calculated.
The ant colony algorithm is established to optimize the resource-constrained project scheduling problem, which can count the usage trend of CPU network traffic of specific IP in a certain period of time, and obtain the time period of peak value, so as to carry out reasonable resource scheduling, avoid network failure, analyze the distribution of network traffic according to IP, and make overall planning for the market.
Due to the limited time and knowledge, although the ant colony algorithm is optimized for the mathematical model of resource-constrained project scheduling problem, its performance test is far from enough, and there may be some hidden dangers that have not been detected. In the next research work, we need to test it more comprehensively in order to improve the algorithm.
No data were used to support this study.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
This work was supported by The Natural Science Foundation of Gansu Province, China (Grant No. 18CX1ZA017).
P. Jędrzejowicz and E. Ratajczak-Ropel, “PLA based strategy for solving RCPSP by a team of agents,” Smart Innovation, vol. 22, no. 6, pp. 856–873, 2016.View at: Google Scholar
L. Xiaojie, Z. Hongjin, F. Honghui, and Z. Min, “Design of vision measurement device for seeding robot based on ant colony algorithm and nonlinear circuit system,” International Journal of Information Technology and Web Engineering, vol. 12, no. 3, pp. 42–50, 2018.View at: Publisher Site | Google Scholar