Abstract

The option pricing estimation of financial market can be transformed into the calculation of high-dimensional integrals. In order to obtain the actual price, option pricing system can only rely on modern numerical methods; on the other hand, the improvement of calculation methods and technologies has also shifted the focus of design and application of option pricing system from a strict closed model to a less advanced but computationally intensive calculation. High-performance computing can establish a reasonable analysis model, realistically simulate the effects of various designed performance and parameters, and provide technical support for a large number of fast numerical calculations and simulations. Deep reinforcement learning can obtain a description of empirical knowledge through the learning of historical information and use the acquired experience to deal with future problems, which can well solve classification problems, regression problems, and optimization problems. Therefore, based on summarizing and analyzing previous research results, this paper expounds the current research status and significance of the design and application of financial market option pricing systems, elaborates the development background, current status, and future challenges of high-performance computing and deep reinforcement learning, introduces the methods and principles of high-performance computing platform and deep reinforcement learning algorithm, conducts no-arbitrage pricing model design, performs risk-neutral pricing model design, discusses the design of finance market option pricing system based on high-performance computing and deep reinforcement learning, analyzes the financial market option pricing considering random interest rates, implements financial market option pricing considering transaction costs, explores the application of financial market option pricing system based on high-performance computing and deep reinforcement learning, and finally carries out empirical experiment and its result analysis. The study results show that the financial market option pricing system based on high-performance computing and deep reinforcement learning records the basic information of high-performance computer system users and option pricing information in the form of relational data in a relational database. The financial market option pricing system describes the state of the controller and the upcoming option pricing situation. The system’s initialization uses multiple arrays from a stochastic strategy to initialize option pricing and evaluation buffers, which uniformly selects an option pricing from the initial option pricing set in each cycle. The study results of this paper provide a reference for further researches on the design and application of financial market option pricing system based on high-performance computing and deep reinforcement learning.

1. Introduction

Financial options have the advantage of risk aversion in the financial market, which can provide a certain degree of security for investors' assets. The research on option pricing is an important content of current financial market research [1]. In option transactions, the party paying the premium takes the initiative, and there are two results for its option purchase, which may bring benefits or lose the premium. The design of the option pricing system in the financial market can be based on a fair price [2]. In most cases, the price of derivative financial instruments can only be expressed as an implicit solution of a partial differential equation under certain boundary and initial value conditions. Mathematically speaking, the price estimation of financial market options can be transformed into the calculation of high-dimensional integrals [3]. The high-performance computing and deep reinforcement realize the query and display function of cluster systems such as unfinished jobs, queues, nodes, cluster billing parameter configuration management function, column information release, online communication, and other functions. At the same time, they can give full play to the scientific research supporting role of the high-performance computing platform and improve the efficiency and convenience of users to use the high-performance computing platform. On the other hand, the improvement of calculation methods and the improvement of calculation technology have also shifted the focus of valuation methods from strictly closed models to methods that are not so sophisticated and computationally intensive [4].

High-performance computing is a branch of computer science, which mainly refers to the research and development of high-performance computers in terms of architecture, parallel algorithms, and software development [5]. High-performance computing can establish a reasonable analysis model, realistically simulate the performance of the designed product on the computer, and obtain the influence of various parameters; it uses numerical simulation to optimize the design, reducing research and development costs and shortening. The research and development cycle provides powerful technical means for breaking through key technologies [6]. Deep reinforcement learning is a special deep reinforcement learning method that takes environmental feedback as input and adapts to the environment. The so-called deep reinforcement learning refers to the learning from the state of the environment to the behavior mapping, so as to maximize the cumulative reward value of the behavior from the environment [7]. Mathematically speaking, the price estimation of financial market options can be transformed into the calculation of high-dimensional integrals. In order to obtain the actual price, option pricing system can only resort to modern numerical methods, such as computer simulation methods and finite difference methods. Many control and optimization problems can be converted into regression problems or optimization problems. Using deep reinforcement learning methods to solve technical problems is one of the current research hotspots in the storage field [8].

Based on summarizing and analyzing previous research results, this paper expounds the current research status and significance of the design and application of financial market option pricing systems, elaborates the development background, current status, and future challenges of high-performance computing and deep reinforcement learning, introduces the methods and principles of high-performance computing platform and deep reinforcement learning algorithm, conducts no-arbitrage pricing model design, performs risk-neutral pricing model design, discusses the design of finance market option pricing system based on high-performance computing and deep reinforcement learning, analyzes the financial market option pricing considering random interest rates, implements financial market option pricing considering transaction costs, explores the application of financial market option pricing system based on high-performance computing and deep reinforcement learning, and finally carries out empirical experiment and its result analysis. The detailed chapters are arranged as follows: Section 2 introduces the methods and principles of high-performance computing platform and deep reinforcement learning algorithm; Section 3 discusses the design of finance market option pricing system; Section 4 explores the application of financial market option pricing system; Section 5 carries out empirical experiment and its result analysis; Section 6 is the conclusion.

2. Methods and Principles

2.1. High-Performance Computing Platform

After proper initial value and boundary value conditions are given, the corresponding flow field results can be obtained by using numerical methods to solve the equations. The finite volume method is the most common in commercial software, but the calculation accuracy is low, while the finite difference method can construct a higher-precision difference format. In the coordinate system, the form of the three-dimensional conservative equation system is as follows:where Ai is the conserved flux; ai is the stock price at the time of the option expiration date; bi is the exercise price of the option; ci is the company value at the option expiration date; di is a fixed default boundary; x is the unnecessary cost related to bankruptcy.

Based on the risk-neutral measure, the system first gives a random model of the value of the underlying stock and the company. Since corporate value is closely related to the default boundary of fragile option pricing, compared with the diffusion model, the jump in corporate value may cause a sharp decline in corporate value. Assume that the underlying stock price et and the company value ft at time t satisfy the following differential equations:where Bi is the drift interest rate; Bj is the jump term, and ei is the magnitude of the jump; ej is the stock price volatility; fi is the jump intensity; fj is the stock price; ei(t) is the percentage of the company value jump; fj(t) is the average value of the jump amplitude.

Assume that risk assets obey the jump diffusion process, and stocks pay dividends randomly, and the size of the dividend is related to the moment of dividend payment by constructing the measurement transformation, looking for the appropriate equivalent martingale measurement in the incomplete market, and then obtaining the pricing formula of the episodic option:where Ci is the call option price; is the stock price on the expiry date; hi is the strike price; li is the price of the risky asset at time i; mi is the risk-free interest rate; ni is the expected rate of return.

High-performance computing logs into the parallel cluster through the main control node or login node, and it issues commands to each node based on the operating system and executes user-specified programs on the computing nodes. The calculation on each computing node is relatively independent, and the information exchange, pace coordination, and execution control between the various computing nodes are carried out through the method of message transmission. Message transmission is realized through a high-speed local area network, and the amount of calculation on the relative computing node is small and often negligible, so that a task runs on multiple computing nodes at the same time, realizing high-performance computing functions. The functional decomposition method is a method of decomposing problems composed of different functions according to their functions [9]. Each computing node calculates the existing problems, and its purpose is to solve the problems of different functions one by one, so as to obtain the solution of the whole problem. Through the network and its switches, multiple computers with independent functions in different geographical locations and their external devices can be connected, thereby effectively expanding the scale and capability of parallel computing. In order to provide the efficiency of parallel computing and reduce the communication time between computers, high-performance parallel computing generally uses optical network.

2.2. Deep Reinforcement Learning Algorithm

The utility maximization framework can be used for the option pricing problem in an incomplete market; that is, potential option buyers have a special risk attitude. More precisely, its goal is to maximize the expected utility Di of wealth at expiration time i:where oi is the combination of all available strategies; pi is the fair price at the expiration date of the option; ri is the amount of wealth transferred to the option; si is the option stock price at the expiration date.

The subfractional Brownian motion is similar in nature to the fractional Brownian motion, and it is also a zero-mean Gaussian process. It also has the characteristics of self-similar, long memory, and other fractional Brownian motion; it is mainly due to the different covariance function, and under the subfractional Brownian motion, the covariance function Ei satisfieswhere u is the expected return rate; is the average waiting time. is the average staying time of the system; y is the average queue length of the local queue; z is the average waiting length of the arrival queue.

The realization of the financial market option pricing system is a process of interactive promotion of data and models. It should have a certain learning ability and be able to continuously improve itself. Each type of task flow is an option price. According to the proprieties of the distribution, the aggregate task flow is also an option price, and its arrival rate is the sum of the arrival rates of each task flow:where Fi is the i-th task type; Gj is the j-th processing unit; Hi is the task arrival rate of the system; Ii is the scheduling service rate of the scheduler; Ji is the probability that the current arrival task type is i; Ki is the probability that the task type is i in local queue; Li is the average queue length of the arrival queue.

Deep reinforcement learning first randomly generates some neural structures as the initial population and samples a small amount of neural structures for training and testing. Then, it takes the better test results of these neural structures as the parent structure and makes mutations in these parent structures to obtain new child structures. The search strategy based on deep reinforcement learning contains three key elements: state, action, and reward. The state represents the part of the neural structure that the current option pricing system has been determined, the action represents the addition of a new layer to the current option pricing system, and the reward refers to the gain to the predetermined option pricing that each action brings [10]. Subsequently, deep reinforcement learning continues to sample some neural structures from the updated population to continue this cyclical evolution process until the preset number of cycles is reached. Since the size of the discrete search space increases exponentially with the increase of operation options in the structure, the discontinuity of the search space also limits the types of search strategies that can be used. The most direct performance evaluation method is to do a complete training of the current neural structure and test its performance on the test set, but too much training will bring huge computational overhead and make the search process very slow.

3. Design of Financial Market Option Pricing System Based on High-Performance Computing and Deep Reinforcement Learning

3.1. No-Arbitrage Pricing Model Design

The realization of the financial market option pricing system is a process of interaction between data and model. First, option pricing system must introduce data based on the rough quantitative model and management model, perform calculations, and obtain preliminary results and then compare the preliminary results with the actual situation. If they are inaccurate, and there are discrepancies with the actual situation, the system must adjust the model parameters or modification of the model, further improving the model, so that the system is continuously improved and perfected with the participation of people; that is, it should have a certain learning ability and be able to continuously improve itself. There are many real scenarios where the model is unknown, and there is no way to know the probability distribution of transitions between different states, and to turn the problem into planning or other ways to solve it; the model-based method builds a model for a specific problem. The financial market option pricing system has the characteristics of data-intensive, computing-intensive, communication-intensive, and real-time, and high-performance computing is essential. Specific solutions can be as follows: purchase a supercomputer, use a public computing network, use multiple microcomputers or servers to establish a special cluster system through the network interconnection, and use existing computers to establish an enterprise internal network. The no-arbitrage pricing model based on high-performance computing and deep reinforcement learning is shown in Figure 1.

The use of high-performance computing memory and virtual memory resources is dynamic, so the maximum physical memory and virtual memory in the task log information are used as calculation references. In addition, network bandwidth belongs to the cluster system environment and performance indicators. In the case of a company with many businesses, the assumption of a continuous process of stock price and company value may underestimate the impact of credit risk on the value of fragile options. It is necessary to use jumps to simulate rare events that occur in stock prices and company values, which will enable companies to more accurately capture fragile option prices [8]. In order to obtain the actual price, the system can only resort to modern numerical methods, such as computer simulation methods and finite difference methods. The development of modern high-performance computers provides technical support for a large number of rapid numerical calculations and simulations. Its option price guarantees the security of option transactions; only in very few cases, the established pricing model has a closed-form analytical solution.

In order to describe the law of stock price changes more accurately, the financial market option pricing system must consider the dependence of stock price volatility on stock prices, the dependence of volatility on other random variables, and possible sudden fluctuations in stock prices. In reality, it is possible for stock prices to jump. Seeking a more accurate description of the stochastic model of stock price behavior is still an important and difficult point in the research. In most cases, the price of derivative financial instruments can only be expressed as an implicit solution of a partial differential equation under certain boundary and initial value conditions [11]. Deep reinforcement learning obtains a description of empirical knowledge through the learning of historical information and uses the acquired experience to deal with future problems, which can well solve classification and regression problems. The development of high-performance computers provides technical support for a large number of rapid numerical calculations and simulations. On the other hand, the improvement of calculation methods and the improvement of calculation technology have also shifted the focus of valuation methods from strictly closed models to methods that are not so sophisticated and computationally intensive.

3.2. Risk-Neutral Pricing Model Design

The financial market option pricing system preallocates the amount of options available to users through the option allocation audit module, audits the user's option usage in real time, and uses the option usage control on the high-performance computer to determine whether the user can submit a running task. The option pricing detailed statistical query module provides high-performance computer system operating information and multilevel user and application option usage statistics and related charts. The financial market option pricing system records the basic information of users of high-performance computer systems and option pricing information in a relational database in the form of relational data. Since the system does not yet have the means to directly understand the type of pricing model to which the user runs the task, the user is required to actively describe the application model to which the task belongs (Figure 2). The system forces users to specify the mode type when submitting jobs, as a statistical basis for the amount of options used by various applications. Only jobs that pass verification can be submitted for operation; otherwise, an error message will be returned to the user. The resource usage control submodule on the high-performance computer will query the user’s remaining option pricing before each task is run to determine whether the user can obtain the resource to run. Once the remaining option price is zero, the task will not be submitted for operation [12].

As shown in Figure 3, there are many basic algorithms for deep reinforcement learning. According to whether the environment model is known, deep reinforcement learning algorithms are divided into model and model-free algorithms. The computing power of supercomputers is very powerful, but the purchase cost is also very high. Supercomputers using public computing networks can fully enjoy their powerful computing power and only need to pay a small fee. The scheduling method is based on the use of a reward function that evaluates the energy parameters specified in the energy model and the utilization function of each component [13]. The financial market option pricing system describes the state of the controller and the upcoming option pricing situation. Initialization uses multiple arrays from stochastic strategies to initialize option pricing and evaluation buffers. This strategy uniformly selects an option pricing from the initial option pricing set of each cycle. In the virtual machine placement process, it uses the best adaptation algorithm to sequentially place the virtual machines on the server with the smallest remaining resource matching. The best fit algorithm seems to be the best, but since the remaining part cut after each allocation is always the smallest, there will be many small free areas that are difficult to use in the computing node.

The financial market option pricing system usually needs to select the accounting unit to obtain the equivalent martingale measure to give the fair price of the contingent rights and then take the conditional expectation for the contingent rights under the equal martingale measure, or the value process is obtained through the hedging strategy of the contingent rights. Since the backward stochastic differential equation has a unique adaptive solution under certain conditions, the backward stochastic differential equation can be regarded as a pricing mechanism in the financial market. Based on futures contracts, options are a kind of trading options, which belong to rights rather than obligations. After the option buyer purchases an option by paying the premium, if the current market price is higher than the strike price in the agreement, the option owner can give up the option to exercise. When studying the issue of option pricing, sometimes option pricing system considers the process of the discounted price of assets and the process of discounted value of strategies. It needs to be pointed out that the definition of the optimal growth investment strategy is not used here but is obtained through a property of the optimal growth investment strategy; that is, when the market has an equivalent martingale measure, taking the growth optimal investment strategy as the mark unit of account, the process of discounting the value of investment strategy is the martingale process under objective probability measurement.

4. Application of Financial Market Option Pricing System Based on High-Performance Computing and Deep Reinforcement Learning

4.1. Financial Market Option Pricing considering Stochastic Interest Rates

The price of call options and put options in the financial market increases with the increase in the stock price and the value of the company. In addition, as the average interest rate during the window period increases, the valuation of call options in the financial market increases, while the valuation of put options decreases. The reason for this phenomenon is that when other factors do not change, if the interest rate increases, it will lead to an increase in the growth rate of the price of the underlying asset, which in turn will increase the value of the call option, and the present value of the cash flow that may be received in the future will decrease [14]. The theory proves that the former factor has a greater impact on the value of call options than the latter factor. Therefore, the higher the average interest rate during the window period, the higher the fixed value of the call option. Both of these factors reduce the value of put options. Figure 4 shows the financial market option pricing considering stochastic interest rates based on high-performance computing and deep reinforcement learning. Therefore, the higher the average interest rate during the window period, the lower the fixed value of the put option. An increase in the average interest rate during the window period will cause an increase in the value of call options in the financial market; on the contrary, it will also cause a decrease in the value of put options in the financial market. For changes in interest rates, the changes in financial market options are not obvious, and the pricing impact of market options is relatively small.

The financial market option pricing system based on high-performance computing and deep reinforcement learning has three nested loops, and each nest corresponds to a different time scale. The innermost loop model is a simulation process based on the underlying control and physical characteristics; the middle loop runs on the time scale of the option pricing cycle; the outermost loop uses the option pricing evaluation parameters to make decision updates. Decisions are determined by option pricing evaluation pairs, and the output of these option pricing evaluation pairs is the output of the deep neural network. The establishment of an enterprise internal network can make full use of existing computers, which can further reduce costs. The option pricing selection is carried out with a certain probability based on the soft-maximization model; this choice is to assign the option pricing a higher evaluation function value. When making option pricing choices, it may be doped with Gaussian noise, which makes noise participate in the output with a certain probability. A model-free algorithm can also be called model-free learning. It does not model the environment and decides the next action based on the actual environment feedback from the entire state from the beginning. In the algorithm solving process, it will order all remaining space from small to large in capacity.

The on-chip network in high-performance computing has become a hot and difficult problem in the research of microprocessor architecture. The on-chip network is the path for the processor computing core to access the storage components, and it is also the basis for the collaborative work of multiple computing cores. The performance of the on-chip network directly determines system performance. Unlike off-chip networks, on-chip networks compete with computing cores and storage components for valuable chip area and power consumption resources [15]. Due to the good compromise between good energy efficiency ratio, universal programmability, and high acceleration performance, heterogeneous architectures have become popular research problems. In order to solve the task scheduling, data sharing, communication bandwidth, and delay issues between processors of different structures, the heterogeneous fusion high-performance design is an ideal technical solution. Although it has good energy efficiency ratio, acceleration performance, and programmability, once the heterogeneous fusion processor chip is finalized and mass-produced, the proportional relationship between its general computing part and acceleration part cannot be adjusted adaptively according to application requirements. The design of separate acceleration devices has good configuration flexibility and can dynamically adjust the ratio of general computing and acceleration parts during system construction and use according to application requirements to adapt to application requirements. Therefore, in the actual high-performance computing system construction project in terms of adapting to application requirements flexibility, etc., it is more inclined to separate accelerator design.

4.2. Financial Market Option Pricing considering Transaction Costs

High-performance financial market option pricing includes two parts: high-performance computer hardware infrastructure and resource application and management soft environment. The high-performance computer is the hardware support platform for numerical mode operation, and the resource application and management software environment is the software service environment for numerical mode operation. Option pricing in high-performance financial markets is an indispensable technical support for the development of numerical models. As numerical models are more precise in terms of timeliness and scope and more refined in terms of temporal and spatial resolution, the demand for option pricing capabilities is increasing, and the quality demand of option pricing services in the financial market is increasing. The goal of option pricing in financial markets has shifted from focusing solely on capacity building to a stage where both capacity and service quality are equally important (Figure 5). The magnitude of the decline is equivalent to the amount of dividends paid by the stock, so the size of the dividend can be regarded as the decrease in the stock price caused by the payment of the dividend on the ex-rights day. The jump diffusion process can be considered as an ordinary diffusion process plus a process that jumps at any time. How to scientifically and accurately evaluate and analyze the application benefits of option pricing in the active financial market can not only promote the optimal use of active resources, but also have important effects and significance for future resource construction [16].

The advancement of computers from single-core to multicore is due to the limitations of single processors. First of all, the traditional method of increasing the frequency is difficult to achieve a breakthrough in performance. At present, the main frequency of the processor has almost reached the limit of the processor technology, and with the increase of the main frequency, the system power consumption continues to rise, becoming a single-core processor. Secondly, for single-core and dual-core processors with the same main frequency, the waiting time of single-core processors is many times that of dual-core processors when processing the same amount of tasks, which restricts the computing speed of high-performance computing. Again, the advancement of multicore technology has brought powerful parallel computing capabilities, and as multicore technology matures, its superior cost performance is unmatched by single-core processors. The establishment of a special cluster system can obtain computing power similar to that of a supercomputer, while the purchase cost is greatly reduced [17]. The management department of a financial enterprise has very important business secrets about the business data, quantitative models adopted, and model parameters used. Generally, they will not be provided to outsiders, and confidentiality is the first priority. Therefore, large financial institutions may choose to purchase supercomputers, while small- and medium-sized financial institutions may choose to build a dedicated cluster system.

In the financial market option pricing system, the strategy is the interactive experience of learning from one's own actions, and the separation strategy is the interactive experience of learning from the actions of others. The advantage of the strategy algorithm is that it is simple and efficient, directly using experience to optimize the corresponding strategy, but this kind of processing cannot handle the balance between exploration and utilization, and it is easy to lead to a local optimization of strategy learning. The separation strategy separates the action strategy and the update strategy, which is conducive to the global search, and it is easy to find the global optimum, but the experience learned from the action strategy is not necessarily applicable to the update strategy. The calculation process of the coupled estimator and the dual estimator are exactly the same, and both use the cross-estimation method. The only difference is that the two estimators in the coupled estimator are related to each other. The coupled estimator integrates the maximum estimator and the dual estimator together and uses them as special cases in the two extreme cases. The coupling estimator uses adaptive coefficients to trade-off between the maximum estimator and the dual estimator, thereby reducing the deviation. In the time series difference method, if the state of the agent changes, a new round of timely update of the value function is carried out according to the obtained return value and the estimated value of the last value function; it is called a one-step time series difference algorithm.

5. Empirical Experiment and Result Analysis

5.1. Empirical Experiment Design

In this paper, stocks are used as the underlying assets, and the historical data of typical individual stocks are extracted online, and the variation rates of different periods and development stages are analyzed, and then different models are used for calculation. The core principle of option pricing theory is based on no-arbitrage equilibrium analysis to price options by constructing an equivalent asset portfolio consisting of the underlying asset and risk-free loan assets. At present, the main pricing models can be divided into two categories, namely, analytical methods and numerical analysis methods. Among them, the numerical analysis method is easy to understand and use. The specific analysis models include binary tree, trigeminal number, and finite difference. Among the factors that affect the price of options, the acquisition of the rate of change of the underlying asset is one of the key factors that affect the availability of the model's calculation results. The current method of obtaining the rate of change is mainly to extract historical data from the price development of the underlying asset. One method is to select a certain period of historical data that is the same as the execution time of the option to extract the variance. The premise of this calculation is to assume that the history will reappear, so the availability of the calculation result is limited. The other method is to select a certain period of time that is considered to be representative for analysis and obtain the variance. This requires rich historical data as a prerequisite and provides a visual development trend chart for decision makers. Figure 6 gives the relationships between jump amplitude and node number in no-arbitrage pricing, risk-neutral pricing, stochastic interest pricing, and transaction cost pricing.

With the option analysis framework, the project investment manager does not have to unfoundedly increase the future cash flow of the project in order to make the project pass the feasibility demonstration. The system can use the option pricing method to connect the future prospects of strategic investment projects with current investments and explain from top to bottom what value creation opportunities are unique to the company and what risks can be avoided [18]. Although real options are derived from financial options, they have different emphases. Financial options mainly focus on valuation issues, while real options focus on decision analysis and optimization issues. Although decision-making and optimization issues also need to involve valuation issues, in order to make correct decisions, there must first be a set of correct thinking and behavior. In actual trading behavior, investors usually get a certain stock dividend and also pay a certain transaction fee. Assuming that, during the validity period of the option, the dividend rate paid can be accurately predicted, and the price of the stock will be on the day when the dividend is paid. The magnitude of the decline is equivalent to the amount of dividends paid by the stock, so the size of the dividend can be regarded as the decrease in the stock price caused by the payment of the dividend on the ex-rights day.

5.2. Result Analysis

The concept of no arbitrage is relative to arbitrage. Arbitrage means that investors can get extra rewards without taking any risks. Therefore, no-arbitrage means that when there is no risk, there is no opportunity for arbitrage. The financial market also conforms to the investment law; that is, there is a direct proportional relationship between risk and investment income. Higher returns mean greater risks, and lower returns mean lower risks. Volatility describes a deviation from the mean, and jump refers to the phenomenon that a certain point in time is seriously deviated from the original price [19]. The so-called jump diffusion process can be considered as an ordinary diffusion process plus a process that jumps at any time. Obviously, this kind of process can better reflect the path of price changes in reality. Because stock prices do not change continuously, there are such jumping changes, which can increase or fall sharply in an instant. In actual operation, the key to achieving this kind of no arbitrage is to use the replication strategy, that is, to use a combination of several financial products to copy a certain financial product, and the cash flow generated by the copied financial product in the future and the cash flow of the financial product combination is the same (Figure 7).

Different learning rates have a certain impact on the convergence rate. The study of the learning rate's impact on convergence is mainly done by controlling the step size of the gradient descent. Because of this characteristic, the information entropy tool can be used to estimate the weight of a single target among multiple targets, thereby providing a basis for multivariate comprehensive evaluation. Before designing the reward principle, you should know that the virtual machine has only two states, busy and idle. It can also be determined that if the virtual machine is idle, then the state is idle rather than busy. Obviously, a successfully trained neural network should have the ability to evaluate positive conditions. In addition, if two objective functions are added to the allocation problem, such as when the task arrives in the task queue, the execution length is different. It can be imagined that the feedback of each virtual machine is related to the length of the task arrived and its processing speed. Then, each virtual machine in the feedback value is almost impossible to calculate. The model has been trained for a long time, and its prediction accuracy and prediction completeness are very high. A trained model can be used on different platforms with strong adaptability. The advantage of this method is that it can adjust the strategy in real time according to the changes of the environment. It does not need to wait until the model has been trained before using it, and it can make real-time decision-making and learning [20].

From a mathematical point of view, the deep reinforcement learning problem can be summarized as an option pricing decision process problem; that is, the decision maker periodically or continuously observes the stochastic dynamic system with option pricing and makes decisions in an orderly and coherent manner. The difficulty of applying deep reinforcement learning is how to model option pricing for existing problems, because many problems cannot be used for natural option pricing modeling [21]. After the combination of deep learning and deep reinforcement learning value methods, it takes advantage of deep neural networks to mainly solve the problem of option pricing in large-scale financial markets, which is suitable for discrete actions. In this process, the goal of the model is to recommend as many items as possible that users are interested in, and the task can be simplified to the problem of maximizing the average cumulative return. The reward function is a kind of feedback given after the environment receives the agent's actions according to the current state. This feedback indicates whether the agent succeeded or failed in processing the current state. It should be noted that, according to different definitions, the reward may be an immediate return, or it may be delayed. A strategy is a mapping of how the agent chooses an action in the face of the current state. Generally speaking, the strategy will bring the greatest immediate return or long-term benefit according to the action.

6. Conclusions

This paper conducts no-arbitrage pricing model design, performs risk-neutral pricing model design, discusses the design of finance market option pricing system based on high-performance computing and deep reinforcement learning, analyzes the financial market option pricing considering random interest rates, implements financial market option pricing considering transaction costs, explores the application of financial market option pricing system based on high-performance computing and deep reinforcement learning, and finally carries out empirical experiment and its result analysis. Deep reinforcement learning continues to sample some neural structures from the updated population to continue this cyclical evolution process until the preset number of cycles is reached. The goal of the model is to recommend as many items as possible that users are interested in, and the task can be simplified to the problem of maximizing the average cumulative return. The reward function is a kind of feedback given after the environment receives the agent's actions according to the current state. The use of high-performance computing memory and virtual memory resources is dynamic, so the maximum physical memory and virtual memory in the task log information are used as calculation references. The research results show that the financial market option pricing system based on high-performance computing and deep reinforcement learning records the basic information of high-performance computer system users and option pricing information in the form of relational data in a relational database. As numerical models are more precise in terms of timeliness and scope, the demand for the quality of option pricing services in the financial market is increasing. The system also realizes the query and display function of cluster systems such as unfinished jobs, queues, nodes, cluster billing parameter configuration management function, online communication, and other functions, which is conducive to promoting the openness and transparency of the high-performance computing platform to management and the development of informatization. The study results of this paper provide a reference for further researches on the design and application of financial market option pricing system based on high-performance computing and deep reinforcement learning.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.