Abstract

Cancer therapy optimization is an issue that can be solved using the control engineering approach. An optimal therapy generation algorithm is presented and tested using a tractable mouse model of breast cancer. The optimized therapeutic protocol is calculated in a closed-loop manner at fixed time instants, twice in a week. The controller consists of a nonlinear model predictive controller which uses the state estimation of a moving horizon estimator. The estimator also computes parameter estimates of the prediction model such that the time varying nature of tumor evolution can be captured. Results show that remission can be induced in a 28-day interval using the algorithm.

1. Introduction

Control of physiological systems is often challenging due to significant inter and intrapatient variability, large time constants of the controlled processes, and limitations in measurement and control input characteristics. Optimization of cancer chemotherapy is no exception as it has yet been an unsolved problem. There are a number of potential benefits of a robust optimization algorithm that can individualize treatment strategy. For example, by minimizing the dose of the drug, the side effects can be mitigated in theory. Various schemes have been proposed in the literature, which aim to optimize the time instants of the treatment, the amount of drug administered to the patient, or even both [1, 2]. Some recent results involve impulsive systems, reinforcement learning, or other model-based approaches [36]. The proper mathematical treatment of the problem often involves impulsive control actions, which has been researched extensively [7, 8] and has vast importance in cases where the drug is administered by injections. A closely related issue in chemotherapy is that drug resistance can occur during the therapy whose mechanism is not fully understood in the present. Optimization of the therapies could provide a partial remedy in this matter [9]. A specific approach is metronomic therapy, which utilizes small doses more frequently, as opposed to conventional therapeutic schemes.

Our goal is to implement metronomic chemotherapy where we optimize the amount of drug that is given at fixed time instants. Specifically, we utilize a nonlinear model predictive controller (NMPC) which uses a moving horizon estimator (MHE) as a state observer and parameter estimator. Our method is similar to [10], which uses the same control structure in a discrete-time setting. Both the NMPC and the MHE use a simple model for prediction and estimation, with four state variables, developed in [11]. Parameters of the model were identified using the stochastic approximation expectation-maximization (SAEM) algorithm on experimental data, which showed that the model can describe the primary mechanisms of the tumor behavior. Nevertheless, mutations in the tumor during its evolution lead to changes in the model parameters, which must be accounted for. Using the MHE developed in [12], the problem can be effectively tackled and intrapatient variability can be handled. It was shown that the MHE can estimate a subset of the model parameters such that the estimated tumor volume remains in the vicinity of the experimental data.

In separate work, the NMPC was designed using direct multiple shooting and impulsive input action in the prediction model [13]. The impulsive action was realized using a continuous approximation of Dirac impulses leading to continuously differentiable trajectories of the system. While the algorithm was effective in reducing the tumors in silico, numerical errors were still present in the computations.

Our current goal is to connect the two algorithms and investigate their behavior in in vivo mice experiments to prove the effectiveness of the scheme. We assume fixed therapy dates on which we optimize the amount of drug for each subject. The duration of the treatments during the experiments was 28 days in each case, which was enough to induce partial remission in mice. The treatment period was preceded by a period where standard therapy was applied and data were gathered for parametric identification that was used for personalization of the therapy generation. We also assumed that there is a measurement on each administration day so that the fresh state estimations can be supplied to the NMPC.

The main contribution of our work is that we show the viability of model-based metronomic therapy in in vivo using mice experiments. Since the vast majority of current research in model-based chemotherapy optimization mostly demonstrates in silico results [2, 3, 10], an experimental validation showing the feasibility of the approach can bring more attention to this field from other researchers.

In Section 2, we discuss the applied tumor model and the control algorithm including the parameter and state estimation. We also discuss the details of the animal experiment. In Section 3, we show the experimental results and compare them with numerical studies. We also compare them with conventional therapies and show the differences. In Section 4, we summarize our results and also discuss future directions of research.

2. Control Algorithm

2.1. Model

The main component of the control algorithm is the model which we use to forecast the tumor evolution in the NMPC and to correct the measurements in the MHE. In previous numerical studies, we used a model whose states were the living and dead tumor volumes and the drug concentration in the blood [14]. Research showed that by augmenting the model with a fourth state that relates to the drug concentration in the peripheral compartment, the model can describe the pharmacokinetics of the drug more accurately [11]. Hence, here we use the following tumor growth model described by the differential equations.where is the time function of the living tumor volume , is the time function of the dead tumor volume , is the time function of the drug level in the central compartment (mg/kg), and is time function of the drug level in the peripheral compartment (mg/kg). The input is the time function of the injection rate (mg/(kg day)), while the output is given as the total tumor volume. The parameters of the model and their physiological meaning are described in Table 1. The model is solved in the algorithm using the ode45 routine of MATLAB with default settings.

The first possibility is to reintroduce the term in the context of an impulsive control system. Let us denote the time of administrations with with , which restrict the solutions of the differential equations of the model to be defined on the intervals . The dosing is defined by the rulewhere is the amount of drug injected into the subject at time and the impulsive effects are in conjunction with . Equation (2) means that the system is simulated from to with the initial condition, that is, the endpoint of the previous simulation, modified with the input value .

Another approach to model impulsive action is to redefine the input in equation (1) such thatwhere controls the approximation, is the shifting term, is responsible for the scaling, and is the starting time of the impulse. This is a smooth approximation of the Dirac delta distribution where the smoothness property could be favorable in the case of using gradient-based optimizers where the cost function is based on the differential equation model. This means that the formalism essentially places a Dirac impulse in the beginning of each interval . Since the integral of (3) with is one, we can interpret as the amount of drug that is injected into the patient. In order to determine the approximation parameters, we can take into account that one administration takes approximately 15 seconds in mice which yields the choice (day).

2.2. Moving Horizon Estimation

An MHE was designed to provide parameter and state estimations of the process. In model equation (1), the parameters are assumed to be constant, which is not a realistic assumption physiologically, due to intrapatient variability. This entails that during the identification, we get a single set of parameters for each measurement time-series data, as can be seen in [11]. However, by utilizing an MHE, a subset of the parameters can be estimated and updated online, thus obtaining better predictions in the NMPC as opposed to using the fixed parameters. In [12], it was determined that the subset of the original parameters in equation (1) can be identified online with great accuracy, hence we estimate these parameters with the MHE. The identification problem at each measurement time instant can be posed aswhere is the length of the horizon, is a scalar tuning parameter, and is a weighting term where is the measured tumor volume with noise. The term is the estimation error, where is the estimated volume, that is, the output of model (1), obtained from the solution at the measurement time instants . The second term penalizes the deviation of the estimated parameters using the error , where is the -th element of the nominal parameter vector , and is the parameter vector which we optimize. The computation of the nominal parameter vector is detailed in Section 3.1. In the MHE, the model is solved in the impulsive sense, such that the input history is represented in the model using rule (2) instead of (3).

The lower and upper bounds of the parameters were determined to be and , respectively. By using the experimental time series in [15], the parameters of the MHE were determined to be and based on in silico experiments. The reason for the high gain is to track the tumor growth in a delay-free manner, which is important since the tumors can grow aggressively in a number of cases (meaning hundreds of in a week). Once the optimal set of parameters is found, we can use the simulation results to obtain the state estimate as for the NMPC. During the experiment, we used the fmincon routine of MATLAB to solve problem (4) using the standard settings of the solver.

2.3. Model Predictive Controller

In order to optimize the treatment protocols, we use an NMPC to calculate the optimal dosage. We assume that the measurements and the dosages are on the same day, and thus the optimization is performed on each . We also use a direct multiple shooting (DMS) formulation of the optimization problem which can boost the speed of the computation by integrating the model in parallel during the prediction [16]. In contrast to the MHE, where we only have four parameters to optimize, we have potentially a higher number of optimized controls in the NMPC case which is important to ensure the stability of the scheme so that computational speed is not negligible. We discretize the controls on the full prediction horizon such that for each subinterval , a constant is assigned which will be the subject of the optimization. For each constant, we assign model (1) with input definition (3) such thatwhere is an artificial initial value assigned to the starting point of each interval. We define the cost function to be the usual quadratic penalization of the predicted tumor volume and the input action aswhere is a scalar control parameter. The output is defined by the solution of equation (5) (using the estimated parameters ), denoted by , according to equation (5). The constants are the reference tumor volume, the measured tumor volume at the beginning of the treatment (at ), and the maximum injection rate, respectively. Since the scaling of the variables and differs significantly, the normalization term and can improve the conditioning of the functional. The optimization problem at time can be stated aswhere and are the vectors of optimized states and controls. Each element of the control vector is constrained to be positive as which yields the uniform bound (where is an -dimensional column vector containing 1 at each entry). The control and prediction horizon is the same in this algorithm and is denoted by . One can see that the estimation of the MHE explicitly appears in the first constraint. Since keeping the continuity of the state vector is essential for the DMS approach, as the second set of constraints indicates, the use of function (3) is thus justified.

Since each can be interpreted as the amount of drug given to the patient at each administration time instant, their value was limited with an upper bound that was chosen to be 8 (mg/kg). This value corresponds to the maximal tolerable dosage (MTD) of pegylated liposomal doxorubicin [15] that is administered to the mice during the experiments, discussed in Section 3. Using this upper bound, we can determine the normalization factor in equation (6) by computing the maximal value of function (3). Hence, the maximal injection rate is corresponding to the previous choice of for a 15-second injection.

In practice, problem (7) is solved by combining the optimized variables into a single vector as . For the initialization of , we used the MHE estimation such that with 0.1 as initial administration for each optimized control. For the prediction horizon, we used in previous works [13]; however, in order to improve the stability of the scheme, we choose in this article. The prediction intervals were set according to the time between each injection during the experiment. The motivation behind our choice is that more frequent administrations would stress the tail vein of the mice excessively, which must be avoided completely [17]. We use the fmincon routine here as well to solve problem (7) with default settings, except the MaxFunctionEvaluations which was modified to 5000 due to slow convergence. We denote the optimal solution of equation (7) by , from which the first control is injected into each subject at time , according to the NMPC principle.

3. Experimental Validation

We validated the proposed algorithm with mice experiments which we detail here. We describe the experiment timeline and the choice of parameters and show the time series of the measurements. We discuss the limitations of the experiment in conjunction with analysis of the results.

All animal housing and breeding processes and experimental protocols were approved by the Hungarian Animal Health and Animal Welfare Directorate according to the EU’s most recent directives. All surgical and treatment procedures were performed according to the Committee on the Care and Use of Laboratory Animals of the Council on Animal Care at the Institute of Enzymology, Research Centre for Natural Sciences in Budapest, Hungary (001/2574–6/2015).

3.1. Experimental Setup

Optimized treatment protocols were tested on a clinically relevant, genetically engineered mouse model of breast cancer. In this model, Brca1, a DNA repair gene, and p53, a regulator of cell cycle and genome stability, were knocked out in breast epithelial cells. The resulting mammary tumors highly resemble the Brca1-linked, triple-negative, hereditary breast cancer in humans: the molecular, immunohistochemical, morphological, and genetic characteristics are almost indistinguishable from its human counterpart [18]. Moreover, these tumors respond to chemotherapy very similarly, as initial treatment with doxorubicin, docetaxel, or cisplatin significantly reduces tumor size and induces remission. Nevertheless, long-term therapy often fails due to the emergence of drug resistance [19, 20], and most of the novel therapeutic approaches to tackle it are in the early developmental phase [2123]. Although we showed previously that pegylated liposomal doxorubicin (PLD) increases relapse-free and overall survival by 6 and 3-fold, respectively, these tumors cannot be cured using conventional chemotherapy regimens [15]. Findings obtained by using this model are frequently translated to human cancer clinic due to its similarity to human breast cancer.

Using the aforementioned tumor model, we optimized therapy protocols for 6 mice receiving PLD treatment during our experiment. After the measured tumor volumes have exceeded 200 , the mice received a single dose of either 4 mg/kg or 6 mg/kg of PLD. The goal of this initial administration was to observe the controlled tumor dynamics so that the parameters of model (1) can be identified with the SAEM algorithm more precisely as opposed to uncontrolled growth. After the identification phase, we waited for the tumors to enter into a relapse phase, and then we initiated a 30-day-long therapy using the algorithm. The reason why we started the treatment at the same time for each mice is to reduce the logistic expenses of the experiment and to cover a wide range of initial tumor sizes at the beginning of the therapies. After 28 days, we stopped the treatment so that the observed results can be compared on the same time scales. During the 30 days, the tumor in one of the mice was in complete remission, so we omitted it from the discussion here. We denote the remaining five mice with identifiers S1–S5. To obtain volume estimates, we used calipers to measure the width and length of the tumor approximated the tumor volume [24] as

Mice S1–S3 received a single dose of 4 mg/kg PLD on days 2, 3, and 4, respectively. Mice S4 and S5 received a higher dose of 6 mg/kg on days 3 and 8. At day 32, we identified the parameters of model (1) using the SAEM algorithm [11, 14], and the results can be seen in Table 2. Note that since the mice are genetically identical and the source of the tumor is the same and the applied drug is the same, the parameters have similar values. We used these parameters to create the nominal parameter vector p in the MHE. We computed the estimated states for each measurement from day 0 to 32 and used this estimation to initialize the NMPC. Weights of the NMPC were set by varying until we got a stable closed-loop simulation response where the first dose is smaller than the MTD, which is 8 mg/kg for PLD [15]. The value for mice S1–S5 was found to be , and , respectively. The reference volume to be tracked was because the model cannot be steered to zero states, and was the tumor volume measurement on day 32. During the 28-day treatment period, there were 9 fixed administration times for each mice in total, which was weekly injected on Mondays and Thursdays, such that we can comply with the rule that there must be at least 3 days between the injections.

3.2. Limitations

During the experiment, the value of was increased by an order of magnitude in the case of S1, S4, and S5. This is due to the effect of cumulative toxicity, which is related to the total administered drug in the past which we did not account for during the optimization. This means that in the case of S1, we set on day 35. For S4, we changed to on day 38 and on day 56. Finally, the value of S5 was modified to on day 35. We also want to point out that the parameter estimation did not capture the process accurately in the case of S4 and S5. One can see in Table 2 that the obtained parameters have very low values, which directly indicates that the tumor can be effectively treated with small doses. Excluding the MHE and generating an open-loop therapy for example lead to doses in the (mg/kg) regime, which is insufficient for the treatment of the tumor. Nevertheless, using an MHE could significantly alleviate this problem during a closed-loop control, as the results indicate a partial remission in each mice. The last potential issue with the mentioned experiment could be the use of an initial identification phase. It would be beneficial to observe the behavior of the algorithm by itself, using a preidentified average patient parameter set as the nominal parameter vector . By doing so, the MHE could solely individualize the treatment iteratively with each new measurement.

3.3. Results

Results of the experiment can be seen in Figures 15. The first plot depicts the measured tumor volumes (red crosses) linearly interpolated (blue lines) in conjunction with the simulation of the model for the identified parameters and the calculated input (purple line) and the estimation results of the MHE (black dots). The plot in the middle contains the applied dosages (green), and the last plot shows the change in the mass of the mouse (yellow). One can see qualitatively that in each case, the controllers were able to reduce the size of the tumors. We have gathered several statistics of the time series in Table 3. The first column shows the total amount of administered drug during the 30-day period which we denoted by . The second column shows the difference between the starting tumor volume and the end volume . is a similar measure which indicates the deviation between the maximum volume and the end volume, i.e., , and these values are also shown in percentage as .

Since the time series is corrupted by significant noise due to the nature of caliper measurements, we regressed the time series with a second-order polynomial to extract a more accurate measure of the volume decrease. Hence, the fifth column represents the difference between the maximum of the regression polynomial and the endpoint of it , and the same data are presented in percentages as well. Furthermore, we have also calculated a simple linear correlation coefficient between the first difference of the measured volume and the measured mass of the mice which can be seen in the seventh column .

In previous experiments with PLD, the therapy protocol was initiated at with 8 mg/kg MTD given in every 10 days until the tumor volume has reached 50 percent of its original volume [15]. If we assume that a single dose of 8 mg/kg is effective, then comparing with this value indicates that our approach did not use less drug than in a conventional case in general. However, this can be attributed to the fact that in this study, the treatments were initiated at tumor volumes other than . As one can see, in the case of S2 and S4, the total doses were smaller than the MTD which reinforces such a claim. It can also be seen both qualitatively and quantitatively that the generated protocols can induce remission in the subjects, invariant to the size of the tumor on the first day of the treatment. shows that in almost every case, we have a significant reduction in the size of the tumor as well, which is reinforced by (%) (where a larger value indicates better response to the therapy).

Also, there is a slight correlation between the tumor volume and the mass of the mouse. By inspection, one can see similar patterns in both time series (a drop between days 42–46 in the case of S2 for example). By taking the first difference of the daily interpolated time series and computing its correlation coefficients, we obtained the values in the last column in Table 3. While the obtained results only show a strong relationship in S5, a more elaborate statistical analysis might reveal additional correspondence; however, it is out of the scope of the current paper.

4. Conclusion

We have presented a control-theoretic approach of optimal chemotherapy protocol generation which we have validated with animal experiments. The combined MHE-NMPC approach was able to shrink the size of the tumor by using a similar amount of drug as in the case of conventional therapy. A limitation of the experiment was the change in the control parameters during the therapy generation. This entails that in the future, a constraint on the cumulative toxicity must be introduced in the optimization problem, and the bounds on the maximum admissible drug should be also lowered. The estimation of the model parameters could also be improved so that the model has better predictive capabilities. The best solution would be to identify a reliable population average, instead of individualized parameter sets for each mouse, which then could be improved with the MHE. Another issue is the relatively small number of mice used during the experiment, which is inadequate for the full validation of the scheme. Nevertheless, as a proof of concept, the results showed that the approach has potential, and the experiment also revealed the flaws of the algorithm which must be corrected in a future version.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 Research and Innovation Program (grant agreement no. 679681). Project no. 2019-1.3.1-KK-2019-00007 has been implemented with the support provided from the National Research, Development, and Innovation Fund of Hungary, financed under the 2019-1.3.1-KK funding scheme. Bence Czakó and Máté Siket were supported by the ÚNKP-20-3 New National Excellence Program of the Ministry for Innovation and Technology from the source of the National Research, Development, and Innovation Fund and the Eötvös Loránd Research Network Secretariat under grant agreement no. ELKH KÖ-40/2020 (Development of cyber-medical systems based on AI and hybrid cloud methods). The current work was also supported by the Collaboration between the Research Centre for Natural Sciences of the Eötvös Lóránd Research Network and the Szentágothai Research Centre of University of Pécs on internationally recognized medical research projects.