Hyperthermia in oncology still remains an experimental treatment with no realistic future in clinical cancer therapy, though declaration of the undisputed efficacy of hyperthermia is a common place in every hyperthermia paper. We have studied the available randomized trials on hyperthermia from the position of “null hypothesis” to confirm or refuse the efficacy and safety of clinical hyperthermia, taking into account also the possible biases. Unfortunately, careful analysis of 14 randomized clinical trials has not confirmed a clinical benefit of hyperthermia independently of its type: superficial, deep or whole-body. We have not found any positive trial not affected with biases. With correction to the distortions, there is no trial with obvious long-term positive effect of hyperthermia. Effect of hyperthermia could be shown in experimentally designed clinical trial or versus inadequate comparator. In clinical setting and provided that study design is correct, hyperthermia is not effective at all or not effective enough to justify its obvious disadvantages: toxicity and labor intensity. Thermal concept of hyperthermia seems to be irrelevant. Nevertheless, multiple publications of positive trials, reviews, and meta-analyses create an impression of hyperthermia renaissance.

1. Introduction

Modern hyperthermia had started from the first paper on local hyperthermia of Westermark [1] published at 1898, more than 110 years ago. Eighty years ago, in the early 30s, electromagnetic hyperthermia had started with Whitney Radiotherm. Fifty years ago, studies of Selawry and Crile launched the modern period of hyperthermia history, and almost 40 years have already passed since von Ardenn and LeVeen introduced local electromagnetic hyperthermia. Regardless of the starting point, hyperthermia is one of the oldest known treatment modalities in oncology.

In 2007, Horsman and Overgaard [2] started their meta-analysis with the words: “Hyperthermia is generally regarded as an experimental treatment with no realistic future in clinical cancer therapy …,” and then added “… This is totally wrong.” Thus, the eminent hyperthermicians had voiced the general opinion of medical community on hyperthermia. This opinion had been articulated by Hornback [3] already in 1987 when he wrote “Clinical hyperthermia today is a time-consuming procedure, done with relatively crude tools, and is an inexact treatment method that has many inherent technical problems. Certainly, excellent research work can be accomplished by private radiation oncologists working in the community. If the individual is willing to commit the time and effort required to participate in clinical studies in this interesting, challenging, exasperating, not too scientific field; then he or she should be encouraged to do so. The field is not without its risks and disappointments, but many cancer patients with recurrent or advanced cancers that are refractory to standard methods of medical care can unquestionably be helped by hyperthermia. It is not, as some have suggested, the fourth major method of treating cancer after surgery, radiation and chemotherapy. It may be innovative, but it still is an experimental form of therapy about which we have much to learn.

Now, in 2010s, clinical hyperthermia is still a time-consuming procedure, done with relatively crude tools, and is an inexact treatment method that has many inherent technical problems; it is an interesting, challenging, exasperating, not too scientific field; it’s already far not innovative, but is still an experimental form of therapy about which we have much to learn. If nothing has changed for 25 years, something is wrong with hyperthermia.

Horsman and Overgaard [2] wrote then “Although the role of hyperthermia alone as a cancer treatment may be limited, there is extensive preclinical data showing that in combination with radiation it is one of the most effective radiation sensitizers known. Moreover, there are a number of large randomized clinical trials in a variety of tumor types that clearly show the potential of hyperthermia to significantly improve both local tumor control and survival after radiation therapy, without a significant increase in side-effects.” A simple question: if this is true, why hyperthermia is still not a standard treatment in oncology?

To answer this question, we have studied all the available randomized clinical trials on hyperthermia published after 1990. We did not include nonrandomized clinical trials taking into account a well-known fact that such trials usually show much higher effect. It was clearly demonstrated, for instance, in the famous RTOG trial on thermoradiotherapy of superficial tumors when 68% complete response rate was reported in phase I/II non-randomized trial [4] and only 32% in subsequent phase III randomized trial [5]. Editorial of Brizel [6] clearly shows inconsistency of such non-randomized trials.

We have reviewed 14 randomized clinical trials: 7 on superficial local hyperthermia (see Table 1), 6 on deep loco-regional hyperthermia (see Table 6), and 1 on whole-body hyperthermia (see Table 14). We proceeded them with the position of “null hypothesis,” that is, considering hyperthermia not effective and/or not safe. From this point of view, we analyzed trials for (1) efficacy by endpoints, (2) toxicity, and (3) biases. With the “null hypothesis,” negative trial result does not need explanation. Therefore, only positive trials were subject for the analysis.

2. Superficial Hyperthermia Clinical Trials

Clinical trial of Perez et al. [5] (RTOG protocol 8104) published in 1991 compared thermoradiotherapy (TRT) with radiotherapy only (RT) in well-designed and large (307 patients with tumors of chest wall, neck nodes and melanoma) randomized trial sponsored by Radiation Therapy Oncology Group (RTOG). Complete local response (CLR) was reached in 32% of patients in TRT arm and in 30% of RT arm; the difference was statistically insignificant. There was no effect to overall survival. Despite the stronger thermal enhancement of RT in tumors <3 cm, the general result was disappointing.

Three clinical trials with similar design were published nearly simultaneously in 1990–1993, comparing efficacy of different TRT protocols: Kapp et al. [7] compared effect of 2 and 6 hyperthermia sessions; Emami et al. [8] and Engin et al. [9] compared effect of 4 and 8 sessions (see Table 1). The difference between “short” and “long” protocols was negligible, and Engin et al. even showed lower efficacy of “long” protocol: CLR was 55% in 8 sessions arm and 59% in 4 sessions arm (not significant difference).

In 1996, International Collaboration Hyperthermia Group (ICHG) trial (Vernon et al. [10]) was published showing significantly better CLR rate for TRT arm (59%) than for RT only arm (41%) without effect to survival. Unfortunately, despite the big enough sample size, this result could not be considered relevant because of incorrect trial design and multiple biases. This was a combination of 5 different European and Canadian clinical trials. Officially they were merged because it had become clear that enrollment rate was too low to reach planned figures. This is strange enough explanation because all the trials (ESHO trial (ESHO arm), two trials of Medical Research Council in Hammersmith Hospital London (MRC-Brl and MRC-Brr arms) and trial of Dutch Hyperthermia Group (DHG arm)) were launched between May 1988 and October 1989. Only Princess Margaret Hospital (PMH arm) trial was launched later in July 1991. MRC and ESHO trials were decided to merge in October 1990, that is in a year after their launch. PMH trial was included in the collaboration in 1992, that is also in a year after the launch. Taking into account typical 6–9 years duration of hyperthermia trials, it seems that it was too early to make these decisions. Internal analysis reveals that only two arms within the entire trial were favorable for TRT group (Table 2): MRC-Brr (CLR 57% versus 29%) and ESHO (CLR 78% versus 38%). In DHG arm, the effect was equal (74%), and two resting arms revealed better effect of RT only: PMH (31% for RT and 29% for TRT) and MRC-Brl (67% for RT and 56% for TRT). Without merge of the trials, it was possible to receive 3 negative trials versus 2 positive ones only. Therefore, intention bias of the merge decision is supposed.

Further, randomization defects and patients selection in positive arms are supposed. For example, 29 patients in RT group (CLR = 38%) and 29 in TRT group (CLR = 78%) were reported in ESHO arm. For MRC-Brr arm, 59 patients in RT group (CLR = 29%) and 90 in TRT group (CLR = 57%) were reported according to planned randomization 40% : 60% in favor of TRT group. But in 2010 van der Zee et al. paper [13] was published with overview of experience in Erasmus Hyperthermia Center. It mentions data of ESHO and MRC-Brr arms of Vernon et al. trial with numbers of patients and percentages of CLR. After a simple recalculation, absolutely different data about number of patients have been received. In ESHO arm, there were 76 patients in the RT group and only 35 patients in TRT group, and MRC-Brr arm counted 203 patients in the RT group and 158 patients in TRT group. Surprisingly, with such great difference in patient numbers, CLR rates reported in van der Zee et al. paper [13] were the same as in Vernon et al. study. On the contrary, van der Zee et al. patient numbers for “TRT negative” PMH arm coincide with Vernon et al. data in full. We suppose that van der Zee et al. have just displayed a raw data because they were a members of the ICHG. Therefore, patient selection and incorrect randomization in “positive” groups of Vernon et al. study are highly probable which makes these results dubious.

Interpretational bias is also obvious. In the abstract, authors did not mention the fact that 3 of 5 arms of the trial were negative for HT, giving only general statistics of the trial. In fact, ICHG had just dissolved the results of 3 “negative” arms in dubious results of another 2 “positive” arms. Moreover, it is obvious that 18% gain (1,44 times) in CR rate in TRT arm was received for the account of twofold rise of generalization of the process (progress elsewhere 44% in TRT group versus 23% in RT group) and 2.5-fold increase of mortality (30% versus 12% correspondingly) (Figure 2). Even general statistics cannot hide that dissemination (+4%) and mortality (+7%) were worse in TRT group. Survival was worse in TRT group during all the trial term with exact divergence to the end of the trial (Figure 1). To our mind, these results in any case could not be considered positive. Anyway, this trial was presented as positive in its abstract and subsequent meta-analyses [2, 12]. Therefore, we consider this trial “semirandomized” and its result is dubious because of low reliability.

In the same year, clinical trial of Overgaard et al. [11] was published. It was multicenter (11 centers in 6 countries) randomized controlled trial on 70 patients with metastatic or recurrent skin melanomas. One hundred twenty-eight lesions were evaluated (63% ≤ 4 cm, 37% > 4 cm). RT was applied by 3 large fractions (8/9 Gy) with subsequent hyperthermia (43°C, 60 min) followed just after RT. Immediate CLR rate in TRT arm was 62% versus 35% in RT only arm (gain 77%, ), and 2 year local control rate (LCR) in TRT arm was 46% versus 28% in RT only arm (gain 64%, ).

Despite being good from the first look, Overgaard et al. trial causes many questions. The sample of the trial is too small, especially considering its multicenter design: 11 European cancer centers enrolled only 70 patients for 6.5 years, that is, less than 1 patient per center annually. Taking into account that melanoma is a quite frequent tumor, this creates ideal terms for preselection of patients, and for special attention to treatment of hyperthermia arm, which usually leads to much better clinical results. And, surely, such small sample is not representative. Additionally, not patients but tumors were subject for randomization in this trial. This is typical for rather experimental trial. As a result, the trial looks like in vivo radiobiology experiment in clinical trial shell.

The main bias of the study is the incorrect comparator which is known as a typical bias in clinical trials. The best or at least standard control treatment is the implied demand for clinical trials. Usual RT dose for skin melanoma treatment, as well for other superficial lesions, is 40–50 Gy per site [5, 9] with common dose not more than 100 Gy, and it’s commonly known that lower doses significantly reduce effect of RT [14]. 24/27 Gy total doses (TDs) used in this trial are certainly low, especially considering well-known radioresistance of melanoma. The median number of tumors per patient was 2; therefore there was no reason to lower dose per site because of high common dose. Also, usual fractionation for skin melanoma is 10–20 fractions of 2–5 Gy each. Hypofractionation used in this trial (3 fractions 8-9 Gy each) is rare. Such choice of comparator has only one logical explanation: this protocol is ideal for demonstration of thermal modification. With three doses only, each dose is modified and it’s simpler to coordinate HT and RT, and the larger single RT dose, the better modification effect. Low common dose allows to show hyperthermia effect because standard high-dose radiotherapy usually makes hyperthermia effect insignificant [15]. This once again demonstrates that this is not a clinical trial but in vivo radiobiology experiment without clinical significance.

This impression is enforced by lack of proper survival analysis. Of course, survival analysis is the core for any clinical trial but not for radiobiological experiment. All known in this trial is that immediate local control in hyperthermia arm was better and remained better after 2 years, but it’s even unknown, which overall survival was in both groups after 2 years. Overall 5-year survival was 19% which is far not better than average level for metastatic skin melanoma, but there is no answer to the main question—which survival was in TRT and RT arms and which group had a better survival? There is a very detailed survival analysis by local response, number of tumors, and sex, even by general control of all diseases—everything except of the primary goal of the trial, survival by groups—and it looks like masking of negative result. There is another reason to suppose that negative results in this trial are reported incompletely: for instance, there is not a word about burns, though these are obviously reported in other trials and usually more than 30%.

It’s not clear also why 4 cm was used as a border of small tumor size. All the other randomized superficial studies used 3 cm as a border size, and this is absolutely correct because superficial tumors generally considered such, if they are less than 3 cm deep. In RTOG 8104 trial [5], 77% of tumors were >3 cm. In Overgaard et al. trial, 63% of tumors were <4 cm and this distribution could not be compared with other trials because of different criteria of tumor size. Therefore, it’s impossible to say exactly, whether there was preselection of small tumors in this trial. It was already known to that moment that TRT is significantly more effective in small tumors. The authors try to prove that tumor size impact was statistically insignificant ( ), but it seems to be not correct. As it's obvious from Figure 3, impact of tumor volume is much stronger in TRT arm, and the only reason why it’s not statistically significant is 4 cm limit. With 3 cm limit, this difference would be higher and probably statistically significant, as it was in the other trials.

Further, 2 different RT protocols with TD 24 and 27 Gy were used in the trial. There was no reason to include two RT protocols to examine HT efficacy: in this case all other factors should be equal. It’s obvious that authors intended to show that thermal enhancement rises with increasing of TD (Figure 4), with subsequent extrapolation of the conclusion to the higher (normal) RT doses. This is absolutely incorrect approach. The results of other trials show that with normal/high TD, the effect of thermal modification becomes insignificant or disappears [5, 79] or even reverses [15]; therefore the extrapolation is incorrect. The strength of RT impact in this trial was much stronger than HT strength: CLR rate was 56% for 27 Gy versus 25% for 24 Gy (Gain 124%, ), that is, twice stronger than HT effect (Gain 64%). This also supposes that, with rise of TD of RT, the relative thermal enhancement would diminish soon. The displayed thermal enhancement effect was rather the effect of single dose difference (9 Gy versus 8 Gy) than the effect of TD because the higher thermal enhancement effect to higher single dose is well-known in radiobiology. Finally, statistics looks not correct because the authors report only 1.17 odds ratio for RT versus 1.73 for HT.

The above-mentioned is enough for the conclusion: (i)the trial was in fact the in vivo radiobiological study without clinical significance.(ii)the trial seems to be especially designed for demonstration of hyperthermia efficacy to the detriment of practical value. (iii)the trial uses an incorrect comparator.(iv)the actual survival outcome of the study is hidden. (v)negative data seems to be reported incompletely.

Apparently, this is the reason why the study had no consequences: there were no further studies on TRT of malignant melanoma and there is no any clinical application. That is why we consider this trial result dubious.

In 2005, the most famous and the most cited superficial hyperthermia study of Jones et al. [12] from Duke University was published. This was a prospective, randomized, controlled, monocenter study on 108 patients with superficial tumors of chest wall, neck nodes, and melanoma. TRT with CEM43°C   was studied versus fractionated RT alone (single dose 1.8–2 Gy, total dose 30–70 Gy). CLR was the main endpoint and it was significantly higher in TRT arm—66.1% versus 42.3% in RT alone arm ( ).

Even the first look at the patient characteristic (Table 3) reveals biases. The median age of TRT arm was 7 years less than RT only arm (52.4 versus 59.3 years). Such difference is impossible with proper randomization of more than 100-person sample. Incorrect randomization is a well-known defect of randomized trials. Some other points also suggest improper randomization: for example, radiation dose in TRT arm was 10% higher. As it was shown above, 10% increase of RT dose in Overgaard et al. [11] trial had led to 124% gain of 2-year local control rate. This improper randomization was further distorted by preselection of “heatable” patients: after test heating, 13 patients of 122 (11%) were considered “nonheatable” and did not enter the trial. This preselection could be not considered as a defect if the trial conclusion refers to “heatable” patients only, but it does not include such mention.

There is no tumor size data in the trial, though tumor size analysis is always present in any clinical trial as one of the major predictor of RT success. Taking into account the obvious defect of randomization, lack of tumor size data, preselection of “heatable” patients, and slow enrollment (122 patients per 7 years, i.e., 1.5 patients per month), selection of patients with small tumors is highly probable. One more distortion factor is high percentage of RT-pretreated patients (36%). These patients were radioresistant: whereas in TRT arm their CLR rate was virtually equal (68.2% in pre-treated and 65% in not pre-treated), CLR rate in RT only arm was significantly lower in preirradiated group (23.5% versus 51%). Simple analysis shows that 36% share of RT-pretreated patients adds 10% difference in favor of TRT arm. This is the obvious defect because well-designed trials usually exclude such known disturbing factors, enrolling either pretreated or not pretreated patients.

We have tried to analyze the possible impact of all the above mentioned biases on CLR rate (see Table 4). The result shows that only the accountable factors—preselection of “heatable” and RT-resistant patients and RT dose bias—could add at least 60% to the effect in TRT arm, which explains all the registered CLR gain. With regard to known younger age of TRT arm and possible tumor size bias, total impact of biases could be even stronger. In other words, it’s possible that really hyperthermia did not improve the radiotherapy effect but, vice versa, did worsen it. Taking into account the results of previously reviewed trials, this conclusion does not look impossible.

Local Control Rate (LCR) was the only positive (+57%) and statistically significant ( ) effect of the study (see Figure 5(a)). Long-term LCR is fully explained by the immediate LCR gain because hazard of progression had become equal in both arms already to the 1st year (see Figure 5(c)). Overall survival (see Figure 5(b)) was the most disappointing endpoint: it was worse in TRT arm since 1st year and to the end of the trial, though statistically insignificant ( ). With respect to known significant biases in favor of TRT arm, these results are threatening. This negative impression is further aggravated by attempts to hide the negative course of the trial. Table 5 is demonstrative in this respect. In fact, more patients had died in TRT arm but with perfect local control (see Figures 5(a) and 5(b)). In Table 5, a very favorable picture of better local control in TRT arm is shown but without the full death information which could spoil the impression. This is an obvious example of data manipulation.

This questions just the parameter “local disease- (relapse) free survival” (LDFS) which is the main endpoint of all the hyperthermia studies and usually the only significant one. This parameter is misleading because in fact this is not a survival. Meaning survival, we understand it as the share of patients which are alive at the moment without local relapse but this is not the case of LDFS, which is in fact the share of patients which are alive and have died without local relapse. In fact, the correct name of this parameter should be “(Intravital) Local Relapse Incidence” and this is not connected with survival in any way. For instance, if all patients in a sample die without local relapse, LDFS equals 100% with OS = 0%. If in another sample, OS for the same period is 50% but with 60% of local relapses, LDSF equals 40% only and this arm looks significantly worse than the first arm in terms of LDFS. Taking into account that virtually all the “successful” hyperthermia trials are based on the better LDFS, the clinical insignificance of this success is obvious.

Finally, safety in this trial was the worst among all the previous trials: 46% of burns, incl. 3% of 3rd degree; 11% of complications of catheterization, incl. 3% of grade 3 toxicity. 16% of patients should pause treatment due to toxicity.

The authors’ conclusion—“Adjuvant hyperthermia with a thermal dose more than 10 CEM 43°C confers a significant local control benefit in patients with superficial tumors receiving radiation therapy”—seems irrelevant. We consider the result of the trial dubious. The observed local control benefit could be fully explained by the reported biases, and with regard to the biases, the survival gain in TRT arm seems to be negative.

In 2007, paper of Jones et al. [16] was published advocating the use of hyperthermia as a radiotherapy sensitizer for treatment of chest wall recurrences: “Data from several randomized trials suggest that the addition of hyperthermia to radiation can increase the response rate for such local recurrences.” The same year, National Comprehensive Cancer Network (NCCN) had included consideration of the addition of hyperthermia for women with recurrent locoregional advanced breast cancers after first-line surgery or radiation failed. The NCCN guidelines stated that “while there is heterogeneity among the study results, a recent series with strict quality assurance demonstrated a statistically significant increase in local tumor response and greater duration of local control with the addition of hyperthermia to radiation compared to radiation alone (Jones et al., 2005 [12]).” The NCCN guidelines noted that the addition of hyperthermia generated substantial discussion and controversy among the NCCN panel members and is a category 3 recommendation (the recommendation is based upon any level of evidence but reflects major disagreement). The counterpoint was sound by McCormick [17] from Department of Radiation Oncology of Memorial Sloan-Kettering Cancer Center who said “Although HT in chest wall recurrences has been used for several decades, recent reports are few. Unresolved issues of radiation dose, optimal temperature and timing of HT, and quality assurance problems with thermometry are apparent from these studies. Although clearly an effective treatment option in this clinical scenario, more research on HT and radiation is needed before this treatment combination can be considered standard care.”

Thus, of 7 reviewed randomized clinical trials on superficial hyperthermia, 4 were considered negative by the authors themselves (Perez et al. [5], Emami et al. [8], Kapp et al. [7], and Engin et al. [9]). Of the 3 remaining trials which were considered positive by their authors, Jones et al. trial was biased and dubious, Vernon et al. trial had incorrect design and controversial data, and Overgaard et al. trial was not representative, biased and clinically insignificant.

These trials showed that superficial TRT is effective: (i)for small tumors only (≤3 cm, thermal enhancement ratio (TER) = 1.2–2) with no effect for big tumors (≥3 cm, TER = 0.9–1.1); (ii)for those tumors only which are possible to heat adequately (20′ ≤ 42.5°C);(iii)for “heatable” tumors only;(iv)only with effective thermal control; (v)with large RT fractions and much less effective or not effective with typical hyperfractionated protocols;(vi)only in special setting—HT shortly after RT.

Even in this setting, HT statistically significantly improves only CLR rate (+30–60%) and short-term local control rate (1-2 years). Total local control rate (complete + partial local remission) improvement and long-term local control rate (>2 years) are generally statistically insignificant. The major prognostic factors for duration of local control were: tumor histology, then RT dose, then tumor size, and then minimum temperature in the tumor (much less significant). The recent retrospective study of De Bruijne et al. [18] showed that with respect to tumor volume, thermal dose was not associated with any clinical endpoint. There was no influence on overall survival; sometimes it tended to be worse with HT [12].

Even these small and partial successes of superficial hyperthermia look clinically insignificant because small tumors represent smaller part (25–35%) of superficial tumors and could be easily ablated or removed by surgery (the methods of choice). These are big superficial tumors, which hyperthermia is interesting for, but it is ineffective in this regard. Major part of these tumors is hardly heatable because of localization, body shape, sensitivity, and so forth. Hard thermal control used in “positive” clinical trials is impossible in clinical practice (e.g., 24-channel thermometry is routinely used in Erasmus university HT center); bad thermal control significantly reduces both efficacy and safety up to reversal of the ratio. Hypofractionated RT protocols, which are optimal for thermal modification, are much less used in practice. Optimal sequence of RT and HT is hard or impossible to manage in real practice; suboptimal sequence makes the combination much less effective or ineffective. The level of toxicity (≥30% of burns), which is applicable in clinical trials, is impossible in clinical practice.

Conclusion on superficial hyperthermia is as follows:(i)there is no clear evidence of overall efficacy of hyperthermic radiotherapy modification of superficial tumors so far. (ii)existing positive results are biased and/or clinically insignificant.(iii)superficial hyperthermia is still an experimental treatment with limited applicability in clinical practice.

The conclusion of hyperthermia society opinion leaders was vague: “In a select group of patients, the addition of hyperthermia to radiotherapy increases the eradication of local tumor, with a modest increase in largely self-limited toxicity. While attainment of CR is a worthwhile study endpoint, one must also consider the need to address palliation of symptoms, in that the majority of these patients will ultimately succumb to their distant disease. In the modern era of “targeted” therapy, the issue of local control will increasingly become more important. Future applications of hyperthermia combined with radiotherapy should include the addition of targeted biological agents in the hopes of increasing the CR rate and hopefully translating into prolonged disease-free survival. Liposomal doxorubicin has been combined with radiotherapy and hyperthermia by one group and warrants further evaluation in the future. Efforts must be taken to provide reproducible, efficacious heating of tumors so that the synergistic effect of combining radiotherapy and hyperthermia can be optimized. With rigorous thermal dosimetry and careful treatment technique, the addition of heat to radiotherapy can result in long-term local control of breast cancer chest wall recurrences” [19].

Having been translated from Aesopian language, this means that hyperthermic radiotherapy modification is effective only in a selected group of patients, and it causes mainly palliation of symptoms by improved local control without effect to survival, because metastatic process is not affected by this treatment, and this local effect could be achieved only upon conditions of effective heating, rigorous thermal dosimetry, and careful treatment technique, and hyperthermia increases toxicity of treatment, and future application of TRT depends on targeted biological agents which could increase its effect. Thus, this conclusion also contains a hidden confession of insufficient efficacy of superficial TRT of breast cancer and chest wall recurrences, and these limitations would keep hyperthermia far from clinical practice.

3. Hyperthermia of Deep-Seated Tumors

The phase III RTOG clinical trial on deep hyperthermia of Emami et al. was published in 1996 [20]. This was prospective, randomized, controlled, multicenter trial. One hundred eighty-four heavily pre-treated patients with deep-seated tumors of head and neck and pelvis were enrolled. TRT with HT 42.5°C for 30–60′ applied after RT versus RT alone (cumulative dose ≤ 100 Gy) was tested. CLR rate was 55% in TRT arm and 53% in RT only arm. 2-year overall survival was 34% in TRT arm and 33% in RT only arm. Acute 3-4 grade toxicity was 22% versus 12% and late toxicity 20% versus 12% in TRT and RT arms, respectively (Figure 6). Thus, complete response rate increment was negligible and statistically insignificant; toxicity increment was substantial, both acute and late, but also not statistically significant.

The authors concluded that “Interstitial hyperthermia did not show any additional beneficial effects over interstitial RT alone. Delivery of HT remains a major obstacle. The benefit of HT in addition to RT still remains to be proven in properly randomized prospective clinical trials after substantial technical improvements in heat delivery and dosimetry are achieved” [20].

In 2000, Dutch Deep Hyperthermia Group (DDHG) prospective, randomized, controlled, multicenter phase III trial of van der Zee et al. [21] was published. Three hundred fifty-eight not pretreated patients were enrolled in 11 Dutch centers and randomized for TRT (182 patient) and RT only (176 patient) groups. RT was applied as External Beam RT (EBRT) + Brachytherapy (BT) with total dose 65 Gy. Five sessions of deep HT (42°C for 60–90′ of total time) were administered weekly 1–4 hrs after RT. CLR rate and local disease-free survival (LDFS) were the endpoints.

The trial included three subgroups (see Figure 7):(i)advanced cervical cancer (114 patients),(ii)advanced rectal cancer (143 patients),(iii)advanced bladder cancer (101 patients).

Though overall CLR rate was statistically significantly increased in TRT arm (55% versus 39%, ) and duration of local control in TRT arm was also significantly longer ( ), there were great differences between subgroups. There was no statistically significant effect in rectal cancer group, and OS in the TRT arm was worse there, though being statistically insignificant. In general, the result in rectum cancer group was negative. Bladder cancer result was better but improved local control had disappeared during follow-up, and there was no effect to OS. In general, this result was dubious.

Cervix cancer group was the only one with statistically significant improvement of both CLR (83% versus 57%, ), LDFS (3 y LDFS 61% versus 41%, ), and OS (3-year OS 51% versus 27% in RT only arm, ). Therefore, only cervix cancer results were further reported [25]. In 2008, Franckena et al. [26] published impressive result of the long-time follow-up: 12-year local control rate was 56% in TRT arm versus 37% in RT arm ( ); 12-year overall survival in TRT arm was 37% versus 20% in RT arm ( ). Median overall survival was 2.64 years in TRT arm versus 1.78 years in RT arm. Local recurrence rate was 25% in TRT arm versus 31% in RT arm. Distant metastases rates were the same in both arms (31% and 32%).

First of all, interpretation of the trial result provokes disagreement. The statements like “in this trial, a beneficial effect from adding hyperthermia to standard radiotherapy was demonstrated, particularly for patients with cervical cancer” [27] or “the overall result showed a substantial benefit for whole group but only 114 patients with cervical cancer were included in the published reports of this trial” [28] are incorrect. In fact, the clear beneficial effect was demonstrated only in cervical cancer subgroup. Results in two other subgroups were negative (rectal cancer) or dubious (bladder cancer) [29]. Therefore, for correct analysis of trial results we consider it consisting of three subtrials where only one was successful.

Second, it seems that trial used incorrect comparator—RT with total dose 67 Gy versus 75–95 Gy in successful RT trials. It’s impossible to say which part of the TD was targeted to tumor mass in this trial because it’s not specified. It’s known only that “para-aortal nodes were routinely included in the external radiotherapy field” [21]; therefore TD to tumor mass was less that 67 Gy (estimated not more than 60 Gy). This point was widely criticized and authors’ attempts to justify the comparator look weak. Their position that such dose “is considered adequate treatment” [27] is unsatisfactory because not adequate but the best available or standard treatment is demanded by default for control treatment in a III phase trial. Inadequacy of low dose RT was obviously showed by Perez et al. trial [14]: in Stage III unilateral lesions, the 10-year pelvic failure rate was about 50% with ≤70 Gy to tumor mass versus 35% with higher doses, and in bilateral or bulky tumors it was 60% with doses ≤70 Gy and 50% with higher doses. Therefore, higher RT dose could add 25%–30% and more to long-term local control rate and there is no any ground to consider total dose less than 70 Gy adequate, especially for control group in clinical trial. Combination of an external-beam RT (EBRT) with a brachytherapy (BT) with total dose 75–85 Gy to tumor mass was widely accepted since mid-70s [14, 30] whereas enrollment to DDHG trial started in 1990. Advocacy that the low dose was a consequence of not all patients received full RT is disproved by the study protocol. According to the protocol, EBRT was applied to whole pelvis by 23–28 fractions of 1.8–2.0 Gy to TD 46–50.4 Gy; then HDR BT 17 Gy in 42 patients or LDR BT 20–30 Gy in 49 patients was applied [21, 25]. As follows, at least in 42 patients TD could not exceed 67 Gy and in the other 49 patients it could vary in the range of 66–80 Gy. Therefore it seems that really achieved TD of 67-68 Gy is a planned target TD of the trial and not a result of not full RT. Another attempt is to change the focus from the problem of insufficient RT dose to the general change of cervix cancer paradigm to chemoradiotherapy after the start of DDHG trial [29]. This is really so, but it does not answer any way the question of RT dose inadequacy. As it’s obviously follows from Table 7, the clinical results in DDHG trial control (RT) group were 1.5–2 times worse than the best results available, and even much worse than the old results of Fletcher received in 1954–1963 with the very first megavolt linear accelerators with TD = 90 Gy for IIIB stage. That is, it’s obvious that DDHG trial used incorrect comparator which is considered a serious bias.

The authors explain the worse clinical results by relatively young age, bulky tumors, and nodal involvement. The first reason is not convincing. Median age 50-51 is equal to age of the first diagnosis of cervix cancer in Northern Europe (50–52) and of necessity nearly equal to any other North-European study enrolling nontreated patients. Also, though in this trial the immediate CLR rate was better for older patients [25], other studies show that younger age is associated with better long-term results and survival [32, 41]. Two other reasons looks sound but not enough evident. Though average tumor size in DDHG trial is really big, in terms of survival this is a significant factor for stage I but not for more advanced stages where parametria involvement and nodal status are significant [14, 32]. Nodal involvement in DDHG trial, though seemed to be more extensive than in other trials (70% versus 30%–40%) was assessed in 44% patients only [25] and therefore is not evident. Summarizing, there are some grounds to consider DDHG sample more severe than in other clinical trials, but it’s not evident. Anyway, use of stage of disease is valuable and correct for comparison (Table 6). And the question remains: why so gentle RT schedule was used which is obviously inadequate to severity of the sample?

Clinical results in TRT arm of DDHG trial were worse than the best results reported with RT only (see Table 7) with total dose to tumor mass 75–90 Gy. As it was discussed above for Overgaard et al. trial, use of low RT dose is convenient for radiobiological demonstration of hyperthermia effect but leads to clinical insignificance of clinical trial. This is what we see in this DDHG trial: it’s impressive in demonstration of low-dose radiotherapy modification but clinically insignificant because of low overall effect. As it’s obvious from the other hyperthermia trials, effect of hyperthermic RT-modification becomes statistically insignificant or disappears at all in comparison with standard high-dose RT [5, 20].

Inadequate comparator is not the only problem of the DDHG trial. These are also huge heterogeneity in RT and HT coupling, difference in HT-equipment used, poor analysis, and incomplete safety analysis. The trial combines data of two independent studies based on Amsterdam Medical Center (AMC) and University Hospital Rotterdam (UHR). Whereas AMC trial was monocenter, UHR collected patients also from 9 other RT centers. As a result, if in AMC HT followed after RT in 1 hour, in UHR the usual delay was 3-4 hours because of logistics. It’s well known that RT-modification time interval lasts not longer than 1.5 hours. Thus, there was a RT-modifying coupling in AMC but not in UHR, where concomitant instead of the combined treatment was applied. It seems that efficacy of such different applications should be quite different. The authors indirectly confess inapplicability of classic RT-modification criteria in this case: “Probably the main gain of hyperthermia is a direct effect on the hypoxic tumor cells. This extra cell kill will be clinically relevant in a small proportion of patients only, and studies of more patients are required to establish such an improvement” [21]. This coupling difference is further aggravated by difference of equipment used: it was BSD2000 system (BSD Corp., USA) in UHR, 4-waveguide applicator system in AMC and TEM applicator in Utrecht (both being custom-built). There is no any comparison of the systems except of short phrase “for the three systems, similar energy distribution in human pelvic size phantoms has been demonstrated” [21]. Taking into account significant difference in technologies (e.g., TEM applicator uses frequency range 10–80 MHz [42] whereas BSD2000 uses 80–120 MHz; these regions have very different properties), there is very low probability that these systems are clinically equal. But no publication on the trial contains separate analysis of efficacy and safety by centers or HT-units. There are no separate data for AMC and UHR; even the number of patients in these two trials is unknown. But such generalized data are useless from practical point of view because it’s unknown, which type of application is effective in such wide range of application modes. It’s even unknown, which temperatures were used in the trial because temperature analysis is absent. When Dahl and Mella [28] say about thermometry data in DDHG trial, they just quote the data from another trial of Harima et al. [24], and this is an obvious confusion. It’s also known from another source in Rotterdam (Fatehi [37]) that intratumoral temperature in cervix carcinoma with BSD2000 system never reached 40°C; thus 42°C stated in the trial protocol is a misinformation. In fact, this trial is a “black box”: we know only input and output parameters but we absolutely do not know “how it works.” Thus, we do not know how to use it, and that is why DDHG trial is useless from practical point of view.

Additionally, safety analysis seems to be incomplete and biased. This is the only HT-trial which reports less 3-4 grade toxicity in TRT arm (2.2%) than in RT arm (5.9%), which is very dubious. At the same time, authors reports about 12% (20/170) of subcutaneous burns, which needed up to 2 weeks for relief; 3% (5/170) of skin burns, including 1 case (0.58%) of 2 grade burn and 2 cases (1.2%) of 3 grade burn, which demand interruption of HT-treatment; and 2 cases (1.2%) of severe deep burns of skin and subskin. Additionally, “some” patients suffered from catheter-dependent infections [21]. Therefore, there were at least 18% (30/170) cases of HT-related toxicity which should cause interruption of HT-treatment, whereas according to the authors’ information, treatment was delayed only for 7 patients in TRT arm.

Refusal of treatment is one more source of safety information. It’s reported that 41% of patients refused to perform all 5 HT treatments, 25% received 1–3 treatments only, and 9% did not receive any HT-session. It’s declared that the main reason of refusal is that patients had known about “experimental nature of this treatment” [25]. This is quite a strange explanation because patients were recruited “after verbal informed consent had been obtained” [21]; therefore the patients should be initially informed about experimental nature of the treatment; also this does not explain 9% of patients (16) who did not receive any HT session at all. The most probable reason of nonreceiving HT-treatment is toxicity. After all considerations, we assess HT-dependent toxicity near 30% with HT-limiting toxicity not less than 10%. These data are hidden.

Therefore, our conclusion on DDHT trial is as follows: of the three DDHT subgroups, rectum results were clearly negative, bladder results were dubious, and only cervix arm showed statistically significant response. This response was received versus inadequate comparator and was worse than that reported in best trials with RT only, including long-time local control and survival. The study design does not allow speaking about TRT, rather about the HT and RT co-treatment. Poor data presentation and analysis do not allow to understand the reasons of the study results. Toxicity analysis is incomplete. The results of the trial are clinically insignificant and practically inapplicable.

Shortly after the DDHG trial, a small Japanese trial of Harima et al. [24] was published in 2001. It was a prospective, randomized, controlled, monocenter trial. Fourty patients with FIGO stage IIIB cervical cancer were enrolled in 1994–1999 and randomly allocated for TRT and control RT group with 20 patients in each group. RT was applied with 6 MV EBRT and iridium-192 HDR BT to TD 82.2 Gy. Hyperthermia was applied within 30 minutes after RT session by Thermotron RF8 capacitive system with output power 800–1500 W. The trial showed excellent results in favor of TRT arm: CLR rate was 80% in TRT arm versus 50% in RT only arm, 3-year LDFS and OS were better in TRT arm (80% and 58% resp.) than in RT only arm (49% and 48% resp.) (Figures 8 and 9).

This trial stands apart from other trials and is unique in many respects. First, the authors had calculated the minimum volume of the sample (2 × 20 patients) from the hypothesis that TRT will give 80% of CLR versus 50% in RT only. Then, they had received the exact as planned result (80% and 50%) with the planned sample volume. Such exact coincidence of trial plan and result is really unique. Second, the sample of the trial was the oldest of all mentioned trials: mean age in TRT group was 64.9 years, and these were previously untreated patients. It’s very uncommon because according to Ioka et al. [41] trial made on 8966 cases of cervical cancer diagnosed in 1975–1996 (Harima et al. enrolled patients in 1994–1999) who lived in Osaka Prefecture of Japan, the average age of the first diagnosis was 54.6 years. It seems that it’s hard enough to obtain 10-year older sample of first time diagnosed patients randomly. Thus, preselection of aged patients is obvious. The reported fact that local control after TRT is significantly better in older patients [21] could be a reason to select such older sample. At the same time, average tumor volume in this trial was at least 1.5 times less compared to DDHT trial though stage of disease is the same and both trials enrolled not previously treated patients. Moreover, in Harima et al. trial, the patients were 14 years older (64.9 versus 51 years) than in TRT group of DDHT trial which implies more bulky tumors. Therefore, preselection of small tumors is supposed. It’s well known that TRT effect is higher for smaller tumors. Third, though TD 82.2 Gy seems to be adequate, in fact it’s not so. TD to tumor mass was only 60.6 Gy (30.6 Gy EBRT to whole pelvis and 30 Gy of BT to point A), while 21.6 Gy dose was applied to parametria with central shielding. Therefore, TD to tumor mass was nearly the same as in DDHG trial, but OS in RT group was much better than in DDHG trial (3 y OS 48% versus 27%, 5y OS 48% versus 23% resp.) and was on the level of best RT-only trials with TD 75–85 Gy to tumor mass (see Table 7), and it’s also amazing. Effect of low-dose comparator and clinical significance of such comparison were discussed above. And, at last, the mentioned trial of Ioka et al. [41] had shown that older age is associated with much lower survival: relative 5-year survival for cervical cancer was 88.6% in <30 years, 78.1% in 30–54 years, 67.7% in 55–64 years, and 54.4% in 65+ years. In Harima et al. trial, 65-old sample had much higher survival than 15 years younger sample in DDHG trial (see Table 7), and this is once again amazing. We did not find any similar trial with respect to its unique features.

Finally, there is a concern on the reliability of the reported results and the statistics of the study at all because of multiple contradictions and inconsistencies. First, it’s stated in paragraph 3.1 that “in the TRT group, 13 patients (65%) are alive and well,” but in paragraph 3.2, only 58.2% 3-year OS is reported. It questions the reliability of OS data. Second, it is reported in paragraph 3.1 that 7 patients in TRT group showed progression of the disease; namely 2 had local relapses, 3 distant metastases, and 2 both local relapse and distant metastases, but then it’s stated that only 6 patients had died from recurrent disease whereas 1 patient had died from cerebral hemorrhage with no cancer recurrence. This contradiction questions the reported DFS rate. Finally, the calculation of all the parameters in paragraph 3.2 is doubtful. Because each group consisted of 20 patients and no one patient was censored for any reason, any percentage should be exactly 5-fold, but there is no any whole number in the paragraph 3.2 (OS = 58.2%, DFS = 63.6%, LDFS = 79.7% and so on). The lowest clinical value of LDFS rate was discussed earlier. Figure 9 is demonstrative in this respect.

The trial seems to be specially designed to show the effect of TRT like it was shown earlier for Overgaard et al. [11] trial: much older patients (+10–15 years) and low-dose TD to tumor mass (60.6 Gy) as a comparator with exact RT-HT coupling, and high-dose RT (21.6 Gy) to parametria. Older age and low-dose RT comparator could explain statistical significance of differences. Large dose to parametria, on the one hand, masks inadequacy of RT-comparator because total dose 82.2 Gy looks adequate and, on the other hand, markedly improves overall survival (it is improved in both RT and TRT arms compared to van der Zee trial), which is significant because older age favors better local control but does not contribute to better survival [41].

So, there is the unique and not been reproduced small chamber trial, having been carried out on highly specific pre-selected sample of aged patients (10 years older than expected) with smaller tumors, using inadequate comparator (low RT dose to tumor mass—60 Gy only), but with excellent result which is better than that obtained on 15 years younger sample (van der Zee et al.), and is statistically significant despite of extremely low sample (20 + 20), and this result had point-to-point coincided with study hypothesis, though the calculations are doubtful. This is an alarming result.

Summarizing, the trial with so many amazing features should be made on much larger sample and preferably should be reproduced in independent trials for evidence. Until confirmation, the significance of Harima results should be considered dubious.

It is the reproduction of the effect which is the main problem of Harima et al. trial evidence, because the attempt to reproduce its result was disappointing. In 2005, clinical trial on cervical cancer of Vasanthan et al. [15] was published. This was a prospective, randomized, controlled, multicenter phase III trial sponsored by International Agency of Atomic Energy. 110 patients with FIGO IIb-IVa stage of cervical cancer were enrolled in 1998–2002 in 5 centers in 4 countries. The OS at 3 years was 73.2%, and the local control rate was 68.5%. There were no significant differences between the patients treated with RT and TRT, either with regard to OS ( ) or to local control rate ( ). At the same time, OS was significantly worse in patients with stage IIb disease in TRT arm ( ) although there was no difference in their rate of local control ( ). Acute grade 2-3 toxicity was seen in 18% of patients in TRT arm and in 4% in RT arm ( ). Authors concluded that “this study failed to show any benefit from the addition of hyperthermia to radiotherapy in the treatment of locally advanced carcinoma of the uterine cervix.” It’s important to note that Vasanthan et al. study had an intermediate design between DDHG and Harima trials: HT was performed in patients with IIb-IVa stage disease with average age 50 years, in average 5 times (like in DDHG trial) 1 HT session per week just after RT using Thermotron RF8 units (like in Harima trial).

It’s interesting to analyze the results of cervical cancer hyperthermia studies because there are many such trials which make such analysis possible. It’s also interesting because cervical cancer really looks thermosensitive. It was success in cervical cancer treatment, which started an interest in hyperthermia in oncology (Westermark, 1898 [1], Gottschalk, 1899 [43]). Thus, it is not amazing that at the end of XX century the center of oncologic hyperthermia application had returned to cervical cancer.

We’ve found six randomized trials on TRT of cervical cancer (see Table 8). Among them, two early Indian trials of Datta et al. and Sharma et al. were not assessed because they used intravaginal convectional heating, which is clinically insignificant method; also they were small enough. Both trials had reported better local control without effect to survival. The trial of Chen et al. is in Chinese which is a problem. But its result is negative in terms of TRT: the authors reported that of 4 subgroups in this trial, only combination of RT, ChT, and HT had shown significant improvement, whereas differences between all other 3 groups (RT only, TRT, and ChRT) were not significant. Because of absence of translation, we have not included Chen et al. trial in the final record (see Table 15).

Design and results of three remaining trials have already been analyzed above and are summarized in Table 9.

Vasanthan et al. trial, despite the negative results for TRT arm, had excellent common result: CLR rate 80%, 3 y LDFS 69%, and 3 y OS 73%. It could be even better taking into account catastrophic results in Guangzhou subgroup (see Table 13) which reason is incomprehensible. As it’s seen from Figure 10, Vasanthan LDFS was average between Harima and van der Zee but OS was much better. It’s very demonstrative than OS in TRT arm in all three trials was close enough but OS in RT only arms was very different (79% versus 48% and 27%, resp.) which once again brings us back to the problem of incorrect comparator.

There were two principal differences between Vasanthan trial on the one hand and Harima and van der Zee trials on the other hand: RT dose and tumor volume. In Vasanthan trial, dose to tumor mass was near 72 Gy (with TD = 84 Gy), that is, 20% more than in both Harima and van der Zee trials (TD 60 Gy). The pattern of these three trials is typical enough: TRT versus low-dose RT gives significant effect, and it’s not effective versus standard high-dose RT.

The second principal point is a tumor volume. As it’s seen from Table 9, tumor volume in Vasanthan et al. trial (50–60 cm3) was two times less than estimated tumor volume in Harima et al. trial (107–118 cm3), and three times less than estimated tumor volume in van der Zee et al. trial (179–183 cm3). This is absolutely natural because 50% in Vasanthan trial were patients with IIb stage whereas there was IIIb stage only patients in Harima trial and in van der Zee trial patients also could be considered IIIb stage because IIb and IVa patients were counterbalanced. As anticipated, smaller tumor size had led to better local control in Vasanthan et al. trial contemporary to van der Zee at al. trial (see Figure 10) (the local control in Harima trial seems to be even better but above-mentioned specificity of the trial design could easily explain it). Local control rates for IIb stage patients also were better than in IIIb stage patients (see Figure 11). But—suddenly—overall survival rate in IIb stage patients was much and significantly worse contemporary to both IIIb subgroup and RT control ( ) (see Figure 11). Therefore, it seems that smaller size is associated with better local control but much worse survival rates. Vasanthan et al. did not analyzed the reasons of enhanced mortality in IIb stage patients, having said just “further analysis is necessary to determine if the difference in survival is due to a greater incidence of distant metastases or some other cause” [15]. Significantly higher incidence of distant metastases after TRT (17.3% (4/23) versus 4.3% (1/23) in RT group) was already reported earlier by Sharma et al. [35] and it’s known also that this trial included both II and III stage patients. It could be hypothesized therefore that in smaller tumors with relatively higher perfusion, hyperthermia-induced increase of blood flow could enhance tumor dissemination. On the other hand, neither DDHG [26] nor Harima et al. [24] reports higher metastases rates in TRT group, but they enrolled predominantly advanced stages of the disease (IIIB–IVA).

In 2007, prospective, randomized, controlled, multicenter phase III trial of Mitsumori et al. [22] made on 80 patients with nonsmall cell lung cancer (NSCLC) was published. In fact, Vasanthan and Mitsumori trials were two arms of one IAAE sponsored trial. The result was the same: difference of CLR and OS rates in TRT and RT arms was statistically insignificant ( and , resp.), though LPFS was significantly better in TRT arm ( ). The authors had concluded that “although improvement of LPFS was observed in the RT + HT arm, this study failed to show any substantial benefit from the addition of HT to RT in the treatment of locally advanced NSCLC.”

The most recent and the most fundamental randomized trial on deep hyperthermia was published by RD Issels et al. [23] in 2010. This prospective, randomized, controlled, multicenter III phase trial was sponsored by European Society for Hyperthermic Oncology (ESHO), European Organization for Research and Treatment of Cancer (EORTC), US National Institute of Health (NIH), German Cancer Society, Helmholtz Association and private sponsors. 341 patients with localized high-risk soft tissue sarcomas (STS) (≥5 cm, FNCLCC grade 2 or 3, deep to the fascia) were enrolled at 9 centers in Europe and North America for 9.5 years (1997–2006). The trial was designed to study HT efficacy in complex treatment of STS by the most effective protocol: neoadjuvant chemotherapy (NAChT) → definitive surgery → adjuvant RT → adjuvant chemotherapy (AChT). Chemotherapy (ChT) was applied by EIA protocol (etoposide 125 mg/m2 and ifosfamide 1500 mg/m2  ×  4 days + doxorubicin 50 mg/m2 on Day 1) in 8 cycles: 4 before surgery and 4 after RT. 169 patient were randomly assigned to receive thermochemotherapy (TChT) instead of ChT. Regional HT (42°C × 60′) by virtue of BSD-2000 hyperthermia units was applied on 1st and 4th day of each ChT cycle. The following results had been reported: there was no effect to overall survival (median survival was 79 months in TChT arm versus 74 months in ChT arm, ) but short-term local response rate (CLR + PLR) was twice higher in TChT arm (34% versus 16%, ), and local progression free survival (LPFS) was significantly enhanced in TChT arm (32 months versus 18 months ( ): 76% versus 61% after 2 years ( ) and 66% versus 55% after 4 years ( )).

Unfortunately, careful analysis of the trial gives disappointing result. There is a systematic bias in favor of TChT arm.

Five possible points of distortion were identified: tumor size, grade of disease, surgery, RT, and ChT. All the points were distorted to various extent but unidirectionally in favor of TChT arm, which forms obvious systematic bias. We’ve attempted to estimate the possible distortion which could be caused by this systematic bias (see Table 10). The method of estimation is as follows. “Δ%” is a relative increment of every parameter calculated as a difference between percentages of the parameter for TChT and ChT arms (or using the value of the parameter, if there is no percentage) divided for the percentage (value) of the less parameter. Impact of a parameter considered “direct” if its increase adds to the effect of treatment, otherwise a parameter has “reverse” impact. “Weight” of a parameter is calculated as the sum of patients involved in the parameter assessment in both arms divided for the total number of patients in the sample (341) and represents an impact of this parameter on general sample. Final distortion (“Dist%”) is calculated as a product of “Δ%” and “Weight,” therefore representing a parameter increment adjusted to its weight. Distortion is considered positive if it favors TChT arm. It’s obvious that every parameter has different strength of impact on treatment effect but we did not do any such adjustment because of its subjectivity. Also, every parameter was estimated by its minimum value. For instance, impact of tumor size, not tumor volume, was assessed, though this 2.7% difference of tumor size means 8.4% difference of tumor volume.

Thus, every parameter of the estimation favors TChT arm: tumor size (+2.7%), grade of STS (+5.2%), RT (+2.9%), surgery (+23.9%), and ChT (+59%). In surgery, every subparameter also was distorted in favor of TChT arm: overall number of patients underwent surgery, including previous surgery (+2.2%), number of definitive surgeries in this trial (+5.9%), number of patients with measurable disease left without surgery (+7.8%), R0 surgery and amputation (+6.4%), and R1 (+0.2%) and R2 (+3.6%) surgeries. It can be assumed that higher percentage of R0 surgery in TChT group was caused by success of neoadjuvant (induction) treatment, but this success also could be contributed to the impact of systematic bias rather than to effect of HT (see Table 11) because total weight of induction treatment distortion was higher than the received effect (18.5% versus 8.5%). In turn, impact of surgery is only a smaller part of the total distortion which exceeds 90% and greatly overweights the received increment of LPFS relative increment 20%–25%.

It’s absolutely obvious that with such significant systematic bias, the effect of the trial cannot be attributed to HT, and it’s impossible to exclude that without HT the result in this arm would be even better because HT treatment was associated with high toxicity.

Analysis of toxicity (Table 12) shows that toxicity in TChT group was increased significantly: general toxicity rate was 3 times higher (78.5% versus 22.5%) and severe toxicity rate was 20 times higher (24% versus 1.2%) than in ChT arm. It is especially significant to note that this rise of toxicity was minimally conditioned by potentiation of ChT toxicity (factor 1.2–1.5). The major part of the toxicity was the own toxicity of hyperthermia: thermometry complications, burns, tissue necrosis, pain, pressure of the bolus, and others. In this regard, the authors’ conclusion looks irrelevant: “Our results indicate that regional hyperthermia combined with the three-drug-regimen EIA can be given safely with moderate toxicity”. The impact of this “moderate toxicity” to the course of the trial could be traced. During induction treatment, full HT treatment (7-8 sessions) had been performed at 76% patients, 20% of patients had received 1–6 sessions, and 4% of patients had not received any session. During adjuvant HT, full HT treatment had been performed at 36% patients, 17% of patients had received 1–6 sessions, and 38% of patients had not received any HT session. Authors declared toxicity as the only reason of nonreceipt of HT treatment. Therefore, this “moderate” toxicity was HT limiting in 24% of untreated patients and 55% of impaired patients (factor 2.3). Critical toxicity which forces to cancel HT treatment had risen for 9.5 times (4% to 38%) in impaired patients. This level of toxicity is unacceptable for clinical practice.

Finally, we compared the clinical results of the trial with data of Sarcoma Meta-analysis Collaboration (SMAC) [38]. SMAC data are derived from 14 randomized trials made in 1973–1990 on 1568 patients with high-grade sarcomas of extremities and trunk. All the patients had definitive surgery followed by adjuvant RT (47%) and adjuvant doxorubicin-based ChT (100%). Comparing to Issels et al. trial, this sample had 14% more STS of extremities (58% versus 44%), 10% more surgeries (100% versus 90%), 16% less RT (47% versus 63%) and did not have neoadjuvant ChT (see Table 13). The overall impact of all the distortions could be considered nearly equal.

Figure 12 demonstrates that clinical results of Issels et al. trial are uniformly worse than SMAC results. The most impressive fact that even the best results in TChT arm are worse than SMAC results in control arm, despite this arm did not have ChT at all. Therefore, the clinical value of Issels et al. result is minor. Thus, it could be concluded that after correction to systematic bias, the long-term effect of the Issels et al. trial is dubious the trial is clinically insignificant. Toxicity level of the treatment is unacceptable for clinical practice. But according to the authors’ opinion, “regional hyperthermia combined with preoperative or postoperative chemotherapy should be considered as an additional standard treatment option for the multidisciplinary treatment of locally advanced high-grade STS” [44]. This is an extremely doubtful conclusion.

It should be specially expressed that systematic bias of Issels et al. trial was not intended and it was not incorporated in the design of the study initially. In fact, the trial has a brilliant design and has been reported excellently. It seems that the problem of the study is rather a common problem of all prospective trials, when investigators pay excessive attention to study group and much less attention to the control group. As a result, the volume of treatment in control group could decrease so much that the groups become incomparable. Taking into account the hard and complex protocol of the trial, its multicenter design, large sample size, and long term of the trial, this defect was virtually inevitable. Probably, the designed “the most effective” treatment protocol appeared too hard to fulfill, and the control group was unintentionally sacrificed. Anyway, this does not excuse investigators which just have not noticed this great systematic bias reporting the results (defect of interpretation).

Resuming, hyperthermia of deep-seated tumors could be effective only versus inadequate comparator. In correct design of a trial, hyperthermia is not effective at all or not effective enough to prove its obvious disadvantages: toxicity and labor-intensity. Clinical efficacy of hyperthermia of deep-seated tumors is still not proven in randomized trials.

4. Whole-Body Hyperthermia

The fact that there is only one phase III randomized trial on WBH is very demonstrative itself, because WBH has much longer history of application in oncology than local hyperthermia. Results of multiple phase II WBH trials usually do not justify initiation of III phase trial. Bakhshandeh et al [39] trial is demonstrative in this respect (see Table 14). In this II phase trial, 20% partial remission and 20% 2-year survival in 27 patients with I-II stage malignant pleural mesothelioma was shown after TChT (ICE + WBH); extensive myelosuppression (75% of 3-4 grade) with 3.7% mortality was reported. Meanwhile, it’s known that efficacy of majority of chemotherapies is also 15%–20% but on heavier samples, and efficacy of gemcitabine+cisplatin combination had demonstrated 48% partial remission with less toxicity (not more than 30%–40% of 3-4 grade toxicity) [45]. Thus, Bakhshandeh et al. [39] phase II study had showed more than dubious clinical efficacy with undisputedly higher toxicity, that is could hardly be considered a basis for further studies. Anyway, authors had considered these results “promising” and initiated III phase trial. Preliminary results of this predictably negative randomized phase III study had exceeded the expectations and were sharply negative [40]. WBH did not improve the results of chemotherapy, but significantly worsened them in all respects: a half less PR (15% versus 31% in ChT only arm), significant decrease of OS (11.5 months versus 15 months) and DFS (5.6 months versus 9.2 months) had been reported. It should be mentioned that this phase III trial was carried out on easier sample than previous phase II trial (WHO 0–II instead of I–III in phase II trial) and with 10% less ChT dose. This had allowed to reduce myelotoxicity significantly (36% versus 74% in phase II) and to avoid deaths, but it also had led to a reversal of the clinical result: previously dubious results had become clearly negative. Authors concluded that “this preliminary data from a randomized study show little, if any, beneficial effect mediated through hyperthermia” and that “conclusive judgment has to be postponed until completion of this trial”, though in fact they just had to stop the trial. Also, the results did not prevent the authors from publishing in 2005 a review of the current state of WBH, which reports intention of Interdisciplinary Working Group on Hyperthermia to build clinical guidelines on the basis of “promising results of phase II trials” as well as on the basis of this phase III trial [46].

The general impression is that the combination of ChT with extreme WBH can, in some cases (20–40%), overcome chemoresistance and provide a partial remission, but without any effect to overall survival. Also, clinical efficacy seems to be reversely connected with toxicity: a clinical benefit is associated with high toxicity; toxicity reduction leads to inefficacy or worsens effect of ChT. Since the results obtained in TChT studies never exceed the best results without WBH, there is a concern on feasibility of WBH at all, since similar or better effect can be obtained by applying high-dose ChT or polychemotherapy at a lower level of toxicity.

WBH Guidelines published in 2000 by Universities of Luebeck and Wisconsin were more than cautious in terms of efficiency and safety of WBH. In particular, it is postulated that efficiency of WBH is only supposed and is based on very limited clinical data; that separate administration of WBH does not make sense because it provides only a minimal increase in overall survival (days, maximum weeks) and only with thermosensitive tumors [47]. These guidelines are intended for research only.

The paper of HI Robins, the former Head of the WBH program at the University of Wisconsin, immediately preceding these guidelines, was even more skeptical [48]. It is noteworthy that Robins, who was the Chairman of the International Systemic Hyperthermia Oncological Working Group and published over 80 articles on WBH since 1983, had completely stopped his activity in hyperthermia field and did not publish any paper on the topic since 2003. With such sudden and complete cessation of research activity on WBH, one can assume that the true result of this 20-year activity was not encouraging.

5. Biases of Hyperthermia Trials

The most common biases of hyperthermia randomized clinical trials are summarized in Table 16. Inadequate comparator is the most often and significant bias in RT-based HT trials [11, 21, 24]. Standard RT has its special efficacy which significantly and not proportionally falls with lowering of total dose. If HT is added to such low-dose RT, it causes some gain in local effect but in comparison to effect of the standard high-dose RT, this HT-added effect is at least not better [5, 7, 8, 20] and often is worse [9, 22], sometimes significantly [15]. At the same time, toxicity of TRT is usually 3–5 times higher than toxicity of RT only. Labour-intensity of TRT is much higher than that of RT only, no matter how van der Zee et al. [21] tried to convince us of the opposite. The main problem is that TRT versus standard high-dose RT is not effective because RT itself is much more potent factor than HT, and HT effect disappears at high-dose RT. The inadequacy of comparator in Issels et al. trial [23] is of another nature and caused by the less volume of treatment in the control arm as it was discussed above.

Obvious defects of randomization were revealed in Jones et al. [12] and Vernon et al. [10] trials. In Jones trial there was also an open preselection of patients, which is considered as a bias because the resume of the trial refers to all patients and not limited to “heatable” patients only. Another hidden type of preselection of aged patients was revealed in Harima et al. trial [24] where not pretreated patients in study group were 10 years older than expected age of the first diagnosis in Japan. Three trials had incorrect design. Overgaard et al. trial was in fact a clinical radiobiological trial without clinical significance. Vernon et al. and van der Zee et al. trials combined some different sub-trials with incompatible protocols, different equipment, and so forth.

Also, data in the majority of the trials are presented incompletely, and virtually all the trials suffer from inadequate analysis. This refers not to positive trials only. For example, the negative Vasanthan trial is reported and analyzed very poor. The authors had just refused to analyze the possible reasons of significantly enhanced mortality in IIb stage group though this is of the great interest. The analysis of reasons of negative trials of 90s also was incomplete and incorrect, as it will be discussed below.

The problem of sponsorship influence deserves a special attention. As it is known from literature, the clinical trials sponsored by industry have at least 5 times more probability to be successful (positive) compared to independent trials. As it is obvious from Table 15, independently sponsored HT clinical trials always reported no significant effect. On the contrary, the trials sponsored by hyperthermia societies were successful in majority of cases with only two exclusions. Bladder and rectum cancer groups in van der Zee trial with negative and dubious results correspondingly were just hidden by understatement and by referring the entire trial as successful. The extremely negative intermediate result of Bakhshandeh et al. trial was reported only once at ASCO meeting and then was hidden by understatement. The final result of the trial is absent.

There is a serious interpretational bias. Namely, hyperthermia community tends to consider the negative trials of early 90s as not significant because of insufficient heating and imperfect technique. This is absolutely incorrect. All the modern hyperthermia technologies were introduced before 1990: microwave superficial heating (433 MHz, 915 MHz, 2.4 GHz, etc.) and capacitive 13,56 MHz heating (LeVeen) are in use since late 70s, APAS technology of BSD is in use since 1982 and 8 MHz capacitive technology of Thermotron is commercially available since 1985. Erasmus University Hyperthermia Center uses 433 MHz technology since 1985 to the date [13]. All the randomized trials of early 90s were executed in leading US universities with the best available equipment. Therefore, the technique of heating in these trials was adequate from the modern look. It’s confirmed by high temperature reached in these trials. For instance, in Kapp et al. trial [7] the minimum temperature in superficial tumors was 40.2°C, average 42.5°C, and maximum 44.8°C. Modern guidelines of Erasmus University [13] for superficial tumors recommend to reach minimum temperature 40°C and maximum 43-44°C. It should be considered that in terms of heating and technique, the negative trials of early 90s were absolutely adequate.

Finally, the publication bias is significant. Seven “positive” trials are well reported, frequently quoted by hyperthermia society and included in all meta-analyses and reviews. Some of them are published some times [21, 25, 26]. On the contrary, the negative trials are poorly quoted and often not mentioned in meta-analyses and reviews. This creates the wrong impression of hyperthermia success.

6. Hyperthermia Problems

Despite more than 100 years of development, hyperthermia still does not have an acceptable explanation. Current hyperthermia concept is based solely on the temperature, but clinical results often directly contradict this concept (see Table 17). Particularly, the significantly stronger radiotherapy modification effect for smaller tumors [5, 11] (less than 3-4 cm) is unexplainable from the thermal concept of hyperthermia. Perez et al. [5] explained that “they are easier to heat,” and this explanation was commonly accepted, but already in 1963 Crile Jr. [49] had convincingly demonstrated that, vice versa, bigger tumors could be heated much easier than smaller ones. This difference is very simple to understand because the main predictor of heating is tumor blood flow, which is high enough in small tumors and significantly reduced in big tumors, which act as “heat trap.” Also, small tumor is cooled effectively enough by high blood flow of surrounding healthy tissues. Hiraoka et al. trials confirmed that bigger tumors are heated better than smaller ones [50] and, at the same time, smaller tumors are cured better with HT [51]. Thus, this phenomenon clearly shows inconsistency of thermal concept of HT: the better heated tumors show worse clinical effect. Instead of initiating discussions on the validity of thermal concept of radiomodification, all the authors [5, 8, 9, 11, 20] had made the simplest and presumably wrong conclusion about the better heating of smaller tumors. This wrong conclusion led to logical consequence that insufficient heating is the reason of the trials fail and that improvement of heating technology could correct the situation.

Results of 3 randomized clinical trials published before 1996 (Kapp et al. [7], Emami et al. [8] and Engin et al. [9]) had blocked the only possible thermal explanation of Perez et al. [5] trial fail: one could hypothesize that 2 HT sessions are not enough for demonstration of HT effect. These trials clearly showed that longer protocols with 6 and 8 HT sessions are not more effective and even could worsen the effect [9]. Though Engin et al. [9] had found that some temperature parameters (namely, median minimum tumor temperature and minimum tumor temperature during the first heat treatment) were prognostic factors predictive of duration of response, though, together with tumor volume, the role of temperature is not confirmed in the majority of the trials. As de Bruijne et al. [18] has clearly showed, after correction to tumor volume, there is no correlation of any temperature parameter with any clinical outcome. Kapp et al. [7] also did not find such dependence: only tumor histology, radiation dose, and tumor volume had correlated with duration of local control. Complete response rate seemed to be not correlated with temperature parameters at all [7, 9].

These results heavily affected the concept of thermal dose offered by Oleson and developed by Sapareto and Dewey [52] in mid 80s. The explanation of long protocols fail was extremely weak: thermotolerance was called the reason. It seems to be incorrect because thermotolerance pattern is well known since early 60s [49]: it mainly falls to the initial level in 72 hours. Therefore, HT sessions 2 times a week, as it was in all the trials, should not be affected by thermotolerance. Some subsequent hyperthermia trials of 2000s [12, 23] also used 2 times per week protocols.

Thus, five negative clinical trials of 1990–1996 (see Table 15) were interpreted incorrectly in terms of the reason of the fail: instead of revision of hyperthermia rationale, “insufficient heating” concept was offered. It would be incorrect to say that these results of the randomized trials were surprising: as it’s clear from Hornback paper quoted above [3], clinical oncologists had made their unambiguous decision about hyperthermia on the basis of previous clinical results already in mid 80s. Together with fail of another RTOG deep hyperthermia trial, these trials results led to disappointment of medical community in oncological hyperthermia.

Temperature analysis of cervical cancer studies also gives contradicting results. Average temperature in Vasanthan et al. [15] trial was the highest among main three cervical cancer trials (41.6°C versus 40.6°C in Harima et al. [24] trial and estimated <40°C in van der Zee et al. [21] trial), but effect of TRT in Vasanthan trial was worse than RT only, though in other two trials with lower temperature, the effect of TRT was significantly better than in RT control. Also, within Vasanthan et al. [15] study, extremely low average temperature was used in Pusan subgroup (38.1°C) but 2-year local control in this subgroup was the same as in Chennai and better than in Kiev subgroup, where much higher average temperatures were used (41.8°C and 42.0°C, resp.).

As it was stated above, there is no temperature analysis in van der Zee et al. [21, 25, 26] trial but there is doctoral thesis of Fatehi [37] from Rotterdam University who was a coauthor in later DDHG study [53]. His patients were collected in 2000–2002, that is, 4 years after completion of van der Zee trial. This paper refers to technical quality of deep hyperthermia using BSD-2000 unit on rectal, bladder, and cervical cancers (see Figure 14). It’s known from van der Zee paper that it was Rotterdam University Hospital with its BSD-2000 unit, which was responsible for larger part of patients enrolled in DDHG study. Therefore, technical results of Fatehi could be considered relevant. It’s easy to see that temperature in cervix is much lower than in rectum and bladder (see Figure 14), but it was cervical cancer which was effectively treated with TRT whereas TRT of rectum and bladder cancers was not effective [21]. Finally, in 2011 De Bruijne et al. [18] had convincingly demonstrated in retrospective study that, after correction to tumor size, CEM 43°C thermal dose was not associated with any clinical endpoint (CLR, LDFS, OS).

Thus, even the central point of hyperthermia concept—the temperature—has got many contradictions. This means that in fact hyperthermia does not have a theoretical base. Clinical results show that hyperthermia is in a dead end. Program papers on hyperthermia show that opinion leaders do not understand what to do and where to move, once again supposing only old thermal solutions [54, 55] which should be discredited already since mid 90s.

Nevertheless, multiple publications of positive trials, reviews, and meta-analyses create an impression of hyperthermia renaissance. The most impressive papers report the history of hyperthermia as a history of uniform success, do not mention negative results at all, and declare heating as the only and exclusively technical problem of hyperthermia [56]. Such approach looks a little biased.

7. Conclusion

The careful analysis of 14 randomized clinical trials does not confirm a clinical benefit of hyperthermia application independently of its type: superficial or whole-body. We have not found any positive trial not affected with biases. With correction to the distortions, there is no any trial with obvious long-term positive effect of hyperthermia. Effect of hyperthermia could be shown in experimental setting and in experimentally designed clinical trials or versus inadequate comparator. In clinical setting and correct study design, hyperthermia is not effective at all or not effective enough to prove its obvious disadvantages: toxicity and labor intensity. Hyperthermia thermal concept seems to be irrelevant. Nevertheless, multiple publications of positive trials, reviews, and meta-analyses create an impression of hyperthermia renaissance.

Conflict of Interests

The author is the General Consultant of OncoTherm Group in Russia and CIS countries and Director of National Medical Corporation Inc. which sells oncothermia devices and, therefore, has a direct financial relation with OncoTherm Group.