How to Measure “Short-Term Hormonal Effects”?
Background. Interest to assess short-term benefits or risks of sex-steroid hormone use (OC or HRT) exists for years. However, no validated scale is available to evaluate the broad array of described effects of short-term hormone use. Methods. A raw scale consisting of 43 specific items and 47 general data was developed. Surveys in Italy, Germany and Austria were performed and data analyzed by factorial analyses. The resulting new scale with 15 items underwent reliability and validity investigations. Results. The new scale consists of 15 items in 5 domains. Internal consistency reliability coefficients were satisfactory as were test-retest reliability coefficients. Content and concurrent validity were promising. Conclusion. Psychometric properties of the new scale suggest good characteristics to measure short-term effects of sex-steroid hormones in women. The scale seems to be appropriate, feasible, interpretable, reliable, and valid for their application as PRO scale.
For years, there has been an interest to evaluate short-term benefits or risks of sex-steroid hormone use in women, and many studies using nonvalidated scales were published. No validated scale was available that met the methodological guidelines for patient-reported outcome scales , that is, meeting the “state-of-the-art” psychometric requirements relevant for health-related quality-of-life (HRQoL) scales.
A variety of shorter or longer simple symptom lists were developed and applied in clinical studies, that is, only subjectively comparing the situation before and after hormone treatment in women. Such lists covered a broad array of clinically discussed short-term effect of hormone use such as skin and hair effects, effects on breast, on menstrual cycle/bleeding pattern, on aspects of sexual life, on psychological problems/symptoms, and on vegetative complaints associated with sex hormone use which will be discussed later.
The aim of this paper is to clarify if a validated measurement scale for short-term use of sex hormones can be developed and if yes, how the diagnostic characteristics such as reliability and validity of measurement might be.
2. Material and Methods
Pertinent literature was scrutinized to get suggestions about potential self-perceived short-term benefits or adverse effects associated with the use of sex-steroid hormones such as sexual dysfunction particular during menopausal transition, problems of menstrual bleeding in general, hair and skin problems including breast tension/pain, and increasing body weight and also positive effects of hormones on premenstrual syndrome complaints were reported. Relief of menorrhagia, shortening of duration of bleeding , or treatment of dysfunctional uterine bleeding  including improvement of iron deficiency are commonly accepted benefits. The effects of controlling acne in combination drugs with certain progestagens [4, 5], or improvement of seborrhea [6, 7], were documented. More debates were found about hormonal effect on bleeding cycle-related disturbances [8–10].
In addition, standardized HRQoL and other scales formerly used in clinical studies related to sex hormone use were reviewed to complete the array of interesting items potentially related with hormone use. This included the Menopause Rating Scale , the Quality of Sexual Function Scale,  the Female Sexual Function Index (FSFI) , the Derogates Interview for Sexual Functioning (DISF) , and the Female Sexual Distress Scale (FSDS) . This included also experience gathered in studies with the Aging Males Symptoms Scale (AMS) .
Own observational (prevalence) surveys in women (patients) with/without hormone use identified many differences in complaints between treated and untreated women and thereby candidates for items of a new scale (unpublished report) as well as a large, observational study in 9 countries in 4 continents  contributed also to the identification of possible problems or concerns of women (unpublished report).
The preparatory work resulted in a conceptual framework with five arbitrarily defined groups of items that could be relevant for a new validated scale (Table 1): psychological complaints (), somato-vegetative (), menstrual disorders (), sexual items (), and complaints related to hormone-sensitive organs (). The resulting raw scale consisted of 43 specific items (suspected short-term effects of sex steroid hormones), and of 47 general information (medical and reproductive history, demographic data) needed for the interpretation and item reduction.
The format of the new scale was planned as paper-based scale. Response categories at a Likert scale from 1 (= no, never) to 5 (= yes, severe) describe the personally perceived severity (or intensity) of complaints (items). All specific items were phrased in a negative direction (complaints) following own experiences with the development of other HRQoL scales [11, 12, 16]. If an item is not relevant, for example, the question of problems concerning menstrual bleeding in case of absence of menstrual cycle, “0” (no, not applicable) can be checked.
Thus, the raw scale consisted of an introduction, two examples as how to answer the questionnaire correctly, and the 90 items to be answered. The English version of the raw scale underwent a linguistic and cultural adaptation into Italian and German languages.
The statistical analyses are based on factorial analysis (main component analysis), Cronbach’s alpha coefficient for internal consistency reliability, and test-retest-reliability. The statistical package SPSS 10 was used.
3. Results and Discussion
The initial normative survey (Italy) involved 228 women aged 15–65 years. This was a sample of the normal female population, that is, not only women using hormones. This approach was chosen to get standard or norm values of the female population.
In a few steps of factorial analysis the number of items could be reduced and the domains of the final scale with 15 items determined.
Five dimensions (domains) were found, as similarly predicted in the conceptual framework (Table 2): sexual problems (SEX), menstrual problems (MENS), hormonal effects (HORM), psychological problems (PSYCH), and abdominal complaints (ABDOM). Table 2 summarizes the findings. For easy recognition and interpretation, only factor loadings over 0.5 were displayed.
In an independent population survey in Germany (), the same 5 domains were observed with factor loadings very similar if not identical with the Italian sample (data not shown here).
This supports the notion that the new scale and its domains have a quite good face or content validity confirmed in two independent studies.
The internal consistency reliability—measured with Cronbach’s Alpha—was good () for the total scale and acceptable for the five domains with one exception: 0.73 on PSYCH, 0.73 HORM, 0.78 for MENS, 0.81 SEX, but unsatisfactory for ABDOM(= 0.53). The latter needs further research. The results were confirmed in the German validation study.
Additional information provided the item-domain correlation that showed strong associations of all items with the respective domains, however, with one exception: the items “cyclic bleeding from guts or bladder” with the domain “abdominal complaints” with a coefficient of 0.39.
Another aspect of reliability, test-retest reliability, was tested in a validation survey in Germany (): a very good reliability was observed for the total score (), and for the five domain scores: 0.83 for PSYCH, 0.85 for HORM, 0.93 for MENS, 0.72 for SEX, but unsatisfactory for ABDOM(= 0.62). The correlation coefficients were also good across almost all items of the scale. The test-retest reliability study confirmed what has been shown in the two studies with analyses of Cronbach’s alpha.
The first step of validation is the comparability of the internal structure (dimensions) of a new scale throughout independent factor analyses and compatibility with the conceptual framework. This is indicative with a good face- or content-validity.
Since the SHE scale was designed also as health-related QoL scale with specific focus on short-term hormonal effects, we were particularly interested in evidence that the SHE scale really measures quality of life: the SHE total score significantly correlated with the generic QoL scale SF-12 (total, physical, and mental health score)  in the German survey and similarly in an Austrian sample. The correlation coefficients were significant but not high, that is, ranging between and ).
Other significant but low correlations were observed between domains of the SHE-scale and the domain anxiety of the HADS  as well as with the domain psychosomatic QoL of the QSF in the German survey (ranging between and ) .
The psychological domain of the SHE scale showed—as the total score—significant association with mental health and total score of SF-12, anxiety (HADS), and psychosomatic QoL (QSF) (range: –). The hormone-related domain showed correlations with SF-12 as well as QSF (range: –). The highest correlation of the abdominal domain of SHE was observed with SF-12 (total and physical health domain) (). The menstrual domain showed only a weak correlation with SF-12. The sexual domain score was correlated with anxiety (HADS) and with psychological QoL (QSF) (range to ).
Altogether, the SHE scale with its total and domain scores seem to be correlated with other scales intending to measure a similar content.
3.4. Ability to Detect Changes
Since the SHE scale was not yet applied in treatment-related observational or the randomized clinical studies, there are no data to describe responsiveness or MID.
The next step will be an analysis of the sensitivity of the SHE scale to detect the effects of hormonal treatment. Therefore, the SHE scale should be included in relevant observational treatment studies or randomized clinical trials, that is, including also independent outcome variables for validation in the study.
The complete design/wording of the SHE scale (in German, English, Italian languages), the evaluation procedure, and reference (norm) values from the Italian, German, and Austrian population sample is openly accessible in the official website for the SHE scale (http://www.short-term-hormone-effects-scale.info). The scale can be used free of charge.
The newly developed SHE scale could close a gap for clinical research to measure short-term effects of sex-steroid hormones in women that were widely applied to demonstrate differences between relevant drugs. In the past, however, simple symptom lists based on the retrospective perception of an “improvement of conditions/complaints after therapy” or other not validated instruments were used as argument that specific formulations of sex-steroid hormones are better than others. Such not validated questionnaires lead to unreliable “benefits.” Although until now no validated scale was available meeting the FDA requirements for PRO scales, there is great interest of the industry to demonstrate “additional short-term benefits” of a newly developed drug containing sex steroid hormones in women because so many drugs are already on the market.
The assessment of the properties of the SHE scale is indicative of good characteristics to measure short-term effects of sex-steroid hormones in women. The scale seems to be appropriate, feasible, interpretable, reliable, and valid for their application as PRO scale. Data to assess responsiveness and sensitivity of the scale as outcome measure of hormone treatment are still lacking.
This validated scale can be recommended for practical use in comparative studies in order to avoid misjudgment concerning “benefits” provided by nonvalidated symptom lists with subjectively perceived “improvement” of drug A over drug B. As self-administered scale, the self-completion of the 15-item-scale takes less than 7 minutes on average.
None declared for this methodological study. The first normative survey in Italy was financially cosponsored by a company that produces sex-steroid hormones.
A. O. Arowojolu, M. F. Gallo, D. A. Grimes, and S. E. Garner, “Combined oral contraceptive pills for treatment of acne,” The Cochrane Database of Systematic Reviews, no. 3, 2004.View at: Google Scholar
H. Thorneycroft, H. Gollnick, and I. Schellschmidt, “Superiority of a combined contraceptive containing drospirenone to a triphasic preparation containing norgestimate in acne treatment,” Cutis, vol. 74, no. 2, pp. 123–130, 2004.View at: Google Scholar
J. Huber, J. M. Foidart, W. Wuttke et al., “Efficacy and tolerability of a monophasic oral contraceptive containing ethinylestradiol and drospirenone,” European Journal of Contraception and Reproductive Health Care, vol. 5, no. 1, pp. 25–34, 2000.View at: Google Scholar
K. Heinemann, A. Ruebig, P. Potthoff et al., “The menopause rating scale: a methodological review,” Health and Quality of Life Outcomes, vol. 2, p. 45, 2004.View at: Google Scholar
L. A. J. Heinemann, P. Potthoff, K. Heinemann, A. Pauls, C. J. Ahlers, and F. Saad, “Scale for Quality of Sexual Function (QSF) as an outcome measure for both genders?” Journal of Sexual Medicine, vol. 2, pp. 82–95, 2005.View at: Google Scholar
R. Rosen, C. Brown, J. Heiman et al., “The Female Sexual Function Index (FSFI): a multidimensional self-report instrument for the assessment of female sexual function,” Journal of Sex and Marital Therapy, vol. 26, no. 2, pp. 191–208, 2000.View at: Google Scholar
L. R. Derogates, “The Derogates Interview for Sexual Functioning (DISF)/DISF-R): an introductory report,” Journal of Sex and Marital Therapy, vol. 23, pp. 291–296, 1997.View at: Google Scholar
R. C. Rosen, “Assessment of female sexual dysfunction: review of validated methods,” Fertility and Sterility, vol. 77, supplement 4, pp. S89–S93, 2002.View at: Google Scholar
I. Daig, L. A. J. Heinemann, S. Kim et al., “The Aging Males' Symptoms (AMS) scale: review of its methodological characteristics,” Health and Quality of Life Outcomes, vol. 1, p. 77, 2003.View at: Google Scholar
K. Heinemann, A. Rübig, A. Strothmann, G. G. Nahum, and L. A. J. Heinemann, “Prevalence and opinions of hormone therapy prior to the women's health initiative: a multinational survey on four continents,” Journal of Women's Health, vol. 17, no. 7, pp. 1151–1166, 2008.View at: Publisher Site | Google Scholar
J. E. Ware Jr., M. Kosinski, and S. D. Keller, “A 12-item short-form health survey: construction of scales and preliminary tests of reliability and validity,” Medical Care, vol. 34, no. 3, pp. 220–233, 1996.View at: Google Scholar
A. S. Zigmond and R. P. Snaith, “The hospital anxiety and depression scale,” Acta Psychiatrica Scandinavica, vol. 67, no. 6, pp. 361–370, 1983.View at: Google Scholar