Background. From Ebola, Zika, to the latest COVID-19 pandemic, outbreaks of highly infectious diseases continue to reveal severe consequences of social and health inequalities. People from low socioeconomic and educational backgrounds as well as low health literacy tend to be affected by the uncertainty, complexity, volatility, and progressiveness of public health crises and emergencies. A key lesson that governments have taken from the ongoing coronavirus pandemic is the importance of developing and disseminating highly accessible, actionable, inclusive, coherent public health advice, which represent a critical tool to help people with diverse cultural, educational backgrounds and varying abilities to effectively implement health policies at the grassroots level. Objective. We aimed to translate the best practices of accessible, inclusive public health advice (purposefully designed for people with low socioeconomic and educational background, health literacy levels, limited English proficiency, and cognitive/functional impairments) on COVID-19 from health authorities in English-speaking multicultural countries (USA, Australia, and UK) to adaptive tools for the evaluation of the accessibility of public health advice in other languages. Methods. We developed an optimised Bayesian classifier to produce probabilistic prediction of the accessibility of official health advice among vulnerable people including migrants and foreigners living in China. We developed an adaptive statistical formula for the rapid evaluation of the accessibility of health advice among vulnerable people in China. Results. Our study provides needed research tools to fill in a persistent gap in Chinese public health research on accessible, inclusive communication of infectious diseases’ prevention and management. For the probabilistic prediction, using the optimised Bayesian machine learning classifier (GNB), the largest positive likelihood ratio (LR+) 16.685 (95% confidence interval: 4.35, 64.04) was identified when the probability threshold was set at 0.2 (sensitivity: 0.98; specificity: 0.94). Conclusion. Effective communication of health risks through accessible, inclusive, actionable public advice represents a powerful tool to reduce health inequalities amidst health crises and emergencies. Our study translated the best-practice public health advice developed during the pandemic into intuitive machine learning classifiers for health authorities to develop evidence-based guidelines of accessible health advice. In addition, we developed adaptive statistical tools for frontline health professionals to assess accessibility of public health advice for people from non-English speaking backgrounds.

1. Introduction

From Ebola, Zika, to the novel coronavirus pandemic, outbreaks of highly infectious diseases continue to reveal severe consequences of social and health inequalities in both developing and developed countries [1, 2]. Vulnerable people are affected more by the uncertainty, complexity, volatility, and progressiveness of public health crises and emergencies [3, 4]. A key lesson that governments have taken from the ongoing coronavirus pandemic is the importance of developing disseminating highly accessible, inclusive, actionable, coherent public health advice [58], which represents a critical tool to help people with diverse cultural, educational backgrounds and varying abilities to effectively implement health policies at the grassroots level. High variability in people’s socioeconomic background, education, and health literacy levels, varying intellectual, cognitive abilities, English proficiency, and religious beliefs can cause barriers to access and implement public health advice under health emergencies and crises [911]. Increasing inclusiveness and accessibility of health advice recommendations among diverse vulnerable populations has emerged as an important topic in both public health education and domestic and international policy making as a highly cost-effective measure to reduce health inequalities [1215].

Currently, there is a lack of national or international guidelines around the development of accessible and inclusive public health advice, especially for health emergencies and crises. However, the recent outbreak of coronavirus has prompted national health authorities to develop highly accessible health resources on COVID-19. Most current public-oriented health resources on infectious diseases belong to regular health resources (RHR) requiring higher levels of education, English proficiency, health knowledge, and literacy [1619]. Typical RHR are resources published by the World Health Organisation which are intended for both general and professional readerships [2022]. However, the practical accessibility of WHO health resources among the public remains unknown, as the international health organisation recently embarked on surveys to establish evidence of the accessibility of its public health resources [20].

Vulnerable people-oriented (VPO) health resources are known for their significantly improved language understandability, information relevance (minimal distracting details irrelevant to the readers), social inclusiveness (applicability among diverse people), and information actionability [2325]. They are developed by medical/health professionals with extensive experiences of working firstly with diverse vulnerable populations to ensure the practical usability of VPO. In English-speaking multicultural societies, intralingual health translations and simplified or accessibility-enhanced English resources provide main sources of public health advice and information for diverse vulnerable populations [8, 2628], although evidence-based guidelines to inform the development of these materials and associated quality control measures are yet to be established and validated.

Increasing amounts of VPO health resources in English, known as easy-to-read or easy read health materials, provide valuable firsthand materials for the development of assessment instruments and techniques to support best practices in the clinic, as well as global health policy making around accessible health recommendation design and social dissemination. The development of language-adaptive evaluation instruments can facilitate evidence-based health policy making by international health authorities. For health policy makers, we developed supervised Bayesian machine learning classifiers, and for frontline health professionals we developed convenient, language-adaptable statistical analysis tools to evaluate the accessibility of public health advice on top infectious diseases, including COVID-19.

It should be noted that the machine learning classifiers and statistical tools that we developed using Chinese health resources can be conveniently and reliably adapted to other languages using low-cost, relatively easy-to-obtain natural language annotation tools. In doing so, our study provides practical and useful tools for clinical and health professionals working directly with vulnerable people. The intuitive Bayesian probabilistic evaluation tool that we developed will help advance research-based global accessible health advice design and international benchmarking for the pandemic which continues to spread in many developing countries and any future public health crises that require rapid, effective, and efficient response from governments and health authorities to address the practical needs from diverse vulnerable populations and help minimise the environmental and health impacts on them.

2. Methods

2.1. Collection and Translation of Regular and Vulnerable People-Oriented Health Resources

We collected two sets of stylistically distinct public health advice on the prevention and self-management of infectious diseases. Regular health resources (RHR) (202) were selected from the website of the World Health Organisation (WHO). Vulnerable People-Oriented (VPO) materials (91), especially public health advice on COVID-19, were collected from websites of health authorities, including Centers for Disease Control and Prevention (CDC), Australian Ministry of Health and Public Health England. These resources were easily recognisable due to being labelled clearly as easy-to-read public health advice and instructions on COVID-19. As an international health authority, WHO provides verified professional translations of original English resources. Since our study was to develop assessment tools to evaluate accessibility of public health advice in Chinese, Chinese translations of regular WHO health advice on infectious diseases were collected. VPO resources collected from sources of CDC, the Australian Ministry of Health and Public Health, England, were translated into Chinese using forward and backward translation recommended by the WHO [2931].

2.2. Morphological-Lexical-Structural (MLS) Features

The Chinese translations of RHR and VPO were annotated using Chinese Readability Index Explorer (CRIE) [32, 33]. CRIE annotation provided 26 morphological-lexical-structural (MLS) features and 46 part-of-speech (POS) features (72 in total) of the annotated Chinese texts for our machine learning classifier development. MLS included average sentence number per paragraph, type token ratio (TTR), low-stroke characters (1–10 strokes), middle-stroke characters (11–20 strokes), high-stroke characters (21 or above), average strokes per character, 2-character words, 3-character words, average words per sentences, ratio of noun phrases, normalised frequency of noun phrases, average number of idioms per sentence, content words (verbs, nouns, adverbs, and adjectives), adverbs of negation, sentences with complex semantic (polysemous) categories, density of content words, average logarithmic frequency of content words, pronouns, personal pronouns, conjunctions, positive conjunctions, negative conjunctions, and difficult words ratio. These MLS features were studied extensively in Chinese readability research.

2.3. Parts of Speech (POS) Features

A POS tagging system developed by Academia Sinica as one of the most comprehensive automatic analysers of Chinese was applied. The POS features collected included nonpredicate adjective (A), coordinate conjunction (Caa), conjunctions (Cab), conjunctions (Cba), correlative conjunctions (Cbb), adverbs (D), nominal/adverbial/complement markers (DE), adverbial noun-modifiers (Da), adverbs of degree before verbs (Dfa), adverbs of degree after verbs (Dfb), tense markers (Di), sentential adverbs (Dk), interjections (I), common nouns (Na), proper nouns (Nb), geographical names (Nc), location names (Ncd), time adverbs (Nd), attributive adjuncts (Nep), modifiers of quantitative measures (Nes), quantities (Neu), measure words (Nf), post-positions (Ng), pronouns (Nh), prepositions (P), Be verbs (SHI), auxiliary words (T), intransitive predicates (VA), causative verbs (VAC), transitive verbs placed after objects (VB), transitive verbs (VC), verbs placed before locations (VCL), predicates used with both direct and indirect objects (VD), action verbs as sentence objects (VE), action verbs as predicate objects (VF), classification verbs (VG), modifiers of verbs and nouns (VH), causative modifiers (VHC), adjectives or past particles placed after objects (VI), transitive verbs to describe states (VJ), mental states and processes (VK), causative predicates (VL), and have verbs (V_2).

2.4. Statistical Analysis

Table 1 shows that, among the 72 natural language features (26 MLS, 46 POS), statistically significant differences ( of nonparametric Mann-Whitney U tests) between regular health resources (RHR) and vulnerable people-oriented (VPO) health resources were present in 88.9% of the entire feature set: 96.15% (25/26) of morphological-lexical-structural (MLS) features and 84.78% (39/46) of part-of-speech (POS) features. In addition to 2-sided p values, we computed Hedges’ [34] as corrected Cohen’s d (1, 36, and 37) and 95% confidence interval of the effect size estimates. We also provided common language effect sizes (CLES), also known as probability of superiority [3537]. In our study, CLES allowed an intuitive interpretation of the likelihood of the mean of a certain natural language feature randomly selected from VPO as higher than the mean of that feature in RHR on infectious diseases. We calculated Hedges’ and CLES alongside commonly reported p values for two purposes: first, this was to help us interpret the result of automatic feature selection using Bayesian machine learning classifiers (Gaussian Naïve Bayes selected due to the presence of normally distributed continuous variables as features); second, this facilitated the determination of sample sizes for follow-up studies or the comparison of effects across studies.

The following formula shows the corrected effect size or Hedges’ :

The following formula shows common language effect size (CLES):

When computing effect sizes (Hedges’ ) and associated probabilities of superiority (CLES), we used RER as reference class. As a result, positive values indicated that features were statistically higher in more difficult regular health resources; and negative values suggested that features were prevalent in highly accessible, vulnerable people-oriented public health advice. MLS features which had large (absolute value larger than 1), corrected effect sizes, Hedges’ , and CLES included the following: average sentences per paragraph (VPO: M = 0.601, SD = 0.345; RHR: M = 3.316, SD = 1.163, , Hedges’  = 2.588, 95% CI [2.301, 2.874], and CLES = 0.966); type token ratio (TTR) (VPO: M = 0.461, SD = 0.119; RHR: M = 0.623, SD = 0.079, , Hedges’  = 1.828, 95% CI [1.568, 2.088], and CLES = 0.902); frequency of difficult words (VPO: M = 141.813, SD = 96.381; RHR: M = 70.872, SD = 35.986, , Hedges’  = −1.311, 95% CI [−1.558, −1.065], and CLES = 0.823); single sentences (VPO: M = 0.883, SD = 0.114; RHR: M = 0.460, SD = 0.191, , Hedges’  = −2.376, 95% CI [−2.655, −2.098], and CLES = 0.954); pronouns (VPO: M = 41.692, SD = 35.043; RHR: M = 1.469, SD = 1.652, , Hedges’  = −2.530, 95% CI [−2.814, −2.245], and CLES = 0.963); personal pronouns (VPO: M = 37.868, SD = 31.838; RHR: M = 0.705, SD = 1.139, , Hedges’  = −2.577, 95% CI [−2.864, −2.291], and CLES = 0.966); low-stroke characters (VPO: M = 641.593, SD = 444.264; RHR: M = 284.707, SD = 140.930, , Hedges’  = −1.507, 95% CI [-1.758, −1.256], and CLES = 0.857); 2-character words (VPO: M = 256.637, SD = 176.783; RHR: M = 114.341, SD = 56.262, , Hedges’  = −1.509, 95% CI [−1.76, −1.258], and CLES = 0.857); and average logarithmic frequency of content words (VPO: M = 1.738, SD = 0.169; RHR: M = 1.337, SD = 0.183, , Hedges’  = −2.225, 95% CI [−2.498, −1.952], and CLES = 0.942).

Part of speech (POS) features which had large (absolute value larger than 1), corrected effect sizes, Hedges’ , and CLES included the following: Nh (pronouns) (VPO: M = 41.692, SD = 35.043; RHR: M = 1.469, SD = 1.652, , Hedges’  = −2.530, 95% CI [−2.814, −2.245], and CLES = 0.963); VF (action verbs as predicate objects) (VPO: M = 4.198, SD = 3.964; RHR: M = 0.222, SD = 0.629, , Hedges’  = −2.119, 95% CI [−2.388, −1.849], and CLES = 0.933); VA (intransitive predicates) (VPO: M = 11.659, SD = 10.140; RHR: M = 2.088, SD = 2.214, , Hedges’  = −1.919, 95% CI [−2.181, −1.656], and CLES = 0.913); VK (mental states and processes) (VPO: M = 9.736, SD = 7.688; RHR: M = 2.293, SD = 2.086, , Hedges’  = −1.889, 95% CI [−2.151, −1.627], and CLES = 0.909); D (adverbs) (VPO: M = 44.110, SD = 31.491; RHR: M = 14.489, SD = 8.247, , Hedges’  = −1.849, 95% CI [−2.11, −1.589], and CLES = 0.905); VC (transitive verbs) (VPO: M = 48.791, SD = 35.985; RHR: M = 14.759, SD = 10.551, , Hedges’  = −1.812, 95% CI [−2.071, −1.552], and CLES = 0.900); Ncd (location names) (VPO: M = 6.747, SD = 5.960; RHR: M = 1.869, SD = 1.933, , Hedges’  = −1.526, 95% CI [−1.777, −1.274], and CLES = 0.860); P (prepositions) (VPO: M = 22.571, SD = 16.570; RHR: M = 9.330, SD = 6.188, , Hedges’  = −1.424, 95% CI [−1.672, −1.175], and CLES = 0.843); and V_2 (have) (VPO: M = 4.560, SD = 5.879; RHR: M = 1.128, SD = 1.214, , Hedges’  = −1.197, 95% CI [−1.44, −0.953], and CLES = 0.801).

2.5. Gaussian Naïve Bayes

Gaussian Naïve Bayes (GNB) is a variant of Naïve Bayes which is supervised machine learning classification algorithm based on the Bayes theorem [3842]. Various strengths of GNB are its convenience, computation speed (suitability for making real time prediction), scalability, generalisability with small data like most Bayesian machine learning classifiers, and flexibility with continuous and discrete features. In our study, the size of the training (205) and testing data (88) was relatively small. Bayesian machine learning classifiers like GNB, relevance vector machine (RVM), and multinominal Naïve Bayes (MNB) are more suitable, as they are unlikely to overfit small datasets. Furthermore, the two sets of public health resources that we collected, regular and vulnerable people-oriented sets, contained continuous features and their distributions in the Chinese translations of regular health resources followed Gaussian normal distribution (Table 2, Figure 1). As a result, GNB was selected as the most suitable machine learning classifier in our study to ensure the generalisability and reliability of the classifiers.

2.6. Training and Testing Machine Learning Classifiers

To train machine learning classifiers for automatic information accessibility evaluation, the total number of RHR used was 202 and the total number of VPO used was 91. Next, 73.3% (148) of RHR and 62.6% (57) of VPO were used as the training data and the remaining texts (54 RHR and 34 VPO) were used as testing data. We applied 5-fold cross-validation with the training data to produce the mean and standard deviation of area of curve (AUC) of the GNB classifier. Review of the model performance was on the remaining 30% test data in terms of AUC, accuracy, sensitivity, specificity, and macro F1 (Table 3).

2.7. Classifier Optimisation

High dimensional features can reduce the performance of machine learning classifiers due to the forced inclusion of irrelevant parameters in the model. To counter the issue of classifier underperformance caused by the presence of redundant features, we applied different classifier optimisation techniques to reduce the original features collected. First, we applied integral optimisation by selecting the optimised feature set from the combined MLS and POS features (72 in total). This led to a combinedly optimised feature set of 6 features (around 8% of the original total features): average sentences per paragraph (ASPP) (, Hedges’  = 2.588, 95% CI [2.301, 2.874], and CLES = 0.966), personal pronouns (, Hedges’  = −2.577, 95% CI [−2.864, −2.291], and CLES = 0.966), Di (tense markers) (, Hedges’  = −1.003, 95% CI [−1.243, −0.763], and CLES = 0.761), Nd (time adverbs) (, Hedges’  = −0.418, 95% CI [−0.65, −0.186], and CLES = 0.616), VF (action verbs as predicate objects) (, Hedges’  = −2.119, 95% CI [−2.388, −1.849], and CLES = 0.933), and V_2 (have verbs) (, Hedges’  = −1.197, 95% CI [−1.44, −0.953], and CLES = 0.801).

These optimised features were also those with the most significant statistical differences (indicated by p values, corrected effect size Hedges’ , common language effect sizes CLES) between regular and vulnerable people-oriented health resources (Table 1). Next, we applied feature optimisation in the two sets of MLS (26) and POS (46) features separately. This led to an optimised MLS feature set of 2 features only (7.7% of the total MLS features): ASPP and ALFCW (, Hedges’  = −2.225, 95% CI [−2.498, −1.952], and CLES = 0.942). The optimised POS feature set of 8 features (17.4% of the total POS features): A (nonpredicate adjectives) (, Hedges’  = −0.023, 95% CI [−0.254, 0.207], and CLES = 0.507), Cbb (correlative conjunctions) (, Hedges’  = −1.022, 95% CI [−1.263, −0.782], and CLES = 0.765), Dfa (adverbs of degree before verbs) (, Hedges’  = −0.797, 95% CI [−1.033, −0.561], and CLES = 0.713), Nd (time adverbs) (see above), Ng (postpositions) (, Hedges’  = −1.090, 95% CI [−1.331, −0.848], and CLES = 0.780), Nh (pronouns) (, Hedges’  = −2.530, 95% CI [−2.814, −2.245], and CLES = 0.963), VCL (verbs placed before locations) (, Hedges’  = −1.656, 95% CI [−1.911, −1.401], and CLES = 0.879), and VHC (causative modifiers) (, Hedges’  = 0.649, 95% CI [0.415, 0.884], and CLES = 0.677). For both integral and parallel classifier optimisations, we used backforward feature elimination known as recursive feature elimination (RFE) with support vector machine as base estimator. Maximal validation accuracy/minimal classification errors were used as the feature optimisation criteria (Table 4 and Figure 2).

3. Results

Table 3 shows the performance of GNB classifiers using different feature sets. Overall, Bayesian classifiers using optimised features outperformed those using original, larger feature sets. For example, on the testing data, the integrally optimised GNB with 2 MLS and 6 POS features achieved a higher AUC (0.993), sensitivity (0.963), specificity (0.9118), and accuracy (0.940) than the classifier using the full feature set which had AUC (0.940), sensitivity (0.944), specificity (0.8824), and accuracy (0.921). The separately optimised POS feature set (8 POS features) achieved a higher AUC (0.968), sensitivity (1.0), specificity (0.8824), and accuracy (0.955) than the original POS feature set (46 POS features) (AUC = 0.852, sensitivity = 0.889, specificity = 0.7941, and accuracy = 0.852). After comparing the 3 optimised feature sets: MLS/POS jointly optimised (6), MLS optimised (2), and POS optimised (8), we added the 2 sets of separately optimised features of 2 MLS features (ASPP, ALFCW) and 8 POS features (A, Cbb, Dfa, Nd, Ng, Nh, VCL, and VHC) together and further refined the feature set to 2 features only: ALFCW and Nh (pronouns) using the same backward elimination RFE_SVM procedure. This resulted in a highly simplified model (model 8) which achieved largely comparable performance to the best-performing model, the one which integrated the 2 separately optimised features (model 7). Optimised model 8 achieved higher specificity (1.0) than the model 7 (0.8824), which indicated better detection of public health resources and advice suitable for use under health emergencies and crises for maximal social accessibility.

3.1. Bayesian Probabilistic Outputs

The outputs of Bayesian machine learning classifiers are in the form of a probability of belonging to the regular public health advice training data. In our study, for the best-performing classifier (model 8: refined MLS and POS separately optimised: 2 features: ALFCW and Nh), average mean output (probability) was 0.9557 (SD = 0.152; range: 0.02, 1; 95% CI: 0.915, 0.996) for regular public health resources and 0.022 (SD = 0.069; range: 0, 0.28; 95% CI: −0.00118, 0.04518) for vulnerable people-oriented public health advice. The differences among public health advice on infectious diseases (translated to Chinese) in terms of public accessibility (high: vulnerable people-oriented resources; low: regular resources) were statistically significant (, Hedges’  = 7.367, 95% CI: 6.197, 8.536, and CLES = 1). Figure 3 is a histogram which shows the number of regular (restricted accessibility) and vulnerable people-oriented (high accessibility) health pieces of advice that fell into each 10% probability bin based on the GNB outputs. One hundred percent of vulnerable people-oriented health advice was assigned a probability of highly accessible health resource equal to or smaller than 50% (specificity = 1); and 98.15% of regular public resources were assigned a probability of public advice of restricted/low accessibility larger than 50% (sensitivity = 98.15%). Around 2% of regular public health resources were misclassified as highly accessible information and advice for the public.

3.2. Thresholds and Positive/Negative Likelihood Ratios (LR+)

Although it is intuitive to use 0.5 as the probability threshold (Figure 3), this is not the case in real life scenarios, because the criterion for a meaningful cut-off depends on the desired pair of sensitivity and specificity or the diagnostic utility of the research instrument. In our study, higher classifier sensitivity is indicative of higher precision with the prediction of regular public health advice (from WHO resources) which have restricted public accessibility; and higher classifier specificity implies increased accuracy with the detection vulnerable people-oriented health advice (from 3 national health authorities) which were designed by experienced health professionals to maximally enhance the language, cognitive accessibility, informational actionability, and communicative effectiveness of these emergency pieces of advice among diverse vulnerable populations with limited education, health literary, socioeconomic abilities, and varying intellectual/cognitive capabilities.

Table 5 shows various probability cut-offs and their associated sensitivity and specificity pairs using the best-performing GNB classifier that we developed using 2 features only (ALFCW: average logarithmic frequency of content words; Nh: pronouns). It shows that when setting the cut-offs lower than 0.1, sensitivity was the highest (0.9815) and specificity was 0.9118. This means that if this machine learning system were used to assist with public health advice design and dissemination, less than 2% of public health advice with restricted accessibility would be misclassified as accessible information and less than 10% of highly accessible materials would be misclassified as ineffective public health advice, potentially increasing the budgetary burden to hire experts to review health resources or extend the timeframe of releasing information to the public. When increasing the threshold from 0.1 to 0.2, sensitivity remained unchanged, and specificity increased. From 0.2 to 0.5, specificity (cost related) remained the same, but sensitivity deceased from 0.98 to 0.96, suggesting that 2% of health public with limited accessibility to vulnerable populations would be misclassified as suitable health advice and information. In high-risk scenarios such as the outbreaks of highly infectious diseases among some of the most deprived communities, this misjudgement in health advice planning can be very costly, as vulnerable people would be given less effective, accessible, and protective health advice and information.

Sensitivity continued to decrease when thresholds raised higher than 0.5, despite the fact that specificity reached 1. Positive likelihood ratio (LR+) (the ratio between sensitivity and false positivity) is another measurement of diagnostic utility. A LR + larger than 10 indicates a very large effect on posttest probability of disease or in the case of our study lack of accessibility of public health advice among some of our most vulnerable communities and people, who need accessible, actionable public health advice most. In our study, using the best-performing Bayesian classifier, the highest LR+ 16.685 (95% CI: 4.35, 64.04) was reached when setting the threshold of probability at 0.2 (sensitivity = 0.982, 95% CI: 0.95, 1; specificity = 0.941, 95% CI: 0.86, 1). This represents the safest (highest sensitivity) and the most cost-effective (lowest budget investment in expert hire) model of machine learning-assisted predictive design of accessible public health advice and information for vulnerable people and populations. We hope that this promising result based on machine learning development using low-cost, relatively easy-to-obtain natural language tools would provide more support for evidence-based, user-oriented policy making around inclusive, accessible public health advice design and communication under both regular and emergency circumstances.

4. Discussion

4.1. Retrospective Assessment of Accessibility of Public Health Advice in Other Languages

Another major strength of our study was that we translated the best practices of accessible health advice on infectious diseases developed by English health authorities during the pandemic into adaptive analytical instruments (low-cost, fast-to-build machine learning algorithms) that can be used for the retrospective assessment of existing health advice in other languages. This was achieved by translating the English materials that we collected to the Chinese language using the forward and backward translation method recommended by WHO and comprehensively annotated the translations using Chinese natural language processing tools to allow automatic feature engineering and machine learning classifier development. Table 5 and Figure 4 illustrate how probabilistic outputs might assist with decision making among policy makers and health/medical professionals when designing public health advice for vulnerable people and communities speaking Chinese or migrants, foreigners living in China or greater China regions. Next, we adapted the machine learning classifier to a convenient statistical tool which can be effectively used in the clinic by health/medical professionals and educators to assess whether a certain piece of health advice on infectious diseases is accessible to vulnerable people (both Chinese and migrants) with limited education and health literacy or Chinese proficiency while living in Chinese speaking regions. We fitted a binary Logistic Regression (LR) model with the 70% training data collected. The model contained two independent variables which were borrowed from the best-performing classifier: ALFCW (average logarithmic frequency of content words) and Nh (number of pronouns). The fitted model with the regression coefficients was shown as follows.

The following is the binary regression formula:

We then used the Sigmoid Function to scale the scoreLR to the region of [0, 1]:

The following formula shows Chinese health advice accessibility assessment tool (CHAAAT):

We examined the performance of the CHAAAT formula on the remaining 30% test data which contained 34 highly accessible COVID-19 prevention resources translated from 3 national health authorities in the USA, Australia, and UK and 54 regular health resources on infectious diseases developed by the WHO for the public. Average mean output (transformed score using the Sigmoid Function) was 5.8958 (SD = 18.128, range: 1.51908E-32, 73.5, and 95% CI: −0.194, 12) for highly accessible, vulnerable people-oriented public health advice and resources and 88.363 (SD = 2.34, range: 12.119, 99.466, and 95% CI: 83.665, 93.062) for regular WHO public health resources with limited accessibility among vulnerable people with low education, limited health literacy, and limited Chinese proficiency such as foreigner migrants or people with cognitive impairments.

Differences between the two sets of test data measured by CHAAAT formula were statistically very significant: , Hedges’  = 7.248, 95% CI: 6.094, 8.401, and CLES = 1. These effect sizes were comparable to those (, Hedges’  = 7.367, 95% CI: 6.197, 8.536, and CLES = 1) of probabilistic outputs of the best-performing GNB classifier using the same feature pair: AFLCW and Nh. Figure 5 shows the number of regular (restricted accessibility) and vulnerable people-oriented (high accessibility) health pieces of advice that fell into each 10-score bin based on the transformed scores computed using the CHAAAT regression formula. Ninety-four percent of vulnerable people-oriented health advice were assigned a transformed score of highly accessible health resource smaller than 50 (specificity = 94.12%) and 96.3% of regular public resources were assigned an indicative score of public advice of restricted/low accessibility equal to or larger than 50 (sensitivity = 96.3%).

Around 4% regular (requiring higher literacy and Chinese proficiency) resources were misclassified as highly accessible information and advice for vulnerable communities and people living in Chinese speaking regions. Since the 4% errors can still result in a costly overestimate of the accessibility or usability of public health advice among vulnerable populations who require highly accessible (simple, actionable, and implementable) public health information and advice, like with machine learning classifiers, we could adjust the thresholds of the CHAAAT formula to obtain the desired sensitivity and specificity pair according to the practical needs in the clinic. Table 6 shows that, like the threshold of the best-performing GNB classifier, setting the threshold of the transformed score (Formula (5)) to 29.906 will allow the regression calculator to achieve the same sensitivity and specificity as the GNB classifier. According to practical needs for accessible public health advice under different circumstances, further decreasing the threshold will lead to increased sensitivity of the assessment tool, which might be needed when the health risks being communicated are complex, and the vulnerability of the target populations is high.

The design and assessment of population-oriented public health advice are high-stakes activities which require highly precise, reliable research tools to support informed, evidence-based public health policy planning and delivery. Our study showed that valuable experiences gained in the design of accessible public advice as triggered by the coronavirus pandemic can be translated to useful, much-needed research instruments and tools to support the design of public health advice for any future health events or crises. Furthermore, we developed assessment tools adaptable to other languages to facilitate international benchmarking and support global public health policy planning around accessible health risk communication and public engagement. The limitation of our study is that we used Chinese as an illustrating example which is a distinct language from English. However, the underlying techniques and methods that we discussed and demonstrated can be conveniently modified for other languages or writing systems, especially underresourced languages such as African languages and minority languages, as we developed high performing Bayesian machine learning classifiers based on small datasets. For illiterate populations, our methods need further adaptation for the assessment of oral public advice and information.

Data Availability

The data that support the findings of this study are available upon request from the corresponding author.

Conflicts of Interest

The authors declare that they have no conflicts of interest.