This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Assessments and developments in constructing a National Health Index for policy making, in the United Kingdom

Anna Freni-Sterrantino The Alan Turing Institute, London, UK [email protected]    Thomas P. Prescott The Alan Turing Institute, London, UK    Greg Ceely Office for National Statistics    Myer Glickman Office for National Statistics    Chris Holmes The Alan Turing Institute, London, UK [email protected]
Abstract

Composite indicators are a useful tool to summarize, measure and compare changes among different communities. The UK Office for National Statistics has created an annual England Health Index (starting from 2015) comprised of three main health domains - lives, places and people - to monitor health measures, over time and across different geographical areas (149 Upper Tier Level Authorities, 9 regions and an overall national index) and to evaluate the health of the nation. The composite indicator is defined as a weighted average (linear combination) of indicators within subdomains, subdomains within domains, and domains within the overall index. The Health Index was designed to be comparable over time, geographically harmonized and to serve as a tool for policy implementation and assessment.

We evaluated the steps taken in the construction, reviewing the conceptual coherence and statistical requirements on Health Index data for 2015-2018. To assess these, we have focused on three main steps: correlation analysis at different index levels; comparison of the implemented weights derived from factor analysis with two alternative weights from principal components analysis and optimized system weights; a sensitivity and uncertainty analysis to assess to what extent rankings depend on the selected set of methodological choices. Based on the results, we have highlighted features that have improved statistical requirements of the forthcoming UK Health Index.

keywords:
Composite Indicator; Health Index; Weights; Robustness assessment; Sensitivity analysis; Uncertainty

1 Introduction

A composite index (CI) is a way to summarize several indicators in one number and provide a tool for policy-making. Besides the known health-related indices like Healthy Life Expectancy vdWPB (96) or Disability-Adjusted Life Years HPM (12); SLTF+ (12), in the United Kingdom (UK) there has been a long tradition of health-related indices; the first ‘Health Index’ was developed in 1943 as a surveillance system for population health at national level, based on mortality and morbidity annual data Sul (66). Kaltenthaler et al. KMB (04), in their systematic review conducted in 2014, evaluated 17 population level health indexes and found that three were composed for the UK population. The ‘Health and material deprivation in Plymouth’ ABPS (92) a modification of Townsend’s ‘Overall Health Index’ TPB (88) and the most popular ‘Index of Multiple Deprivation’ DotEtR (00). However none of them or any of the other health-population indexes seemed to fulfil the desiderata for a health index: proper health coverage indicators; routinely collected and updated data; indices at local and national level; and statistical coherence. These findings were later confirmed by Ashraf et al. ANTG (19) in a systematic review. They concluded that most of the indices measured population’s overall health outcomes, but only few gave focus to specific health topics or the health of specific sub-populations. They urged the development of population health indices that can be constructed systematically and rigorously, with robust processes and sound methodology.

Recently, to fill this gap, the Office for National Statistics of the UK (ONS) developed an annual (experimental) composite index to quantify health in England, to track changes in health across the country and to compare health measures across different population subgroups.

The Health Index (HI) expands the WHO definition of health: ‘a state of complete physical, mental and social well-being and not merely the absence of disease and infirmity’ Gra (02), to include health determinants that are known to influence people’s health. Therefore, the HI is characterized by three main domains: Healthy People, Healthy Lives and Healthy Places, split across 17 subdomains, for a total of 58 indicators. For example, life expectancy and the standardized number of avoidable deaths define the subdomain ‘Mortality’ and prevalence at Upper Tier Local Authority (UTLA) level of dementia, musculoskeletal, respiratory, cardiovascular, cancer and kidney conditions define the subdomain ‘Physical health conditions’ within the Healthy People domain. Healthy Places is structured over 14 indicators (access to public and private green space, air and noise pollution, road safety, etc.) split in 5 subdomains: Access to green space, Local environment, Access to housing, Access to services and Crime.

The construction of a new composite indicator is a lengthy process that takes into account several steps and choices. From the wide literature on composite indicators BDWL (19); Fre (03); JSG (04), it emerges that there is no gold-standard, with every method having its own drawbacks and advantages  GITT (19) relative to the purpose of each CI and its future use in policy making.

In recent years, extensive work was carried out by many institutions, such as Eurostat Eur (17), the Organisation for Economic Co-operation and Development (OECD) C+ (08), the Joint Research Centre (JRC) ST (02) and specific working groups at the European Commission JRC , to provide statistical guidance on CI construction. The cumulative effort has provided a framework to define CI principles NSST (05), outlining the essential steps, introducing sensitivity and uncertainty analysis as a core part of composite indicators SST (05) and advancing composite indicators methodology MN (05).

With no current unanimous approved checklist for evaluating composite indicators, we relied on two main sources to guide us into assessing the Health Index. The first is based on the COIN step-list from the JRC JRC , which includes observations from the OECD handbook C+ (08). These elements provide a framework that will guide us on the statistical (quantitative) methodological choices and statistical analysis. The second source is based on previous work carried out in an audit format by the JRC composite indicators expert group SP (12); CB+ (22), where they have evaluated other composite indicators.

In this paper, in an effort to fulfill transparency requirements, we evaluated the steps taken and arising issues that come into the design of the ONS HI. We highlight areas of improvement or which warrant further investigation, based on our findings, aiming for a statistically and conceptually coherent index, that will be integrated in the future HI release. This paper is structured as follows. We start by describing the beta ONS HI for 2015-2018 structure and steps taken in its construction, in section 2. In section 3, we provide an in-depth correlation analysis which will be useful for the weights system selection that we introduce in section 4. The index validity is evaluated by sensitivity and uncertainty analysis in section 5. At the end of each section we conclude with features that could be improved or are worthy of further considerations. Finally, we provide discussion and conclusions, in section 6.

2 The ONS Health Index

The ONS Health Index (HI) is a composite index (CI) structured in three main domains: ‘Healthy People’, ‘Healthy Lives’ and ‘Healthy Places’, see Figure 1. These domains are based on 17 subdomains, which are in turn based on 58 indicators, collected for the 149 Upper Tier Level Authorities (UTLA) in England, from 2015 to 2018. See Table 1 for full indicator and subdomain detailed descriptions (see also Table 1 in Supplementary Material). The choice of the indicators, and the definition of the 17 subdomains and three domains, were based on a comprehensive review of contents of existing indices and frameworks; cross-referenced with existing accepted definitions of health; and then consulted on by an expert group with members from central government, local organisations, think tanks and academia to evaluate the proposalCee (20). The methodology was based on the 10 steps reported in the COIN guidance promoted by the European Joint Research Center JRC . After collating raw data for the indicators at UTLA level, the steps taken to construct the Health Index were:

  1. 1.

    data imputation;

  2. 2.

    data treatment and normalization;

  3. 3.

    subdomain weights computation for factor analysis;

  4. 4.

    arithmetic aggregation with equal weights across subdomains and domains.

The index is computed for each UTLA, aggregated geographically to correspond to English regions, and further aggregated into an overall national figure. The index values are calculated for each year from 2015 to 2018 inclusive, with a normalised value anchored at the baseline year 2015. Full details are provided in Supplementary Material (SM).

Refer to caption
Figure 1: The Health Index structure.
Table 1: Health Index structure: domains, subdomains and indicators

Health Domains: People (Pe) Lives (Li) Places (Pl) Pe.1 Mortality: life expectancy, avoidable deaths Li.1 Physiological risk factors: diabetes, overweight and obesity in adults, hypertension Pl.1 Access to green space: public green space, private outdoor space Pe.2 Physical health conditions: dementia, musculoskeletal conditions, respiratory conditions, cardiovascular conditions, cancer, kidney disease Li.2 Behavioural risk factors: alcohol misuse, drug misuse, smoking, physical activity, healthy eating Pl.2 Local environment: air pollution, transport noise, neighbourhood noise, road safety, road traffic volume Pe.3 Difficulties in daily life: disability that impacts daily activities, difficulty completing activities of daily living (ADLs), frailty Li.3 Unemployment: unemployment Pl.3 Access to housing: household overcrowding, rough sleeping, housing affordability Pe.4 Personal well-being: life satisfaction, life worthwhileness, happiness, anxiety Li.4 Working conditions: job-related training, low pay, workplace safety Pl.4 Access to services: distance to GP services, distance to pharmacies, distance to sports or leisure facilities Pe.5 Mental health: suicides, depression, self-harm Li.5 Risk factors for children: infant mortality, children’s social, emotional and mental health, overweight and obesity in children, low birth weight, teenage pregnancy, child poverty, children in state care Pl.5 Crime: personal crime Li.6 Children and young people’s education: young people’s education, employment and training, pupil absence, early years development, General Certificate of Secondary Education achievement Li.7 Protective measures: cancer screening, vaccination coverage, sexual health

The Health Index is built starting from a tensor 𝒳\mathcal{X} of raw data, with elements xcitx_{cit}. Here, each cCc\in C is an upper tier local authority (UTLA), for the set CC of |C|=149|C|=149 UTLAs; each iIi\in I is an indicator, for the set II of |I|=58|I|=58 indicators; and each tT={2015,2016,2017,2018}t\in T=\{2015,2016,2017,2018\} denotes the year. We are also given a partition of the set of UTLAs, CC, into a set RR of |R|=9|R|=9 regions, rRr\in R, which are disjoint subsets rCr\subseteq C of UTLAs.

2.1 Data Imputation

We first note that 𝒳\mathcal{X} is missing data, which needs to be imputed. Missing data was of two types: either an indicator value for a given year is completely missing for all UTLAs (see Table 2 in SM), or missing only in a subset of UTLAs. Briefly, if an indicator value for only one year was available, such as for ‘access to green space’, the values were imputed to be constant across all four years. If an indicator value is missing for a given year but available before/after, then the value was the average of the years either side of the missing year. If an indicator value is missing and only the year before or after was available then the value would be imputed with that of the closest year. Full details of the data imputation are provided in the supplementary material.

2.2 Data treatment and normalization

Once the missing data has been imputed, the completed tensor 𝒳=(xcit)\mathcal{X}=(x_{cit}) is decomposed into |I|=58|I|=58 flattened data sets, 𝑿i={xcit:cC,tT}\boldsymbol{X}_{i}=\{x_{cit}:c\in C,t\in T\} for each iIi\in I. Using the data transformations fif_{i} listed in Supplementary Table 3 for each indicator, ii, the raw indicator data is transformed to 𝒀i={ycit=fi(xcit):cC,tT}\boldsymbol{Y}_{i}=\{y_{cit}=f_{i}(x_{cit}):c\in C,t\in T\}. The assignment of each transformation, fif_{i}, to an indicator, ii, is selected to minimise the absolute values of skewness and kurtosis of 𝒀i\boldsymbol{Y}_{i}, aiming for absolute skewness 2\leq 2 and absolute kurtosis 3.5\leq 3.5. By minimising (absolute) skewness and kurtosis, we aim to ensure that the transformed data 𝒀i\boldsymbol{Y}_{i} is approximately normally distributed. For 18 indicators, the skewness and kurtosis of 𝑿i\boldsymbol{X}_{i} were optimal, 40 indicators have been transformed and of these 18 have been log-transformed (see Table 3 in SM).

The normalization step in the ONS Health Index accounts for time and geography, and allows indicators to be compared on the same scale, weighting by the UTLA populations. The normalization transforms elements ycity_{cit} of 𝒴\mathcal{Y} into z-scores,

zcit=(1)δi[ycitμiσi],z_{cit}=(-1)^{\delta_{i}}\left[\frac{y_{cit}-\mu_{i}}{\sigma_{i}}\right],

which then define the elements of the tensor 𝒵=(zcit)\mathcal{Z}=(z_{cit}). For each indicator, ii, we specify δi=0\delta_{i}=0 or δi=1\delta_{i}=1 to ensure that larger positive values for zcitz_{cit} correspond to improved health, a property which we term as being health directed. Note that the mean and standard deviation μi\mu_{i} and σi\sigma_{i} for each indicator, ii, are taken to be the population-weighted mean and standard deviation of ycity_{cit} for the chosen baseline year across UTLAs cCc\in C, fixing t=2015t=2015. Finally, given the zz-scores zcitz_{cit} forming the tensor 𝒵\mathcal{Z}, the ONS Health Index presents the zz-scores as Health Index values,

hcit=H(zcit)=100+10zcit,h_{cit}=H(z_{cit})=100+10z_{cit},

which are translated and rescaled zz-scores, such that hcit=100h_{cit}=100 means that the transformed value, ycity_{cit}, for indicator ii in the UTLA cc in year tt is equal to the weighted mean, μi\mu_{i}.

2.3 Subdomain weights computation: a time-series factor analysis

The ONS has chosen to compute weights using a time-series factor analysis. The fundamental assumption of factor analysis is that there is a latent factor that underpins the variables in a group. This translates to this level of the Health Index: ONS assumed that there is a single unobserved variable that underpins the indicators within each subdomain. Highly correlated indicators within each subdomain could lead to double counting in the index, so factor analysis directly addresses this issue, accounting for the correlation between indicators in their implied weights DL (13).

To maintain the same weights for all the years considered (2015-18) a time-series factor analysis was applied. The rationale was to ensure that, by accounting for all the years jointly, they would change with each additional year of data. As such, the weights would need to be calculated for a set time period, e.g. 2015 to 2019, and these weights would be held constant until a review date. This assured that (i) the indicators selected matched the underlying factor (subdomains) over time; (ii) and then the factor loadings were scaled and used as data-driven weights.

In practice, from the normalized data 𝒵CT=(zct)\mathcal{Z}_{CT}=(z_{ct}) are collapsed by year and then rescaled to (0,1), next given dDd\in D, a factor analysis on the indicators idi\in d was carried out and the weights were chosen as the first loading factor, taken in absolute value. The weights wiw_{i} for indicators iIi\in I are chosen by running factor analysis for each subdomain, dDd\in D, in turn, allowing for one factor estimated using a maximum likelihood method. For example, for a subdomain d={i1,i2}d=\{i_{1},i_{2}\} comprised of two indicators, suppose the factor loadings are 0.5 and 0.75. We would then set the weights wi1=0.4w_{i_{1}}=0.4 and wi2=0.6w_{i_{2}}=0.6. In supplementary material, we address the weights constraints taking into account the different aggregation levels.

2.4 Arithmetic aggregation with equal weights across subdomains and domains

The final step is the arithmetic aggregation of the index, where there are equal weights for subdomains wsw_{s} and domains wdw_{d}, while indicator weights are derived from a factor analysis. All the weights have been chosen as positive and summing to one, for all the different aggregation levels. The Health Index, at the hierarchical levels of indicators, subdomains, domains and overall, is then computed for each year at geographical levels of UTLAs, regions and the nation, where the geographical aggregations at the regional and national levels are population-weighted.

2.5 The Health Index ranking distribution

Refer to caption

Figure 2: The 2015 Health Index ordered by UTLA ranking, jointly with Healthy Lives, Healthy People and Healthy Places indexes, and green bars indicating the minimum and maximum value of the domains.

For the year 2015, for each UTLA, we plot each domain’s Health Index values, ordering the UTLAs by the overall Health Index ranking, in Figure 2. It emerges that Lincolnshire, Leeds and Staffordshire have all three domain index values concentrated at the same values. In contrast, Westminster (the UTLA with the largest difference in domain indexes) presents Healthy People at 109, similar to Kensington and Chelsea, but Healthy Places at 82. Westminster and Blackpool present similar values for Healthy Places and Healthy People, but their ranking is significantly different. It is interesting to note that Healthy Lives sits within the range defined by Healthy Places and Healthy People. Similar patterns are observed for the following years, as reported in the SM (see Figures 5-7).

2.6 A modified ONS Health Index

Before investigating the HI and carrying out further analysis: correlation and sensitivity/robustness analysis, we implemented a slight change to the original HI as presented above. As pointed out in C+ (08), a certain coherence in the methods needs to be preserved to create a statistically sound index. This change was done to avoid statistical misinterpretation, as not all the potential combinations of data transformation and subsequent data operations could be properly interpreted, as carried out in the ONS version.

Hence, we have computed a modified ONS HI version. We begin from the imputed matrices 𝑿i\boldsymbol{X}_{i} for each indicator, ii. Then, instead of directly selecting and applying transformations fif_{i} to ensure normality, we accounted for kurtosis and skewness using winsorization first and then by transforming. This approach resulted in only 5-7 variables per year that have been log-transformed (Table 4 in SM). We proceed to standardize using a z-score (following the ONS), and then aggregated with arithmetic mean and equal weights (see Table 5 in SM for comparison ).

We opted for less strict data transformation, as this would have not changed the aggregation formula interpretation. As it stands at the moment, the ONS data transformations included 40 indicators, with 18 indicators log-transformed. By aggregating all transformed variables using an arithmetic mean, the untransformed variables are effectively aggregated via a mix between a geometric mean (for log-transformed variables) and arithmetic mean (for other variables). As succinctly summarised by NSST (05), “when the weighted variables in a linear aggregation are expressed in logarithms, this is equivalent to the geometric aggregation of the variables without logarithms. The ratio between two weights indicates the percentage improvement in one indicator that would compensate for a one percentage point decline in another indicator. This transformation leads to attributing higher weight for a one unit improvement starting from a low level of performance, compared to an identical improvement starting from a high level of performance.

We used this modified version as the starting point for the rest of this paper. The z-scores (see Figure 2 in SM) comparison between this modified version and the original ONS shows that several indicators have more outliers below the 25th percentile, but overall, there are no major discrepancies in values. Indeed, this modified version generated a different ranking, that affected the UTLAs in the middle, while the top and bottom UTLAs remain unaffected (see Table 4 in SM). The biggest shift in ranking is observed for Barking and Dagenham (which moved positively 49 positions), whereas Westminster, Herefordshire and Shropshire all shifted down the rankings by, respectively, 50, 48 and 51 positions. Overall 52% of the UTLAs shifted in absolute value of within 10 ranking positions, 38% shifted between 20–30 positions and only 9% shifted more than 31 positions. Only Blackpool, Kingston upon Hull, City of Northampton and Hertfordshire kept the same ranking in comparison to the original ONS HI version. All the analysis was conducted on R version 4.2R C (22) and COINr package W (21), by Becker.

2.7 Proposal

  • We suggest to adopt winsorization, as it takes care of the outliers and provides a robust approach to ensure that kurtosis and skewness are within the acceptable limits, without heavy mathematical data transformations. This also preserves the statistical coherence at aggregation level and interpretation.

3 Correlation analysis

The core of every composite index is the indicators, which have to be selected carefully to represent the dimensions of the phenomenon that we are trying to summarize. Hence, correlation analysis plays a dual crucial role in the composite indicator construction. First, statistical analyses anchored on the correlation - such as principal components analysis, factor analysis, Cronbach’s alpha - are all suitable to assess that the selected indicators are appropriately representing the statistical dimensions, i.e. theoretical constructs are supported by the data. Second, it is useful to identify highly correlated indicators (subdomains and domains), to highlight data redundancy and potential structure issues.

Ideally each indicator (this is true also for subdomains and domains) should be positively moderately correlated with the others, while high inter-correlations may indicate a multi-collinearity problem and collinear terms should be combined or otherwise eliminated. Negative correlations are an undesirable feature in CI, however they may occur at different hierarchical levels of the index. For example, if an indicator is negatively correlated, it can be removed. If domains or subdomains show negative correlation then aggregation by geometric or arithmetic mean should be discarded as it would insert an element of trade-off where units that perform well in one domain have their overall performance affected by the poor performance on another domain. To explain how negative correlations affect the composite index, Saisana et al. SP (12) reviewed the Sustainable Society Index (SSI). The index - similarly composed to the HI - has three main domains: Human, Environmental and Economic wellbeing. Human and Environmental wellbeing show negative correlation, as in many countries Human and Economic wellbeing go hand in hand, at the expenses of the Environment. Their review suggested that these correlations are a sign of a trade-off, whereby many countries that have poor performance on Environment levels, have good performance on all other categories and vice versa, therefore each domain should be presented as itself in scoreboard and not aggregated. This is what happens to Blackpool and Westminster in Figure 5, where Westminster presents the lowest Places indicator and Blackpool for People, but not for Places. We will explore further the trade-off and correlation and their role in weights definition, but before we provide an extended HI correlation analyses.

3.1 Health Index Correlation analysis

We used our modified version of the Health Index to carry out a correlation analysis (Pearson) at the different levels of aggregation. The correlation analysis provides insights on the potential redundancy of those indicators with high correlation (ρ0.9\rho\geq 0.9); negative correlation (ρ0.4\rho\leq-0.4) also indicates some conceptual problems. Acceptable correlation values are for weak (0.3<ρ0.40.3<\rho\leq 0.4) and moderate (0.3ρ<0.90.3\leq\rho<0.9). The ideal situation would be to have indicators positively correlated among them ( 0.3- 0.9), and not highly correlated with other subdomains as this could impact weights and aggregation. In Figure 3, indicators grouped in subdomains are showing overall positive correlations. However, there are some correlations of concern. For example, public and private green space that define the subdomain ’Access to green space (Pl.1)’ show negative correlations, and in subdomain ’Access to services (Pl.4)’ distance to the nearest pharmacy and general practitioner (GP) are also highly correlated. We suspected that this could be somehow related to the urban/rural UTLA definition. Cardiovascular and respiratory prevalence are highly correlated in subdomain ’Physical health conditions (Pe.2)’. The indicators in ’Behavioural risk factors (Li.2)’ and ’Working conditions (Li.4)’ present negative and weak correlations. In this heatmap, we see also correlation among the subdomains like blocks. For example ’Risk factors for children(Li.5)’ and ’Children and young’s people education (Li.6)’ are also correlated, likewise ’Physical health conditions (Pe.2)’ and ’Difficulties in daily life (Pe.3)’.

From the subdomain correlation map (see Figure 4), we immediately see that the indicator ’Household overcrowding’ is highly correlated with the subdomains on ’Local environment (Pl.2)’. Finally, we correlated ( see Figure 3 in SM) subdomains versus domains, we found that People subdomains are overall well correlated with the other subdomains within their domain. Lives and Places are similar but present some weak correlations: ’Access to services (Pl.4)’, and ’Unemployment (L1.3)’ and ’Difficulties in daily life (Pe.3)’. This confirms what we have observed in the indicators heatmap.

Refer to caption

Figure 3: Correlation heatmap for indicators grouped in subdomains

Refer to caption

Figure 4: Correlation heatmap for indicators and subdomains, grouped by subdomains mean.

Refer to caption

Figure 5: UTLA Domains index scatter plots with fitted linear regression (2015): (A) People vs Places, (B) Lives vs Places, (C) Lives vs People.

The panels in Figure 5 show the scatter-plots for the three domains. It can be observed that Healthy Lives and Healthy People have a high Pearson correlation (ρ\rho= 0.65), while for Healthy Lives and Healthy Places (ρ\rho= -0.12) and Healthy People and Healthy Places (ρ\rho= -0.39) the correlations are negative, a similar situation as described for the SSI by SP (12). Once we removed London’s UTLAs, characterized by high values of People and Lives and low on Places, the correlation for Lives and People increases (ρ\rho= 0.72), null for Lives and Places (ρ\rho= -0.06) and diminishes in People and Places (ρ\rho= -0.25).

3.2 Proposal

  • We suggest to remove the public and private green space indicators due to the high negative correlation.

  • A revision of the indicators defining ’Behavioural risk factors’. Physical activity is correlated with alcohol misuse and smoking and not with healthy eating, which is correlated with drug use. The subdomain should be split and re-organized. Drug misuse and healthy eating are pointing toward unemployment and in general to a some measure on society inequality.

  • Subdomains ’Risk factors for children’ and ’Children and young people’s education’ could be merged in a unique block, as the indicators are highly correlated.

  • Cardiovascular, respiratory and hypertension could be combined as they are highly correlated, therefore bringing redundancy.

4 The choice of a weight system

In this section, we introduce the choice of a weights system that could be employed in the linear aggregation formula that generate the composite index. We review the definitions and how the weights can be interpreted. We then proceed on evaluating what role this plays for the correlation at different levels (indicators/subdomains/domains) and we describe the optimized method BSPV (17) that generates weights that account for correlations.

We also compared the time-series factor analysis derived weights, currently in use in the ONS HI with that for the ONS HI with weights generated by principal component analysis (PCA). We introduce them here, because we are going to use the PCA weights and the optimized weights as options in our sensitivity and uncertainty analysis.

4.1 Weights definitions: compensatory versus non-compensatory

In standard practice JRC ; MN (05), the composite indicator for time tt is defined as:

zt=cCwctzct,z_{t}=\sum_{c\in C}w_{ct}z_{ct},

where cc indexes the indicators and CC is a set of indicators being composed (which, in the context of the ONS HI, may correspond to subdomains, domains, or the overall index). Thus the composite indicator is a weighted linear aggregation, where weights are (typically) constrained to sum to 1.

In the composite index literature GITT (19), weights methods are often found to be linear, geometric or multi criteria, or classified into compensatory and non-compensatory approaches. However, the major difference in weight systems boils down to defining weights either as coefficients that address the importance of a variable (indicator/subdomains/domains) or as a trade-off coefficient.

Weights that convey ‘importance’ should be used in aggregation formulae that do not allow for compensability; that is, where poor performance in some indicators can be compensated by sufficiently high values of other indicators. These definitions are also known as compensatory, because the ‘compensation’ refers to a willingness to allow high performance on one variable (subdomain/domain) to compensate for low performance on another. The weighted mean (arithmetic/geometric) is a classic example of compensatory approach, where the weight is a de facto trade-off coefficient.

Non-compensatory methods allow the weights to express ‘importance’, where the greatest weight is placed on the most important ‘dimension’ Vin (92); Van (90). These approaches have their roots in social choice theory (also known as multicriteria) and more details can be found in M+ (08). Briefly, in this framework, indicator (subdomain/domain) values rank the countries in different ways and contributed to define the relative performance of each country/option with respect to each of the other countries/option. This indicators-unit ranking, generates an impact matrix and a voting system must be put in place to define the overall ranking. For example the ‘plurality vote’ will rank as first the unit (UTLA) that has ranked at first place on the majority of the indicators. However, this approach comes with the price of dealing with preferences and choices on how to select the final ranking given the indicators-ranking M+ (08). Two popular approaches, that take the name after their authors, suggest that a Condorcet approach is necessary when weights are to be understood as importance coefficients, while a Borda approach is desirable when weights are meaningful in the form of trade-offs.

These methods, while valuable, are rather harder to implement as they require an expert panel to grade the indicators in first place, but also lack the immediate facility to explain when compared with a weighted mean. The dual notions of weighting as importance versus weighting as trade-off and their interpretation requires more consideration, to assure that selected weights are in line with the practitioner preferences. In their article, Munda and Nardo MN (09, 05) provide extensive commentary on an interesting mis-interpretation around the weights/aggregation combination that gets buried in the CI construction, but is useful to address here.

4.2 Weights as ’importance’ coefficients, linear aggregation and correlation

According to the OECD guidelines Fre (03): “Greater weight should be given to components which are considered to be more significant in the context of the particular composite”. As pointed out by Greco et al. GITT (19), the popular linear aggregation weights are used as if they were importance coefficients, while they are in fact trade-off coefficients.

Briefly, the authors MN (05, 09) state that in linear and geometric aggregation the weights play the role of a trade-off ratio that depends on the scale of measurement. If the weight has to be interpreted as a measure of importance, then the weights should be connected with the indicators themselves and not with their quantification; they should be invariant to the units of the indicator. This distinction between weights as trade-off ratio versus importance does not disappear even when all indicators are on the same scale. For a weight to express ‘importance’, then non-compensablity should be enforced. This issue becomes relevant when CI are composed of different data for multicriteria optimization where improvement in one domain cannot compensate for degradation in another. One way to disentangle this paradox of trade-off weights interpreted as importance weights is proposed by Becker BSPV (17). In order to derive weights as ‘explicit importance’, we need to evaluate the correlation structure and use it to understand the ‘importance’ role of the domains/ subdomains in the composite indicators, and what the influence of each indicator is on the index, generating optimized weights.

4.3 Optimized compensatory weights

For a weighting system where weights are representing ‘explicit importance’, then different variances and correlations among indicators (subdomains/domains) mask the weights to represent importance, as shown above in the correlation analysis.

To find weights that reflect importance and not trade-off ratios, conditioned on the correlations, we follow the methods introduced by Becker BSPV (17). We recall that for the Health Index, domains and subdomains have equal weights, while indicators have data-driven weights derived by FA. If we take the equal weights choice as a way to express equal importance of the three domains, we need to account for each variable’s influence on the output, and how weights can be assigned to reflect the desired importance, ‘conditioned’ on the existing shared information among the domains. Knowing the correlation among domains can help to reduce uncertainty, as strong correlations suggest that the domains should be treated jointly, rather than individually. This can help in reassessing the weights.

A measure of importance, capturing the dependence between the CI and the effect of domains, starts from analysing the correlations ratio SdS_{d}, also known as the first order sensitivity index or main effect. We split the correlation ratio in two parts: a correlated part, SdcS_{d}^{c}, and uncorrelated part, SduS_{d}^{u}, such that

Sd=Sdc+SduS_{d}=S_{d}^{c}+S_{d}^{u}

where d=1,2,3d=1,2,3 indicates the level of aggregation.

A large value for SdS_{d}, with a relatively low uncorrelated part SduS_{d}^{u} such that SdSdcS_{d}\approx S_{d}^{c}, indicates that the domain contribution to the index variance is only due to the correlation with the other domains MT (12). However, if SdcS_{d}^{c} is negative, this implies conceptual problems with one of the domains, and is not a desirable feature in composite indicators.

The optimized weights have been presented in Becker et al. BSPV (17). It is important to note that, while we have applied this approach at the domain level, the same methodology can be applied to other levels of the hierarchical structure of the Health Index, i.e. for aggregating indicators into subdomains and subdomains into domains. Briefly, first we estimate SdS_{d} and SduS_{d}^{u} by implementing a series of linear and non-linear regressions (using splines Woo (01)). The steps to compute the two summands of the correlation ratio, are the following:

  1. 1.

    Estimate SiS_{i} using a nonlinear regression approach

  2. 2.

    Perform a regression of xdx_{d} on xdx_{\sim d}. This can be either linear (using multivariate linear regression), or nonlinear (using a multivariate Gaussian process). Denote this fitted regression as x^d\hat{x}_{d}.

  3. 3.

    Get the residuals of this regression, z^d=xdx^d\hat{z}_{d}=x_{d}-\hat{x}_{d}.

  4. 4.

    Estimate SduS_{d}^{u} by a nonlinear regression of yy on z^d\hat{z}_{d}, using the same approach as in step (a).

  5. 5.

    The correlated part then is the simple expression Sdc=SdSduS_{d}^{c}=S_{d}-S_{d}^{u}

Using a simple numerical approach, the weights are estimated that result in the desired importance, using an optimisation algorithm. If S~d=Sdd=1DSd\tilde{S}_{d}=\frac{S_{d}}{\sum_{d=1}^{D}S_{d}} is the normalised correlation of xdx_{d}, then the targeted normalised correlation ratio is Sd~\tilde{S_{d}^{\star}}, where it is assumed that is Sd~=wd\tilde{S_{d}^{\star}}=w_{d} is the weight assumed (in our case equal weights) to reflect the importance. Once these quantities have been computed, and provided the equal weights ( or any other weight system user-provided), the optimized weights are the results of the minimizing the objective function

wopt=d=1D(S~dS~d(w))2.w_{opt}=\sum_{d=1}^{D}(\tilde{S}_{d}^{\star}-\tilde{S}_{d}(w))^{2}.

Refer to caption

Figure 6: Estimates of SdS_{d} (full bars), broken down into correlated SdcS_{d}^{c} and uncorrelated SduS_{d}^{u}, using linear and non-linear dependence modelling.

The results of this approach are reported in Figure 6. Our first observation is that the domains have fairly similar linear and non-linear correlation ratio SdS_{d} estimates, indicating that linear estimates would have been sufficient to address the linear correlation among the three domains. Recall that we would like low SdcS_{d}^{c} and high SduS_{d}^{u}, both positive. What we have obtained is that the correlated part dominates in Healthy Lives, indicating that Healthy Lives has a small impact on the composite index as it is mostly imputable to the correlation with the other variable. For Healthy People both components contribute equally and both positively. Healthy Places has a negative correlated effect, which is similar to the uncorrelated part, but the negative SdcS_{d}^{c} values implies potential problems in the composite index. Somehow, we could have expected that Healthy Places could have some problematic behaviour, as we have observed in the correlation analysis.

Having unpacked the correlation among the domains, we can use this information to find a new set of weights that truly reflects the importance of each variable in the CI, but that are close to the importance distribution we have specified - in our case equal importance (each domain 0.33 weight). The optimization algorithm finds optimal weights of 0.45 for Healthy People, 0.16 for Healthy Lives, and 0.73 for Healthy Places. These weights will be used subsequently for a sensitivity and uncertainty analysis.

4.4 Principal Component Analysis derived weights

While there is no objective choice in selecting the weights, we concentrate on a so-called data-driven weighting system, derived from Principal Component Analysis (PCA) or Factor Analysis (FA). Now, in the context of composite indicator construction, these two methods can be applied at different steps due to their versatile interpretation: to identify dimensions, to cluster indicators and to define weights. While PCA and FA share several methodological aspects, there is a key difference between the two analyses. PCA is a data reduction method based on the correlation matrix, which re-defines a new set of uncorrelated variables as linear combinations of the original variables. In contrast, FA is a measurement model of a latent variable, where the latent factor ’causes’ the observed variables. There is a recommendation in the CI community ST (02) to use the PCA loadings as weights only if the first component accounts at least for the 70%70\% of the total variability. We applied this procedure to derive the weighting systems for subdomains. The 58 indicators are split in 17 subdomains (see Table 1), and for each of these subdomains we carried out a PCA analysis, for each year.

For most subdomains, over all four years, the first PCA component accounted for a range between 51% to 94% of the total variability. Exceptions were observed (see Table 6 in SM) for ’Mental health (Pe.4)’ with variance explained 66-69%, ’Behavioural risk factors (Li.1)’ 53-55%, ’Working conditions (Li.4)’ 50-55%, ’Risk factors for children (Li.5)’ 55%, ’Children and young people’s education (Li.6)’ 63-69% , ’Access to housing (Pl.3)’ 65-69%. We then normalized the loading coefficient and compared them over time, jointly with the weights originally derived from FA for all the years collapsed.

Refer to caption

Figure 7: Weight comparison between PCA weight per each year and time-series FA weights. [labels need to be changed to more readable]

We have investigated the PCA weights values over time and compared them with the time-series FA analysis computed for the ONS HI. We have found that these are very similar over time, which is reassuring in terms of stability of the index weights (see Figure 7). However, when we compared PCA and FA weights, we have found that FA gave higher weights to the following indicators (difference percentage among weights): low pay (12%), self-harm (10%), difficulty completing activities of daily living (5.4%) and drug misuse (6.6%). On the contrary, PCA imposed higher weights to job-related training (7.6%), physical activity (7%), suicides (5.9%) and workplace safety (4.4%).

4.5 Proposal

  • Given the negative correlation between the Healthy Places domain and the two other domains, we discourage arithmetic mean aggregation.

  • To interpret the weights as ‘importance’, optimized weights should be adopted.

  • Between FA and PCA derived weights, we recommend using PCA as the subdomain is not the ‘cause’ of the indicators, but a combination.

  • The overall index could be obtained by adopting optimized weights and a geometric mean aggregation formula. Optimized weight allow for partial substitution as correlation is accounted for; geometric mean rewards balance by penalizing uneven performance in the underlying domains.

5 Sensitivity and Uncertainty analysis

Following the approach introduced by Saisana et al. SST (05); Sob (93); SAA+ (10), we carried out an analysis of the sensitivity and uncertainty of the Health Index. This analysis is based on a variance-based approach that constructs Monte Carlo estimates of the variability observed due to each step, and due to the interactions between the different steps. For each of the construction steps 𝐪i\mathbf{q}_{i} we select a potential alternative methods. Therefore, indicating the model with mm, we can compute the global variance as

V(m)=iVi+ij>iVi,j++V1,2,,k,V(m)=\sum_{i}V_{i}+\sum_{i}\sum_{j>i}V_{i,j}+...+V_{1,2,...,k},

where

Vi=\displaystyle V_{i}= Vqi[E𝐪i(m|𝐪i)],\displaystyle V_{q_{i}}[E_{\mathbf{q}_{-}i}(m|\mathbf{q}_{i})],
Vij=\displaystyle{V_{i}}_{j}= Vqij[E𝐪ij(m|𝐪i,𝐪j)]Vqi[E𝐪i(m|𝐪i)]Vqj[E𝐪j(m|𝐪j)]\displaystyle{V_{q}}_{ij}[E_{\mathbf{q}_{-}ij}(m|\mathbf{q}_{i},\mathbf{q}_{j})]-V_{q_{i}}[E_{\mathbf{q}_{-}i}(m|\mathbf{q}_{i})]-V_{q_{j}}[E_{\mathbf{q}_{-}j}(m|\mathbf{q}_{j})]

The quantity Vqi[E𝐪i(m|𝐪i)]V_{q_{i}}[E_{\mathbf{q}_{-}i}(m|\mathbf{q}_{i})] and the expectation E𝐪iE_{\mathbf{q}_{-}i} require the computation of an integral over all factors except qiq_{i}, including the marginal distributions for these factors. The variance VqiV_{q_{i}} would imply a further integral over qiq_{i} and its marginal distribution.

The sensitivity indices are then Si=Vi/V(m)S_{i}=V_{i}/V(m). These terms measure the contribution of the input 𝐪i\mathbf{q}_{i} to the total variance, and can be interpreted as a fraction of uncertainty.

The first order sensitivity index, which is the fraction of the output variance caused by each uncertain input assumption alone, is:

Si=V[E(m|𝐪i)]V(m),S_{i}=\frac{V[E(m|\mathbf{q}_{i})]}{V(m)},

this is averaged over variations in other input assumptions, and the total order sensitivity index, (or interaction),

STi=1V[E(m𝐪i)]V(m)=E[V(mqi)]V(m)S_{Ti}=1-\frac{V[E\left(m\mid\mathbf{q}_{-i}\right)]}{V(m)}=\frac{E[V\left(m\mid\textbf{q}_{-i}\right)]}{V(m)}

where 𝐪i\mathbf{q}_{-i} is the set of all uncertain inputs except the iith quantity, and the quantity STiS_{Ti} measures the fraction of the output variance caused by 𝐪i\mathbf{q}_{i} and any interactions with other assumptions. In carrying out the sensitivity analysis, we have selected potential steps 𝐪i\mathbf{q}_{i} that are coherent with a final linear aggregation.

Table 2: Steps and methods used in the sensitivity analysis

Steps Alternatives Data treatment winsorization (2nd,5th,10th points) Normalization z-score, min-max Weights Indicators equal weights, principal components weights Weights Domains optimized weight

The steps and the methods to be tested are listed in Table 2. In our analysis we evaluated (for 2015) the following main outcomes: UTLA ranking by overall Index value and UTLA rankings by each domain’s index value.

We opted for winsorization to control data kurtosis and skewness, by winsorising at the second, fifth and tenth values. We allowed for two normalization types: z-score centered at 100 and standard deviation at 10; and min-max bounded 1-100. For the weights we allowed equal weights, PCA derived and optimized weights for domains only, as previously introduced. We ran the computations for 10,000 iterations.

We studied also the absolute mean ranking shift of removing indicators and subdomains, to evaluate the roles played by the hierarchical elements.

5.1 Results for Sensitivity and Uncertainty analysis

We carried out the sensitivity analysis on the modified ONS HI and in Figure 8, we notice that for the overall index tail rankings are stable, while the middle UTLAs are the ones showing the highest variability with median rankings (green dots) above or below the provided ranking.

We then repeated the analysis for the three domains separately (see Figure 4 in SM). The estimates are more precise, as the bounds between the 5th and 95th centile are narrower compared to the overall index. We observed that People rankings are quite precise and concentrated and it is possible to see that they are following the Health Index. Lives and Places are displaying higher variability, with Places acting as the ‘wild card’.

The first order sensitivity and the total order sensitivity have been computed for the overall index and the three domains and we reported them in Table 7 in SM. We then plotted the main effect SiS_{i} and the interactions STiS_{T_{i}}, see Figure 9. These values can be interpreted as the uncertainty caused by the effect of the iith uncertain parameter/assumption on its own. The total order sensitivity index is the uncertainty caused by the effect of the iith uncertain parameter/assumption, including its interactions with other inputs. This disentanglement shows that at domain level normalization plays a major role for all of them, with winsorization additionally being quite relevant for Places and weights being relevant for People. For the overall index, weights are the main cause of the variability with normalization and winsorization playing a minor role at interaction levels.

Refer to caption

Figure 8: Results of UA showing the overall Index for each UTLA, ordered by the modified ranking for 2015 (crosses). With the corresponding 5th and 95th percentiles (bounds) and the median ranking (green dots). For comparison, the original ONS UTLA ranking (black dots).

Refer to caption

Figure 9: Results of the sensitivity order: Main Effect SiS_{i} and the interaction STiS_{T_{i}}

5.2 Ranking shifts by removing indicators and subdomains

We assessed the absolute mean differences on the overall rank shift, by removing indicators and subdomains. At indicator level (see Figure 10), we observed the highest shifts are due to unemployment, access to private and public green space and personal crime. Moderate absolute shifts are observed for job-related training, workplace safety, disability, frailty, suicides, depression and rough sleeping.

At subdomain levels (see Figure 11), the highest impact is for ‘Access to services (Pl.4)’ and ‘Children and young people’s education (Li.6)’, followed by ‘Unemployment (Li.3)’ and ‘Working conditions (Li.4)’. The observation that Healthy Lives shows the most influence on the overall index values confirms what has already been observed in previous sections, where we note the high correlation between Healthy Lives values and the overall index values.

Refer to caption

Figure 10: The absolute mean rank shift for the overall index by removing one indicator at a time.

Refer to caption

Figure 11: The absolute mean rank shift for the overall index by removing one subdomain at a time.

6 Discussion

We have scrutinized the choices made when constructing the ONS Health Index for England, and have evaluated the issues that emerge while assessing each construction step. The resulting Health Index is easy to explain to wider audiences, and the data collection and the index structure are harmonized to be comparable across time and different geographies. The indicator selection covers the main areas of Health, in line with the WHO definition, and provides access to policy makers to different combination of indicators and comparisons.

Our analysis has shown that the weights and normalization steps play a major role in the exhibited variability in the Health Index, in particular for middle-ranking UTLAs. The steps that generate the most cumbersome decisions to be taken are the choice of the weighting system, and the choice of aggregation formula GITT (19). However, choices made for both steps need to be taken in the context with the preceding steps. Driven by the desideratum to have an index that is easy to explain, we decided to explore in the sensitivity and uncertainty analysis only those methods that were compatible with the approach taken by the ONS. In our case, we considered the use of different weighting systems and data treatment, while staying consistent with a final linear aggregation formula (i.e. an arithmetic mean). This coherence was also the reason why we recommended to intervene minimally on the data treatment, opting for winsorization and then if still needed we followed with a transformation to normalize the indicator. The negative correlation exhibited by Healthy Places, the effect on the rank shifting for Healthy Places indicators, and the low ranking correlation with the overall index, could potentially help us to reflect on the choices of the indicators and potentially revise the indicators selected. However, it is accurate to claim that areas with worse Healthy Places indicators, such as London boroughs (comprising 20% of all UTLAs), score higher values on the other two domains. The reverse is also true, where more rural UTLAs have, for example, lower pollution and good access to private and public green space, but are lower on other indicators.

By exploring the data derived weights using PCA and comparing them with the initial choices made in the ONS version, we saw some differences, but no major discrepancies. This approach also yielded similar results across time. The fact that PCA and FA return similar results, which are then reflected in weights, could be explained by the fact that, overall, the subdomains are composed of a very limited number of indicators. Indeed, the highest number of indicators in a given subdomain is for Li.5, with 7 indicators. Therefore the PCA correlation matrix closely resembles the off-diagonal FA correlation matrix.

The optimized set of weights allowed us to uncover the relationships among the domains. We could also extend this approach to subdomains. We have found that the correlation among the domains could be explored by decomposing the correlation ratio in two parts, and that these estimates can be further used to reflect weights as importance and not as trade-off ratios.

The weights play a major role in the sensitivity and uncertainty analysis, while the ranking uncertainty is smaller at people level only. Once we evaluated the overall index, we observed higher variability for the middle UTLAs. The UTLAs at top and bottom tend to remain stable. When we compared the difference between the original ONS ranking and the rankings range based on the modified index, we found these middle UTLAs are most likely to become outliers. This pattern could be a result of using arithmetic mean aggregation.

There are a number of potential aspects of index construction that we have not fully explored in this analysis. For example, given the uniqueness of the specific data set and the rich spatial data, we did not explore the effect of spatial-temporal correlations in the different steps of the CI construction, i.e. data imputation. Similarly, with the addition of new years’ data, imputation methods could benefit from the longer time series. The spatial component could be potentially also exploited to construct a ‘spatial composite index’ TC (18); SKVS (16); FVS (18); SCB+ (15). Nevertheless, the experimental Health Index fulfills the criteria advocated by Ashraf et al. ANTG (19) and it constitutes a starting basis for statistical improvements that will improve the feature releases.

Aftermath

In summary, this analysis conducted on the 2015-18 beta Health Index served as proof on the statistical coherence and investigated choices and issues that arise in the process of building a new composite index. The aim of the ONS Health Index is to become a reliable harmonized index over time and over space, with inclusion of finer geographical level and potential stratification of population by age and sex, including Scotland and Wales. The suggestions highlighted in this article are not exhaustive due to the current evolving nature of the index, but provide a valuable tool that serves as guidance for the upcoming versions.

While we assessed the beta version, the Health Index version from year 2019 already includes the Lower Tier Local Authorities (LTLA) for England (307 LTLAs) and several suggestions have been integrated. We review the propositions made in this paper and how these have been included.

  • Data is winsorized to derive weights by factor analysis, reducing need for much other transformation to normalize. The final data, which gets aggregated later on to produce Index scores, is not winsorized so LTLAs with extreme values can still observe change over time.

  • Public green space indicator - that was negatively correlated with private outdoor space - will be removed in the 2020 Health Index.

  • The passage to LTLA relaxed the correlation seen for Behavioural risk factors, and drug misuse indicators has been changed to a different source to meet the granularity criteria.

  • Subdomains ‘Risk factors for children’ and ‘Children and young people’s education’ are now joined into a unique subdomain ‘Children and young people’.

  • Hypertension (renamed to ‘High blood pressure’) continues to show high positive correlation with cardiovascular, respiratory and musculoskeletal conditions, even at LTLA level. Currently, the ONS is exploring potential alternatives for these indicators.

  • At LTLA level data no longer present a negative correlation between Healthy Places and the other domains: they are now all positively correlated, albeit Places has a weaker correlation with the others than the Lives–People correlation.

  • The inclusion of a smaller geography as base unit and changes observed for in correlations pairs, changes the derived weights that account for the correlation. The added value of our analysis highlighted how crucial the weights are, not only in terms of value but also in terms of interpretation. The ONS is going to evaluate the possibility to have non-compensatory weights defined by expert opinion, to better capture what stakeholders rate as important among the domains and subdomains.

  • We argued that between FA and PCA derived weights, PCA should be preferred. The ONS has not implemented this proposed modification. This is due to the evolving nature of the index at this stage. The extension to additional geographical layer and population stratification will have an impact on the correlation matrices and therefore weights may undergo substantial changes, before a stable version is finalized.

  • Finally, our last suggestion promoted a geometric mean aggregation formula, that implies only partial substitution and rewards balance by penalizing uneven performance in the underlying domains. Initially, among the principle followed by the ONS in producing a composite index, there was the necessity to be able to have simple statistical methods that could be easy to explain. However, in the course of this analysis, we showed that the geometric mean offers a valid alternative, used by many other well-known and respected indexes such as the Human Development Index Pro (10). This option will be taken into account in the ONS’s upcoming methodological evaluations.

Conclusion

In conclusion, the ONS Health Index (2015–18) presents a summary of the health of the population of England and fills a gap in policy making and assessment tools. The index is based on a hierarchical geographical structure, starting from the Upper Tier Local Authority level, rising to National level. It provides a detailed and flexible composite measurement, that will allow policy makers to assess changes in population health, and to plan interventions by identifying areas and policy domains where interventions can provide significant, quantifiable impact. Future Health Index editions, with finer geographic granularity and population subgroups, will enrich the understanding of health determinants and guide bespoke interventions and assessments.

Acknowledgments

AFS is grateful to William Becker for his help and for writing the R-package COINr and to Professor Avi Feller for useful comments.

References

  • ABPS [92] Pamela Abbott, Joyce Bernie, Geoff Payne, and Roger Sapsford. Health and material deprivation in plymouth. op. cit, 1992.
  • ANTG [19] Khalid Ashraf, Chirk Jenn Ng, Chin Hai Teo, and Kim Leng Goh. Population indices measuring health outcomes: A scoping review. Journal of global health, 9(1), 2019.
  • BDWL [19] Matthew Barclay, Mary Dixon-Woods, and Georgios Lyratzopoulos. The problem with composite indicators. BMJ quality & safety, 28(4):338–344, 2019.
  • BSPV [17] William Becker, Michaela Saisana, Paolo Paruolo, and Ine Vandecasteele. Weights and importance in composite indicators: Closing the gap. Ecological indicators, 80:12–22, 2017.
  • C+ [08] Joint Research Centre-European Commission et al. Handbook on constructing composite indicators: methodology and user guide. OECD publishing, 2008.
  • CB+ [22] Giulio CAPERNA, W Becker, et al. Jrc statistical audit of the european skills index 2022. JRC Publications, 2022.
  • Cee [20] Greg Ceely. Methods used to develop the health index for england: 2015 to 2018. Technical report, Office of National Statistics, 2020.
  • DL [13] Koen Decancq and María Ana Lugo. Weights in multidimensional indices of wellbeing: An overview. Econometric Reviews, 32(1):7–34, 2013.
  • DotEtR [00] Transport Department of the Environment and the Regions. Indices of deprivation 2000. Department of the Environment, Transport and the Regions, 2000.
  • Eur [17] Eurostat. Part 1: Indicator typologies and terminologies. In Towards a harmonised methodology for statistical indicators. Publications Office of the European Union, 2017.
  • Fre [03] Michael Freudenberg. Composite indicators of country performance, 2003.
  • FVS [18] Elisa Fusco, Francesco Vidoli, and Biresh K Sahoo. Spatial heterogeneity in composite indicator: A methodological proposal. Omega, 77:1–14, 2018.
  • GITT [19] Salvatore Greco, Alessio Ishizaka, Menelaos Tasiou, and Gianpiero Torrisi. On the methodological framework of composite indices: A review of the issues of weighting, aggregation, and robustness. Social indicators research, 141(1):61–94, 2019.
  • Gra [02] Frank P Grad. The preamble of the constitution of the world health organization. Bulletin of the World Health Organization, 80:981–981, 2002.
  • HPM [12] Adnan A Hyder, Prasanthi Puvanachandra, and Richard H Morrow. Measuring the health of populations: explaining composite indicators. Journal of Public Health Research, 1(3):222, 2012.
  • [16] JRC. Supporting policy with scientific evidence.
  • JSG [04] Rowena Jacobs, Peter C Smith, and Maria K Goddard. Measuring performance: an examination of composite performance indicators: a report for the Department of Health. Centre of Health Economics, University of York, 2004.
  • KMB [04] Eva Kaltenthaler, Ravi Maheswaran, and Catherine Beverley. Population-based health indexes: a systematic review. Health Policy, 68(2):245–255, 2004.
  • M+ [08] Giuseppe Munda et al. Social multi-criteria evaluation for a sustainable economy, volume 17. Springer, 2008.
  • MN [05] Giuseppe Munda and Michela Nardo. Constructing consistent composite indicators: the issue of weights, 2005.
  • MN [09] Giuseppe Munda and Michela Nardo. Noncompensatory/nonlinear composite indicators for ranking countries: a defensible setting. Applied Economics, 41(12):1513–1523, 2009.
  • MT [12] Thierry A Mara and Stefano Tarantola. Variance-based sensitivity indices for models with dependent inputs. Reliability Engineering & System Safety, 107:115–121, 2012.
  • NSST [05] Michela Nardo, Michaela Saisana, Andrea Saltelli, and Stefano Tarantola. Tools for composite indicators building. European Comission, Ispra, 15(1):19–20, 2005.
  • Pro [10] UNDP (United Nations Development Programme). Human development report 2010. UNDP (United Nations Development Programme), 2010.
  • R C [22] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2022.
  • SAA+ [10] Andrea Saltelli, Paola Annoni, Ivano Azzini, Francesca Campolongo, Marco Ratto, and Stefano Tarantola. Variance based sensitivity analysis of model output. design and estimator for the total sensitivity index. Computer physics communications, 181(2):259–270, 2010.
  • SCB+ [15] Mahdi-Salim Saib, Julien Caudeville, Maxime Beauchamp, Florence Carré, Olivier Ganry, Alain Trugeon, and Andre Cicolella. Building spatial composite indicators to analyze environmental health inequalities on a regional scale. Environmental Health, 14(1):1–11, 2015.
  • SKVS [16] Martin Siegel, Daniela Koller, Verena Vogt, and Leonie Sundmacher. Developing a composite index of spatial accessibility across different health care sectors: A german example. Health policy, 120(2):205–212, 2016.
  • SLTF+ [12] Isabelle Soerjomataram, Joannie Lortet-Tieulent, Jacques Ferlay, David Forman, Colin Mathers, D Maxwell Parkin, and Freddie Bray. Estimating and validating disability-adjusted life years at the global level: a methodological framework for cancer. BMC medical research methodology, 12(1):1–15, 2012.
  • Sob [93] Ilya M Sobol. Sensitivity analysis for non-linear mathematical models. Mathematical modelling and computational experiment, 1:407–414, 1993.
  • SP [12] Michaela Saisana and Dionisis Philippas. Sustainable society index (ssi): Taking societies’ pulse along social, environmental and economic issues. Environmental Impact Assessment Review, 32:94–106, 2012.
  • SST [05] Michaela Saisana, Andrea Saltelli, and Stefano Tarantola. Uncertainty and sensitivity analysis techniques as tools for the quality assessment of composite indicators. Journal of the Royal Statistical Society: Series A (Statistics in Society), 168(2):307–323, 2005.
  • ST [02] Michaela Saisana and Stefano Tarantola. State-of-the-art report on current methodologies and practices for composite indicator development, volume 214. Citeseer, 2002.
  • Sul [66] Daniel F Sullivan. Conceptual problems in developing an index of health. Number 17 in 1. US Department of Health, Education, and Welfare, Public Health Service, 1966.
  • TC [18] Daniele Trogu and Michele Campagna. Towards spatial composite indicators: A case study on sardinian landscape. Sustainability, 10(5):1369, 2018.
  • TPB [88] P Townsend, P Phillimore, and A Beattie. Indicator of health and deprivation: inequality in the north. London: Croom Helm, pages 30–40, 1988.
  • Van [90] Jean-Claude Vansnick. Measurement theory and decision aid. In Readings in multiple criteria decision aid, pages 81–100. Springer, 1990.
  • vdWPB [96] Harry PA van de Water, Rom JM Perenboom, and Hendriek C Boshuizen. Policy relevance of the health expectancy indicator; an inventory in european union countries. Health Policy, 36(2):117–129, 1996.
  • Vin [92] Philippe Vincke. Multicriteria decision-aid. John Wiley & Sons, 1992.
  • W [21] Becker W. Composite Indicator Development and Analysis in R with COINr, 2021.
  • Woo [01] Simon N Wood. mgcv: Gams and generalized ridge regression for r. R news, 1(2):20–25, 2001.