Patterns of convergence and divergence within the convergEU package

Nedka D. Nikiforova, Federico M. Stefanini, Chiara Litardi, Eleonora Peruffo and Massimiliano Mascherini

2024-01-21

Index:



1 Finding patterns of convergence and divergence within the convergEU package

The convergEU package allows to obtain patterns of change along time for indicators in the European Union (EU) by invoking the ms_pattern_ori function:

help(ms_pattern_ori)

The ms_pattern_ori function allows for obtaining patterns for both lowBest and highBest types of indicators. More specifically, in this function the following patterns are defined through numerical labels and corresponding string labels:

It is important to note that for finding patterns for indicators of type “low is better”, we assume that higher the indicator value, worse the considered socio/economic feature in a given member state (MS). Instead of creating new labels to tag patterns of this class of indicators, we transform the original indicator after noting that the absolute positioning of values is not relevant while judging for the presence of a given pattern. Thus, the indicators of type “low is better” are transformed, and the distance from the maximum value for each original observation is calculated. If the original index decreases then the transformed value increases, and the pattern recognition scheme applies in the same way as for indicators of type “high is better”.

The graphical plots for the defined patterns depending on the type of indicators (lowBest or highBest) are available by invoking the patt_legend function:

help(patt_legend)

When considering indicators of type highBest, the graphical representation of the patterns is as follows:

highind<-patt_legend(indiType="highBest")
highind

while for the lowBest type of indicators the plot of the patterns is:

lowhind<-patt_legend(indiType="lowBest")
lowhind

For further details on the defined patterns we refer to the Eurofound report “Monitoring convergence in the European Union Upward convergence in the EU: Concepts, measurements and indicators” (2018, p. 25-26).

For illustrating practically the points discussed above, let’s consider a first example related to the emp_20_64_MS dataset for which the indicator is of type highBest. Thus, for obtaining the patterns for this type of indicator,we invoke the ms_pattern_ori function as follows:

myemp <-ms_pattern_ori(emp_20_64_MS, "time",type="highBest")

The output of the ms_pattern_ori function consists of the usual three list components: “$res” that contains the results, “$msg” that possibly carries messages for the user and “$err” which is a string containing an error message, if an error occurs:

names(myemp)
#> [1] "res" "msg" "err"

The “$res” component of the output contains the numerical labels for the patterns as well as their string labels:

mypattemp<-myemp$res$mat_label_tags
mypattempn<-myemp$res$mat_without_summaries
mypattempn
#> # A tibble: 28 × 17
#>    Country `2002/2003` `2003/2004` `2004/2005` `2005/2006` `2006/2007`
#>    <chr>         <int>       <int>       <int>       <int>       <int>
#>  1 AT                4           3           4           4           4
#>  2 BE                6           1           1          21           1
#>  3 BG                1           1           1           1           1
#>  4 CY                4           4           3           4           2
#>  5 CZ                3           3           4           2           2
#>  6 DE                3          19          20           4           4
#>  7 DK                3           4           3           4           3
#>  8 EE                4           4           4           4           2
#>  9 EL                1           1           5           1           5
#> 10 ES                1           1           1           1           5
#> # ℹ 18 more rows
#> # ℹ 11 more variables: `2007/2008` <int>, `2008/2009` <int>, `2009/2010` <int>,
#> #   `2010/2011` <int>, `2011/2012` <int>, `2012/2013` <int>, `2013/2014` <int>,
#> #   `2014/2015` <int>, `2015/2016` <int>, `2016/2017` <int>, `2017/2018` <int>
mypattemp
#> # A tibble: 28 × 17
#>    Country `2002/2003`   `2003/2004`   `2004/2005`       `2005/2006` `2006/2007`
#>    <chr>   <chr>         <chr>         <chr>             <chr>       <chr>      
#>  1 AT      Outperforming Inversion     Outperforming     Outperform… Outperform…
#>  2 BE      Diving        Catching up   Catching up       Other (Ins… Catching up
#>  3 BG      Catching up   Catching up   Catching up       Catching up Catching up
#>  4 CY      Outperforming Outperforming Inversion         Outperform… Flattening 
#>  5 CZ      Inversion     Inversion     Outperforming     Flattening  Flattening 
#>  6 DE      Inversion     Crossing      Crossing reversed Outperform… Outperform…
#>  7 DK      Inversion     Outperforming Inversion         Outperform… Inversion  
#>  8 EE      Outperforming Outperforming Outperforming     Outperform… Flattening 
#>  9 EL      Catching up   Catching up   Slower pace       Catching up Slower pace
#> 10 ES      Catching up   Catching up   Catching up       Catching up Slower pace
#> # ℹ 18 more rows
#> # ℹ 11 more variables: `2007/2008` <chr>, `2008/2009` <chr>, `2009/2010` <chr>,
#> #   `2010/2011` <chr>, `2011/2012` <chr>, `2012/2013` <chr>, `2013/2014` <chr>,
#> #   `2014/2015` <chr>, `2015/2016` <chr>, `2016/2017` <chr>, `2017/2018` <chr>

Let’s illustrate more in detail one of the obtained patterns; to this end, we consider the time period 2006-2007 and the country France (“FR”) for which the obtained pattern is “Slower pace”:

mypattemp[["2006/2007"]][12]
#> [1] "Slower pace"

with the following graphical plot of the calculated pattern where the dashed blue line refers to the France and the black solid line refers to the EU: The interpretation of the “Slower pace” pattern is straightforward as illustrated in the Eurofound report “Monitoring convergence in the European Union Upward convergence in the EU: Concepts, measurements and indicators” (2018, p. 25).

A second example relates to an indicator of type “low is better”. To this end, let’s consider the indicator Unemployment rate by sex, age and educational attainment - annual averages for which the data are stored in the une_educ_a.xls file (Subsection 4.2, Tutorial for analyzing convergence with the convergEU package). First, we import the data from the xls file as explained in details in the Tutorial (Subsection 4.2):

# library(readxl)
file_name <- system.file("vign/une_educ_a.xls", package = "convergEU")
myxls2<-read_excel(file_name,
                   sheet="Data",range = "A12:AP22", na=":")
myxls2 <- dplyr::mutate(myxls2, `TIME/GEO` = as.numeric(`TIME/GEO`))

where “une_educ_a.xls” specifies the path (eventually including disk unit or folders) in which the xls file is stored. Then, the cluster EU27_2020 of MS is chosen, the data are checked for unsuited features, and missing values imputation is performed as follows:

EU27estr<-convergEU_glb()$EU27_2020$memberStates$codeMS
myxls<- dplyr::select(myxls2,`TIME/GEO`,all_of(EU27estr))
check_data(myxls)
#> $res
#> NULL
#> 
#> $msg
#> NULL
#> 
#> $err
#> [1] "Error: one or more missing values in the dataframe."
myxls3<- dplyr::rename(myxls,time=`TIME/GEO`)
myxlsf <- impute_dataset(myxls3, timeName ="time",
                         countries=convergEU_glb()$EU27_2020$memberStates$codeMS,
                         headMiss = c("cut", "constant")[2],
                         tailMiss = c("cut", "constant")[2])$res
check_data(myxlsf)
#> $res
#> [1] TRUE
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

in order to obtain the final dataset myxlsf for calculating patterns.

The indicator une_educ_a is of type “low is better”; thus, the syntax to find the patterns is as follows:

myres <-  ms_pattern_ori(myxlsf, "time",type="lowBest")

where the “$res” component of the output contains the numerical labels for the patterns as well as their string labels:

mypattl<-myres$res$mat_label_tags
mypattn<-myres$res$mat_without_summaries
mypattn
#> # A tibble: 27 × 10
#>    Country `2009/2010` `2010/2011` `2011/2012` `2012/2013` `2013/2014`
#>    <chr>         <dbl>       <dbl>       <dbl>       <dbl>       <dbl>
#>  1 BE                7           8           7          10           3
#>  2 DK               10           7           7           8           2
#>  3 FR               20          21          21          21          21
#>  4 DE                8           8           8           8           2
#>  5 EL               10          10          19           9           1
#>  6 IE                9           9           9          11           1
#>  7 IT                7           7          10          10           3
#>  8 LU                8          10           7          10          21
#>  9 NL                7           8          10          10           3
#> 10 PT                7          10          10          10           4
#> # ℹ 17 more rows
#> # ℹ 4 more variables: `2014/2015` <dbl>, `2015/2016` <dbl>, `2016/2017` <dbl>,
#> #   `2017/2018` <dbl>
mypattl
#> # A tibble: 27 × 10
#>    Country `2009/2010`       `2010/2011`     `2011/2012` `2012/2013` `2013/2014`
#>    <chr>   <chr>             <chr>           <chr>       <chr>       <chr>      
#>  1 BE      Defending better  Escaping        Defending … Underperfo… Inversion  
#>  2 DK      Underperforming   Defending bett… Defending … Escaping    Flattening 
#>  3 FR      Crossing reversed Other (Inspect… Other (Ins… Other (Ins… Other (Ins…
#>  4 DE      Escaping          Escaping        Escaping    Escaping    Flattening 
#>  5 EL      Underperforming   Underperforming Crossing    Falling aw… Catching up
#>  6 IE      Falling away      Falling away    Falling aw… Recovering  Catching up
#>  7 IT      Defending better  Defending bett… Underperfo… Underperfo… Inversion  
#>  8 LU      Escaping          Underperforming Defending … Underperfo… Other (Ins…
#>  9 NL      Defending better  Escaping        Underperfo… Underperfo… Inversion  
#> 10 PT      Defending better  Underperforming Underperfo… Underperfo… Outperform…
#> # ℹ 17 more rows
#> # ℹ 4 more variables: `2014/2015` <chr>, `2015/2016` <chr>, `2016/2017` <chr>,
#> #   `2017/2018` <chr>

For this indicator, let’s take the time period 2015-2016 and the MS Finland (“FI”) for which the obtained pattern is again “Slower pace”:

mypattl$`2015/2016`[14]
#> [1] "Slower pace"

In this case, given that the indicator is of type “low is better”, the plot for the “Slower pace” pattern is: where the dashed blue line refers to Finland and the black solid line refers to the EU. Recall that differently from the previous indicator of type “high is better”, for this type of indicator the results for the “Slower pace” pattern should be interpreted according to the assumption that “higher the indicator value, worse the considered socio/economic feature in a member country”.

To further illustrate other possible patterns, consider again the first example related to the emp_20_64_MS dataset (indicators of type “high is better”). For example, let’s take the time period 2011-2012 and the MS Portugal (“PT”) for which the pattern is “Crossing”:

mypattemp$`2011/2012`[23]
#> [1] "Crossing"

where the corresponding plot for this pattern is:

Similarly, for the member country Lithuania (“LT”) in the same time period, the obtained pattern is now “Crossing reversed” and the plot for this pattern is:

2 Types of convergence/ divergence within the convergEU package

Convergence and divergence may be strict or weak, upward or downward. In the convergEU package, the function upDo_CoDi is specifically implemented for assessing the type of convergence/ divergence occurring for a given indicator, a collection of member states and a period of time:

help(upDo_CoDi)

The interpretation depends on the type of indicator, that is “highBest” or “lowBest”. Let’s consider a first example for the emp_20_64_MS dataset in which the indicator “Employment rate” is of type highBest. Suppose that we wish to determine the type of convergence/ divergence by considering as reference time the year 2008 (time_0), as target time the year 2010 (time_t), and the variance for summarizing dispersion (argument heter_fun):

Empconv<-upDo_CoDi(emp_20_64_MS,
              timeName = "time",
              indiType = "highBest",
              time_0 = 2008,
              time_t = 2010,
              heter_fun = "var")

The output of the upDo_CoDi function consists of the usual three list components: “$res” that contains the results, “$msg” that possibly carries messages for the user and “$err” which is a string containing an error message, if an error occurs:

names(Empconv)
#> [1] "res" "msg" "err"
Empconv$msg
#> NULL
Empconv$err
#> NULL

By considering more in details the component “$res”, it contains for example:

Empconv$res$declaration_type
#> [1] "Convergence"
Empconv$res$declaration_strict
#> [1] "none"
Empconv$res$declaration_weak
#> [1] "Weak downward"
Empconv$res$declaration_split$names_incre
#> [1] "AT" "DE" "LU" "MT" "RO"
Empconv$res$declaration_split$names_decre
#>  [1] "BE" "BG" "CY" "CZ" "DK" "EE" "EL" "ES" "FI" "FR" "HR" "HU" "IE" "IT" "LT"
#> [16] "LV" "NL" "PL" "PT" "SE" "SI" "SK" "UK"
Empconv$res$diffe_MS
#>    AT   BE BG   CY CZ DE   DK    EE   EL   ES   FI   FR   HR   HU IE   IT   LT
#> 1 0.1 -0.4 -6 -1.5 -2  1 -3.8 -10.3 -2.5 -5.7 -2.8 -1.2 -2.8 -1.6 -8 -1.9 -7.7
#>    LU    LV  MT   NL   PL   PT  RO   SE   SI   SK   UK
#> 1 1.9 -11.1 0.9 -0.7 -0.7 -2.8 0.4 -2.3 -2.7 -4.2 -1.7
Empconv$res$diffe_averages
#> [1] -2.860714
Empconv$res$dispersions
#> Time: 2008 Time: 2010 
#>   29.76417   28.44423

Note that if the argument heter_fun is set to var (as in this example) or sd (i.e. the standard deviation), then calculations for those statistics are performed using as a denominator \(n-1\), i.e. the number of observations decreased by 1. Thus, if the users prefer to adopt n as a denominator, then the function pop_var may be used as follows:

Empconvpop<-upDo_CoDi(emp_20_64_MS,
                   timeName = "time",
                   indiType = "highBest",
                   time_0 = 2008,
                   time_t = 2010,
                   heter_fun = "pop_var")

User-developed function are also allowed in the argument heter_fun, as illustrated in the following example related to an indicator of type lowBest. To this end, we consider the dataset myxlsf illustrated in the previous Section and related to the indicator Unemployment rate by sex, age and educational attainment. We choose as a reference time the year 2009 and as a target time the year 2011. Moreover, in this case we consider the following user-developed function for summarizing dispersion:

diffQQmu <-  function(vettore){
 (quantile(vettore,0.75)-quantile(vettore,0.25))/mean(vettore)
  }

This user-developed function diffQQmu is specified in the argument heter_fun of the function upDo_CoDi:

unempconvvar<-upDo_CoDi(myxlsf,
                      timeName = "time",
                      indiType = "lowBest",
                      time_0 = 2009,
                      time_t = 2011,
                      heter_fun = "diffQQmu")
unempconvvar
#> $res
#> $res$declaration_type
#> [1] "Divergence"
#> 
#> $res$declaration_strict
#> [1] "none"
#> 
#> $res$declaration_weak
#> [1] "Weak downward"
#> 
#> $res$declaration_split
#> $res$declaration_split$names_incre
#>  [1] "BE" "DK" "FR" "EL" "IE" "IT" "LU" "NL" "PT" "ES" "FI" "SE" "CY" "CZ" "HU"
#> [16] "LT" "MT" "PL" "SK" "SI" "BG" "HR"
#> 
#> $res$declaration_split$names_decre
#> [1] "DE" "AT" "EE" "LV" "RO"
#> 
#> 
#> $res$diffe_MS
#>    BE  DK FR   DE  EL  IE  IT  LU NL  PT  ES   AT  FI  SE  CY  CZ   EE  HU   LV
#> 1 0.3 2.2  0 -2.7 8.6 5.9 1.1 0.2  1 3.6 4.4 -1.6 1.3 0.7 1.4 0.2 -2.4 1.8 -1.9
#>    LT MT  PL SK  SI   BG   RO  HR
#> 1 8.7  0 3.6  1 4.6 10.9 -0.3 7.5
#> 
#> $res$diffe_averages
#> [1] 2.225926
#> 
#> $res$dispersions
#> Time: 2009 Time: 2011 
#>  0.7080638  0.7504565 
#> 
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

According to the obtained results, for this type of indicator there is an evidence of divergence of type “weak upward” in the period 2009-2011.





References


The following reference may be consulted to find further details on convergence: