Octocat This notebook is part of a GitHub repository: https://github.com/pessini/european-voters
MIT Licensed
Author: Leandro Pessini

Examining factors that influence EU acceptance among European voters

Would you vote for your country to leave or remain in the European Union?

Photo: The Irish Times — BREXIT: THE FACTS

1- Introduction

European Social Survey (ESS) This analysis will investigate a dataset provided by European Social Survey (ESS) which is a cross-national survey of attitudes and behaviour from European citizens. The topics covered by ESS are very heterogeneous and include media and social trust, politics, immigration, citizen involvement, health and care, economic, work and well-being.

The focus will be on which aspects can influence a person to vote for their country to leave or remain a member of the European Union. The variables selected are mostly socio-demographic such as education, employment status and Union membership status.

Data Dictionary

  • CNTRY Country
  • EDUYRS Years of full-time education completed
  • EISCED Highest level of education, ES - ISCED
  • UEMP3M Ever unemployed and seeking work for a period more than three months
  • MBTRU Member of trade union or similar organisation
  • VTEURMMB Would vote for your country to remain member of European Union or leave
  • GNDR Gender
  • YRBRN Year of birth
  • AGEA Age of respondent. Calculation based on year of birth and year of interview

Survey Questions

EDUYR About how many years of education have you completed, whether full-time or part-time? Please report these in full-time equivalents and include compulsory years of schooling.
EISCED Generated variable: Highest level of education, ES - ISCED 9 (What is the highest level of education you have successfully completed?)
UEMP3M Have you ever been unemployed and seeking work for a period of more than three months?
MBTRU Are you or have you ever been a member of a trade union or similar organisation? IF YES, is that currently or previously?
VTEURMMB Imagine there were a referendum in [country] tomorrow about membership of the European Union. Would you vote for [country] to remain a member of the European Union or to leave the European Union?
YRBRN And in what year were you born?

International Standard Classification of Education (ISCED)

ISCED is the reference international classification for organising education programmes and related qualifications by levels and fields. ISCED 2011 (levels of education) has been implemented in all EU data collections since 2014.

Levels

  • ISCED 0: Early childhood education (‘less than primary’ for educational attainment)
  • ISCED 1: Primary education
  • ISCED 2: Lower secondary education
  • ISCED 3: Upper secondary education
  • ISCED 4: Post-secondary non-tertiary education
  • ISCED 5: Short-cycle tertiary education
  • ISCED 6: Bachelor’s or equivalent level
  • ISCED 7: Master’s or equivalent level
  • ISCED 8: Doctoral or equivalent level

More info about ISCED can be found here).

Notebook settings

# Change the default plots size 
options(repr.plot.width=15, repr.plot.height=10)
options(warn=-1)
# Suppress summarise info
options(dplyr.summarise.inform = FALSE)

Libraries

# Check if the packages that we need are installed
want = c("dplyr", "ggplot2", "ggthemes", "gghighlight", "foreign", "scales", "survey", "srvyr", "caret", 
         "ggpubr", "forcats")
have = want %in% rownames(installed.packages())
# Install the packages that we miss
if ( any(!have) ) { install.packages( want[!have] ) }
# Load the packages
junk <- lapply(want, library, character.only = T)
# Remove the objects we created
rm(have, want, junk)
Attaching package: ‘dplyr’


The following objects are masked from ‘package:stats’:

    filter, lag


The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union


Loading required package: grid

Loading required package: Matrix

Loading required package: survival


Attaching package: ‘survey’


The following object is masked from ‘package:graphics’:

    dotchart



Attaching package: ‘srvyr’


The following object is masked from ‘package:stats’:

    filter


Loading required package: lattice


Attaching package: ‘caret’


The following object is masked from ‘package:survival’:

    cluster


Loading dataset

Selecting the variables which will be used for the data analysis

survey_rawdata <- read.spss("ESS9e03_1.sav", use.value.labels=T, max.value.labels=Inf, to.data.frame=TRUE)
variables <- c("cntry", 
               "eduyrs", 
               "eisced",
               "uemp3m", 
               "mbtru", 
               "vteurmmb", 
               "yrbrn", 
               "agea", 
               "gndr", 
               "anweight", 
               "psu", 
               "stratum")
european_survey <- survey_rawdata[,variables]
head(european_survey)
A data.frame: 6 × 12
cntryeduyrseisceduemp3mmbtruvteurmmbyrbrnageagndranweightpsustratum
<fct><fct><fct><fct><fct><fct><fct><fct><fct><dbl><dbl><dbl>
1Austria12ES-ISCED IIIb, lower tier upper secondaryNoNo Remain member of the European Union197543Male 0.06588958168859
2Austria12ES-ISCED IIIb, lower tier upper secondaryNoYes, previouslyRemain member of the European Union195167Male 0.12490674 8879
3Austria12ES-ISCED II, lower secondary NoNo Leave the European Union 197840Female0.68583600 93811
4Austria11ES-ISCED IIIb, lower tier upper secondaryNoNo Remain member of the European Union195563Male 0.11675334199874
5Austria8 ES-ISCED II, lower secondary NoNo Remain member of the European Union194771Female0.31178924 60199
6Austria13ES-ISCED IIIb, lower tier upper secondaryNoYes, previouslyRemain member of the European Union195464Male 0.17386711 6877
paste0("Number of rows in the dataset: ", nrow(european_survey))
'Number of rows in the dataset: 49519'

2- Data Cleaning & Wrangling

# Checking for NA's in the dataset
sapply(european_survey, function(x) sum(is.na(x)))
cntry
0
eduyrs
708
eisced
107
uemp3m
295
mbtru
331
vteurmmb
13648
yrbrn
222
agea
222
gndr
0
anweight
0
psu
0
stratum
0

Handle Missings

# For the purpose of this analysis, considering Vote as Leave or Remain
european_survey$vteurmmb <- as.character(european_survey$vteurmmb)
european_survey$vteurmmb[european_survey$vteurmmb == "Remain member of the European Union"] <- "Remain"
european_survey$vteurmmb[european_survey$vteurmmb == "Leave the European Union"] <- "Leave"
european_survey$vteurmmb[european_survey$vteurmmb == "Would submit a blank ballot paper"] <- NA
european_survey$vteurmmb[european_survey$vteurmmb == "Would spoil the ballot paper"] <- NA
european_survey$vteurmmb[european_survey$vteurmmb == "Would not vote"] <- NA
european_survey$vteurmmb[european_survey$vteurmmb == "Not eligible to vote"] <- NA
european_survey$vteurmmb <- as.factor(european_survey$vteurmmb)
# Cleaning responses that are not able to fit into ISCED
european_survey$eisced <- as.character(european_survey$eisced)
european_survey$eisced[european_survey$eisced == "Not possible to harmonise into ES-ISCED"] <- NA
european_survey$eisced[european_survey$eisced == "Other"] <- NA
# Cleaning NA values
df_european_survey <- european_survey[complete.cases(european_survey), ]
sapply(df_european_survey, function(x) sum(is.na(x)))
cntry
0
eduyrs
0
eisced
0
uemp3m
0
mbtru
0
vteurmmb
0
yrbrn
0
agea
0
gndr
0
anweight
0
psu
0
stratum
0
# Different way to clean the variable leaving as yes or no
df_european_survey$uemp3m <- as.character(df_european_survey$uemp3m)
df_european_survey$uemp3m <- as.factor(df_european_survey$uemp3m)

Data Transformation / Aggregation

Aggregating Education Levels by EISCED

Level ISCED
Low education Levels 0-2
Medium education Levels 3-4
High education Levels 5-8
# Creating a new feature Education by aggregating the ISCED"s levels
# Low, Medium and High Education
df_european_survey <- df_european_survey %>% 
  mutate(Education = case_when(
      eisced == "ES-ISCED I , less than lower secondary" ~ "Low Education",
      eisced == "ES-ISCED II, lower secondary" ~ "Low Education",
      eisced == "ES-ISCED IIIb, lower tier upper secondary" ~ "Medium Education",
      eisced == "ES-ISCED IIIa, upper tier upper secondary" ~ "Medium Education",
      eisced == "ES-ISCED IV, advanced vocational, sub-degree" ~ "Medium Education",
      eisced == "ES-ISCED V1, lower tertiary education, BA level" ~ "High Education",
      eisced == "ES-ISCED V2, higher tertiary education, >= MA level" ~ "High Education",
      TRUE ~ eisced))
df_european_survey$Education <- as.factor(df_european_survey$Education)
df_european_survey$eisced <- as.factor(df_european_survey$eisced)
# For the purpose of this analysis, considering the answer if the respondent ever been a member 
# of a trade union or similar organisation - "Yes, currently" and "Yes, previously" as simple Yes
df_european_survey$mbtru <- as.character(df_european_survey$mbtru)
df_european_survey$mbtru[df_european_survey$mbtru == "Yes, currently"] <- "Yes"
df_european_survey$mbtru[df_european_survey$mbtru == "Yes, previously"] <- "Yes"
df_european_survey$mbtru <- as.factor(df_european_survey$mbtru)
# Transforming as numeric the variable Years of Education
df_european_survey$eduyrs <- as.numeric(df_european_survey$eduyrs)
# Creating a new feature as per age (eg. young, young adult, older adult, elderly)
df_european_survey$agea <- as.numeric(df_european_survey$agea)
df_european_survey <- df_european_survey %>% 
  mutate(Age_Band = case_when(
    agea < 20 ~ "<20",
    agea >= 20 & agea < 40 ~ "20-39",
    agea >= 40 & agea <= 65 ~ "40-65",
    agea > 65 ~ ">65"))
df_european_survey$Age_Band <- as.factor(df_european_survey$Age_Band)

European Regions

Conventionally there are four main geographical regions or subregions in Europe.

  • Northern Europe
  • Western Europe
  • Eastern Europe
  • Southern Europe

Northern Europe refers to the portion of Europe to the north of Western Europe, the English Channel, and the Baltic Sea; it also includes the Baltic republics of Estonia, Latvia, and Lithuania.

Western Europe is bounded by the Atlantic Ocean in the west, the English Channel and the North Sea to the north, and the Alps in the south.

Conventionally Eastern Europe is the geographical region east of Germany and west of the Ural Mountains. The United Nations geo-scheme lists ten countries including the former Eastern bloc countries of Poland, Czechia, and Slovakia (formerly Czechoslovakia), Hungary, Romania, and Bulgaria, the former Soviet republics of Belarus and Ukraine, as well as European Russia.

Southern Europe or Mediterranean Europe refers to the mainly subtropical southern portion of the continent. The region is bounded by the Mediterranean Sea in the south. There are 13 sovereign countries in Southern Europe; seven of those states are members of the European Union.

northern <- c("Denmark","Finland","Ireland","Latvia","Lithuania","Sweden")
western <- c("Austria","Belgium","France","Germany","Netherlands")
eastern <- c("Bulgaria","Czechia","Hungary","Poland","Slovakia")
southern <- c("Slovenia","Cyprus","Spain","Croatia","Italy","Portugal")
df_european_survey <- df_european_survey %>% mutate(Region = case_when(cntry %in% northern ~ "Northern Europe",
                                                                      cntry %in% western ~ "Western Europe",
                                                                      cntry %in% eastern ~ "Eastern Europe",
                                                                      cntry %in% southern ~ "Southern Europe",
                                                                      TRUE ~ "Europe"))

3- Survey Weights

The analysis of survey data often uses complex sample designs and weighting adjustments in order to make the sample look more like the intended population of the survey. As ESS is a cross-national survey and countries implement different sample designs, it is important to use weights in all analyses to take into consideration the country context, and therefore avoid bias in the outcome.

Post-stratification weights intended purpose is to decrease the impact of coverage, sampling and nonresponse error. This weight is based on gender, age, education and geographical region.

Clustering produces more precise population estimates than a simple random design would achieve but this makes survey results appear more homogeneous. To address this problem ESS uses Clustering Adjustments.

According to ESS documentation:

It is recommended that by default you should always use anweight (analysis weight) as a weight in all analysis. This weight is suitable for all types of analysis, including when you are studying just one country, when you compare across countries, or when you are studying groups of countries.

anweight corrects for differential selection probabilities within each country as specified by sample design, for nonresponse, for noncoverage, and for sampling error related to the four post-stratification variables, and takes into account differences in population size across countries.

Details about how ESS weights the data can be found here.

There are 2 R packages which help us with complex surveys design: survey and srvyr

In ESS dataset the clustering variable is psu, stratification is indicated by stratum, and weighting by anweight.

srvyr library which is based on survey brings a dplyr syntax-style.

weighted_df_ess <- df_european_survey %>% as_survey_design(ids=psu, strata=stratum, weights=anweight)
# Lonely PSUs - http://r-survey.r-forge.r-project.org/survey/exmample-lonely.html
options(survey.lonely.psu = "adjust")
weighted_df_ess
Stratified 1 - level Cluster Sampling design (with replacement)
With (16284) clusters.
Called via srvyr
Sampling variables:
 - ids: psu
 - strata: stratum
 - weights: anweight
Data variables: cntry (fct), eduyrs (dbl), eisced (fct), uemp3m (fct), mbtru
  (fct), vteurmmb (fct), yrbrn (fct), agea (dbl), gndr (fct), anweight (dbl),
  psu (dbl), stratum (dbl), Education (fct), Age_Band (fct), Region (chr)

4- Exploratory Data Analysis

# Classifying happiness with EU by splitting countries with more than 15% of voting to Leave as Unfavorable
happiness_EU <- weighted_df_ess %>% 
                    group_by(cntry,vteurmmb) %>%
                    summarise(proportion = survey_mean()) %>%
                    filter(vteurmmb == "Leave") %>% 
                    mutate(EU_Opinion = ifelse(proportion < .16, "Favorable", "Unfavorable")) %>%
                    group_by(EU_Opinion) %>% summarise(total = n()) %>%
                    mutate(prop = total / sum(total), 
                           label = paste0(round(total / sum(total) * 100, 0), "%"), 
                           label_y = cumsum(prop) - 0.5 * prop)
happiness_overview <- happiness_EU %>%
    ggplot(aes(x = "", y = prop)) +
    geom_bar(aes(fill = fct_reorder(EU_Opinion, prop, .desc = FALSE)), lineend = 'round',
             stat = "identity", width = .5, alpha=.9) +
    coord_flip() +
    scale_fill_manual(values = c("#67a9cf", "#ef8a62")) +
    geom_text(aes(y = label_y, label = paste0(label, "\n", EU_Opinion)), size = 8, col = "white", fontface = "bold") +
    labs(x = "", y = "%",
        title = "How happy member nations are with European Union?",
        subtitle = "Considering more than 15% of votes to Leave the EU as Unfavorable view") + 
    theme_void() +
    theme(axis.title.x = element_blank(), axis.text.x = element_blank(), axis.ticks.x = element_blank()) +
    theme(legend.position = "none",
          plot.title=element_text(vjust=.8,family='', face='bold', colour='#636363', size=25),
          plot.subtitle=element_text(vjust=.8,family='', face='bold', colour='#636363', size=15))

#ef8a62 - Happy
#67a9cf - Not so Happy

European Union Views

options(repr.plot.width=12, repr.plot.height=5)
happiness_overview

The majority of countries surveyed have shown a favorable view regarding the European Union. However, not everyone is happy with the institution. Across the 22 EU member countries surveyed a median of 32% hold an unfavorable view.

# changing the global plot size back
options(repr.plot.width=15, repr.plot.height=10)

What countries hold a negative outlook towards the EU?

countries_by_Vote_Leave <- weighted_df_ess %>% group_by(cntry,vteurmmb) %>% 
    summarise(total = survey_total(), prop = survey_mean()) %>%
    filter(vteurmmb == "Leave") %>%
    arrange(desc(prop)) %>%
    head(15)
countries_by_Vote_Leave %>% 
mutate(factor(cntry, levels = .$cntry),
      label = paste0(round(prop * 100, 0), "%")) %>%
ggplot(aes(x=reorder(cntry,prop), y=prop)) + 
    geom_segment(aes(xend = cntry, yend = 0), color = "#67a9cf", size=1.2) +
    geom_point(size = 18, color="#67a9cf") +
    geom_text(face="bold", color = "white", size = 5, aes(label = label)) +
    geom_hline(aes(yintercept = .20), colour = "#8da0cb", linetype ="longdash", size = .8) +
    annotate("text", x = 12.5, y = .23, family='', face='bold', colour='#636363', size=7,
               label = "2 countries \n with more than 20%") +
    scale_y_continuous(labels = scales::percent) +
    labs(x = "", y = "",
         title = "Countries with the highest proportion of votes to Leave the EU",
        subtitle = "% is approximate") +
    theme_minimal() + coord_flip() +
    theme(axis.title.x = element_blank(), axis.text.x = element_blank(), axis.ticks.x = element_blank(),
          axis.text.y = element_text(face="bold", color="#636363", size=18),
          plot.title=element_text(vjust=1.5,family='', face='bold', colour='#636363', size=25),
          plot.subtitle=element_text(vjust=1.5,family='', face='bold', colour='#636363', size=15))

The countries with the highest proportion of votes to leave the EU are Czechia, Italy, France, Finland and Cyprus. All of them with more than 17% of respondents inclined to vote for their country to Leave the EU in a hypothetical referendum.

Czechia and Italy are the only countries with more than 20% of voting intentions to Leave the EU.

And how about countries that hold strongly positive views of the political union?

countries_by_Vote_Remain <- weighted_df_ess %>% group_by(cntry,vteurmmb) %>% 
    summarise(total = survey_total(), prop = survey_mean()) %>%
    filter(vteurmmb == "Remain") %>%
    arrange(desc(prop)) %>%
    head(15)
countries_by_Vote_Remain %>% 
mutate(factor(cntry, levels = .$cntry),
      label = paste0(round(prop * 100, 0), "%")) %>%
ggplot(aes(x=reorder(cntry,prop), y=prop)) + 
    geom_segment(aes(xend = cntry, yend = 0), color = "#ef8a62", size=1.2) +
    geom_point(size = 18, color="#ef8a62") +
    geom_text(face="bold", color = "white", size = 5, aes(label = label)) +
    scale_y_continuous(labels = scales::percent) +
    labs(x = "", y = "",
         title = "Countries with the highest proportion of votes to Remain member of the EU",
        subtitle = "% is approximate") +
    theme_minimal() + coord_flip() +
    theme(axis.title.x = element_blank(), axis.text.x = element_blank(), axis.ticks.x = element_blank(),
          axis.text.y = element_text(face="bold", color="#636363", size=18),
          plot.title=element_text(vjust=1.5,family='', face='bold', colour='#636363', size=25),
          plot.subtitle=element_text(vjust=1.5,family='', face='bold', colour='#636363', size=15))

On contrary Poland, Ireland, Spain, Portugal, and Lithuania have more than 92% of their population voting intentions to Remain a member of the EU.

How is the voting intention when it comes to European regions?

weighted_df_ess %>% 
    group_by(Region,vteurmmb) %>%
    summarise(total = round(survey_total(),2), proportion = round(survey_mean(),2)) %>%
    mutate(label = paste0(round(proportion * 100, 2), "%"), 
           label_y = cumsum(proportion) - 0.5 * proportion) %>%
    ggplot(aes(x= fct_reorder2(Region, vteurmmb, proportion, .desc = FALSE), y=proportion)) + 
    geom_bar(aes(fill=vteurmmb), position = position_stack(reverse = TRUE) ,stat="identity", width = .4) +
    scale_fill_manual(values = c("#67a9cf", "#ef8a62"))  +
    scale_y_continuous(labels = scales::percent) +
    coord_flip() +
    geom_text(aes(y=label_y, label = paste0(label, "\n", vteurmmb)), 
              col = "white",
              size = 6,
              fontface = "bold") +
    labs(x = "", y = "", fill = "",
        title = "Voting intention on European Regions")+ 
    theme_minimal() + 
    theme(legend.position = "none",
          axis.title.x = element_blank(), axis.text.x = element_blank(), axis.ticks.x = element_blank(),
          axis.text.y = element_text(face="bold", color="#636363", size=18),
          axis.title.y = element_blank(),
          plot.title=element_text(vjust=.5,family='', face='bold', colour='#636363', size=25),
          plot.subtitle=element_text(vjust=.5,family='', face='bold', colour='#636363', size=15))

Northern and Southern are the European regions with the highest EU rejection rate, each of them marking 14% voting intentions to Leave the Brussels-based institution.

Curiously, those regions are also home to 4 countries that have the highest voting intention to Remain.

Countries with the happiest citizens regarding the EU:

  1. Poland
  2. Ireland
  3. Spain
  4. Portugal
  5. Lithuania

Ireland and Lithuania from the Northern region. Spain and Portugal from the Southern region.

Eastern is the region that holds the most favorable views of European Union. But not all countries in the Eastern region are happy, Czechia is a country there that presented the highest voting intention to Leave the EU.

Gender-Age Overview

weighted_df_ess %>%
    group_by(gndr,Age_Band) %>%
    summarise(total = round(survey_total(),2), proportion = survey_mean()) %>%
    mutate(label = paste0(round(proportion * 100, 2), "%"), 
       label_y =