Octocat This notebook is part of a GitHub repository: https://github.com/pessini/SFI-Grants-and-Awards
MIT Licensed
Author: Leandro Pessini

SFI - Gender differences in research grant applications

Gender Equality in STEM Research Programmes in Ireland

1- Introduction

This report will provide an overview of gender equality in awards applied by Science Foundation Ireland (SFI) which is the national foundation for investment in scientific and engineering research. The data provided covers a period of time between 2011 and 2018.

The Agreed Programme for Government, published June 2002, provided for establishing SFI as a separate legal entity. In July 2003, SFI was established on a statutory basis under the Industrial Development (Science Foundation Ireland) Act, 2003.

SFI provides awards to support scientists and engineers working in the fields of science and engineering that underpin biotechnology, information and communications technology and sustainable energy and energy-efficient technologies.

The analysis is on gender differences in research grants offered by SFI whether the award was accepted or declined by the applicant.

Audience

A core principle of data analysis is understanding your audience before designing your visualization. It is important to match your visualization to your viewer’s information needs.

To provide better context, this report will be based on a hypothetical presentation to a SFI Executive staff, Director of Science for Society. The director has the responsibility for overseeing all Science Foundation Ireland research funding programs and management of funded awards.

Dataset

The dataset used is SFI Gender Dashboard 2019 and includes SFI research programmes from 2011 that were managed end-to-end in SFI’s Grants and Awards Management System and reflects a binary categorisation of gender, e.g. male or female between 2011 and 2018. For more information, check out the Data Dictionary available.

Dataset provided by Ireland's Open Data Portal which helds public data from Irish Public Sectors such as Agriculture, Economy, Housing, Transportation etc.

Libraries

In [1]:
# Change the default plots size 
options(repr.plot.width=15, repr.plot.height=10)
options(warn=-1)
# Suppress summarise info
options(dplyr.summarise.inform = FALSE)
options(dplyr = FALSE)
In [37]:
# Check if the packages that we need are installed
want = c("dplyr", "ggplot2", "ggthemes", "gghighlight", 
         "grid", "foreign", "scales", "ggpubr", "forcats", 
         "stringr", "lubridate", "Hmisc", "psych")
have = want %in% rownames(installed.packages())
# Install the packages that we miss
if ( any(!have) ) { install.packages( want[!have] ) }
# Load the packages
junk <- lapply(want, library, character.only = T)
# Remove the objects we created
rm(have, want, junk)
In [3]:
sfi.grants.gender <- read.csv('../data/SFIGenderDashboard_TableauPublic_2019.csv')
In [4]:
head(sfi.grants.gender)
A data.frame: 6 × 6
Programme.NameYearAward.StatusApplicant.GenderAmount.RequestedAmount.funded
<chr><int><chr><chr><int><int>
1SFI Investigator Programme / Principal Investigator Programme2016DeclinedMale 480000NA
2SFI Investigator Programme / Principal Investigator Programme2014DeclinedFemale790000NA
3SFI Investigator Project Award 2012DeclinedFemale200000NA
4SFI Starting Investigator Research Grant 2018DeclinedFemale400000NA
5SFI Investigator Programme / Principal Investigator Programme2013DeclinedMale 580000NA
6SFI Starting Investigator Research Grant 2015DeclinedFemale400000NA

2- Data Cleaning & Wrangling

In [29]:
sfi.grants.gender2 <- sfi.grants.gender
In [30]:
sfi.grants.gender2$Award.Status <- as.factor(sfi.grants.gender2$Award.Status)
sfi.grants.gender2$Applicant.Gender <- as.factor(sfi.grants.gender2$Applicant.Gender)
In [31]:
sapply(sfi.grants.gender2, function(x) sum(is.na(x)))
Programme.Name
0
Year
0
Award.Status
0
Applicant.Gender
0
Amount.Requested
59
Amount.funded
1973
In [32]:
# There are 59 NA records for Amount Requested
sfi.grants.gender2 %>% filter(is.na(Amount.Requested)) %>% group_by(Applicant.Gender) %>% summarise(n = n())
A tibble: 2 × 2
Applicant.Gendern
<fct><int>
Female16
Male 43
In [33]:
# Cleaning NA values for Amount Requested because the analysis will use this variable
sfi.grants.gender2 <- sfi.grants.gender2 %>% filter(!is.na(Amount.Requested))
sapply(sfi.grants.gender2, function(x) sum(is.na(x)))
Programme.Name
0
Year
0
Award.Status
0
Applicant.Gender
0
Amount.Requested
0
Amount.funded
1914
In [34]:
Hmisc::describe(sfi.grants.gender2)
sfi.grants.gender2 

 6  Variables      2719  Observations
--------------------------------------------------------------------------------
Programme.Name 
       n  missing distinct 
    2719        0       12 

lowest : SFI Career Development Award                                  SFI Future Research Leaders programme                         SFI Industry Fellowship                                       SFI Investigator Programme / Principal Investigator Programme SFI Investigator Project Award                               
highest: SFI Research Professorship                                    SFI Science Policy Research Programme                         SFI Spokes Fixed call Programme                               SFI Starting Investigator Research Grant                      SFI Technology Innovation Development Award                  
--------------------------------------------------------------------------------
Year 
       n  missing distinct     Info     Mean      Gmd 
    2719        0        8     0.98     2015    2.358 

lowest : 2011 2012 2013 2014 2015, highest: 2014 2015 2016 2017 2018
                                                          
Value       2011  2012  2013  2014  2015  2016  2017  2018
Frequency    151   454   341   296   524   389   305   259
Proportion 0.056 0.167 0.125 0.109 0.193 0.143 0.112 0.095
--------------------------------------------------------------------------------
Award.Status 
       n  missing distinct 
    2719        0        2 
                            
Value       Awarded Declined
Frequency       804     1915
Proportion    0.296    0.704
--------------------------------------------------------------------------------
Applicant.Gender 
       n  missing distinct 
    2719        0        2 
                        
Value      Female   Male
Frequency     719   2000
Proportion  0.264  0.736
--------------------------------------------------------------------------------
Amount.Requested 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
    2719        0      284    0.995  1013064  1428586    70000    90000 
     .25      .50      .75      .90      .95 
  100000   420000   870000  1470000  2000000 

lowest :        0    10000    20000    30000    40000
highest: 29990000 30000000 30060000 45380000 47940000
--------------------------------------------------------------------------------
Amount.funded 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
     805     1914      187    0.994   953507  1465202    50000    60000 
     .25      .50      .75      .90      .95 
   90000   240000   700000  1390000  2052000 

lowest :        0     3000    10000    20000    30000
highest: 25210000 27340000 27890000 28820000 44440000
--------------------------------------------------------------------------------
In [35]:
# By Gender
describeBy(sfi.grants.gender[,c("Amount.Requested", "Amount.funded")], sfi.grants.gender$Applicant.Gender)
 Descriptive statistics by group 
group: Female
                 vars   n     mean      sd median  trimmed    mad   min
Amount.Requested    1 719 614617.5 1677506  4e+05 392305.0 444780 10000
Amount.funded       2 213 546244.1 1824061  1e+05 279941.5  74130 10000
                      max    range  skew kurtosis       se
Amount.Requested 29870000 29860000 11.61   163.36  62560.4
Amount.funded    24250000 24240000 10.79   132.59 124982.6
------------------------------------------------------------ 
group: Male
                 vars    n    mean      sd median  trimmed    mad min      max
Amount.Requested    1 2000 1156305 3634810 430000 526062.5 489258   0 47940000
Amount.funded       2  592 1100039 3693443 310000 426940.9 340998   0 44440000
                    range skew kurtosis        se
Amount.Requested 47940000 7.20    58.48  81276.83
Amount.funded    44440000 7.04    56.67 151799.59

As expected the number of Male Applicants is way higher than Female ones.

  • Male Applicants = 2.000
  • Female Applicants = 719
In [174]:
# Creating a category based on quantile to categorize the Amount Requested
sfi.grants.gender2 <- sfi.grants.gender2 %>%
  mutate(Category.Amount= cut(Amount.Requested, 
                             breaks=quantile(Amount.Requested, c(0,.25,.50,.75,1), na.rm = TRUE), 
                             labels=c("low","medium","high","very-high")))

# handling amount requested = 0
sfi.grants.gender2$Category.Amount[sfi.grants.gender2$Amount.Requested == 0] <- "low"

sfi.grants.gender2 %>% group_by(Category.Amount) %>% summarise(total= n()) %>% ungroup()
A tibble: 4 × 2
Category.Amounttotal
<fct><int>
low 802
medium 625
high 618
very-high674
In [36]:
paste0("Number of rows in the dataset: ", nrow(sfi.grants.gender2))
'Number of rows in the dataset: 2719'

3- Exploratory Data Analysis

In [52]:
# Filtering the total amount requested and number of request for all applicants
total_requested <- sfi.grants.gender2 %>% 
summarise(Total.Amount.Requested = sum(Amount.Requested),
         Total.Requests = n())
total_requested
A data.frame: 1 × 2
Total.Amount.RequestedTotal.Requests
<dbl><int>
27545200002719
In [66]:
# Filtering the total amount requested and number of request by Gender
requests_by_gender <- sfi.grants.gender2 %>% 
                        group_by(Applicant.Gender) %>%
                        summarise(total.amount = round(sum(Amount.Requested),2), 
                                  proportion.applicants = round(n()/total_requested$Total.Requests,2)) %>%
                        mutate(label = paste0(round(proportion.applicants * 100, 2), "%"), 
                               label_y = cumsum(proportion.applicants) - 0.5 * proportion.applicants)
requests_by_gender
A tibble: 2 × 5
Applicant.Gendertotal.amountproportion.applicantslabellabel_y
<fct><dbl><dbl><chr><dbl>
Female 4419100000.2626%0.13
Male 23126100000.7474%0.63

Total applicants by Gender

In [72]:
options(repr.plot.width=12, repr.plot.height=5)
requests_by_gender %>% 
    ggplot(aes(x = "", y = proportion.applicants)) +
        geom_bar(aes(fill = fct_reorder(Applicant.Gender, proportion.applicants, .desc = FALSE)), lineend = 'round',
                 stat = "identity", width = .3, alpha=.9, position = position_stack(reverse = TRUE)) +
        coord_flip() +
        scale_fill_manual(values = c("#F48898", "#6487FF")) +
        geom_text(aes(y = label_y, label = paste0(label, "\n", Applicant.Gender)), 
                  size = 8, col = "white", fontface = "bold") +
        labs(x = "", y = "%",
            title = "Total applicants by Gender") + 
        theme_void() +
        theme(axis.title.x = element_blank(), axis.text.x = element_blank(), axis.ticks.x = element_blank()) +
        theme(legend.position = "none",
              plot.title=element_text(vjust=.8, hjust = .5, family='', face='bold', colour='#636363', size=25))

#F48898 - Pink
#6487FF - Blue

As expected the number of Male applicants is way higher than the Female applicants.

In [75]:
# changing the global plot size back
options(repr.plot.width=15, repr.plot.height=10)

How many applications were submitted each year by Gender?

In [147]:
# Filtering the total applicantions by year
total.by.year <- sfi.grants.gender2 %>% 
                    group_by(Year) %>% 
                    summarise(Total.Requests = n())


# by gender
sfi.grants.gender2 %>% 
    group_by(Year, Applicant.Gender) %>% 
    summarise(total = n()) %>%
    ggplot(aes(x= factor(Year), y=total)) + 
    geom_bar(aes(fill=Applicant.Gender), position = position_stack(reverse = TRUE),
             stat="identity", width = .4) +
    scale_fill_manual(values = c("#F48898", "#6487FF")) +
    annotate("segment", x = 5.2, xend = 8, y = 550, yend = 330,
           colour = "#ef8a62", size = 2, arrow = arrow()) +
    annotate("text", x = 7.5, y = 500, family='', face='bold', colour='#636363', size=8,
               label = "After 2015 the number \n of applications have been \n decreasing each year") +
    labs(x = "Year", y = "Total Applicants", fill = "",
        title = "Frequency of application throughout the years",
        subtitle = "Break down by gender")+ 
    theme_gdocs() + 
    theme(legend.position = "top", 
          legend.direction = "horizontal",
          legend.text = element_text(size=15, face="bold"),
          axis.text.x = element_text(face="bold", color="#636363", size=18),
          plot.title=element_text(vjust=.5,family='', face='bold', colour='#636363', size=25),
          plot.subtitle=element_text(vjust=.5,family='', face='bold', colour='#636363', size=15))

After reached a peak in number of grants in 2015 we can see a decreasing in the coming years. In 2015, SFI made a report stating that the number of PhD graduates in STEM research has reduced which may lead to a skills deficit in future years.

The chart shows that the prediction has happened and caused an impact on the number of applications.

Total awarded and declined applications by gender

In [144]:
total_awarded_declined <- sfi.grants.gender2 %>% 
    group_by(Award.Status) %>%
    summarise(total = n())
A tibble: 2 × 2
Award.Statustotal
<fct><int>
Awarded 804
Declined1915
In [159]:
sfi.grants.gender2 %>%
    group_by(Award.Status, Applicant.Gender) %>%
    summarise(total = n()) %>% 
    left_join(total_awarded_declined, by = c("Award.Status")) %>%
    mutate(proportion = total/total.awards.status,
           label = paste0(round(proportion * 100, 1), "%"), 
           label_y = cumsum(proportion) - 0.5 * proportion)
A grouped_df: 4 × 7
Award.StatusApplicant.Gendertotaltotal.awards.statusproportionlabellabel_y
<fct><fct><int><int><dbl><chr><dbl>
Awarded Female 214 8040.266169226.6%0.1330846
Awarded Male 590 8040.733830873.4%0.6330846
DeclinedFemale 50519150.263707626.4%0.1318538
DeclinedMale 141019150.736292473.6%0.6318538
In [162]:
total_awarded_declined <- sfi.grants.gender2 %>% 
    group_by(Award.Status) %>%
    summarise(total.awards.status = n())

# Proportion of accepted and denied applications by gender
sfi.grants.gender2 %>%
    group_by(Award.Status, Applicant.Gender) %>%
    summarise(total = n()) %>% 
    left_join(total_awarded_declined, by = c("Award.Status")) %>%
    mutate(proportion = total/total.awards.status,
           label = paste0(round(proportion * 100, 1), "%"), 
           label_y = cumsum(total) - 0.5 * total) %>%
    ggplot(aes(x= Award.Status, y=total)) + 
    geom_bar(aes(fill=Applicant.Gender), position = position_stack(reverse = TRUE),
             stat="identity", width = .3) +
    geom_text(aes(y=label_y, label = paste0(label, "\n", Applicant.Gender)), 
              col = "white",
              size = 6,
              fontface = "bold") +
    scale_fill_manual(values = c("#F48898", "#6487FF")) +
    labs(x = "Award Status", y = "Number of Applications", fill = "",
        title = "Number of Awarded / Declined Applications by gender")+ 
    coord_flip() +
    theme_gdocs() + 
    theme(legend.position = "none",
          axis.text.x = element_text(face="bold", color="#636363", size=16),
          axis.text.y = element_text(face="bold", color="#636363", size=18),
          plot.title=element_text(vjust=.5,family='', face='bold', colour='#636363', size=25))

The proportion of Awarded/Declined Grants is practically the same to both genders.

26.5% Awarded | 73.5% Declined

In [12]:
# By Award Status
describeBy(sfi.grants.gender[,c("Amount.Requested", "Amount.funded")], sfi.grants.gender$Award.Status)
 Descriptive statistics by group 
group: Awarded
                 vars   n      mean      sd median  trimmed    mad   min
Amount.Requested    1 804 1024701.5 3616477 265000 405993.8 289107 10000
Amount.funded       2 801  954360.8 3319078 240000 383666.2 266868  3000
                      max    range skew kurtosis       se
Amount.Requested 47940000 47930000 7.78    68.92 127543.3
Amount.funded    44440000 44437000 7.73    68.96 117273.9
------------------------------------------------------------ 
group: Declined
                 vars    n    mean      sd median  trimmed    mad min      max
Amount.Requested    1 1915 1008178 3073772 440000 521161.1 504084   0 45380000
Amount.funded       2    4  782500  895484 655000 782500.0 919212   0  1820000
                    range skew kurtosis        se
Amount.Requested 45380000 7.98    72.19  70240.45
Amount.funded     1820000 0.13    -2.29 447742.02

Analysing the descriptive statistics separated by Award status we can see that the average difference between Requested and Funded is not high.

  • Mean Amount Requested => €1.024.701,50
  • Mean Amount Awarded => €954.360,80

Number of awarded grants by amount requested

In [14]:
CrossTable(total.amount.requested$Award.Status, total.amount.requested$Category.Amount,
          prop.r=TRUE,
          prop.c=FALSE,
          prop.t=FALSE,
          prop.chisq=FALSE,
          digits=2)
 
   Cell Contents
|-------------------------|
|                       N |
|           N / Row Total |
|-------------------------|

 
Total Observations in Table:  2716 

 
                                    | total.amount.requested$Category.Amount 
total.amount.requested$Award.Status |       low |    medium |      high | very-high | Row Total | 
------------------------------------|-----------|-----------|-----------|-----------|-----------|
                            Awarded |       364 |       133 |       121 |       186 |       804 | 
                                    |      0.45 |      0.17 |      0.15 |      0.23 |      0.30 | 
------------------------------------|-----------|-----------|-----------|-----------|-----------|
                           Declined |       435 |       492 |       497 |       488 |      1912 | 
                                    |      0.23 |      0.26 |      0.26 |      0.26 |      0.70 | 
------------------------------------|-----------|-----------|-----------|-----------|-----------|
                       Column Total |       799 |       625 |       618 |       674 |      2716 | 
------------------------------------|-----------|-----------|-----------|-----------|-----------|

 
In [217]:
# Plot the total awarded grants by categories
categories.amount <- total.amount.requested %>%
  group_by(Category.Amount, Award.Status) %>%
  filter(!is.na(Category.Amount), Award.Status == "Awarded") %>%
  summarise(total = n())

categories.amount %>% 
    ggplot(aes(x= Category.Amount, y=total, fill="")) + 
    geom_bar(position="dodge",stat="identity", width = .6) +
    scale_fill_brewer(palette = "Set2") +
    scale_y_continuous(labels = scales::number) +
    annotate("curve", curvature = -.3, x = 2.5, xend = 1.4, y = 300, yend = 320,
               colour = "#636363", size = 2, arrow = arrow()) +
    annotate("text", x = 3, y = 320, family = "", fontface = 3, size=6,
               label = "45% of the Applicants applied for a \"Low\" amount") +
    labs(x = "", y = "Number of applications", 
         title = "Number of awarded grants by amount requested",
         subtitle = "Each category represents 25% of the amount requested") + 
    theme_minimal() + 
    theme(legend.position = "none",
          axis.text.x = element_text(face="bold", color="#636363", size=18), 
          axis.text.y = element_text(face="bold", color="#636363", size=18),
          plot.title=element_text(vjust=.5,family='', face='bold', colour='#636363', size=25),
          plot.subtitle=element_text(vjust=.5,family='', face='bold', colour='#636363', size=15))

45% of the Applicants who have their grant application Awarded applied for a "Low" amount.

4- Dashboard


This Dashboard was created using Tableau® software.

The focus is to show the main insights found on this analysis. The Dashboard along with the Data Exploration can be found on Tableau website.

SFI - Gender Dashboard

GitHub Mark GitHub repository
Author: Leandro Pessini

In [16]:
R.version$version.string
'R version 4.0.2 (2020-06-22)'